CategoricalNB#

class cuml.naive_bayes.CategoricalNB(*, alpha=1.0, fit_prior=True, class_prior=None, output_type=None, verbose=False)[source]#

Naive Bayes classifier for categorical features.

The categorical Naive Bayes classifier is suitable for classification with discrete features that are categorically distributed. The categories of each feature are drawn from a categorical distribution.

Parameters:
alphafloat, default=1.0

Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).

fit_priorbool, default=True

Whether to learn class prior probabilities or not. If false, a uniform prior will be used.

class_priorarray-like of shape (n_classes,), default=None

Prior probabilities of the classes. If specified the priors are not adjusted according to the data.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

Attributes:
category_count_ndarray of shape (n_features, n_classes, n_categories)

With n_categories being the highest category of all the features. This array provides the number of samples encountered for each feature, class and category of the specific feature.

class_count_ndarray of shape (n_classes,)

Number of samples encountered for each class during fitting.

class_log_prior_ndarray of shape (n_classes,)

Smoothed empirical log probability for each class.

classes_ndarray of shape (n_classes,)

Class labels known to the classifier

feature_log_prob_ndarray of shape (n_features, n_classes, n_categories)

With n_categories being the highest category of all the features. Each array of shape (n_classes, n_categories) provides the empirical log probability of categories given the respective feature and class, P(x_i|y). This attribute is not available when the model has been trained with sparse data.

n_features_int

Number of features of each sample.

Examples

>>> import cupy as cp
>>> from cuml.naive_bayes import CategoricalNB
>>> rng = cp.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100), dtype=cp.int32)
>>> y = cp.array([1, 2, 3, 4, 5, 6])
>>> clf = CategoricalNB()
>>> clf.fit(X, y)
CategoricalNB()
>>> print(clf.predict(X[2:3]))
[3]