CategoricalNB#

class cuml.naive_bayes.CategoricalNB(*, alpha=1.0, fit_prior=True, class_prior=None, output_type=None, verbose=False)[source]#

Naive Bayes classifier for categorical features.

The categorical Naive Bayes classifier is suitable for classification with discrete features that are categorically distributed. The categories of each feature are drawn from a categorical distribution.

Parameters:

alphafloat, default=1.0: Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).
fit_priorbool, default=True: Whether to learn class prior probabilities or not. If false, a uniform prior will be used.
class_priorarray-like of shape (n_classes,), default=None: Prior probabilities of the classes. If specified the priors are not adjusted according to the data.
output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None: Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.
verboseint or boolean, default=False: Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

Attributes:

category_count_ndarray of shape (n_features, n_classes, n_categories): With n_categories being the highest category of all the features. This array provides the number of samples encountered for each feature, class and category of the specific feature.
class_count_ndarray of shape (n_classes,): Number of samples encountered for each class during fitting.
class_log_prior_ndarray of shape (n_classes,): Smoothed empirical log probability for each class.
classes_ndarray of shape (n_classes,): Class labels known to the classifier
feature_log_prob_ndarray of shape (n_features, n_classes, n_categories): With n_categories being the highest category of all the features. Each array of shape (n_classes, n_categories) provides the empirical log probability of categories given the respective feature and class, P(x_i|y). This attribute is not available when the model has been trained with sparse data.
n_features_int: Number of features of each sample.

Examples

>>> import cupy as cp
>>> from cuml.naive_bayes import CategoricalNB
>>> rng = cp.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100), dtype=cp.int32)
>>> y = cp.array([1, 2, 3, 4, 5, 6])
>>> clf = CategoricalNB()
>>> clf.fit(X, y)
CategoricalNB()
>>> print(clf.predict(X[2:3]))
[3]