MultinomialNB#

class cuml.naive_bayes.MultinomialNB(*, alpha=1.0, fit_prior=True, class_prior=None, verbose=False, output_type=None)[source]#

Naive Bayes classifier for multinomial models.

The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification).

The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.

Parameters:
alphafloat (default=1.0)

Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).

fit_priorboolean (default=True)

Whether to learn class prior probabilities or no. If false, a uniform prior will be used.

class_priorarray-like, size (n_classes) (default=None)

Prior probabilities of the classes. If specified, the priors are not adjusted according to the data.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

Attributes:
class_count_ndarray of shape (n_classes)

Number of samples encountered for each class during fitting.

class_log_prior_ndarray of shape (n_classes)

Log probability of each class (smoothed).

classes_ndarray of shape (n_classes,)

Class labels known to the classifier

feature_count_ndarray of shape (n_classes, n_features)

Number of samples encountered for each (class, feature) during fitting.

feature_log_prob_ndarray of shape (n_classes, n_features)

Empirical log probability of features given a class, P(x_i|y).

n_features_int

Number of features of each sample.

Examples

Load the 20 newsgroups dataset from Scikit-learn and train a Naive Bayes classifier.

>>> from sklearn.datasets import fetch_20newsgroups
>>> from sklearn.feature_extraction.text import CountVectorizer
>>> from cuml.naive_bayes import MultinomialNB
>>> data = fetch_20newsgroups(subset='train', shuffle=True, random_state=42)
>>> X = CountVectorizer().fit_transform(data.data)
>>> model = MultinomialNB().fit(X, data.target)
>>> model.score(X, y)
0.9245...