ComplementNB#

class cuml.naive_bayes.ComplementNB(*, alpha=1.0, fit_prior=True, class_prior=None, norm=False, output_type=None, verbose=False)[source]#

The Complement Naive Bayes classifier described in Rennie et al. (2003).

The Complement Naive Bayes classifier was designed to correct the “severe assumptions” made by the standard Multinomial Naive Bayes classifier. It is particularly suited for imbalanced data sets.

Parameters:
alphafloat, default=1.0

Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).

fit_priorbool, default=True

Whether to learn class prior probabilities or not. If false, a uniform prior will be used.

class_priorarray-like of shape (n_classes,), default=None

Prior probabilities of the classes. If specified the priors are not adjusted according to the data.

normbool, default=False

Whether or not a second normalization of the weights is performed. The default behavior mirrors the implementation found in Mahout and Weka, which do not follow the full algorithm described in Table 9 of the paper.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

Attributes:
class_count_ndarray of shape (n_classes)

Number of samples encountered for each class during fitting.

class_log_prior_ndarray of shape (n_classes)

Log probability of each class (smoothed).

classes_ndarray of shape (n_classes,)

Class labels known to the classifier

feature_count_ndarray of shape (n_classes, n_features)

Number of samples encountered for each (class, feature) during fitting.

feature_log_prob_ndarray of shape (n_classes, n_features)

Empirical log probability of features given a class, P(x_i|y).

n_features_int

Number of features of each sample.

References

Rennie, J. D., Shih, L., Teevan, J., & Karger, D. R. (2003). Tackling the poor assumptions of naive bayes text classifiers. In ICML (Vol. 3, pp. 616-623). https://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf

Examples

>>> import cupy as cp
>>> from cuml.naive_bayes import ComplementNB
>>> rng = cp.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100), dtype=cp.int32)
>>> y = cp.array([1, 2, 3, 4, 4, 5])
>>> clf = ComplementNB()
>>> clf.fit(X, y)
ComplementNB()
>>> print(clf.predict(X[2:3]))
[3]