KNeighborsClassifier#

class cuml.neighbors.KNeighborsClassifier(*, weights='uniform', verbose=False, output_type=None, **kwargs)#

K-Nearest Neighbors Classifier is an instance-based learning technique, that keeps training samples around for prediction, rather than trying to learn a generalizable set of model parameters.

Parameters:

n_neighborsint (default=5)

Default number of neighbors to query

algorithmstring (default=’auto’)

The query algorithm to use. Currently, only ‘brute’ is supported.

metricstring (default=’euclidean’).

Distance metric to use.

weights{‘uniform’, ‘distance’} or callable, default=’uniform’

Weight function used in prediction. Possible values:

‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
‘distance’ : weight points by the inverse of their distance. In this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

Attributes:

outputs_2d_: KNeighborsClassifier.outputs_2d_(self)

Methods

`fit`(self, X, y, *[, convert_dtype])	Fit a GPU index for k-nearest neighbors classifier model.
`predict`(self, X, *[, convert_dtype])	Use the trained k-nearest neighbors classifier to predict the labels for X
`predict_proba`(self, X, *[, convert_dtype])	Use the trained k-nearest neighbors classifier to predict the label probabilities for X

Notes

For additional docs, see scikitlearn’s KNeighborsClassifier.

Examples

>>> from cuml.neighbors import KNeighborsClassifier
>>> from cuml.datasets import make_blobs
>>> from cuml.model_selection import train_test_split

>>> X, y = make_blobs(n_samples=100, centers=5,
...                   n_features=10, random_state=5)
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, train_size=0.80, random_state=5)

>>> knn = KNeighborsClassifier(n_neighbors=10)

>>> knn.fit(X_train, y_train)
KNeighborsClassifier()
>>> knn.predict(X_test)
array([1., 2., 2., 3., 4., 2., 4., 4., 2., 3., 1., 4., 3., 1., 3., 4., 3., # noqa: E501
    4., 1., 3.], dtype=float32)

fit(self, X, y, *, convert_dtype=True) → 'KNeighborsClassifier'[source]#

Fit a GPU index for k-nearest neighbors classifier model.

Parameters:

Xarray-like (device or host) shape = (n_samples, n_features): Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
yarray-like (device or host) shape = (n_samples, 1): Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
convert_dtypebool, optional (default = True): When set to True, the method will automatically convert the inputs to np.float32.

property outputs_2d_#: Whether the output is 2d

predict(self, X, *, convert_dtype=True)[source]#

Use the trained k-nearest neighbors classifier to predict the labels for X

Parameters:

Xarray-like (device or host) shape = (n_samples, n_features): Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
convert_dtypebool, optional (default = True): When set to True, the method will automatically convert the inputs to np.float32.

Returns:

X_newcuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, 1)

Labels predicted

For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.

predict_proba(self, X, *, convert_dtype=True) → CumlArray | list[CumlArray][source]#

Use the trained k-nearest neighbors classifier to predict the label probabilities for X

Parameters:

Xarray-like (device or host) shape = (n_samples, n_features): Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
convert_dtypebool, optional (default = True): When set to True, the method will automatically convert the inputs to np.float32.

Returns:

X_newcuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, 1)

Labels probabilities

For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.