KNeighborsClassifier#
- class cuml.neighbors.KNeighborsClassifier(*, weights='uniform', verbose=False, output_type=None, **kwargs)#
K-Nearest Neighbors Classifier is an instance-based learning technique, that keeps training samples around for prediction, rather than trying to learn a generalizable set of model parameters.
- Parameters:
- n_neighborsint (default=5)
Default number of neighbors to query
- algorithmstring (default=’auto’)
The query algorithm to use. Currently, only ‘brute’ is supported.
- metricstring (default=’euclidean’).
Distance metric to use.
- weights{‘uniform’, ‘distance’} or callable, default=’uniform’
Weight function used in prediction. Possible values:
‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
‘distance’ : weight points by the inverse of their distance. In this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.
- Attributes:
outputs_2d_KNeighborsClassifier.outputs_2d_(self)
Methods
fit(self, X, y, *[, convert_dtype])Fit a GPU index for k-nearest neighbors classifier model.
predict(self, X, *[, convert_dtype])Use the trained k-nearest neighbors classifier to predict the labels for X
predict_proba(self, X, *[, convert_dtype])Use the trained k-nearest neighbors classifier to predict the label probabilities for X
Notes
For additional docs, see scikitlearn’s KNeighborsClassifier.
Examples
>>> from cuml.neighbors import KNeighborsClassifier >>> from cuml.datasets import make_blobs >>> from cuml.model_selection import train_test_split >>> X, y = make_blobs(n_samples=100, centers=5, ... n_features=10, random_state=5) >>> X_train, X_test, y_train, y_test = train_test_split( ... X, y, train_size=0.80, random_state=5) >>> knn = KNeighborsClassifier(n_neighbors=10) >>> knn.fit(X_train, y_train) KNeighborsClassifier() >>> knn.predict(X_test) array([1., 2., 2., 3., 4., 2., 4., 4., 2., 3., 1., 4., 3., 1., 3., 4., 3., # noqa: E501 4., 1., 3.], dtype=float32)
- fit(self, X, y, *, convert_dtype=True) 'KNeighborsClassifier'[source]#
Fit a GPU index for k-nearest neighbors classifier model.
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- yarray-like (device or host) shape = (n_samples, 1)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the method will automatically convert the inputs to np.float32.
- property outputs_2d_#
Whether the output is 2d
- predict(self, X, *, convert_dtype=True)[source]#
Use the trained k-nearest neighbors classifier to predict the labels for X
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the method will automatically convert the inputs to np.float32.
- Returns:
- X_newcuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, 1)
Labels predicted
For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.
- predict_proba(self, X, *, convert_dtype=True) CumlArray | list[CumlArray][source]#
Use the trained k-nearest neighbors classifier to predict the label probabilities for X
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the method will automatically convert the inputs to np.float32.
- Returns:
- X_newcuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, 1)
Labels probabilities
For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.