KNeighborsClassifier#

class cuml.dask.neighbors.KNeighborsClassifier(*, client=None, streams_per_handle=0, verbose=False, **kwargs)[source]#

Multi-node Multi-GPU K-Nearest Neighbors Classifier Model.

K-Nearest Neighbors Classifier is an instance-based learning technique, that keeps training samples around for prediction, rather than trying to learn a generalizable set of model parameters.

Parameters:

n_neighborsint (default=5): Default number of neighbors to query
batch_size: int (optional, default 2000000): Maximum number of query rows processed at once. This parameter can greatly affect the throughput of the algorithm. The optimal setting of this value will vary for different layouts and index to query ratios, but it will require batch_size * n_features * 4 bytes of additional memory on each worker hosting index partitions.
verboseint or boolean, default=False: Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

Methods

`fit`(X, y)	Fit a multi-node multi-GPU K-Nearest Neighbors Classifier index
`predict`(X[, convert_dtype])	Predict labels for a query from previously stored index and index labels.
`predict_proba`(X[, convert_dtype])	Provide score by comparing predictions and ground truth.
`score`(X, y[, convert_dtype])	Predict labels for a query from previously stored index and index labels.

fit(X, y)[source]#

Fit a multi-node multi-GPU K-Nearest Neighbors Classifier index

Parameters:

Xarray-like (device or host) shape = (n_samples, n_features): Index data. Acceptable formats: dask CuPy/NumPy/Numba Array
yarray-like (device or host) shape = (n_samples, n_features): Index labels data. Acceptable formats: dask CuPy/NumPy/Numba Array

Returns:

selfKNeighborsClassifier model

predict(X, convert_dtype=True)[source]#

Predict labels for a query from previously stored index and index labels. The process is done in a multi-node multi-GPU fashion.

Parameters:

Xarray-like (device or host) shape = (n_samples, n_features): Query data. Acceptable formats: dask cuDF, dask CuPy/NumPy/Numba Array
convert_dtypebool, optional (default = True): When set to True, the predict method will automatically convert the data to the right formats.

Returns:

predictionsDask futures or Dask CuPy Arrays

predict_proba(X, convert_dtype=True)[source]#

Provide score by comparing predictions and ground truth.

Parameters:

Xarray-like (device or host) shape = (n_samples, n_features): Query data. Acceptable formats: dask cuDF, dask CuPy/NumPy/Numba Array
convert_dtypebool, optional (default = True): When set to True, the predict method will automatically convert the data to the right formats.

Returns:

probabilitiesDask futures or Dask CuPy Arrays

score(X, y, convert_dtype=True)[source]#

Predict labels for a query from previously stored index and index labels. The process is done in a multi-node multi-GPU fashion.

Parameters:

Xarray-like (device or host) shape = (n_samples, n_features): Query test data. Acceptable formats: dask CuPy/NumPy/Numba Array
yarray-like (device or host) shape = (n_samples, n_features): Labels test data. Acceptable formats: dask CuPy/NumPy/Numba Array

Returns:

score