NearestNeighbors#
- class cuml.dask.neighbors.NearestNeighbors(*, client=None, streams_per_handle=0, **kwargs)[source]#
Multi-node Multi-GPU NearestNeighbors Model.
- Parameters:
- n_neighborsint (default=5)
Default number of neighbors to query
- batch_size: int (optional, default 2000000)
Maximum number of query rows processed at once. This parameter can greatly affect the throughput of the algorithm. The optimal setting of this value will vary for different layouts and index to query ratios, but it will require
batch_size * n_features * 4bytes of additional memory on each worker hosting index partitions.- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.
Methods
fit(X)Fit a multi-node multi-GPU Nearest Neighbors index
get_neighbors(n_neighbors)Returns the default n_neighbors, initialized from the constructor, if n_neighbors is None.
kneighbors([X, n_neighbors, ...])Query the distributed nearest neighbors index
- fit(X)[source]#
Fit a multi-node multi-GPU Nearest Neighbors index
- Parameters:
- Xdask_cudf.Dataframe
- Returns:
- self: NearestNeighbors model
- get_neighbors(n_neighbors)[source]#
Returns the default n_neighbors, initialized from the constructor, if n_neighbors is None.
- Parameters:
- n_neighborsint
Number of neighbors
- Returns:
- n_neighbors: int
Default n_neighbors if parameter n_neighbors is none
- kneighbors(X=None, n_neighbors=None, return_distance=True, _return_futures=False)[source]#
Query the distributed nearest neighbors index
- Parameters:
- Xdask_cudf.Dataframe
Vectors to query. If not provided, neighbors of each indexed point are returned.
- n_neighborsint
Number of neighbors to query for each row in X. If not provided, the n_neighbors on the model are used.
- return_distanceboolean (default=True)
If false, only indices are returned
- Returns:
- rettuple (dask_cudf.DataFrame, dask_cudf.DataFrame)
First dask-cuDF DataFrame contains distances, second contains the indices.