Brute Force KNN#
Index#
- class cuvs.neighbors.brute_force.Index#
Brute Force index object. This object stores the trained Brute Force which can be used to perform nearest neighbors searches.
- Attributes:
- trained
Index build#
- cuvs.neighbors.brute_force.build(dataset, metric='sqeuclidean', metric_arg=2.0, resources=None)[source]#
Build the Brute Force index from the dataset for efficient search.
- Parameters:
- datasetCUDA array interface compliant matrix shape (n_samples, dim)
Supported dtype [float, int8, uint8]
- metricDistance metric to use. Default is sqeuclidean
- metric_argvalue of ‘p’ for Minkowski distances
- resourcesOptional cuVS Resource handle for reusing CUDA resources.
If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling
resources.sync()
before accessing the output.
- Returns:
- index: cuvs.neighbors.brute_force.Index
Examples
>>> import cupy as cp >>> from cuvs.neighbors import brute_force >>> n_samples = 50000 >>> n_features = 50 >>> n_queries = 1000 >>> k = 10 >>> dataset = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> index = brute_force.build(dataset, metric="cosine") >>> distances, neighbors = brute_force.search(index, dataset, k) >>> distances = cp.asarray(distances) >>> neighbors = cp.asarray(neighbors)
Index search#
- cuvs.neighbors.brute_force.search(Index index, queries, k, neighbors=None, distances=None, resources=None)[source]#
Find the k nearest neighbors for each query.
- Parameters:
- indexIndex
Trained Brute Force index.
- queriesCUDA array interface compliant matrix shape (n_samples, dim)
Supported dtype [float, int8, uint8]
- kint
The number of neighbors.
- neighborsOptional CUDA array interface compliant matrix shape
(n_queries, k), dtype int64_t. If supplied, neighbor indices will be written here in-place. (default None)
- distancesOptional CUDA array interface compliant matrix shape
(n_queries, k) If supplied, the distances to the neighbors will be written here in-place. (default None)
- resourcesOptional cuVS Resource handle for reusing CUDA resources.
If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling
resources.sync()
before accessing the output.
Examples
>>> import cupy as cp >>> from cuvs.neighbors import brute_force >>> n_samples = 50000 >>> n_features = 50 >>> n_queries = 1000 >>> dataset = cp.random.random_sample((n_samples, n_features), ... dtype=cp.float32) >>> # Build index >>> index = brute_force.build(dataset, metric="sqeuclidean") >>> # Search using the built index >>> queries = cp.random.random_sample((n_queries, n_features), ... dtype=cp.float32) >>> k = 10 >>> # Using a pooling allocator reduces overhead of temporary array >>> # creation during search. This is useful if multiple searches >>> # are performed with same query size. >>> distances, neighbors = brute_force.search(index, queries, k) >>> neighbors = cp.asarray(neighbors) >>> distances = cp.asarray(distances)