Attention

The vector search and clustering algorithms in RAFT are being migrated to a new library dedicated to vector search called cuVS. We will continue to support the vector search algorithms in RAFT during this move, but will no longer update them after the RAPIDS 24.06 (June) release. We plan to complete the migration by RAPIDS 24.08 (August) release.

Sampling Without Replacement#

#include <raft/random/sample_without_replacement.cuh>

namespace raft::random

template<typename DataT, typename IdxT, typename WeightsVectorType, class OutIndexVectorType>
void sample_without_replacement(raft::resources const &handle, RngState &rng_state, raft::device_vector_view<const DataT, IdxT> in, WeightsVectorType &&weights_opt, raft::device_vector_view<DataT, IdxT> out, OutIndexVectorType &&outIdx_opt)#

Sample the input vector without replacement, optionally based on the input weight vector for each element in the array.

The implementation is based on the one-pass sampling algorithm described in “Accelerating weighted random sampling without replacement,” a technical report by Kirill Mueller.

If no input weight vector is provided, then input elements will be sampled uniformly. Otherwise, the elements sampled from the input vector will always appear in increasing order of their weights as computed using the exponential distribution. So, if you are particular about the order (for e.g., array permutations), then this might not be the right choice.

Note

Please do not specify template parameters explicitly, as the compiler can deduce them from the arguments.

Template Parameters:
  • DataT – type of each element of the input array in

  • IdxT – type of the dimensions of the arrays; output index type

  • WeightsVectorType – std::optional<raft::device_vector_view<const weight_type, IdxT>> of each elements of the weights array weights_opt

  • OutIndexVectorType – std::optional<raft::device_vector_view<IdxT, IdxT>> of output indices outIdx_opt

Parameters:
  • handle[in] RAFT handle containing (among other resources) the CUDA stream on which to run.

  • rng_state[inout] Pseudorandom number generator state.

  • in[in] Input vector to be sampled.

  • weights_opt[in] std::optional weights vector. If not provided, uniform sampling will be used.

  • out[out] Vector of samples from the input vector.

  • outIdx_opt[out] std::optional vector of the indices sampled from the input array.

Pre:

The number of samples out.extent(0) is less than or equal to the number of inputs in.extent(0).

Pre:

The number of weights wts.extent(0) equals the number of inputs in.extent(0).

template<typename ...Args, typename = std::enable_if_t<sizeof...(Args) == 5>>
void sample_without_replacement(Args... args)#

Overload of sample_without_replacement to help the compiler find the above overload, in case users pass in std::nullopt for one or both of the optional arguments.

Please see above for documentation of sample_without_replacement.

#include <raft/random/permute.cuh>

namespace raft::random

template<typename InputOutputValueType, typename IntType, typename IdxType, typename Layout>
void permute(raft::resources const &handle, raft::device_matrix_view<const InputOutputValueType, IdxType, Layout> in, std::optional<raft::device_vector_view<IntType, IdxType>> permsOut, std::optional<raft::device_matrix_view<InputOutputValueType, IdxType, Layout>> out)#

Randomly permute the rows of the input matrix.

We do not support in-place permutation, so that we can compute in parallel without race conditions. This function is useful for shuffling input data sets in machine learning algorithms.

Note

This is NOT a uniform permutation generator! It only generates a small fraction of all possible random permutations. If your application needs a high-quality permutation generator, then we recommend Knuth Shuffle.

Template Parameters:
  • InputOutputValueType – Type of each element of the input matrix, and the type of each element of the output matrix (if provided)

  • IntType – Integer type of each element of permsOut

  • IdxType – Integer type of the extents of the mdspan parameters

  • Layout – Either raft::row_major or raft::col_major

Parameters:
  • handle[in] RAFT handle containing the CUDA stream on which to run.

  • in[in] input matrix

  • permsOut[out] If provided, the indices of the permutation.

  • out[out] If provided, the output matrix, containing the permuted rows of the input matrix in. (Not providing this is only useful if you provide permsOut.)

Pre:

If permsOut.has_value() is true, then (*permsOut).extent(0) == in.extent(0) is true.

Pre:

If out.has_value() is true, then (*out).extents() == in.extents() is true.

template<typename InputOutputValueType, typename IdxType, typename Layout, typename PermsOutType, typename OutType>
void permute(raft::resources const &handle, raft::device_matrix_view<const InputOutputValueType, IdxType, Layout> in, PermsOutType &&permsOut, OutType &&out)#

Overload of permute that compiles if users pass in std::nullopt for either or both of permsOut and out.