Spectral Embedding#

Spectral embedding is a powerful dimensionality reduction technique that uses the eigenvectors of the graph Laplacian to embed high-dimensional data into a lower-dimensional space. This method is particularly effective for discovering non-linear manifold structures in data and is widely used in clustering, visualization, and feature extraction tasks.

Overview#

The spectral embedding algorithm works by:

  1. Graph Construction: Building a k-nearest neighbors graph from the input data

  2. Laplacian Computation: Computing the graph Laplacian matrix (normalized or unnormalized)

  3. Eigendecomposition: Finding the eigenvectors corresponding to the smallest eigenvalues

  4. Embedding: Using these eigenvectors as coordinates in the lower-dimensional space

Parameters#

#include <cuvs/preprocessing/spectral_embedding.hpp>

namespace cuvs::preprocessing::spectral_embedding

struct params#

Parameters for spectral embedding algorithm.

Spectral embedding is a dimensionality reduction technique that uses the eigenvectors of the graph Laplacian to embed data points into a lower-dimensional space. This technique is particularly useful for non-linear dimensionality reduction and clustering tasks.

Public Members

int n_components#

The number of components to reduce the data to.

int n_neighbors#

The number of neighbors to use for the nearest neighbors graph.

bool norm_laplacian#

Whether to normalize the Laplacian matrix.

If true, uses the normalized graph Laplacian (D^(-1/2) L D^(-1/2)). If false, uses the unnormalized graph Laplacian (L = D - W). Normalized Laplacian often leads to better results for clustering tasks.

bool drop_first#

Whether to drop the first eigenvector.

The first eigenvector of the normalized Laplacian is constant and uninformative. Setting this to true drops it from the embedding. This is typically set to true when norm_laplacian is true.

uint64_t seed#

Random seed for reproducibility.

Controls the random number generation for k-NN graph construction and eigenvalue solver initialization. Use the same seed value to ensure reproducible results across runs.

Functions#

#include <cuvs/preprocessing/spectral_embedding.hpp>

namespace cuvs::preprocessing::spectral_embedding

void transform(
raft::resources const &handle,
params config,
raft::device_matrix_view<float, int, raft::row_major> dataset,
raft::device_matrix_view<float, int, raft::col_major> embedding
)#

Perform spectral embedding on input dataset.

This function computes the spectral embedding of the input dataset by:

  1. Constructing a k-nearest neighbors graph from the input data

  2. Computing the graph Laplacian (normalized or unnormalized)

  3. Finding the eigenvectors corresponding to the smallest eigenvalues

  4. Using these eigenvectors as the embedding coordinates

#include <raft/core/resources.hpp>
#include <cuvs/preprocessing/spectral_embedding.hpp>

raft::resources handle;

// Set up parameters
cuvs::preprocessing::spectral_embedding::params params;
params.n_components = 2;
params.n_neighbors = 15;
params.norm_laplacian = true;
params.drop_first = true;
params.seed = 42;

// Create input dataset (n_samples x n_features)
auto dataset = raft::make_device_matrix<float, int>(handle, n_samples, n_features);
// ... fill dataset ...

// Create output embedding matrix (n_samples x n_components)
auto embedding = raft::make_device_matrix<float, int, raft::col_major>(
    handle, n_samples, params.n_components);

// Perform spectral embedding
cuvs::preprocessing::spectral_embedding::transform(
    handle, params, dataset.view(), embedding.view());
Parameters:
  • handle[in] RAFT resource handle for managing CUDA resources

  • config[in] Parameters controlling the spectral embedding algorithm

  • dataset[in] Input dataset in row-major format [n_samples x n_features]

  • embedding[out] Output embedding in column-major format [n_samples x n_components]

void transform(
raft::resources const &handle,
params config,
raft::device_coo_matrix_view<float, int, int, int> connectivity_graph,
raft::device_matrix_view<float, int, raft::col_major> embedding
)#

Perform spectral embedding using a precomputed connectivity graph.

This function computes the spectral embedding from a precomputed sparse connectivity graph (e.g., from a k-NN search or custom similarity matrix). This is useful when you want to use a custom graph construction method or when you have a precomputed similarity/affinity matrix.

The function:

  1. Converts the COO matrix to the graph Laplacian

  2. Computes eigenvectors of the Laplacian

  3. Returns the eigenvectors as the embedding

#include <raft/core/resources.hpp>
#include <cuvs/preprocessing/spectral_embedding.hpp>

raft::resources handle;

// Set up parameters
cuvs::preprocessing::spectral_embedding::params params;
params.n_components = 2;
params.norm_laplacian = true;
params.drop_first = true;
params.seed = 42;

// Assume we have a precomputed connectivity graph as COO matrix
// connectivity_graph represents weighted edges between samples
raft::device_coo_matrix<float, int, int, int> connectivity_graph(...);

// Create output embedding matrix (n_samples x n_components)
auto embedding = raft::make_device_matrix<float, int, raft::col_major>(
    handle, n_samples, params.n_components);

// Perform spectral embedding
cuvs::preprocessing::spectral_embedding::transform(
    handle, params, connectivity_graph.view(), embedding.view());
Parameters:
  • handle[in] RAFT resource handle for managing CUDA resources

  • config[in] Parameters controlling the spectral embedding algorithm (n_neighbors parameter is ignored when using precomputed graph)

  • connectivity_graph[in] Precomputed sparse connectivity/affinity graph in COO format representing weighted connections between samples

  • embedding[out] Output embedding in column-major format [n_samples x n_components]

struct params
#include <spectral_embedding.hpp>

Parameters for spectral embedding algorithm.

Spectral embedding is a dimensionality reduction technique that uses the eigenvectors of the graph Laplacian to embed data points into a lower-dimensional space. This technique is particularly useful for non-linear dimensionality reduction and clustering tasks.

Example Usage#

Basic Usage with Dataset#

#include <raft/core/resources.hpp>
#include <cuvs/preprocessing/spectral_embedding.hpp>

// Initialize RAFT resources
raft::resources handle;

// Configure spectral embedding parameters
cuvs::preprocessing::spectral_embedding::params params;
params.n_components = 2;        // Reduce to 2D for visualization
params.n_neighbors = 15;        // Local neighborhood size
params.norm_laplacian = true;   // Use normalized Laplacian
params.drop_first = true;       // Drop constant eigenvector
params.seed = 42;               // For reproducibility

// Create input dataset (n_samples x n_features)
int n_samples = 1000;
int n_features = 50;
auto dataset = raft::make_device_matrix<float, int>(handle, n_samples, n_features);
// ... populate dataset with your data ...

// Allocate output embedding matrix (n_samples x n_components)
auto embedding = raft::make_device_matrix<float, int, raft::col_major>(
    handle, n_samples, params.n_components);

// Perform spectral embedding
cuvs::preprocessing::spectral_embedding::transform(
    handle, params, dataset.view(), embedding.view());

Using Precomputed Graph#

#include <raft/core/resources.hpp>
#include <cuvs/preprocessing/spectral_embedding.hpp>

raft::resources handle;

// Configure parameters (n_neighbors is ignored with precomputed graph)
cuvs::preprocessing::spectral_embedding::params params;
params.n_components = 3;
params.norm_laplacian = true;
params.drop_first = true;
params.seed = 42;

// Assume we have a precomputed connectivity graph
// This could be from custom similarity computation or k-NN search
raft::device_coo_matrix<float, int, int, int> connectivity_graph(...);

// Allocate output embedding
auto embedding = raft::make_device_matrix<float, int, raft::col_major>(
    handle, n_samples, params.n_components);

// Perform spectral embedding with precomputed graph
cuvs::preprocessing::spectral_embedding::transform(
    handle, params, connectivity_graph.view(), embedding.view());