Spectral Embedding#
Spectral embedding is a powerful dimensionality reduction technique that uses the eigenvectors of the graph Laplacian to embed high-dimensional data into a lower-dimensional space. This method is particularly effective for discovering non-linear manifold structures in data and is widely used in clustering, visualization, and feature extraction tasks.
Overview#
The spectral embedding algorithm works by:
Graph Construction: Building a k-nearest neighbors graph from the input data
Laplacian Computation: Computing the graph Laplacian matrix (normalized or unnormalized)
Eigendecomposition: Finding the eigenvectors corresponding to the smallest eigenvalues
Embedding: Using these eigenvectors as coordinates in the lower-dimensional space
Parameters#
#include <cuvs/preprocessing/spectral_embedding.hpp>
namespace cuvs::preprocessing::spectral_embedding
-
struct params#
Parameters for spectral embedding algorithm.
Spectral embedding is a dimensionality reduction technique that uses the eigenvectors of the graph Laplacian to embed data points into a lower-dimensional space. This technique is particularly useful for non-linear dimensionality reduction and clustering tasks.
Public Members
-
int n_components#
The number of components to reduce the data to.
-
int n_neighbors#
The number of neighbors to use for the nearest neighbors graph.
-
bool norm_laplacian#
Whether to normalize the Laplacian matrix.
If true, uses the normalized graph Laplacian (D^(-1/2) L D^(-1/2)). If false, uses the unnormalized graph Laplacian (L = D - W). Normalized Laplacian often leads to better results for clustering tasks.
-
bool drop_first#
Whether to drop the first eigenvector.
The first eigenvector of the normalized Laplacian is constant and uninformative. Setting this to true drops it from the embedding. This is typically set to true when norm_laplacian is true.
-
uint64_t seed#
Random seed for reproducibility.
Controls the random number generation for k-NN graph construction and eigenvalue solver initialization. Use the same seed value to ensure reproducible results across runs.
-
int n_components#
Functions#
#include <cuvs/preprocessing/spectral_embedding.hpp>
namespace cuvs::preprocessing::spectral_embedding
- void transform(
- raft::resources const &handle,
- params config,
- raft::device_matrix_view<float, int, raft::row_major> dataset,
- raft::device_matrix_view<float, int, raft::col_major> embedding
Perform spectral embedding on input dataset.
This function computes the spectral embedding of the input dataset by:
Constructing a k-nearest neighbors graph from the input data
Computing the graph Laplacian (normalized or unnormalized)
Finding the eigenvectors corresponding to the smallest eigenvalues
Using these eigenvectors as the embedding coordinates
#include <raft/core/resources.hpp> #include <cuvs/preprocessing/spectral_embedding.hpp> raft::resources handle; // Set up parameters cuvs::preprocessing::spectral_embedding::params params; params.n_components = 2; params.n_neighbors = 15; params.norm_laplacian = true; params.drop_first = true; params.seed = 42; // Create input dataset (n_samples x n_features) auto dataset = raft::make_device_matrix<float, int>(handle, n_samples, n_features); // ... fill dataset ... // Create output embedding matrix (n_samples x n_components) auto embedding = raft::make_device_matrix<float, int, raft::col_major>( handle, n_samples, params.n_components); // Perform spectral embedding cuvs::preprocessing::spectral_embedding::transform( handle, params, dataset.view(), embedding.view());
- Parameters:
handle – [in] RAFT resource handle for managing CUDA resources
config – [in] Parameters controlling the spectral embedding algorithm
dataset – [in] Input dataset in row-major format [n_samples x n_features]
embedding – [out] Output embedding in column-major format [n_samples x n_components]
- void transform(
- raft::resources const &handle,
- params config,
- raft::device_coo_matrix_view<float, int, int, int> connectivity_graph,
- raft::device_matrix_view<float, int, raft::col_major> embedding
Perform spectral embedding using a precomputed connectivity graph.
This function computes the spectral embedding from a precomputed sparse connectivity graph (e.g., from a k-NN search or custom similarity matrix). This is useful when you want to use a custom graph construction method or when you have a precomputed similarity/affinity matrix.
The function:
Converts the COO matrix to the graph Laplacian
Computes eigenvectors of the Laplacian
Returns the eigenvectors as the embedding
#include <raft/core/resources.hpp> #include <cuvs/preprocessing/spectral_embedding.hpp> raft::resources handle; // Set up parameters cuvs::preprocessing::spectral_embedding::params params; params.n_components = 2; params.norm_laplacian = true; params.drop_first = true; params.seed = 42; // Assume we have a precomputed connectivity graph as COO matrix // connectivity_graph represents weighted edges between samples raft::device_coo_matrix<float, int, int, int> connectivity_graph(...); // Create output embedding matrix (n_samples x n_components) auto embedding = raft::make_device_matrix<float, int, raft::col_major>( handle, n_samples, params.n_components); // Perform spectral embedding cuvs::preprocessing::spectral_embedding::transform( handle, params, connectivity_graph.view(), embedding.view());
- Parameters:
handle – [in] RAFT resource handle for managing CUDA resources
config – [in] Parameters controlling the spectral embedding algorithm (n_neighbors parameter is ignored when using precomputed graph)
connectivity_graph – [in] Precomputed sparse connectivity/affinity graph in COO format representing weighted connections between samples
embedding – [out] Output embedding in column-major format [n_samples x n_components]
-
struct params
- #include <spectral_embedding.hpp>
Parameters for spectral embedding algorithm.
Spectral embedding is a dimensionality reduction technique that uses the eigenvectors of the graph Laplacian to embed data points into a lower-dimensional space. This technique is particularly useful for non-linear dimensionality reduction and clustering tasks.
Example Usage#
Basic Usage with Dataset#
#include <raft/core/resources.hpp>
#include <cuvs/preprocessing/spectral_embedding.hpp>
// Initialize RAFT resources
raft::resources handle;
// Configure spectral embedding parameters
cuvs::preprocessing::spectral_embedding::params params;
params.n_components = 2; // Reduce to 2D for visualization
params.n_neighbors = 15; // Local neighborhood size
params.norm_laplacian = true; // Use normalized Laplacian
params.drop_first = true; // Drop constant eigenvector
params.seed = 42; // For reproducibility
// Create input dataset (n_samples x n_features)
int n_samples = 1000;
int n_features = 50;
auto dataset = raft::make_device_matrix<float, int>(handle, n_samples, n_features);
// ... populate dataset with your data ...
// Allocate output embedding matrix (n_samples x n_components)
auto embedding = raft::make_device_matrix<float, int, raft::col_major>(
handle, n_samples, params.n_components);
// Perform spectral embedding
cuvs::preprocessing::spectral_embedding::transform(
handle, params, dataset.view(), embedding.view());
Using Precomputed Graph#
#include <raft/core/resources.hpp>
#include <cuvs/preprocessing/spectral_embedding.hpp>
raft::resources handle;
// Configure parameters (n_neighbors is ignored with precomputed graph)
cuvs::preprocessing::spectral_embedding::params params;
params.n_components = 3;
params.norm_laplacian = true;
params.drop_first = true;
params.seed = 42;
// Assume we have a precomputed connectivity graph
// This could be from custom similarity computation or k-NN search
raft::device_coo_matrix<float, int, int, int> connectivity_graph(...);
// Allocate output embedding
auto embedding = raft::make_device_matrix<float, int, raft::col_major>(
handle, n_samples, params.n_components);
// Perform spectral embedding with precomputed graph
cuvs::preprocessing::spectral_embedding::transform(
handle, params, connectivity_graph.view(), embedding.view());