approximate_predict#

cuml.cluster.hdbscan.approximate_predict(clusterer, points_to_predict, convert_dtype=True)[source]#

Predict the cluster label of new points. The returned labels will be those of the original clustering found by clusterer, and therefore are not (necessarily) the cluster labels that would be found by clustering the original data combined with points_to_predict, hence the ‘approximate’ label.

If you simply wish to assign new points to an existing clustering in the ‘best’ way possible, this is the function to use. If you want to predict how points_to_predict would cluster with the original data under HDBSCAN the most efficient existing approach is to simply recluster with the new point(s) added to the original dataset.

Parameters:
clustererHDBSCAN

A clustering object that has been fit to the data and had prediction_data=True set.

points_to_predictarray, or array-like (n_samples, n_features)

The new data points to predict cluster labels for. They should have the same dimensionality as the original dataset over which clusterer was fit.

Returns:
labelsarray (n_samples,)

The predicted labels of the points_to_predict

probabilitiesarray (n_samples,)

The soft cluster scores for each of the points_to_predict