membership_vector#

cuml.cluster.hdbscan.membership_vector(clusterer, points_to_predict, int batch_size=4096, convert_dtype=True)[source]#

Predict soft cluster membership. The result produces a vector for each point in points_to_predict that gives a probability that the given point is a member of a cluster for each of the selected clusters of the clusterer.

Parameters:
clustererHDBSCAN

A clustering object that has been fit to the data and either had prediction_data=True set, or called the generate_prediction_data method after the fact.

points_to_predictarray, or array-like (n_samples, n_features)

The new data points to predict cluster labels for. They should have the same dimensionality as the original dataset over which clusterer was fit.

batch_sizeint, optional, default=min(4096, n_points_to_predict)

Lowers memory requirement by computing distance-based membership in smaller batches of points in the prediction data. For example, a batch size of 1,000 computes distance based memberships for 1,000 points at a time. The default batch size is 4,096.

Returns:
membership_vectorsarray (n_samples, n_clusters)

The probability that point i is a member of cluster j is in membership_vectors[i, j].