PermutationExplainer#

class cuml.explainer.PermutationExplainer(*, model, data, masker_type='independent', link='identity', is_gpu_model=None, random_state=None, dtype=<class 'numpy.float32'>, output_type=None, verbose=False)#

GPU accelerated version of SHAP’s PermutationExplainer

cuML’s SHAP based explainers accelerate the algorithmic part of SHAP. They are optimized to be used with fast GPU based models, like those in cuML. By creating the datasets and internal calculations, alongside minimizing data copies and transfers, they can accelerate explanations significantly. But they can also be used with CPU based models, where speedups can still be achieved, but those can be capped by factors like data transfers and the speed of the models.

PermutationExplainer is algorithmically similar and based on the Python SHAP package kernel explainer: slundberg/shap

This method approximates the Shapley values by iterating through permutations of the inputs. From the SHAP library docs: it guarantees local accuracy (additivity) by iterating completely through entire permutations of the features in both forward and reverse directions.

Current characteristics of the GPU version:

  • Only tabular data is supported for now, via passing the background dataset explicitly.

  • Hierarchical clustering for Owen values are planned for the near future.

  • Sparse data support is planned for the near future.

Setting the random seed:

This explainer uses CuPy to generate the permutations that are used, so to have reproducible results use CuPy’s seeding mechanism.

Parameters:
modelfunction

A callable python object that executes the model given a set of input data samples.

maskerDense matrix containing floats or doubles.

cuML’s permutation SHAP supports tabular data for now, so it expects a background dataset, as opposed to a shap.masker object. To respect a hierarchical structure of the data, use the (temporary) parameter masker_type Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

masker_type: {‘independent’, ‘partition’} default = ‘independent’

If ‘independent’ is used, then this is equivalent to SHAP’s independent masker and the algorithm is fully GPU accelerated. If ‘partition’ then it is equivalent to SHAP’s Partition masker, which respects a hierarchical structure in the background data.

linkfunction or str (default = ‘identity’)

The link function used to map between the output units of the model and the SHAP value units. From the SHAP package: The link function used to map between the output units of the model and the SHAP value units. By default it is identity, but logit can be useful so that expectations are computed in probability units while explanations remain in the (more naturally additive) log-odds units. For more details on how link functions work see any overview of link functions for generalized linear models.

gpu_modelbool or None (default = None)

If None Explainer will try to infer whether model can take GPU data (as CuPy arrays), otherwise it will use NumPy arrays to call model. Set to True to force the explainer to use GPU data, set to False to force the Explainer to use NumPy data.

dtypenp.float32 or np.float64 (default = np.float32)

Parameter to specify the precision of data to generate to call the model.

output_type‘cupy’ or ‘numpy’ (default = ‘numpy’)

Parameter to specify the type of data to output. If not specified, the explainer will default to ‘numpy’ for the time being to improve compatibility.

Methods

shap_values(self, X[, npermutations, as_list])

Interface to estimate the SHAP values for a set of samples.

Examples

>>> from cuml import SVR
>>> from cuml import make_regression
>>> from cuml import train_test_split

>>> from cuml.explainer import PermutationExplainer

>>> X, y = make_regression(
...     n_samples=102,
...     n_features=10,
...     noise=0.1,
...     random_state=42)
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X,
...     y,
...     test_size=2,
...     random_state=42)
>>> model = SVR().fit(X_train, y_train)

>>> cu_explainer = PermutationExplainer(
...     model=model.predict,
...     data=X_train,
...     random_state=42)

>>> cu_shap_values = cu_explainer.shap_values(X_test)
>>> cu_shap_values
array([[ 0.16611198, 0.74156773, 0.05906528,  0.30015892, 2.5425286 ,
        0.0970122 , 0.12258395, 2.1998262 , -0.02968234, -0.8669155 ],
    [-0.10587756,  0.77705824, -0.08259875, -0.71874434,  1.781551  ,
        -0.05454511, 0.11826539, -1.1734306 , -0.09629871, 0.4571011]],
    dtype=float32)
shap_values(self, X, npermutations=10, as_list=True, **kwargs)[source]#

Interface to estimate the SHAP values for a set of samples. Corresponds to the SHAP package’s legacy interface, and is our main API currently.

Parameters:
XDense matrix containing floats or doubles.

Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

npermutationsint (default = 10)

Number of times to cycle through all the features, re-evaluating the model at each step. Each cycle evaluates the model function 2 * (# features + 1) times on a data matrix of (# background data samples) rows. An exception to this is when PermutationExplainer can avoid evaluating the model because a feature’s value is the same in X and the background dataset (which is common for example with sparse features).

as_listbool (default = True)

Set to True to return a list of arrays for multi-dimensional models (like predict_proba functions) to match the SHAP package shap_values API behavior. Set to False to return them as an array of arrays.

Returns:
shap_valuesarray or list