CLX LODA Anomaly Detection

This is an introduction to CLX LODA Anomaly Detection.

Introduction

Anomaly detection is an important problem that has been studied within wide areas and application domains. Several anomaly detection algorithms are generic while many are developed specifically to the domain of interest. In practice, several ensemble-based anomaly detection algorithms have been shown to have superior performance on many benchmark datasets, namely Isolation Forest, Lightweight Online Detector of Anomalies (LODA), and an ensemble of Gaussian mixture models …etc.

The LODA algorithm is one of the good performing generic anomaly detection algorithms. LODA detects anomalies in a dataset by computing the likelihood of data points using an ensemble of one-dimensional histograms.

How to train LODA Anomaly Detection model

First initialize your new model

[1]:
from clx.analytics.loda import Loda

# n_bins: Number of bins in each histogram
# n_random_cuts: Number of random cut projections
loda_ad = Loda(n_bins=None, n_random_cuts=100)

Next, train your LODA Anomaly detector. The below example uses a random 100 measurements each with 5 features for demonstration only. Ideally you will want a larger training set. For in-depth example view this Jupyter Notebook.

[2]:
import cupy as cp

x = cp.random.randn(100,5)
loda_ad.fit(x)

Evaluate detector

[3]:
score = loda_ad.score(x) #generate nll scores
[4]:
#The scores is supposed to be b/n 0 & +inf, here we are considering negative log likelihood values as score.
score
[4]:
array([0.02640843, 0.02768199, 0.03147859, 0.04195542, 0.03008146,
       0.04265076, 0.02641434, 0.03313721, 0.02657809, 0.03276536,
       0.03662108, 0.0583036 , 0.03570133, 0.03573123, 0.03812958,
       0.03527145, 0.03877153, 0.03861965, 0.02603266, 0.02552872,
       0.03931126, 0.04589576, 0.03404597, 0.03120982, 0.03824891,
       0.03919073, 0.02837009, 0.04301027, 0.04108723, 0.04884877,
       0.03536417, 0.02920567, 0.07190172, 0.03709241, 0.02731557,
       0.03456131, 0.03576248, 0.02761115, 0.02706673, 0.03896215,
       0.03297062, 0.02895744, 0.05466592, 0.03186817, 0.02556774,
       0.0467988 , 0.03416863, 0.03966398, 0.02644935, 0.02726812,
       0.03451503, 0.04112752, 0.02732916, 0.02549557, 0.04406088,
       0.02979328, 0.02567999, 0.03121529, 0.02728798, 0.03653633,
       0.04039105, 0.02558433, 0.04804431, 0.03105184, 0.03728624,
       0.03818692, 0.02752913, 0.03663195, 0.03334081, 0.04295748,
       0.03328309, 0.04036   , 0.03819137, 0.02858496, 0.02836143,
       0.03043277, 0.0426332 , 0.04071417, 0.03029431, 0.03222212,
       0.04773601, 0.02946283, 0.02881164, 0.0451702 , 0.03133729,
       0.02639464, 0.02710749, 0.03917251, 0.0270029 , 0.03356491,
       0.02848636, 0.02776215, 0.03230875, 0.04161201, 0.0435702 ,
       0.04083894, 0.05152136, 0.0432116 , 0.03471389, 0.02888558])

Explanation of anomalies

To explain the cause of anomalies LODA utilize contributions of each feature across the histograms.

[5]:
feature_explanation = loda_ad.explain(x[5])
[6]:
print("Feature importance scores: {}".format(feature_explanation.ravel()))
Feature importance scores: [0.80967783 0.72935663 0.22464838 1.         0.        ]

Conclusion

This example shows GPU implementation of LODA algorithm for anomaly detection and explanation. Users can experiment with other datasets and evaluate the model implementation to identify anomalies and explain the features using RAPDIS.

Reference