Multi-Class Hypersphere Anomaly Detection (MCHAD)

Our Paper Multi-Class Hypersphere Anomaly Detection (MCHAD) has been accepted for presentation at the ICPR 2022. In summary, we propose a new loss function for learning neural networks that are able to detect anomalies in their inputs.

MACHAD is available via pytorch-ood. You can find example code here.

How does it work? §

The general idea is that we want a neural network $f_{\theta}: \mathcal{X} \rightarrow \mathcal{Z}$ that maps inputs from the input space to some lower dimensional representation in such a way that points from class $y$ cluster around a hypersphere with center $\mu_y$ in the output space. Because the neural network can learn non-linear functions, the classes in the input space can have arbitrarily complex shapes.

To train this neural network, we optimize its parameters $\theta$ to minimize a loss function. We then hope that the model only maps points from the known classes into the spheres of the corresponding spheres, while other points that are dissimilar to the training data (i.e., anomalies) are mapped further away because the model never learned to map these points close to the centers.

Omitting some details, the loss function we propose has three different components, each of which we will explain in the following.

Intra-Class Variance §

We want the representations $f(x)$ of one class to cluster as tightly around a class center $\mu_y$ as possible. For this, we can use the Intra class variance loss, which is defined as:

$$ \mathcal{L}_{\Lambda}(x,y) = \Vert \mu_y - f(x) \Vert^2 $$

Inter-Class Variance §

A trivial solution to minimize $ \mathcal{L}_{\Lambda}$ would be to map all inputs to the same point, which would lead to the collapse of the model. To prevent this, we have to add a second term that ensures that the points remain separable:

$$ \mathcal{L}_{\Delta}(x,y) = \log \left( 1 + \sum e^{ \Vert \mu_y - f(x) \Vert^2 - \Vert\mu_j - f(x) \Vert^2} \right) $$

This expression might seem somewhat random, but it can, in fact, be derived from the method of maximum likelihood.

Extra-Class Variance §

Sometimes, we have a set of example outliers at hand. Previous work showed that the robustness of models can be significantly improved by including these in the optimization. Therefore, we can add a term that incentivizes such outliers to be mapped sufficiently far away from the class centers:

$$ \mathcal{L}_{\Theta}(x) = \max \lbrace 0, r_y^2 - \Vert \mu_y - f(x) \Vert^2 \rbrace $$

where $x$ is some outlier and $r_y$ is some class conditional radius. This term can also be applied to other methods that aim to learn spherical clusters in their output space. We refer to it as Generalized MCHAD.

How well does it work? §

Our experiments found that both MCHAD and Generalized MCHAD outperform other hypersphere learning methods. In ablations studies, we also investigated the influence of each of the loss terms and demonstrated that all of them contribute to the overall performance regarding discriminative power on normal data and the ability to detect anomalies.

Our companion paper, On Challenging Aspects of Reproducibility in Deep Anomaly Detection, has been accepted for presentation at the Fourth Workshop on Reproducible Research in Pattern Recognition (satellite event of ICPR 2022). In it, we discuss aspects of reproducibility for our anomaly detection algorithm MCHAD, as well as anomaly detection with deep neural networks in general. In particular, we discussed the following challenges for the reproducibility: Nondeterminism: conducting the same …

Our paper, PyTorch-OOD: A library for Out-of-Distribution Detection based on PyTorch, has been presented at the CVPR 2022 Workshops. You can find the most recent version of the Python source code on GitHub. Abstract § Machine Learning models based on Deep Neural Networks behave unpredictably when presented with inputs that do not stem from the training distribution and sometimes make egregiously wrong predictions with high confidence. This property undermines the trustworthiness of systems …