Addressing Randomness in Evaluation Protocols for Out-of-Distribution Detection

Our Paper Addressing Randomness in Evaluation Protocols for Out-of-Distribution Detection has been accepted at the ICJAI 2021 Workshop for Artificial Intelligence for Anomalies and Novelties.

In summary, we investigated the following phenomenon: when you train neural networks several times, and then measure their performance on some task, there is a certain variance in the performance measurements, since the results of experiments may vary based on several factors (that are effectively controlled by the random seed). We investigated how the performance measures for several evaluation protocols used in Anomaly Detection, Out-of-Distribution Detection, Open Set Recognition (OSR) and related fields vary when the random seed is varied.

In some of these fields, like OSR, it is common to measure the average performance over 3-5 experiments. Is this sufficient to draw reliable conclusions regarding a possible performance difference between methods?

We found that the variance is so large that it may, in fact, not. Consequentially, experiments based on too few random seed might provide a brittle foundation for conclusions. We the argue that such experiments should rather be seen as a fundamentally random process. Therefore, we should measure the expected value of the performance $\mathbb{E}_{x \sim p} [ f(x) ] $ where $p$ is the distribution of the random seeds and $f$ is an experimental setting.

Given a set of measurements, we can use statistical tests to determine if an observed difference can be considered significant. However, we found that in some cases even 1000 experiments were insufficient to infer significant differences in the results.

Masters Thesis (PDF). If an AI gives you a weird explanation for its prediction, you should remain septical about the accuracy of the prediction. Sounds reasonable? This was the general idea of my masters thesis, which was originally titled Self-Assessment of Visual Recognition Systems based on Attribution. Today, I would call it Explanation-based Anomaly Detection in Deep Neural Networks. The general idea was to use attribution-based explanation methods to detect anomalies (such as unusual …

Addressing Randomness in Evaluation Protocols for Out-of-Distribution Detection

Related Posts