When we draw conclusions from an experiment, we would like the conclusions to also hold in a different experimental setting; that is: slight changes in the initial conditions should not falsify our hypotheses about the underlying mechanism.
In experiments involving deep learning models, it has been observed that results can vary drastically, depending on different sources of non-determinism that are often not accounted for, which can ultimately lead to findings that are not reproducible. My research in this area focuses on investigating these effects and improving and incentivizing the reproducibility of experiments.