I’m currently taking a look at Meta’s paper “The Llama 3 Herd of Models”, I am particularly interested in this particular snippet on contamination analysis where they talk about a paper referred to as “Singh et al” I have made quite the effort to find this paper, I looked up the researchers and have found nothing, has anyone found, or know how to find this paper? Any help is really appreciated!
5.1.4 Contamination Analysis
We conduct a contamination analysis to estimate to what extent benchmark scores may be influenced by contamination of the evaluation data in the pre-training corpus. In previous work, several different contamination methods have been used, with various different hyperparameters – we refer to Singh et al. (2024) for an overview. Any of these methods can suffer from false positives and negatives, and how to best run contamination analyses is currently still an open field of research. Here, we largely follow the suggestions of Singh et al. (2024).