Comparing Test Sets with Item Response Theory In this paper, | Data Science Digest
Comparing Test Sets with Item Response Theory
In this paper, Clara Vania et al. use the Item Response Theory to evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models, while SNLI, MNLI, and CommitmentBank seem to be saturated for current strong models.
https://bit.ly/3xHDkJj