Get Mystery Box with random crypto!

Comparing Test Sets with Item Response Theory In this paper, | Data Science Digest

Comparing Test Sets with Item Response Theory

In this paper, Clara Vania et al. use the Item Response Theory to evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models, while SNLI, MNLI, and CommitmentBank seem to be saturated for current strong models.

https://bit.ly/3xHDkJj