How much smaller can you make your LM with overtraining? This | Speech Technology

How much smaller can you make your LM with overtraining?

This figure from Chinchilla gives you a clue on what to expect. Say, you have C = 6e20.

If N = 350M, it performs on par with L_opt of C = 1e20 (N_opt = 900M).

=> 6x training FLOPS for 2.5x less inference FLOPS

https://twitter.com/arankomatsuzaki/status/1630257908238696449

Speech Technology

💩 652
Technologies

Join
▲ Vote (1)

How much smaller can you make your LM with overtraining? This | Speech Technology

Login