🔥 Burn Fat Fast. Discover How! 💪

How much smaller can you make your LM with overtraining? This | Speech Technology

How much smaller can you make your LM with overtraining?

This figure from Chinchilla gives you a clue on what to expect. Say, you have C = 6e20.

If N = 350M, it performs on par with L_opt of C = 1e20 (N_opt = 900M).

=> 6x training FLOPS for 2.5x less inference FLOPS

https://twitter.com/arankomatsuzaki/status/1630257908238696449