Transformer Module Optimization Article on how to apply diffe | Data Science by ODS.ai 🦜

Transformer Module Optimization

Article on how to apply different methods to make your transformer network up to 10x smaller and faster:

- Plain model optimization and PyTorch tricks;
- How and why to use FFT instead of self-attention;
- Model Factorization and quantization;

https://habr.com/ru/post/563778/

#deep_learning