Transformer Module Optimization Article on how to apply different methods to make your transformer network up to 10x smaller and faster: - Plain model optimization and PyTorch tricks; - How and why to use FFT instead of self-attention; - Model Factorization and quantization; https://habr.com/ru/post/563778/ #deep_learning 14.5K views07:18