Long-Short Transformer: Efficient Transformers for Language | Data Science by ODS.ai 🦜

Long-Short Transformer: Efficient Transformers for Language and Vision

This paper offers a new approach to solving the problem of quadratic time and memory complexities of self-attention in Transformers. The authors propose Long-Short Transformer (Transformer-LS), an efficient self-attention mechanism for modeling long sequences with linear complexity for both language and vision tasks. It aggregates a novel long-range attention with dynamic projection to model distant correlations and a short-term attention to capture fine-grained local correlations. A dual normalization is used to deal with the scale mismatch between the two attention mechanisms. Transformer-LS can be applied to both autoregressive and bidirectional models without additional complexity.

This method outperforms the state-of-the-art models on multiple tasks in language and vision domains. For instance, Transformer-LS achieves 0.97 test BPC on enwik8 using half the number of parameters than previous methods, while being faster and is able to handle 3× as long sequences. On ImageNet, it can obtain 84.1% Top-1 accuracy, while being more scalable on high-resolution images.

Paper: https://arxiv.org/abs/2107.02192

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-transformerls

#deeplearning #cv #nlp #transformer #attention

Data Science by ODS.ai 🦜

👨‍🎤 51.70K
Technologies

First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of f...

Join
▲ Vote (1)

​​Long-Short Transformer: Efficient Transformers for Language | Data Science by ODS.ai 🦜

Login

Long-Short Transformer: Efficient Transformers for Language | Data Science by ODS.ai 🦜