Scaling Vision Transformers to 22 Billion Parameters Google | Data Science by ODS.ai 🦜
Scaling Vision Transformers to 22 Billion Parameters
Google Research authors present a recipe for training a highly efficient and stable Vision Transformer (ViT-22B) with 22B parameters, the largest dense ViT model to date. Experiments reveal that as the model's scale increases, its performance on downstream tasks improves. Additionally, ViT-22B shows an improved tradeoff between fairness and performance, state-of-the-art alignment with human visual perception in terms of shape/texture bias, and improved robustness. The authors suggest that ViT-22B demonstrates the potential for achieving “LLM-like” scaling in vision models and takes important steps toward that goal.
Paper: https://arxiv.org/abs/2302.05442
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-vit-22
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of f...