Get Mystery Box with random crypto!

​​Scaling Vision Transformers to 22 Billion Parameters Google | Data Science by ODS.ai 🦜

​​Scaling Vision Transformers to 22 Billion Parameters

Google Research authors present a recipe for training a highly efficient and stable Vision Transformer (ViT-22B) with 22B parameters, the largest dense ViT model to date. Experiments reveal that as the model's scale increases, its performance on downstream tasks improves. Additionally, ViT-22B shows an improved tradeoff between fairness and performance, state-of-the-art alignment with human visual perception in terms of shape/texture bias, and improved robustness. The authors suggest that ViT-22B demonstrates the potential for achieving “LLM-like” scaling in vision models and takes important steps toward that goal.

Paper: https://arxiv.org/abs/2302.05442

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-vit-22

#deeplearning #cv #transformer #sota