Get Mystery Box with random crypto!

​​Next-ViT: Next Generation Vision Transformer for Efficient D | Data Science by ODS.ai 🦜

​​Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios

While vision transformers demostrate high performance, they can't be deployed as efficiently as CNNs in realistic industrial deployment scenarios, e. g. TensorRT or CoreML.

The authors propose Next-ViT, which has a higher latency/accuracy trade-off than existing CNN and ViT models. They develop two new architecture blocks and a new paradigm to stack them. As a result, On TensorRT, Next-ViT surpasses ResNet by 5.4 mAP (from 40.4 to 45.8) on COCO detection and 8.2% mIoU (from 38.8% to 47.0%) on ADE20K segmentation. Also, it achieves comparable performance with CSWin, while the inference speed is accelerated by
3.6×. On CoreML, Next-ViT surpasses EfficientFormer by 4.6 mAP (from 42.6 to 47.2) on COCO detection and 3.5% mIoU (from 45.2% to 48.7%) on ADE20K segmentation under similar latency.

Paper: https://arxiv.org/abs/2207.05501

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-next-vit

#deeplearning #cv #transformer #computervision