NÜWA: Visual Synthesis Pre-training for Neural visUal World | Data Science by ODS.ai 🦜

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

In this paper, Microsoft Research Asia and Peking University researchers share a unified multimodal (texts, images, videos, sketches) pre-trained model called NÜWA that can generate new or manipulate existing visual data for various visual synthesis tasks. Furthermore, they have designed a 3D transformer encoder-decoder framework with a 3D Nearby Attention (3DNA) mechanism to consider the nature of the visual data and reduce the computational complexity.

NÜWA achieves state-of-the-art results on text-to-image generation, text-to-video generation, video prediction, and several other tasks and demonstrates good results on zero-shot text-guided image and video manipulation tasks.

Paper: https://arxiv.org/abs/2111.12417
Code: https://github.com/microsoft/NUWA

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-nuwa

#deeplearning #cv #transformer #pretraining

Data Science by ODS.ai 🦜

👨‍🦯 51.70K
Technologies

First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of f...

Join
▲ Vote (1)

​​NÜWA: Visual Synthesis Pre-training for Neural visUal World | Data Science by ODS.ai 🦜

Login

NÜWA: Visual Synthesis Pre-training for Neural visUal World | Data Science by ODS.ai 🦜