Kandinsky 2.1by Sber & AIRI
The main features:
- 3.3B parameters
- generation resolution - 768x768
- image prior transformer
- new MoVQ image autoencoder
- doing a cleaner set of 172M text-image pairs
- work modes: generate by text, blend image, generate images by pattern, change images by text, inpainting/outpainting
The FID on the COCO_30k dataset reaches 8.21
Few posts where compare Kandinsky 2.1 with another similar models
- https://t.me/dushapitona/643
- https://t.me/antidigital/6153
Habr: https://habr.com/ru/companies/sberbank/articles/725282/
Telegram-bot: https://t.me/kandinsky21_bot
ruDALL-E: https://rudalle.ru/
MLSpace: https://sbercloud.ru/ru/datahub/rugpt3family/kandinsky-2-1
GH: https://github.com/ai-forever/Kandinsky-2
HF model: https://huggingface.co/ai-forever/Kandinsky_2.1
HF space: https://huggingface.co/spaces/ai-forever/Kandinsky2.1
FusionBrain: https://fusionbrain.ai/diffusion