Gradient Dude

Channel address:

Categories: Technologies

Language: English

Subscribers: 2.46K

Description from channel

TL;DR for DL/CV/ML/AI papers from an author of publications at top-tier AI conferences (CVPR, NIPS, ICCV,ECCV).
Most ML feeds go for fluff, we go for the real meat.
YouTube: youtube.com/c/gradientdude
IG instagram.com/gradientdude

▲ Vote (1)

Ratings & Reviews

4.00

3 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

4 stars

3 stars

2 stars

1 stars

The latest Messages 3

2021-06-21 06:11:32

(1) High-level scheme of our method and (2) some more results.

772 views03:11

Open / Comment

2021-06-21 06:11:32

Just a small announcement
Our new (with Facebook AI Research) #CVPR21 paper is out!

Discovering Relationships between Object Categories via Universal Canonical Maps

TL;DR: Densepose method for Animals on Steroids which as a byproduct can automatically discover correspondences between 3D shapes of animals using novel cycle losses.

I will present the paper Today (21.06) at 11am EDT / 5PM CET. Feel free to join live Q&A session and ask me a question .

Project page
Video explanation
Paper
Source code

783 views03:11

Open / Comment

2021-06-15 23:07:44

This is the architecture. Content encoder encodes text, Style encoder extracts style and Generator generates stylized text conditioned on a style vector.

653 views20:07

Open / Comment

2021-06-15 23:07:44 Facebook AI has built a system called TextStyleBrush that can replace text both in scenes and handwriting — in one shot — using only a single example word.
The model was made self-supervised because it is utterly hard to collect labeled pairs of text in different conditions, and to annotate the segmentation masks for text (although I think it can be done using synthetic generation).

The model is trained to understand unlimited text styles for not just different typography and calligraphy, but also for different transformations, like rotations, curved text, and deformations that happen between paper and pen when handwriting; background clutter; and image noise. The main idea is to disentangle the content of a text image from all aspects of the appearance of the entire word box. The representation of the overall appearance can then be applied as a one-shot-transfer without retraining on the novel source style samples.

The model consists of a style encoder, content encoder, and stylized text generator (plus a bunch of losses).
The generator architecture is based on the StyleGAN2 model. However, the design of StyleGAN2 has an important limitation: StyleGAN2 is an unconditional model, meaning it generates images by sampling a random latent vector. For generating photo-realistic text images, however, one needs to control the output based on two separate sources: the desired text content and style. This is solved by extracting layer-specific style information and injecting it at each layer of the generator (it is some sort of conditional instance normalization).

The losses are the following: 1) reconstruction and cycle loss; 2) Discriminator real/fake; 3) Recognizer - the network that recognizes text on the stylized image and makes sure that no content is lost; 4) Typeface classifier - a pretrained network that measures how well the generator captures the style of input.

Results are quite striking!
Now imagine how you drive through the busy streets of Hong Kong and see street signs projected on the windshield of your car and translated online. Or one day used we will send personalized messages by generating some creative images with the text embedded in them (instead of stickers).

Blogpost
Paper

695 views20:07

Open / Comment

2021-06-12 01:27:07

Chinese researchers are very fond of doing extensive surveys of a particular sub-field of machine learning, listing the main works and the major breakthrough ideas. There are so many articles published every day, and it is impossible to read everything. Therefore, such reviews are valuable (if they are well written, of course, which is quite rare).

Recently there was a very good paper reviewing various variants of Transformers with a focus on language modeling (NLP). This is a must-read for anyone getting into the world of NLP and interested in Transformers. The paper discusses the basic principles of self-attention and such details of modern variants of Transformers as architecture modifications, pre-training, and various applications.

Paper: A Survey of Transformers.

771 views22:27

Open / Comment

2021-05-12 18:43:28

Scheme of the Denoising Diffusion Probablistic Model.

Sampling process goes from left to right, while Diffusion goes from right to left by gradually adding noise to the input.

598 views15:43

Open / Comment

2021-05-12 18:43:28

Another cool work from OpenAI: Diffusion Models Beat GANs on Image Synthesis.
New SOTA for image generation on ImageNet

A new type of generative models is proposed - Diffusion Probabilistic Model. The diffusion model is a parameterized Markov chain trained using variational inference to generate samples matching data after finite time. The diffusion process here is a Markov chain that gradually adds noise to the data in the opposite direction of sampling until signal is destroyed. So here we are learning reverse transitions in this chain, which reverse the diffusion process. And of course, we parameterize everythin with neural networks.

It produces very high-quality generations, even better than with GANs (it is especially clearly seen on the man with a fish, who is not that spectacular in the BigGAN model). The current disadvantage of diffusion models is slow training and inference.

Paper
Code

698 views15:43

Open / Comment

2021-05-07 14:41:50

Moore's law is still working. Yesterday IBM has announced that they created the first 2nm chip!

They claim that their 2nm development will improve performance by 45% at the same power, or 75% energy at the same performance, compared to modern 7nm processors (e.g., Intel's).

IBM is one of the world’s leading research centers on future semiconductor technology, but they have sold its manufacturing to GlobalFoundries in 2014 so currently, IBM only develops IP in collaboration with others (Samsung and recently announced Intel) for their manufacturing facilities.

The latest NVIDIA GPUs based on Ampere microarchitecture (2020) use TSMC 7 nm fabrication process. TSMC's 3nm is already entering into production in 2022. But when is IBM/Intel's 2nm even coming? I'm also curious if Intel can even manage their 5nm chips by 2024/25.

Source article.

678 views11:41

Open / Comment

2021-05-04 16:14:15

Snap has released a new model for animating the entire human body (not just the face). Looks pretty good.

The principle is similar to their previous method - First order motion model for animation of heads. The difference is that (a) the background motion is explicitly modeled here; and (b) instead of regressing local affine transformations for a set of keypoints, this method learns to find heatmaps of different body parts in unsupervised way and
the transformation matrix of each body part is computed by applying principal component analysis (PCA) to the predicted heatmaps.

More details on the project website. Most importantly, there is code and pretrained weights. So go ahead and animate!

P.S. 2 years ago another method for animating the whole body "Everybody Dance Now" was released, but there you had to retrain the network for each new person.

815 views13:14

Open / Comment

2021-04-24 13:00:16

Infinite image generation and resampling

This method can generate infinite images of diverse and complex scenes that transition naturally from one into another. It does so without any conditioning and trains without any supervision from a dataset of unrelated square images.

You can check an interactive demo on the project website.

Paper

1.5K views10:00

Open / Comment

Gradient Dude

Ratings & Reviews

The latest Messages 3

Popular Channels

Related Chats

Popular Channels

Login