Gradient Dude

Channel address:

Categories: Technologies

Language: English

Subscribers: 2.46K

Description from channel

TL;DR for DL/CV/ML/AI papers from an author of publications at top-tier AI conferences (CVPR, NIPS, ICCV,ECCV).
Most ML feeds go for fluff, we go for the real meat.
YouTube: youtube.com/c/gradientdude
IG instagram.com/gradientdude

▲ Vote (1)

Ratings & Reviews

4.00

3 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

4 stars

3 stars

2 stars

1 stars

The latest Messages 4

2021-04-23 15:31:43

Researchers from Berkeley rolled out VideoGPT - a transformer that generates videos.

The results are not super "WOW", but the architecture is quite simple and now it can be a starting point for all future work in this direction. As you know, GPT-3 for text generation was also not built right away. So let's will wait for method acceleration and quality improvement.

Paper
Code
Project page
Demo

11.1K views12:31

Open / Comment

2021-04-14 21:46:31

I have disappeared for a couple of days and now I'm happy to announce that yesterday I defended my PhD in Computer Vision!

So more high quality posts are coming!

1.6K views18:46

Open / Comment

2021-04-11 19:31:31

Main experiments

Pretrain on Imagenet -> finetune on COCO or PASCAL:
1. Pretrain on Imagenet in a self-supervised regime using the proposed DetCon approach.
2. Use the self-supervised pretraining of the backbone to initialize Mask-RCNN and fine-tune it with GT labels for 12 epochs on COCO or 45 epochs on PASCAL (Semantic Segmentation).
3. Achieve SOTA results while using 5x fewer pretraining epochs than SimCLR.

Pretrain on COCO -> finetune on PASCAL for Semantic Segmentation task:
1. Pretrain on COCO in self-supervised regime using the proposed DetCon approach.
2. Use the self-supervised pretraining of the backbone to initialize Mask-RCNN and fine-tune it with GT labels for 45 epochs on PASCAL (Semantic Segmentation).
3. Achieve SOTA results while using 4x fewer pretraining epochs than SimCLR.
5. The first time a self-supervised pretrained ResNet-50 backbone outperforms supervised pretraining on COCO.

Paper: Efficient Visual Pretraining with Contrastive Detection

2.0K views16:31

Open / Comment

2021-04-11 19:31:30 DetCon: The Self-supervised Contrastive Detection Method
DeepMind

A new self-supervised objective, contrastive detection, which tasks representations with identifying object-level features across augmentations.

Object-based regions are identified with an approximate, automatic segmentation algorithm based on pixel affinity (bottom). These masks are carried through two stochastic data augmentations and a convolutional feature extractor, creating groups of feature vectors in each view (middle). The contrastive detection objective then pulls together pooled feature vectors from the same mask (across views) and pushes apart features from different masks and different images (top).

Highlights
+ SOTA detection and Instance Segmentation (on COCO) and Semantic Segmentation results (on PASCAL) when pretrained in self-supervised regime on ImageNet, while requiring up to 5× fewer epochs than SimCLR.
+ It also outperforms supervised pretraining on Imagenet.
+ DetCon(SimCLR) converges much faster to reach SOTA: 200 epochs are sufficient to surpass supervised transfer to COCO, and 500 to PASCAL.
+ Linear increase in the number of model parameters (using ResNet-101, ResNet-152, and ResNet-200) brings a linear increase in the accuracy on downstream tasks.
+ Despite only being trained on ImageNet, DetCon(BYOL) matches the performance of Facebook's SEER model that used a higher capacity RegNet architecture and was pretrained on 1 Billion Instagram images.
+ First time a ResNet-50 with self-supervised pretraining on COCO outperforms the supervised pretraining for Transfer to PASCAL
+ The power of DetCon strongly correlates with the quality of the masks. The better the masks used during the self-supervised pretraining stage, the better the accuracy on downstream tasks.

Method details
DetConS and DetConB, based on two recent self-supervised baselines: SimCLR and BYOL respectively with ResNet-50 backbone.
Authors adopt the data augmentation procedure and network architecture from these methods while applying the proposed Contrastive Detection loss to each.

Each image is randomly augmented twice, resulting in two images: x, x'.
In addition, they compute for each image a set of masks that segment the image into different components.
These masks can be computed using efficient, off-the-shelf, unsupervised segmentation algorithms. In particular, authors use Felzenszwalb-Huttenlocher algorithm a classic segmentation procedure that iteratively merges regions using pixel-based affinity. This algorithm does not require any training and is available in scikit-image. If available, human-annotated segmentations can also be used instead of automatically generated. Each mask (represented as a binary image) is transformed using the same cropping and resizing as used for the underlying RGB image, resulting in two sets of masks {m}, {m'} which are aligned with the augmented images x, x'.

For every mask m associated with the image, authors compute a mask-pooled hidden vector (i.e., similar to regular average pooling but applied only to spatial locations belonging to the same mask).
Then 2-layer MLP is used as a projection on top of the mask-pooled hidden vectors. Note that if you replace masked-pooling with a single global average pooling then you will get exactly SimCLR or BYOL architecture.

Standard contrastive loss based on cross-entropy is used for learning. Positive pair is the latent representations of the same mask from augmented views x and x'. Latent representations of different masks from the same image and from different images in the batch are used as negative samples. Moreover, negative masks are allowed to overlap with a positive one.

1.5K views16:31

Open / Comment

2021-04-11 17:03:34 Self-supervision paper from arxiv for histopathology CV.

Authors draw inspiration from the process of how histopathologists tend to review the images, and how those images are stored. Histopathology images are multiscale slices of enormous size (tens of thousands pixels by one side), and area experts constantly move through different levels of magnification to keep in mind both fine and coarse structures of the tissue.

Therefore, in this paper the loss is proposed to capture relation between different magnification levels. Authors propose to train network to order concentric patches by their magnification level. They organise it as the classification task — network to predict id of the order permutation instead of predicting order itself.

Also, authors proposed specific architecture for this task and appended self-training procedure, as it was shown to boost results even after pre-training.

All this allows them to reach quality increase even in high-data regime.

My description of the architecture and loss expanded here.
Source of the work here.

1.0K views14:03

Open / Comment

2021-04-10 04:12:26

919 views01:12

Open / Comment

2021-04-09 18:48:19

Monkey is playing Pong just using the power of its mind (no joystick)

New demo from Neuralink. A monkey called Pager is playing video games using brain signals for in-game manipulations.
I'm just curious how much more precise is invasive neuralink versus some non-invasive electroencephalography-based sensors?

Now imagine someone with paralysis using a smartphone/computer with their mind. This will be invaluable. I'm not even saying about controlling bionic arms and legs.

1.4K views15:48

Open / Comment

2021-04-08 22:19:21

Project page with more results

1.2K views19:19

Open / Comment

2021-04-08 22:19:21

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement

This paper proposed an improved way to project real images in the StyleGAN latent space (which is required for further image manipulations).

Instead of directly predicting the latent code of a given real image using a single pass, the encoder is tasked with predicting a residual with respect to the current estimate. The initial estimate is set to just average latent code across the dataset. Inverting is done using multiple of forward passes by iteratively feeding the encoder with the output of the previous step along with the original input.

Notably, during inference, ReStyle converges its inversion after a small number of steps (e.g., < 5), taking less than 0.5 seconds per image. This is compared to several minutes per image when inverting using optimization techniques.

The results are impressive! The L2 and LPIPS loss valeus are comparable to optimization-based techniques, while two orders of magnitude faster!

Paper
Code
Colab

1.1K viewsedited 19:19

Open / Comment

2021-04-07 09:01:15

Joker Donald Trump Inauguration Speech

Look Ma, DeepFakes are getting amazingly good! No need to spend thousands of dollars anymore to create such realistic effects.

Borrowed from @NeuroLands

1.2K viewsedited 06:01

Open / Comment

Gradient Dude

Ratings & Reviews

The latest Messages 4

Popular Channels

Related Chats

Popular Channels

Login