simclrv2框架

机器学习 (Machine Learning)

Huge Self-Supervised Models are Strong Semi-Supervised learners..

巨大的自我监督模型是强大的半监督学习者。

目录： (Table of Contents:)

Introduction
介绍
Key insight
关键见解
Results
结果
Why it matters
为什么重要
I’m thinking
我在想

介绍： (Introduction:)

The long-standing problem in computer vision, where models find it hard to learn on a few labeled examples while making use of large amounts of unlabelled data for training, may be coming to an end.

计算机视觉中的长期存在的问题可能即将结束，在该问题中，模型很难用一些带有标签的示例来学习，同时又利用大量未标记的数据进行训练。

The SimCLR frameworkResearchers at Google Research, Brain team, comprising Geoffrey Hinton, Ting Chen, and a few others built the SimCLR Framework. SimCLR is a simple framework for contrastive learning of visual representations. SimCLR first learns generic representations of images on an unlabelled dataset, and then it can be fine-tuned with a small number of labeled images to achieve good performance for a given classification task.

SimCLR框架 Brain团队的Google研究人员由Geoffrey Hinton，Ting Chen和其他一些人组成，他们构建了SimCLR框架 。 SimCLR是用于视觉表示的对比学习的简单框架。 SimCLR首先学习未标记数据集上图像的通用表示形式，然后可以使用少量标记图像对其进行微调，以实现给定分类任务的良好性能。

The generic representations are learned by simultaneously maximizing agreement between differently transformed views of the same image and minimizing agreement between transformed views of different images, following a method called contrastive learning. Updating the parameters of a neural network using this contrastive objective causes representations of corresponding views to “attract” each other, while representations of non-corresponding views “repel” each other.

按照称为对比学习的方法，通过同时最大化同一图像的不同变换视图之间的一致性以及最小化不同图像的变换视图之间的一致性来学习通用表示。使用此对比物镜更新神经网络的参数会导致相应视图的表示彼此“吸引”，而不对应视图的表示彼此“排斥”。

Afterward, SimCLR computes a non-linear projection (projection head)of the image representation using a fully-connected network (i.e., MLP), which amplifies the invariant features and maximizes the ability of the network to identify different transformations of the same image

之后，SimCLR使用完全连接的网络 (即MLP )计算图像表示的非线性投影( 投影头 )，这会放大不变特征并最大化网络识别同一图像的不同变换的能力

What’s new with SimCLRv2 Framework:

SimCLRv2框架的新增功能：

Img_Credit Img_Credit

In a recent paper titled “Big Self-Supervised Models are Strong Semi-Supervised Learners”, this same team of researchers at Google have improved the performance of the State-of-The-Art (SOTA), self-supervised model on the ImageNet dataset. This feat was achieved using the SimCLRv2 framework consisting of a much bigger self-supervised ResNet model, built on the earlier SimCLR architecture. Thus, SimCLRv2 is an improvement on the SimCLR framework.

在最近发表的题为“大自我监督模型是强大的半监督学习者”的论文中 ，Google的同一组研究人员改进了ImageNet上最先进的(SOTA)自我监督模型的性能数据集。这一壮举是通过使用SimCLRv2框架实现的，该框架由一个更大的自我监督ResNet模型组成，该模型基于早期的SimCLR体系结构构建。因此， SimCLRv2是对SimCLR框架的改进。

关键见解： (Key insight:)

Recent progress in natural language processing models such as BERT has shown that it is possible to derive good results by first pretraining on a large unlabelled dataset before fine-tuning on a smaller labeled dataset. However, existing self-supervised methods for Image data are complex and hard to adopt. SimCLRv2 tends to simplify and improve this process by enhancing an earlier introduced method called contrastive learning.

诸如BERT之类的自然语言处理模型的最新进展表明，可以通过先对大型未标记数据集进行预训练，然后对较小的标记数据集进行微调来获得良好的结果。然而，现有的用于图像数据的自我监督方法是复杂且难以采用的。 SimCLRv2倾向于通过增强早期引入的称为对比学习的方法来简化和改进此过程。

How it works:

这个怎么运作：

Studying both the SimCLR and SimCLRv2 papers, it turns out that both methods use the first approach of self-supervised pretraining on a large unlabelled data set. This helps the model learn generic representations of the data by simultaneously maximizing agreements between differently transformed views of the same image and minimizing agreements between transformed views of different images.

通过研究SimCLR和SimCLRv2论文，事实证明这两种方法都使用对大型未标记数据集进行自我监督预训练的第一种方法。通过同时最大化同一图像的不同转换视图之间的协议，同时最小化不同图像的转换视图之间的协议，这有助于模型学习数据的通用表示。

SimCLRv2 introduces 3 new major modifications to the SimCLR framework, these include:-

SimCLRv2对SimCLR框架引入了3个新的主要修改，其中包括：

Applying random-cropping plus color-distortion yields the most results…

应用随机裁剪和颜色失真可获得最多的结果……

Step 1:

第1步：

In the self-supervised pretraining phase, each image is augmented via random-cropping, random-color-distortion, and Gaussian-blur. Size is crucial, therefore SimCLRv2 uses a much deeper but less wide ResNet-152, (3x) model, with Selective Kernels (SK). While SimCLR uses the ResNet-50(4x), model.

在自我监督的预训练阶段，通过随机裁剪，随机颜色失真和高斯模糊来增强每个图像。大小至关重要，因此SimCLRv2使用更深，宽度更小的ResNet-152 (3x)模型以及“ 选择性内核”(SK) 。 SimCLR使用ResNet-50(4x)模型。

Step 2:

第2步：

Supervised Fine-tuning is done using a few labeled examples. SimCLRv2 uses a wider non-linear projection head (MLP), incorporated into the base encoder. This method is equivalent to fine-tuning from the middle layer of the projection head instead of the input layer as in SimCLR.

有监督的微调使用一些标记的示例进行。 SimCLRv2使用更宽的非线性投影头(MLP)，该投影头已集成到基本编码器中。此方法等效于从投影头的中间层而不是SimCLR中的输入层进行微调。

Step 3:

第三步：

Self-training is done once again using the same unlabelled examples but in a task-specific way. The big, fine-tuned network is used as a Teacher to impute pseudo-labels for training a Student network. Thus, the Teacher can be distilled into a smaller Student network with minimal accuracy loss.

再次使用相同的未标记示例进行自我训练，但以特定于任务的方式进行。大型的，经过微调的网络用作教师来插补伪标签，以训练学生网络。因此，可以将教师提炼成较小的学生网络，而损失的准确性最小。

结果： (Results:)

Using the ResNet-50 architecture, SimCLRv2 pitted against SimCLR on 1% label fraction of ImageNet data, achieves 73.9% top-1 accuracy, which is 53% better than the SOTA.

使用ResNet-50架构，SimCLRv2在ImageNet数据的1％标签部分上与SimCLR相对，实现了73.9％的top-1准确性，比SOTA高53％。
On a 10% label fraction, SimCLRv2 achieves 77.5% top-1 accuracy which is 18% better than the SOTA.
在10％的标签分数上，SimCLRv2达到77.5％的top-1精度，比SOTA高18％。
For larger networks, SimCLRv2 ResNet-152(3x+SK) achieves 76.6% and 80.9% top-1 accuracy on 1% and 10% label fractions which is 22% and 9% respectively better than the SimCLR ResNet-50(4x) model.

对于大型网络，SimCLRv2 ResNet-152(3x + SK)在1％和10％的标签分数上可达到76.6％和80.9％的top-1精度，分别比SimCLR ResNet-50(4x)模型高22％和9％。

The paper shows that bigger models tend to produce larger improvements with fewer labels, and tend to continuously improve as the model size and training epochs increase.

该论文表明，较大的模型倾向于使用较少的标签产生较大的改进，并且随着模型尺寸和训练时期的增加而倾向于不断改进。

The paper titled “Big Self-Supervised Models are Strong Semi-Supervised Learners” also shows the following impressive inferences…

题为“大自我监督模型是强大的半监督学习者”的论文 还显示了以下令人印象深刻的推论……

With a depth of 152 layers and a width at 3x, SimCLRv2 ResNet-152(3x+SK) has over 795 million parameters and from the chart above, it performs best on 1%, 10%, and 100% of the ImageNet ILSVRC-2012 dataset. This stellar performance is also maintained on linear evaluation and supervised classification tasks.

SimCLRv2 ResNet-152(3x + SK)的深度为152层，宽度为3倍， 7.95亿个参数 从上图可以看出，它在ImageNet ILSVRC-2012数据集的1％，10％和100％上表现最佳。在线性评估和监督分类任务中也保持了这种出色的性能。

更大的模型更有效地标记。 (Bigger models are more label-efficient.)

The paper also shows that bigger models are more label-efficient for both supervised and semi-supervised learning, but gains appear to be larger for semi-supervised learning. Furthermore, it is worth pointing out that although bigger models are better, some models (e.g. with SK) are more parameter efficient than others

该论文还表明，对于监督学习和半监督学习而言，较大的模型都具有更高的标签效率，但是对于半监督学习而言，收益似乎更大。此外，值得指出的是，虽然更大的模型更好，但是某些模型(例如，带有SK的模型)的参数效率比其他模型更高

为何重要： (Why it matters:)

The world is full of unlabelled data. Therefore, the findings in this paper can be harnessed to improve accuracy in any computer vision application where it is more expensive or difficult to label additional data than to train larger models such as in medical imaging.

世界上充满了未标记的数据。因此，本文的发现可用于提高任何计算机视觉应用的准确性，在这些视觉应用中，比培训诸如医学成像等更大的模型更昂贵或更难以标记附加数据。

我在想： (I’m thinking:)

Img_Credit Img_Credit

Self-supervised models fine-tuned for specific tasks can greatly improve computer-vision applications. But there has to be a balance because there is an entire industry built around human Data-labelling services, and technology that may reduce the need for these services could lead to loss of income in these tough times.

针对特定任务进行了微调的自我监督模型可以极大地改善计算机视觉应用。但是必须保持平衡，因为整个行业都围绕着人类数据标记服务而建立，而可能减少对这些服务需求的技术可能会在当前困难时期导致收入损失。

Cheers!

干杯!

关于我： (About Me:)

Lawrence is a Data Specialist at Tech Layer, passionate about fair and explainable AI and Data Science. I believe that sharing knowledge and experiences is the best way to learn. I hold both the Data Science Professional and Advanced Data Science Professional certifications from IBM and the IBM Data Science Explainability badge. I have conducted several projects using ML and DL libraries, I love to code up my functions as much as possible. Finally, I never stop learning and experimenting and yes, I have written several highly recommended articles.

Lawrence是Tech Layer的数据专家，对公平和可解释的AI和数据科学充满热情。 我相信分享知识和经验是最好的学习方式。 我同时拥有 IBM和 IBM 的 Data Science Professional 和 Advanced Data Science Professional 认证。 数据科学可解释性徽章 。 我使用ML和DL库进行了多个项目，我喜欢尽可能地编写函数。 最后，我从未停止学习和尝试，是的，我写了几篇强烈推荐的文章。