Open AI 自监督学习笔记:Self-Supervised Learning

2023-10-27

转载自微信公众号
原文链接: https://mp.weixin.qq.com/s?__biz=Mzg4MjgxMjgyMg==&mid=2247486049&idx=1&sn=1d98375dcbb9d0d68e8733f2dd0a2d40&chksm=cf51b898f826318ead24e414144235cfd516af4abb71190aeca42b1082bd606df6973eb963f0#rd

Open AI 自监督学习笔记


文章目录


Video: https://www.youtube.com/watch?v=7l6fttRJzeU
Slides: https://nips.cc/media/neurips-2021/Slides/21895.pdf

Self-Supervised Learning
Self-Prediction and Contrastive Learning

  • Self-Supervised Learning
    • a popular paradigm of representation learning

Outline

  • Introduction: motivation, basic concepts, examples
  • Early Work: Look into connection with old methods
  • Methods
    • Self-prediction
    • Contrastive Learning
    • (for each subsection, present the framework and categorization)
  • Pretext tasks: a wide range of literature review
  • Techniques: improve training efficiency

Introduction

What is self-supervised learning and why we need it?

What is self-supervised learning?
  • Self-supervised learning (SSL):
    • a special type of representation learning that enables learning good data representation from unlablled dataset
  • Motivation :
    • the idea of constructing supervised learning tasks out of unsupervised datasets

    • Why?

      ✅ Data labeling is expensive and thus high-quality dataset is limited

      ✅ Learning good representation makes it easier to transfer useful information to a variety of downstream tasks ⇒ \Rightarrow e.g. Few-shot learning / Zero-shot transfer to new tasks

Self-supervised learning tasks are also known as pretext tasks

What’s Possible with Self-Supervised Learning?
  • Video Colorization (Vondrick et al 2018)

    • a self-supervised learning method

    • resulting in a rich representation

    • can be used for video segmentation + unlabelled visual region tracking, without extra fine-tuning

    • just label the first frame

      picture 1

  • Zero-shot CLIP (Radford et al. 2021)

    • Despite of not training on supervised labels

    • Zero-shot CLIP classifier achieve great performance on challenging image-to-text classification tasks

      picture 2

Early Work

Precursors 先驱者 to recent self-supervised approaches

Early Work: Connecting the Dots

Some ideas:

  • Restricted Boltzmann Machines

  • Autoencoders

  • Word2Vec

  • Autogressive Modeling

  • Siamese networks

  • Multiple Instance / Metric Learning

Restricted Boltzmann Machines
  • RBM:
    • a special case of markov random field

      picture 3

    • consisting of visible units and hidden units

    • has connections between any pair across visible and hidden units, but not within each group

      picture 4

Autoencoder: Self-Supervised Learning for Vision in Early Days
  • Autoencoder: a precursor to the modren self-supervised approaches
    • Such as Denoising Autoencoder
  • Has inspired many self-learning approaches in later years
    • such as masked language model (e.g. BERT), MAE

picture 5

Word2Vec: Self-Supervised Learning for Language
  • Word Embeddings to map words to vectors
    • extract the feature of words
  • idea:
    • the sum of neighboring word embedding is predictive of the word in the middle

picture 6

  • An interesting phenomenon resulting from word2Vec:
    • you can observe linear substructures in the embedding space where the lines connecting comparable concepts such as the corresponding masculine and feminine words appear in roughly parallel lines

      picture 7

Autoregressive Modeling
  • Autoregressive model:

    • Autoregressive (AR) models are a class of time series models in which the value at a given time step is modeled as a linear function of previous values

    • NADE: Neural Autogressive Distribution Estimator

      picture 8

  • Autogressive model also has been a basis for many self-supervised methods such as gpt

Siamese Networks

Many contrastive self-supervised learning methods use a pair of neural networks and learned from their difference
– this idea can be tracked back to Siamese Networks

  • Self-organizing neural networks
    • where two neural networks take seperate but related parts of the input, and learns to maximize the agreement between the two outputs
  • Siamese Networks
    • if you believe that one network F can well encode x and get a good representation f(x)

    • then, 对于两个不同的输入x1和x2,their distance can be d(x1,x2) = L(f(x1),f(x2))

    • the idea of running two identical CNN on two different inputs and then comparing them —— a Siamese network

    • Train by:

      ✅ If xi and xj are the same person, ∣ ∣ f ( x i ) − f ( x j ) ||f(xi)-f(xj) ∣∣f(xi)f(xj) is small

      ✅ If xi and xj are the different person, ∣ ∣ f ( x i ) − f ( x j ) ||f(xi)-f(xj) ∣∣f(xi)f(xj) is large

picture 9

Multiple Instance Learning & Metric Learning

Predecessors of the predetestors of the recent contrastive learning techniques : multiple instance learning and metric learning

  • deviate frome the typical framework of empirical risk minimization

    • define the objective function in terms of multiple samples from the dataset ⇒ \Rightarrow multiple instance learning
  • ealy work:

    • around non-linear dimensionality reduction
    • 如multi-dimensional scaling and locally linear embedding
    • better than PCA: can preserving the local structure of data samples
  • metric learning:

    • x and y: two samples
    • A: A learnable positive semi-definite matrix
  • contrastive Loss:

    • use a spring system to decrease the distance between the same types of inputs, and increase between different type of inputs
  • Triplet loss

    • another way to obtain a learned metric
    • defined using 3 data points
    • anchor, positive and negative
    • the anchor point is learned to become similar to the positive, and dissimilar to the negative
  • N-pair loss:

    • generalized triplet loss
    • recent 对比学习 就以 N-pair loss 为原型

picture 13

Methods

  • self-prediction
  • Contrastive learning
Methods for Framing Self-Supervised Learning Tasks
  • Self-prediction: Given an individual data sample, the task is to predict one part of the sample given the other part
    • 即 “Intra-sample” prediction

The part to be predicted pretends to be missing

  • Contrastive learning: Given multiple data samples, the task is to predict the relationship among them
    • relationship: can be based on inner logics within data

      ✅ such as different camera views of the same scene

      ✅ or create multiple augmented version of the same sample

The multiple samples can be selected from the dataset based on some known logics (e.g., the order of words / sentences), or fabricated by altering the original version
即 we know the true relationship between samples but pretend to not know it

Self-Prediction
  • Self-prediction construct prediction tasks within every individual data sample

    • to predict a part of the data from the rest while pretending we don’t know that part

    • The following figure: demonstrate how flexible and diverse the options we have for constructing self-prediction learning tasks

      ✅ can mask any dimensions

      picture 14

  • 分类:

    • Autoregressive generation
    • Masked generation
    • Innate relationship prediction
    • Hybrid self-prediction
Self-prediction: Autoregressive Generation
  • The autoregressive model predicts future behavior based on past behavior

    • Any data that comes with an innate sequential order can be modeled with regression
  • Examples :

    • Audio (WaveNet, WaveRNN)
    • Autoregressive language modeling (GPT, XLNet)
    • Images in raster scan (PixelCNN, PixelRNN, iGPT)
Self-Prediction: Masked Generation
  • mask a random portion of information and pretend it is missiing, irrespective of the natural sequence

    • The model learns to predict the missing portion given other unmasked information
  • e.g.,

    • predicting random words based on other words in the same context around it
  • Examples :

    • Masked language modeling (BERT)
    • Images with masked patch (denoising autoencoder, context autoencoder, colorization)
Self-Prediction: Innate Relationship Prediction
  • Some transformation (e.g., segmentation, rotation) of one data samples should maintain the original information of follow the desired innate logic

  • Examples

    • Order of image patches

      ✅ e.g., shuffle the patches

      ✅ e.g., relative position, jigsaw puzzle

    • Image rotation

    • Counting features across patches

Self-Prediction: Hybrid Self-Prediction Models

Hybrid Self-Prediction Models: Combines different type of generation modeling

  • VQ-VAE + AR
    • Jukebox (Dhariwal et al. 2020), DALL-E (Ramesh et al. 2021)
  • VQ-VAE + AR + Adversial
    • VQGAN (Esser & Rombach et al. 2021)

    • VQ-VAE: to learn a discrete code book of context rich visual parts

    • A transformer model: trained to auto-aggressively modeling the color combination of this code book

      picture 15

Contrastive Learning
  • Goal:

    • To learn such an embedding space in which similar sample pairs stay close to each other while dissimilar ones are far apart

      picture 16

  • 对比学习 can be applied to both supervised and unsupervised settings

    • when working with unsupervised data, 对比学习 is one of the most powerful approach in the self-supervised learning
  • Category

    • Inter-sample classification

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Open AI 自监督学习笔记:Self-Supervised Learning 的相关文章

随机推荐

  • Devs--开源规则引擎介绍

    Devs Devs是一款轻量级的规则引擎 开源地址 https github com CrankZ devs 基础概念 此规则引擎的基础概念有字段 条件 规则等 其中字段组成条件 条件组成规则 并且支持多个条件通过与或组成一个规则 下面用常
  • 一短文读懂编译型与解释型编程语言

    在编程世界中 我们经常听到编译型语言和解释型语言这两个术语 它们是什么 有什么区别呢 让我们一起来探讨一下 编译型语言 编译型语言 如C Java等 是一种需要先被编译成机器代码 然后才能被执行的语言 你可以把它想象成一个笔译员 他会先把你
  • 用电器分析识别装置(2021 年全国大学生电子设计竞赛H题)

    用电器分析识别装置 2021 年全国大学生电子设计竞赛H题 摘要 1 系统方案 1 1 用电器分析识别装置的原理和结构 1 2 方案论证 1 2 1 系统供电论证和选择 1 2 2 采样方法论证和选择 1 2 3 采样芯片的型号选择 1 3
  • 爆款短视频剪辑方法技巧,这样剪辑出来的短视频更容易爆,收藏

    爆款短视频剪辑方法推荐 这样剪辑出来的短视频更容易爆 前面几篇内容 我们从定位到脚本结构 再到选题 再到互动点和内容各方面都为短视频做好了素材准备 后续我们也开始知道怎么写自己的文案了 也告诉大家什么是一个好的表现力 还有我们的景别 我们的
  • 内核对象

    内核对象 1 什么是内核对象 内核对象是内核分配的一段空间 如文件对象和进程对象等 可以用Windows提供的函数来创建相应的内核对象 创建成功后返回一个对象句柄 并且对象句柄值是进程相关的 程序不能直接操作内核对象 只能通过Windows
  • MFC学习笔记 — C++如何执行.exe文件

    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XX 作 者 文化人 XX 联系方式 或进群 471144274 XX 版权声明 原创文章 欢迎评论和转载 转载时能告
  • 随机森林实例(R语言实现)

    1 可以先查询一下路径 可以是数据所在的路径 需要更改路径的话用setwd 路径 2 安装需要的包并使用 install package 包名 library 包名 randomForest 随机森林包 caret 常用于机器学习 数据处理
  • 【22-23 春学期】人工智能基础--AI作业9-卷积3-XO识别

    1 For循环版本 手工实现 卷积 池化 激活 import numpy as np x np array 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  • 基于51单片机的智能照明控制系统

    功能 基于51单片机的智能照明控制系统 以51系列单片机为核心 使用光敏传感模块 采用ADC0832对光敏电路进行AD转换 红外传感模块与声敏传感模块组成检测装置 并采用PWM对照明灯的光强度进行控制 1 本设计分为手动模式和自动模式 可通
  • MyBatis-Plus 3 实现批量新增和批量修改

    1 批量更新 mapper 接口 批量方法插入 void batchInsert Param users List
  • 智能合约开发solidity编程语言实例

    智能合约开发用solidity编程语言部署在以太坊这个区块链平台 本文提供一个官方实战示例快速入门 用例子深入浅出智能合约开发 体会以太坊构建去中心化可信交易技术魅力 智能合约其实是 执行合约条款的计算机交易协议 区块链上的所有用户都可以看
  • Cascader 级联选择组件

    乌鱼子 一开始级联组件使用的是ElementUI 但是有一个bug 我自己做的是一个二级级联 在选择了二级之后 点击x删除选择 再打开下拉框 一级和二级都是展开的 翻遍了文档都没有查到解决资料 然后问了我们一个前端同事 他说这是Elemen
  • 网站密码明文传输解决方案js+java

    解决密码明文传输的方案 基本有两种解决方案 1 将项目网站全站升级为https协议 如果要更谨慎 还需要加密 2 将密码进行加密后 在后台解密 因项目升级https时间周期太长 将暂时替代方案改为RSA加密解密方式 最简单的方案 前端加密
  • Deeplearning4j 实战 (13-2):基于Embedding+CNN的文本分类实现

    Deeplearning4j 实战 13 2 基于Embedding CNN的文本分类实现 Eclipse Deeplearning4j GitChat课程 Deeplearning4j 快速入门 专栏 Eclipse Deeplearni
  • 解决FreeRTOS程序跑不起来,打印调试却提示“Error:..\FreeRTOS\port\RVDS\ARM_CM3\port.c,244“的方法

    前言 今天来分享一个不会造成程序编译报错 但会使程序一直跑不起来 并且通过调试会发现有输出错误提示的错误例子分析 话不多说 我们就直接开始分析 首先 我们说过这个例子在编译时候没有明示的错误提示 也可以说没有语法和逻辑之类的错误 应该是程序
  • Tomcat启动项目出错之45秒限制

    今天启动项目 发现项目启动时候并没有报错 但是启动到一半的时候停下来了 并且会提示xxx45m之类的 原因是Tomcat默认启动项目的时长为45秒 如果45秒内项目没启动好 就会停止启动 我们可以通过修改配置文件而达到更长的启动项目时间 1
  • Unet复现:遇到 block: [0,0,0], thread: [594,0,0] Assertion `t >= 0 && t < n_classes` failed.题

    复现参考链接 http t csdn cn sAPq1 在训练自己用labelme标注的图片时遇到上面提到的问题 经过网上的分享总共分下面几种情况 但是都还是无法解决我遇到的问题 最后排查代码才找到问题 1 原作者说的一种情况 http t
  • ChatGLM-6B

    ChatGLM 6B 是一个开源的 支持中英双语的对话语言模型 基于 General Language Model GLM 架构 具有 62 亿参数 结合模型量化技术 用户可以在消费级的显卡上进行本地部署 INT4 量化级别下最低只需 6G
  • 基于Matlab的量子粒子群算法优化单目标问题

    基于Matlab的量子粒子群算法优化单目标问题 量子粒子群算法 Quantum Particle Swarm Optimization QPSO 是一种基于自然界粒子群群体智能算法的优化方法 QPSO算法通过引入量子力学的概念 将传统粒子群
  • Open AI 自监督学习笔记:Self-Supervised Learning

    转载自微信公众号 原文链接 https mp weixin qq com s biz Mzg4MjgxMjgyMg mid 2247486049 idx 1 sn 1d98375dcbb9d0d68e8733f2dd0a2d40 chksm