2020论文阅读：Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

2023-05-16

文章目录

文章贡献
1.绪论
2. 有关研究
- 2.1 General Object Detection
- 2.2 Few-shot learning
3. FSOD: A Highly-Diverse Few-Shot Object Detection Dataset
4. Methodology
- 4.1 问题定义
- 4.2 Attention-Based Region Proposal Network
- - 4.2.1 Attention-Based Region Proposal Network
  - 4.2.2 Multi-Relation Detector
- 4.3 Two-way Contrastive Training Strategy
5 实验
- 5.1 Training Details

文章贡献

提出了一个小样本目标检测网络模型FSOD，利用只有少量图片的support set与query set之间的相似性进行目标检测，作者表示该网络在一旦模型训练好，它可以检测未曾遇到过的物体类别，不需要进一步的训练或调整。https://github.com/fanq15/FSOD-code
一个大型的标注数据集：带有1000类物体以及每一类对应的少量examples。 https://github.com/fanq15/Few-Shot-Object-Detection-Dataset一个目标物体的特写图像 S c S_c Sc（也就是support image）；
另外有一个需要检测的图像 Q c Q_c Qc（也就是query image），图像包含了support image中的目标物体c；
**task：给定一个sup

1.绪论

现实世界中物体的光照、纹理、形状差别非常大，给小样本学习任务带来巨大的挑战；近年来在小样本学习中取得重要进展的论文都集中在image classification任务，少有触及object detection任务的，最主要的可能性应该是：从小样本(图像)分类转移到小样本目标识别是一个不简单、不平凡的任务。

作者提到在image classification任务取得重要进展的几篇论文：

[1] Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In NeurIPS, 2017.
[2] Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning. In ICLR, 2017.
[3] Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. Meta-learning with memory-augmented neural networks. In ICML, 2016.
[4] Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wier- stra, et al. Matching networks for one shot learning. In NeurIPS, 2016.
[5] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model- agnostic meta-learning for fast adaptation of deep networks. In ICML, 2017.
[6] Qi Cai, Yingwei Pan, Ting Yao, Chenggang Yan, and Tao Mei. Memory matching networks for one-shot image recognition. In CVPR, 2018.
[7] Spyros Gidaris and Nikos Komodakis. Dynamic few-shot visual learning without forgetting. In CVPR, 2018.
[8] Flood Sung Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for few-shot learning. In CVPR, 2018.

小样本目标识别的主要挑战是：如何在杂乱的环境中，定位未曾遇到的物体。事后看来这是一个只有少量标签的新类别目标定位问题。

在这里插入图片描述
作者的目的：对于一张带有support images（左上角的自行车，右上角的头盔）的图片，support images都属于新的目标类别，检测出测试集中所有属于新目标类别的前景物体。

2. 有关研究

2.1 General Object Detection

早期传统方法利用人工设计的特征进行检测；近年来基于CNN的Object Detection方法可以分为：proposal-free detectors（YOLO,YOLO9000,SSD,RetinaNet等）和proposal-based detectors（R-CNN家族）。利用了RPN的检测网络通常比那些没有利用的网络表现得好一些。但目前这些模型都是在很强的“监督”下工作的，在仅有几个样本的情况下很难拓展到新类别中。

2.2 Few-shot learning

小样本学习对于传统的机器学习算法是很艰难的。早期的研究着眼于学习具有共性的、能跨类别的特征，比如人工设计的strokes或parts。有些研究着眼于对人工设计的不同类别之间的距离公式的度量进行学习。更近的趋势在于设计一个agent或者策略以对各个任务进行“监督”学习。——这个研究方向称为元学习，在这个领域中，孪生神经网络siamese network利用匹配(matching)的策略捕捉support(也就是学习样本)与query(测试样本)之间的内在变化，对于物品的类别不关心。
在利用matching这一思想的范畴中，随后的研究有

着眼于增强特征嵌入的，一个子方向是建立一个记忆模块以捕捉各个学习样本之间的全局语义关系；
利用局部描述子来从小样本中获取额外知识的；
利用图网络建立不同类别之间的关系的；
在小样本集合中反复横跳让在高维空间中的metric learning更有效；
其他的致力于学习一个general agent来进行参数优化的。

作者的研究是基于该论文：
Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. Siamese neural networks for one-shot image recognition. In ICML Workshop, 2015.

Our work is motivated by the research line pioneered by the matching network. We propose a general few-shot object detection network that learns the matching metric between image pairs based on the Faster R-CNN framework equipped with our novel attention RPN and multi-relation detector trained using our contrastive training strategy.

概况：基于Faster R-CNN的框架上新增了

attention RPN
multi-relation detector(trained using our contrastive training strategy)

3. FSOD: A Highly-Diverse Few-Shot Object Detection Dataset

Next, we follow the few-shot learning setting to split our data into training set and test set without overlapping categories.

Challenging setting: Our dataset contains objects with large variance on box size and aspect ratios, consisting of 26.5% images with no less than three objects in the test set.
Our test set contains a large number of boxes of categories not included in our label system, thus presenting great challenges for a few-shot model.

在这里插入图片描述

4. Methodology

4.1 问题定义

已有一个目标物体的特写图像 S c S_c Sc（也就是support image）；
另外有一个需要检测的图像 Q c Q_c Qc（也就是query image），图像包含了support image中的目标物体c；
**task：给定一个support image，找出query image中所有的目标，并且对所有检测出来的目标打上标签；**如果support set有N类，每一类有K个examples，这个问题就是N-way K-shot detection；

4.2 Attention-Based Region Proposal Network

weight-shared framework包括两种分支，分别用来处理query set和support set，可以有多个处理support set的分支；处理query set的分支是Faster R-CNN网络（带有RPN和detector）；
在这里插入图片描述

4.2.1 Attention-Based Region Proposal Network

RPN网络用于产生与（所有）物体相关的bounding box（用于后面的detection），在有support image的情况下，RPN可以产生和support image相关的bounding box，而不是漫无目的地生成很多bounding box。为了引导RPN生成与support image中的物体类别相关的bounding box，并忽略其他类别的物体，作者提出了attention RPN：（让RPN的attention 集中在support image中的类别）

support feature map：由support image经过卷积得到，随后support feature经过平均池化，大小为 X ∈ t S × S × C X∈t^{S × S× C} X∈tS×S×C（在本文的情况中作者发现S=1模型的表现较好）；
query feature map：由query image经过卷积得到，作者取的是ResNet50比较靠前的层res4_6的输出，大小为 Y ∈ t H × W × C Y∈t^{H × W× C} Y∈tH×W×C；
相似性(similarity)的定义为： G h , w , c = ∑ i , j X i , j , c ⋅ Y h + i − 1 , w + j − 1 , 其中 i , j ∈ { 1 , . . . , S } G_{h,w,c}=\sum\limits_{i,j}X_{i,j,c}·Y_{h+i-1,w+j-1}, 其中i,j∈ \{1, ..., S\} Gh,w,c=i,j∑Xi,j,c⋅Yh+i−1,w+j−1,其中i,j∈{1,...,S}
得到的G就是attention feature map，随后输入到RPN网络中进行下一步(RPN带有objectiveness classification layer以及box regression layer）。

4.2.2 Multi-Relation Detector

在经典的R-CNN系列网络中，RPN网络模块后面通常接一个detector，detector的角色是对RPN网络生成的bounding box打分并识别box框中的类别。因此detector需要具有很强的区分各个类别的能力，作者提出了以下这个Multi-Relation Detector具有三个组成部分：

global-relation head
local-correlation head
patch-relation head

4.3 Two-way Contrastive Training Strategy

5 实验

5.1 Training Details

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)