论文阅读：AutoAssign

2023-10-31

文章目录

1、论文总述

个人感觉本篇论文提出的端到端的动态划分正负样本的work质量是很高的，虽然没有被ECCV录，但是作者已经将这篇论文放在了arxiv上，应该会被其他的顶会收录。
论文提出的背景是：现在目标检测网络中的正负样本的分配都是基于人工先验的，例如anchor-based的根据GT与anchor的IOU进行划分正负样本，以及anchor-free网络中的将GT中心点周围某个半径R之内的点作为正样本，这些都是有极强的先验知识，而且有些超参数，同时也是固定的正负样本：意思就是一旦网络配置和数据集确定之后，每个anchor或者每个grid cell是不是正负样本就已经确定了，在网络的学习过程中，并不能根据网络的学习效果进行更改（包括ATSS的正负样本分配方式，它也是一种伪动态）
本文提出的大概思路是：GT框里的每个grid cell刚开始都可以认为是正样本/负样本，但会对应着两个权重w+ 和 w- ，（1）w+（每个grid cell的正样本权重）的产生：不在GT bbox中的w+为0，对bbox GT的中心和bbox中的前景的中心学习一个offset（对那些不能很正常的目标比较有效，如：线状目标，环形目标），然后根据分类得分和定位得分得到confidence，将confidence与刚才产生的center prior进行结合即可产生w+。（2）w-（每个grid cell的负样本的权重）的产生：首先，不在GT bbox中的w-为1，w-的值是根据该点预测的框与GT的iou决定的，IOU越小，则该点的w-越大。

论文中作者提到网络结构是基于FCOS的，所以只给了 w+ 和 w-的产生示意图，没有完整的网络结构
在这里插入图片描述
其中Implict Objectness分支类似于二分类，与classification一起学习，这样看的话是不是有点类似YOLO V3中的正负样本分配？说实话，这篇论文不太好理解，我也是看了好几个小时，而且也没完全搞懂训练和测试的完整流程。。。只能期待着开源了，后续如果对这篇论文有了新的理解的话，我会继续补充。

在这里插入图片描述
上图就是作者想要达到的动态分配正负样本的效果。

In this work, we propose a fully differentiable strategy
for label assignment. As illustrated in Fig. 1, we first follow
the anchor-free manner like FCOS [20] to directly predict
objects on each locations without human-designed anchors.
In order to retain enough locations for further optimizing,
we initially treat all the locations inside a bounding box
as both positive and negative candidates at all scale levels.
Then we generate positive and negative weight maps
to modify the prediction in the training loss. To accommodate the distribution from different categories and domains,
we propose a category-wise weighting module named center weighting to learn the distribution of each category from
data. To get adapted to the appearance and scale of each instance, we propose a confidence weighting module to modify the positive and negative confidences of the locations
in both spatial and scale dimensions. Then we combine
the two modules to generate positive and negative weight
maps for all the locations. The entire process of weighting is differentiable and can be conveniently optimized by
back-propagation.

2、密集预测型FCOS正负样本分配的不足

在COCO、VOC数据集上FCOS、RetinaNet这种正负样本的分配方式对模型的效果影响不那么大，但是到了别的一些困难目标的检测数据集上，可能这种方式就不太好使了。
在这里插入图片描述

As shown in Fig. 1, existing detectors mainly sample the
positive and negative locations by human prior:
(1) Anchorbased detectors like RetinaNet [11] preset several anchors
of diverse scales and aspect ratios on each location and resort to the Intersection over Union (IoU) for sampling positives and negatives among spatial and scale-level feature
maps.
(2) Anchor-free detectors like FCOS [20] sample a
fixed fraction of center area as spatial positive locations for
each object, and select certain stages of FPN [10] by the
pre-defined scale constraints. These detectors follow the
prior distribution of the objects to design their assignment
strategies, which are proved to be effective on challenging
benchmarks, e.g., Pascal VOC [3, 4] and MS COCO [12].

某个GT应该分配到FPN的哪个层级上，在FCOS和retinanet中这个是人工指定的，而在autoassign中这个也是学出来的。

However, as shown in Fig. 2, in the real world, appearances of
objects vary a lot across categories and scenarios. The fixed center
sampling strategy may pick locations outside objects as positives.
Intuitively, sampling locations on objects is better than the plain
background because these locations are prone to generate higher
classification confi- dences. On the other hand, although CNN can
learn offsets, the obstacle caused by feature shifting when
backgrounds are sampled as positives may decrease the performance.
Thus the fixed strategies above may not always select the most
appropriate locations among spatial and scale dimensions.

3、 Comparison of label assignment between different typical detectors.

在这里插入图片描述

4、we transform the whole assignment step into two weight maps.

在这里插入图片描述

（1）To accommodate to the distributions of different cat
egories, we propose a category-wise and data-dependent
weighting module named center weighting. It starts from
the standard center prior and then learns the distribution of
each category from data.
（2）To get adapted to the appearance and scale of each instance, we further present an instance-wise weighting module called confidence weighting. It dynamically weights the
positions in the spatial and scale dimensions based on the
predicted confidences of each object.

5、公式

在这里插入图片描述

6、Visualization of learned center weighting weights of different categories

在这里插入图片描述

But when we look into some classes with unique distributions, e.g.,
bear, surfboard and hotdog, the improvements are notable

7、 Analysis of ImpObj for P(cls).

在这里插入图片描述
可以看到ImpObj分支还有很有作用的。

在这里插入图片描述

8、不同数据集上的泛化性

在这里插入图片描述
这点肯定是要提，因为感觉本篇论文的动态分配正负样本是可以根据数据集来学习的，应该是要比以前的固定正负样本分配是要好些的。

参考文献：
1、大白话《AutoAssign》by Face++

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

目标检测

论文阅读