【IoU Loss】《UnitBox: An Advanced Object Detection Network》

2023-11-13

在这里插入图片描述

在这里插入图片描述
ACM MM-2016（Proceedings of the 24th ACM international conference on Multimedia）

文章目录

1 Background and Motivation
2 Advantages / Contributions
3 Method
4 Experiments
5 Conclusion（own）
附录 A—— IoU Coding

1 Background and Motivation

基于 CNN 的 object detection 方法，在各种应用中大显身手，当前的方法基本都遵循着如下的 pipeline

提取 region proposals，eg，Selective Search, EdgeBoxes
用 CNN 对 region proposals 进行 recognition and categorization
bounding box regression methods 来精修 localization

遵循这样的 pipeline，object detector 往往由于 region proposals methods 的effectiveness（仅利用 low-level feature 来产生，质量往往不行，sensitive to the local appearance changes）和 efficiency（多，密集—慢）而表现不佳

为了克服上述困难，

faster rcnn 用 RPN 网络来提速，但由于 ratio 和 scale 是 pre-designed and fixed，很难处理 large shape variations 和 small objects
DenseBox 直接回归 pixel 与 gt 的四个边界距离，然后用 l2 loss！如下图

在这里插入图片描述
缺点是，孤立的来优化 four-side distances. It goes against the intuition that those variables are correlated and should be regressed jointly.

2 Advantages / Contributions

提出 IoU loss，

faster training convergence
enabled with variable-scale training
best performance among all published methods on the FDDB benchmark

3 Method

任意的 pixel ( i , j ) (i,j) (i,j)，GT 可以定义如下为 4-d 的向量

x ~ i , j = ( x ~ t i , j , x ~ b i , j , x ~ l i , j , x ~ r i , j ) \widetilde{x}_{i,j} = (\widetilde{x}_{t_{i,j}},\widetilde{x}_{b_{i,j}},\widetilde{x}_{l_{i,j}},\widetilde{x}_{r_{i,j}}) x i,j=(x ti,j,x bi,j,x li,j,x ri,j)

如图1 所示， t ， b ， l ， r t，b，l，r t，b，l，r 分别上下左右， x ~ t i , j , x ~ b i , j , x ~ l i , j , x ~ r i , j \widetilde{x}_{t_{i,j}},\widetilde{x}_{b_{i,j}},\widetilde{x}_{l_{i,j}},\widetilde{x}_{r_{i,j}} x ti,j,x bi,j,x li,j,x ri,j 表示当前像素到GT的上下左右四个边界的距离！

3.1 L2 Loss Layer

被用在 DenseBox 中

在这里插入图片描述
有两个缺点

bbox 是用四个独立的变量，没有大局观和整体性（优化的时候，可能一两个变量优化的完美，其它的不太行，整体效果可能也不行），有的 bbox 和 GT loss 很小，但定位的很不准，例如下面这种情况

在这里插入图片描述
如果当前像素在两个矩形的中间，一个矩形是 GT，一个矩形是预测的 bbox，那么 loss 为 0

unnormalized，没有归一化，相同 IoU 情况下，大 bbox 定位的不准确带来的损失惩罚可能比小 bbox 的多（绝对大小——像素角度、相对大小 IoU 角度）

3.2 IoU Loss Layer: Forward

在这里插入图片描述
这就是给的图一 IoU loss 的细节版，

x ~ ≠ 0 \widetilde{x} \neq 0 x =0 很关键，只统计落在 GT 范围内的 pixel

I I I 是交集， U U U 是并集，最后 IoU Loss 为交并比的负对数，

负对数的函数曲线如下所示，

import numpy as np
import matplotlib.pyplot as plt 

def log2x(x):
    return -np.math.log2(x)

x = np.arange(0.01,1,0.01)
y = [log2x(i) for i in x]

# gca = get current axis
ax = plt.gca() # x,y

# spines = 上下左右四条黑线
ax.spines['right'].set_color('none') # 让右边的黑线消失
ax.spines['top'].set_color('none')  # 让上边的黑线消失

ax.xaxis.set_ticks_position('bottom') # 把下面的黑线设置为x轴
ax.yaxis.set_ticks_position('left')   #  把左边的黑线设置为y轴

ax.spines['bottom'].set_position(('data',0)) # 移动x轴到指定位置，本例子为0
ax.spines['left'].set_position(('data',0))   # 移动y轴到指定位置，本例子为0

    
plt.plot(x,y)
plt.show()

在这里插入图片描述

【python】matplotlib（上）

IoU越大，loss越小，重合的话，loss为 0，

优点，IoU Loss 把 bbox 当成一个整体，IoU本身就属于 [0,1] 之间，自带归一化性质

3.3 IoU Loss Layer: Backward

配合算法1 的公式，我们来看看 IoU Loss 的反向传播

X = ( x t + x b ) ∗ ( x l + x r ) X = (x_t +x_b) * (x_l + x_r) X=(xt+xb)∗(xl+xr)

在这里插入图片描述

I = I h ∗ I w = [ m i n ( x t , x ~ t ) + m i n ( x b , x ~ b ) ] ∗ [ m i n ( x l , x ~ l ) + m i n ( x r , x ~ r ) ] I = I_h*I_w=[min(x_t,\widetilde{x}_t)+min(x_b,\widetilde{x}_b)]*[min(x_l,\widetilde{x}_l)+min(x_r,\widetilde{x}_r)] I=Ih∗Iw=[min(xt,x t)+min(xb,x b)]∗[min(xl,x l)+min(xr,x r)]

在这里插入图片描述

U = X + X ~ − I U = X+\widetilde{X}-I U=X+X −I， I o U = I U IoU = \frac{I}{U} IoU=UI， L = − l n ( I o U ) L=-ln(IoU) L=−ln(IoU)

在这里插入图片描述
可以看到，反向传播时候， ▽ x X \bigtriangledown_xX ▽xX 是惩罚预测 bbox 的面积的，面积越大，梯度越大，要更新的越多，说明错误的越多，反过来，面积越小，梯度越小！ ▽ x I \bigtriangledown_xI ▽xI 是惩罚重叠区域的，重叠的越多，梯度越小，重叠越少，梯度越大！从反向传播可以看出，bbox 面积越小越好，重叠区域越大越好！，极限情况下重叠，上面的公式是等于 0 的

3.4 UnitBox Network

在这里插入图片描述
两个分支，每个 pixel 有对应的4个坐标，和对应的 score

网络模仿 VGG，有三个输入

原图
con fidence heatmap：GT 覆盖范围内外 positive 和 negative！与原图大小一样，二值mask，positive 区域应该是 FDDB 标签中的椭圆区域
bounding box heatmaps：positive 区域中，与 GT 上下左右的距离

预测的时候，confidence heatmap 接的是 sigmoid activation function

4 Experiments

4.1 Datasets

FDDB

在这里插入图片描述
http://vis-www.cs.umass.edu/fddb/samples/

这个数据集是用椭圆来标注人脸的，以椭圆的中心为中心，生成 bbox

4.2 Effectiveness of IoU Loss

VGG 初始化，WideFace 数据集 fine-tune

Convergence

在这里插入图片描述
可以看到 IoU loss 比 L2 loss 收敛的更快、更稳，miss rate 也更低

FP-recall Curves

在这里插入图片描述

Scale Variation

在这里插入图片描述
把测试图片 resize 60-960 pixel 不等，比较 IoU loss 与 L2 loss 的 scale variation，可以看出，IoU 很强

4.3 Performance of UnitBox

在这里插入图片描述
ROC 曲线可以看出，领先还是挺明显的

5 Conclusion（own）

从反向传播的角度来解释 Loss 的作用，很不错哟
记住 FDDB 人脸数据集的标签是椭圆，有章子怡和巩俐
bbox 是用四个独立的变量来 learning 的缺点，没有大局观和整体性（优化的时候，可能一两个变量优化的完美，其它的不太行，整体效果可能也不行）
顺便再回忆一下 P-R 曲线，ROC 曲线
缺点（摘抄自深入浅出Yolo系列之Yolov3&Yolov4&Yolov5核心基础知识完整讲解）

问题1： 即状态 1 的情况，当预测框和目标框不相交时，IOU=0，无法反应两个框距离的远近，此时损失函数不可导，IOU_Loss无法优化两个框不相交的情况。
问题2： 即状态 2 和状态 3 的情况，当两个预测框大小相同，两个 IOU 也相同，IOU_Loss 无法区分两者相交情况的不同。
因此 2019 年出现了 GIOU_Loss来进行改进。
发展：Smooth L1 Loss-> IoU Loss（2016）-> GIoU Loss（2019）-> DIoU Loss（2020）->CIoU Loss（2020）

附录 A—— IoU Coding

def compute_iou(rec1, rec2):
    """
    computing IoU
    :param rec1: (y0, x0, y1, x1), which reflects
            (top, left, bottom, right)
    :param rec2: (y0, x0, y1, x1)
    :return: scala value of IoU
    """
    areas1 = (rec1[3] - rec1[1]) * (rec1[2] - rec1[0])
    areas2 = (rec2[3] - rec2[1]) * (rec2[2] - rec2[0])
    left = max(rec1[1],rec2[1])
    right = min(rec1[3],rec2[3])
    top = max(rec1[0], rec2[0])
    bottom = min(rec1[2], rec2[2])
    w = max(0, right-left)
    h = max(0, bottom-top)
    return w*h/(areas2+areas1-w*h)


if __name__ == '__main__':
    rect1 = [661, 27, 679, 47]
    # (top, left, bottom, right)
    rect2 = [662, 27, 682, 47]
    iou = compute_iou(rect1, rect2)
    print(iou)

来自目标检测算法中规则矩形和不规则四边形IOU的Python实现

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)