SSD Keras code解析
一 、模型建立
1.1 重要标志参数
aspect_ratios_per_layer=[[1.0, 2.0, 0.5],
[1.0, 2.0, 0.5, 3.0, 1.0/3.0],
[1.0, 2.0, 0.5, 3.0, 1.0/3.0],
[1.0, 2.0, 0.5, 3.0, 1.0/3.0],
[1.0, 2.0, 0.5, 3.0, 1.0/3.0],
[1.0, 2.0, 0.5],
[1.0, 2.0, 0.5]]
aspect_ratios = aspect_ratios_per_layer
for ar in aspect_ratios_per_layer:
if (1 in ar) & two_boxes_for_ar1:
n_boxes.append(len(ar) + 1)
n_predictor_layers = 7
n_classes += 1
1.2 VGG基础网络
conv1_1 = Conv2D(64, (3, 3), activation='relu', padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv1_1')(x1)
conv1_2 = Conv2D(64, (3, 3), activation='relu', padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv1_2')(conv1_1)
pool1 = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same', name='pool1')(conv1_2)
...
conv10_1 = Conv2D(128, (1, 1), activation='relu', padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv10_1')(conv9_2)
conv10_1 = ZeroPadding2D(padding=((1, 1), (1, 1)), name='conv10_padding')(conv10_1)
conv10_2 = Conv2D(256, (4, 4), strides=(1, 1), activation='relu', padding='valid', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv10_2')(conv10_1)
...
1.3 目标检测附加层
1.3.1 置信度层
如果检测目标共有
c
c
c个类别,SSD其实需要预测
c
+
1
c+1
c+1个置信度值,其中第一个置信度指的是不含目标或者属于背景的评分。后面当我们说
c
c
c个类别置信度时,请记住里面包含背景那个特殊的类别,即真实的检测类别只有
c
−
1
c-1
c−1个。在预测过程中,置信度最高的那个类别就是边界框所属的类别,特别地,当第一个置信度值最高时,表示边界框中并不包含目标。
conv4_3_norm_mbox_conf = Conv2D(n_boxes[0] * n_classes, (3, 3), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv4_3_norm_mbox_conf')(conv4_3_norm)
...
conv10_2_mbox_conf = Conv2D(n_boxes[6] * n_classes, (3, 3), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv10_2_mbox_conf')(conv10_2)
1.3.2 位置层
对于一个大小 的特征图,共有
m
×
n
m \times n
m×n个单元,每个单元设置的先验框数目记为
k
k
k,也就是n_boxes[…],那么每个单元共需要
(
c
+
4
)
×
k
(c+4) \times k
(c+4)×k个预测值,所有的单元共需要
(
c
+
4
)
×
k
m
n
(c+4) \times kmn
(c+4)×kmn个预测值,由于SSD采用卷积做检测,所以就需要
(
c
+
4
)
×
k
(c+4) \times k
(c+4)×k个卷积核完成这个特征图的检测过程。
n_boxes = [4, 6 ,6 ,6 ,6 ,4 ,4]
有了特征图之后,需要对特征图进行卷积得到检测结果,下图给出了一个
5
×
5
5 \times 5
5×5大小的特征图的检测过程。其中Priorbox是得到先验框。检测值包含两个部分:类别置信度和边界框位置,各采用一次
3
×
3
3 \times 3
3×3卷积来进行完成。每个先验框都会预测一个边界框,所以SSD512一共可以预测
64
×
64
×
4
+
32
×
32
×
6
+
16
×
16
×
6
+
8
×
8
×
6
+
4
×
4
×
6
+
2
×
2
×
4
+
1
×
1
×
4
=
24564
64 \times 64 \times4+32 \times 32 \times6+16 \times 16 \times6+8 \times 8 \times6+4 \times 4 \times6+2 \times 2 \times4+1 \times 1 \times4=24564
64×64×4+32×32×6+16×16×6+8×8×6+4×4×6+2×2×4+1×1×4=24564 个边界框,所以说SSD本质上是密集采样。
conv4_3_norm_mbox_loc = Conv2D(n_boxes[0] * 4, (3, 3), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv4_3_norm_mbox_loc')(conv4_3_norm)
...
conv10_2_mbox_loc = Conv2D(n_boxes[6] * 4, (3, 3), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv10_2_mbox_loc')(conv10_2)
1.3.3 先验框
'''
Note that this tensor does not participate in any graph computations at runtime. It is being created
as a constant once during graph creation and is just being output along with the rest of the model output
during runtime. Because of this, all logic is implemented as Numpy array operations and it is sufficient
to convert the resulting Numpy array into a Keras tensor at the very end before outputting it.
'''
conv4_3_norm_mbox_priorbox = AnchorBoxes(img_height, img_width, this_scale=scales[0], next_scale=scales[1], aspect_ratios=aspect_ratios[0],
two_boxes_for_ar1=two_boxes_for_ar1, this_steps=steps[0], this_offsets=offsets[0], clip_boxes=clip_boxes,
variances=variances, coords=coords, normalize_coords=normalize_coords, name='conv4_3_norm_mbox_priorbox')(conv4_3_norm_mbox_loc)
...
1.3.4 reshape层
conv4_3_norm_mbox_conf_reshape = Reshape((-1, n_classes), name='conv4_3_norm_mbox_conf_reshape')(conv4_3_norm_mbox_conf)
conv4_3_norm_mbox_loc_reshape = Reshape((-1, 4), name='conv4_3_norm_mbox_loc_reshape')(conv4_3_norm_mbox_loc)
conv4_3_norm_mbox_priorbox_reshape = Reshape((-1, 8), name='conv4_3_norm_mbox_priorbox_reshape')(conv4_3_norm_mbox_priorbox)
1.3.5 输出合并层
mbox_conf = Concatenate(axis=1, name='mbox_conf')([conv4_3_norm_mbox_conf_reshape,
fc7_mbox_conf_reshape,
conv6_2_mbox_conf_reshape,
conv7_2_mbox_conf_reshape,
conv8_2_mbox_conf_reshape,
conv9_2_mbox_conf_reshape,
conv10_2_mbox_conf_reshape])
mbox_loc = Concatenate(axis=1, name='mbox_loc')([conv4_3_norm_mbox_loc_reshape,
fc7_mbox_loc_reshape,
conv6_2_mbox_loc_reshape,
conv7_2_mbox_loc_reshape,
conv8_2_mbox_loc_reshape,
conv9_2_mbox_loc_reshape,
conv10_2_mbox_loc_reshape])
mbox_priorbox = Concatenate(axis=1, name='mbox_priorbox')([conv4_3_norm_mbox_priorbox_reshape,
fc7_mbox_priorbox_reshape,
conv6_2_mbox_priorbox_reshape,
conv7_2_mbox_priorbox_reshape,
conv8_2_mbox_priorbox_reshape,
conv9_2_mbox_priorbox_reshape,
conv10_2_mbox_priorbox_reshape])
mbox_conf_softmax = Activation('softmax', name='mbox_conf_softmax')(mbox_conf)
predictions = Concatenate(axis=2, name='predictions')([mbox_conf_softmax, mbox_loc, mbox_priorbox])
1.4 模型建立
if mode == 'training':
model = Model(inputs=x, outputs=predictions)
elif mode == 'inference':
'''The last axis contains the coordinates for each predicted box in the format
[class_id, confidence, xmin, ymin, xmax, ymax]'''
decoded_predictions = DecodeDetections(confidence_thresh=confidence_thresh,
iou_threshold=iou_threshold,
top_k=top_k,
nms_max_output_size=nms_max_output_size,
coords=coords,
normalize_coords=normalize_coords,
img_height=img_height,
img_width=img_width,
name='decoded_predictions')(predictions)
model = Model(inputs=x, outputs=decoded_predictions)
1.5 编解码
对于边界框的location,包含4个值
(
c
x
,
c
y
,
w
,
h
)
(c x, c y, w, h)
(cx,cy,w,h),分别表示边界框的中心坐标以及宽高。但是真实预测值其实只是边界框相对于先验框的转换值(paper里面说是offset,R-CNN中是transformation)。先验框位置用
d
=
(
d
c
x
,
d
c
y
,
d
w
,
d
h
)
d=\left(d^{c x}, d^{c y}, d^{w}, d^{h}\right)
d=(dcx,dcy,dw,dh)表示,其对应边界框用
b
=
(
b
c
x
,
b
c
y
,
b
w
,
b
h
)
b=\left(b^{c x}, b^{c y}, b^{w}, b^{h}\right)
b=(bcx,bcy,bw,bh)表示,那么边界框的预测值
l
l
l 其实是
b
b
b 相对于
d
d
d 的转换值:
l
c
x
=
(
b
c
x
−
d
c
x
)
/
d
w
,
l
c
y
=
(
b
c
y
−
d
c
y
)
/
d
h
l^{c x}=\left(b^{c x}-d^{c x}\right) / d^{w}, l^{c y}=\left(b^{c y}-d^{c y}\right) / d^{h}
lcx=(bcx−dcx)/dw,lcy=(bcy−dcy)/dh
l
w
=
log
(
b
w
/
d
w
)
,
l
h
=
log
(
b
h
/
d
h
)
l^{w}=\log \left(b^{w} / d^{w}\right), l^{h}=\log \left(b^{h} / d^{h}\right)
lw=log(bw/dw),lh=log(bh/dh)
习惯上称上面这个过程为边界框的编码(encode),预测时,你需要反向这个过程,即进行解码(decode),从预测值
l
l
l 中得到边界框的真实位置
b
b
b:
b
c
x
=
d
w
l
c
x
+
d
c
x
,
b
c
y
=
d
y
l
c
y
+
d
c
y
b^{c x}=d^{w} l^{c x}+d^{c x}, b^{c y}=d^{y} l^{c y}+d^{c y}
bcx=dwlcx+dcx,bcy=dylcy+dcy
b
w
=
d
w
exp
(
l
w
)
,
b
h
=
d
h
exp
(
l
h
)
b^{w}=d^{w} \exp \left(l^{w}\right), b^{h}=d^{h} \exp \left(l^{h}\right)
bw=dwexp(lw),bh=dhexp(lh)
在SSD的Caffe源码实现中还有trick,那就是设置variance超参数来调整检测值
b
c
x
=
d
w
(
b^{c x}=d^{w}\left(\right.
bcx=dw(variance
[
0
]
∗
l
c
x
)
+
d
c
x
,
b
c
y
=
d
y
(
\left.[0] * l^{c x}\right)+d^{c x}, b^{c y}=d^{y}\left(\right.
[0]∗lcx)+dcx,bcy=dy(variance
[
1
]
∗
l
c
y
)
+
d
c
y
\left.[\mathbf{1}] * l^{c y}\right)+d^{c y}
[1]∗lcy)+dcy
b
w
=
d
w
exp
(
b^{w}=d^{w} \exp \left(\right.
bw=dwexp(variance
[
2
]
∗
l
w
)
,
b
h
=
d
h
exp
(
\left.[2] * l^{w}\right), b^{h}=d^{h} \exp \left(\right.
[2]∗lw),bh=dhexp(variance
[
3
]
∗
l
h
)
\left.[3] * l^{h}\right)
[3]∗lh)
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)