SqueezeNet运用到Faster RCNN进行目标检测+OHEM

2023-11-17

一、SqueezeNet介绍

论文提交ICLR 2017
论文地址：https://arxiv.org/abs/1602.07360
代码地址：https://github.com/DeepScale/SqueezeNet
注：代码只放出了prototxt文件和训练好的caffemodel，因为整个网络都是基于caffe的，有这两样东西就足够了。
在这里只是简要的介绍文章的内容，具体细节的东西可以自行翻阅论文。

MOTIVATION

在相同的精度下，模型参数更少有3个好处：

More efficient distributed training
Less overhead when exporting new models to clients
Feasible FPGA and embedded deployment

即高效的分布式训练、更容易替换模型、更方便FPGA和嵌入式部署。
鉴于此，提出3种策略：

Replace 3x3 filters with 1x1 filters.
Decrease the number of input channels to 3x3 filters.
Downsample late in the network so that convolution layers have large activation maps.

即

使用1x1的核替换3x3的核，因为1x1核参数是3x3的1/9；
输入通道减少3x3核的数量，因为参数的数量由输入通道数、卷积核数、卷积核的大小决定。因此，减少1x1的核数量还不够，还需要减少输入通道数量，在文中，作者使用squeeze layer来达到这一目的；
后移池化层，得到更大的feature map。作者认为在网络的前段使用大的步长进行池化，后面的feature map将会减小，而大的feature map会有较高的准确率。

FIRE MODULE

由上面的思路，作者提出了Fire Module，结构如下：
这里写图片描述

ARCHITECTURE

这里写图片描述

关于SqueezeNet的构建细节在文中也有详细的描述

为了3x3的核输出的feature map和1x1的大小相同，padding取1（主要是为了concat）
squeezelayer和expandlayer后面跟ReLU激活函数
Dropout比例为0.5，跟在fire9后面
取消全连接，参考NIN结构
训练过程采用多项式学习率（我用来做检测时改为了step策略）
由于caffe不支持同一个卷积层既有1x1，又有3x3，所以需要concat，将两个分辨率的图在channel维度concat。这在数学上是等价的

EVALUATION

这里写图片描述

二、SqueezeNet与Faster RCNN结合

这里，我首先尝试的是使用alt-opt，但是很遗憾的是，出来的结果很糟糕，基本不能用，后来改为使用end2end，在最开始的时候，采用的就是faster rcnn官方提供的zfnet end2end训练的solvers，又很不幸的是，在网络运行大概400步后出现：

loss = NAN

遇到这个问题，把学习率改为以前的1/10，解决。
直接上prototxt文件，前面都是一样的，只需要改动zfnet中的conv1-con5部分，外加把fc6-fc7改成squeeze中的卷积链接。
prototxt太长，给出每个部分的前面和后面部分：

name: "Alex_Squeeze_v1.1"
layer {
  name: 'input-data'
  type: 'Python'
  top: 'data'
  top: 'im_info'
  top: 'gt_boxes'
  python_param {
    module: 'roi_data_layer.layer'
    layer: 'RoIDataLayer'
    param_str: "'num_classes': 4"
  }
}

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 64
    kernel_size: 3
    stride: 2
  }
}
.
.
.
layer {
  name: "drop9"
  type: "Dropout"
  bottom: "fire9/concat"
  top: "fire9/concat"
  dropout_param {
    dropout_ratio: 0.5
  }
}

#========= RPN ============

layer {
  name: "rpn_conv/3x3"
  type: "Convolution"
  bottom: "fire9/concat"
  top: "rpn/output"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 256
    kernel_size: 3 pad: 1 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}
.
.
.
layer {
  name: "drop9"
  type: "Dropout"
  bottom: "fire9/concat"
  top: "fire9/concat"
  dropout_param {
    dropout_ratio: 0.5
  }
}

#========= RPN ============

layer {
  name: "rpn_conv/3x3"
  type: "Convolution"
  bottom: "fire9/concat"
  top: "rpn/output"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 256
    kernel_size: 3 pad: 1 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}
.
.
.
layer {
  name: 'roi-data'
  type: 'Python'
  bottom: 'rpn_rois'
  bottom: 'gt_boxes'
  top: 'rois'
  top: 'labels'
  top: 'bbox_targets'
  top: 'bbox_inside_weights'
  top: 'bbox_outside_weights'
  python_param {
    module: 'rpn.proposal_target_layer'
    layer: 'ProposalTargetLayer'
    param_str: "'num_classes': 4"
  }
}

#===================== RCNN =============

layer {
  name: "roi_pool5"
  type: "ROIPooling"
  bottom: "fire9/concat"
  bottom: "rois"
  top: "roi_pool5"
  roi_pooling_param {
    pooled_w: 7
    pooled_h: 7
    spatial_scale: 0.0625 # 1/16
  }
}

layer {
  name: "conv1_last"
  type: "Convolution"
  bottom: "roi_pool5"
  top: "conv1_last"
  param { lr_mult: 1.0 }
  param { lr_mult: 1.0 }
  convolution_param {
    num_output: 1000
    kernel_size: 1
    weight_filler {
      type: "gaussian"
      mean: 0.0
      std: 0.01
    }
  }
}
layer {
  name: "relu/conv1_last"
  type: "ReLU"
  bottom: "conv1_last"
  top: "relu/conv1_last"
}


layer {
  name: "cls_score"
  type: "InnerProduct"
  bottom: "relu/conv1_last"
  top: "cls_score"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 5
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "bbox_pred"
  type: "InnerProduct"
  bottom: "relu/conv1_last"
  top: "bbox_pred"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 20
    weight_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "loss_cls"
  type: "SoftmaxWithLoss"
  bottom: "cls_score"
  bottom: "labels"
  propagate_down: 1
  propagate_down: 0
  top: "loss_cls"
  loss_weight: 1
}
layer {
  name: "loss_bbox"
  type: "SmoothL1Loss"
  bottom: "bbox_pred"
  bottom: "bbox_targets"
  bottom: "bbox_inside_weights"
  bottom: "bbox_outside_weights"
  top: "loss_bbox"
  loss_weight: 1
}

后面一部分的结构如图：
这里写图片描述
注意红圈部分，以前的fc换成了squ中的卷积层，这样网络参数大大减少，因为我改动了rpn部分选proposal的比例和数量，共采用改了70种选择，所以最后训练出来的模型为17M，比初始化4.8M大很多，不过也已经很小了。

三、SqueezeNet+Faster RCNN+OHEM

OHEM无非就是多了一个readonly部分，不过加上之后效果会好很多，和上面的方式一致，放出一部分prototxt，其他的课自行补上。从rpn那里开始，前面部分和上面给出的完全一样

#====== RoI Proposal ====================
layer {
  name: "rpn_cls_prob"
  type: "Softmax"
  bottom: "rpn_cls_score_reshape"
  top: "rpn_cls_prob"
}
layer {
  name: 'rpn_cls_prob_reshape'
  type: 'Reshape'
  bottom: 'rpn_cls_prob'
  top: 'rpn_cls_prob_reshape'
  reshape_param { shape { dim: 0 dim: 140 dim: -1 dim: 0 } }
}
layer {
  name: 'proposal'
  type: 'Python'
  bottom: 'rpn_cls_prob_reshape'
  bottom: 'rpn_bbox_pred'
  bottom: 'im_info'
  top: 'rpn_rois'
  python_param {
    module: 'rpn.proposal_layer'
    layer: 'ProposalLayer'
    param_str: "'feat_stride': 16"
  }
}
layer {
  name: 'roi-data'
  type: 'Python'
  bottom: 'rpn_rois'
  bottom: 'gt_boxes'
  top: 'rois'
  top: 'labels'
  top: 'bbox_targets'
  top: 'bbox_inside_weights'
  top: 'bbox_outside_weights'
  python_param {
    module: 'rpn.proposal_target_layer'
    layer: 'ProposalTargetLayer'
    param_str: "'num_classes': 4"
  }
}
##########################
## Readonly RoI Network ##
######### Start ##########
layer {
  name: "roi_pool5_readonly"
  type: "ROIPooling"
  bottom: "fire9/concat"
  bottom: "rois"
  top: "pool5_readonly"
  propagate_down: false
  propagate_down: false
  roi_pooling_param {
    pooled_w: 6
    pooled_h: 6
    spatial_scale: 0.0625 # 1/16
  }
}
layer {
  name: "conv1_last_readonly"
  type: "Convolution"
  bottom: "pool5_readonly"
  top: "conv1_last_readonly"
  propagate_down: false  
  param {
    name: "conv1_last_w"
  }
  param {
    name: "conv1_last_b"
  }
  convolution_param {
    num_output: 1000
    kernel_size: 1
    weight_filler {
      type: "gaussian"
      mean: 0.0
      std: 0.01
    }
  }
}
layer {
  name: "relu/conv1_last_readonly"
  type: "ReLU"
  bottom: "conv1_last_readonly"
  top: "relu/conv1_last_readonly"
  propagate_down: false
}
layer {
  name: "cls_score_readonly"
  type: "InnerProduct"
  bottom: "relu/conv1_last_readonly"
  top: "cls_score_readonly"
  propagate_down: false
  param {
    name: "cls_score_w"
  }
  param {
    name: "cls_score_b"
  }
  inner_product_param {
    num_output: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "bbox_pred_readonly"
  type: "InnerProduct"
  bottom: "relu/conv1_last_readonly"
  top: "bbox_pred_readonly"
  propagate_down: false
  param {
    name: "bbox_pred_w"
  }
  param {
    name: "bbox_pred_b"
  }
  inner_product_param {
    num_output: 16
    weight_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "cls_prob_readonly"
  type: "Softmax"
  bottom: "cls_score_readonly"
  top: "cls_prob_readonly"
  propagate_down: false
}
layer {
  name: "hard_roi_mining"
  type: "Python"
  bottom: "cls_prob_readonly"
  bottom: "bbox_pred_readonly"
  bottom: "rois"
  bottom: "labels"
  bottom: "bbox_targets"
  bottom: "bbox_inside_weights"
  bottom: "bbox_outside_weights"
  top: "rois_hard"
  top: "labels_hard"
  top: "bbox_targets_hard"
  top: "bbox_inside_weights_hard"
  top: "bbox_outside_weights_hard"
  propagate_down: false
  propagate_down: false
  propagate_down: false
  propagate_down: false
  propagate_down: false
  propagate_down: false
  propagate_down: false
  python_param {
    module: "roi_data_layer.layer"
    layer: "OHEMDataLayer"
    param_str: "'num_classes': 4"
  }
}
########## End ###########
## Readonly RoI Network ##
##########################
#===================== RCNN =============
layer {
  name: "roi_pool5"
  type: "ROIPooling"
  bottom: "fire9/concat"
  bottom: "rois_hard"
  top: "roi_pool5"
  propagate_down: true
  propagate_down: false
  roi_pooling_param {
    pooled_w: 7
    pooled_h: 7
    spatial_scale: 0.0625 # 1/16
  }
}
layer {
  name: "conv1_last"
  type: "Convolution"
  bottom: "roi_pool5"
  top: "conv1_last"
  param { 
      lr_mult: 1.0 
      name: "conv1_last_w"
      }
  param { 
      lr_mult: 1.0 
      name: "conv1_last_b"
      }
  convolution_param {
    num_output: 1000
    kernel_size: 1
    weight_filler {
      type: "gaussian"
      mean: 0.0
      std: 0.01
    }
  }
}
layer {
  name: "relu/conv1_last"
  type: "ReLU"
  bottom: "conv1_last"
  top: "relu/conv1_last"
}
layer {
  name: "cls_score"
  type: "InnerProduct"
  bottom: "relu/conv1_last"
  top: "cls_score"
  param {
    lr_mult: 1
    name: "cls_score_w"
  }
  param {
    lr_mult: 2
    name: "cls_score_b"
  }
  inner_product_param {
    num_output: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "bbox_pred"
  type: "InnerProduct"
  bottom: "relu/conv1_last"
  top: "bbox_pred"
  param {
    lr_mult: 1
    name: "bbox_pred_w"
  }
  param {
    lr_mult: 2
    name: "bbox_pred_b"
  }
  inner_product_param {
    num_output: 16
    weight_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "loss_cls"
  type: "SoftmaxWithLoss"
  bottom: "cls_score"
  bottom: "labels_hard"
  propagate_down: true
  propagate_down: false
  top: "loss_cls"
  loss_weight: 1
}
layer {
  name: "loss_bbox"
  type: "SmoothL1Loss"
  bottom: "bbox_pred"
  bottom: "bbox_targets_hard"
  bottom: "bbox_inside_weights_hard"
  bottom: "bbox_outside_weights_hard"
  top: "loss_bbox"
  loss_weight: 1
  propagate_down: false
  propagate_down: false
  propagate_down: false
  propagate_down: false
}

结构图如下：
这里写图片描述
比前面训练的多一个readonly部分，具体可参考论文：
Training Region-based Object Detectors with Online Hard Example Mining
https://arxiv.org/abs/1604.03540

至此，SqueezeNet+Faster RCNN 框架便介绍完了，运行速度在GPU下大概是ZF的5倍，CPU下大概为2。5倍。

原文链接：

http://blog.csdn.net/u011956147/article/details/53714616

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)