FasterRCNN详解

2023-11-08

1.2.2 FasterRCNN

FasterRCNN在FastRCNN的基础上,实现端到端的训练。算法分为3个部分。主干网络提取特征、RPN生成建议框、RCNN进行分类和回归。

FasterRCNN优点:

  1. 检测精度高。
  2. RPN网络生成先验框。
  3. 通用型、鲁棒性强。
  4. 可优化空间充足。

FasterRCNN缺点:

  1. 主干网络只提取一个特征。
  2. NMS对遮挡物不友好,可能造成漏检。
  3. RoI Pooling两次取证带精度损失。
  4. 网络最后使用全连接导致网络参数量大。
  5. 检测速度相对慢。

1 模型

1.1 主干网络VGG16 or ResNet50.

input[800,600,3] --> VGGNet(4次下采样) / ResNet50–> Output[37,50,512]

(1) VGG16代理流程
VGG16以卷积和池化作为基本结构,输入[600,600,3]进行5次下采样得到base_layers[37,37,512]。

'''
base_layers = VGG16(inputs)
x --> Conv2D*2 + MaxPoolong2D --> Conv2D*2 + MaxPoolong2D  --> Conv2D*3 + MaxPoolong2D --> Conv2D*3 + MaxPoolong2D --> Conv2D*3  
'''
def VGG16(inputs):
    x = Conv2D(64,(3,3),activation = 'relu',padding = 'same',name = 'block1_conv1')(inputs)
    x = Conv2D(64,(3,3),activation = 'relu',padding = 'same', name = 'block1_conv2')(x)
    x = MaxPooling2D((2,2), strides = (2,2), name = 'block1_pool')(x)

    x = Conv2D(128,(3,3),activation = 'relu',padding = 'same',name = 'block2_conv1')(x)
    x = Conv2D(128,(3,3),activation = 'relu',padding = 'same',name = 'block2_conv2')(x)
    x = MaxPooling2D((2,2),strides = (2,2), name = 'block2_pool')(x)

    x = Conv2D(256,(3,3),activation = 'relu',padding = 'same',name = 'block3_conv1')(x)
    x = Conv2D(256,(3,3),activation = 'relu',padding = 'same',name = 'block3_conv2')(x)
    x = Conv2D(256,(3,3),activation = 'relu',padding = 'same',name = 'block3_conv3')(x)
    x = MaxPooling2D((2,2),strides = (2,2), name = 'block3_pool')(x)

    # 第四个卷积部分
    # 14,14,512
    x = Conv2D(512,(3,3),activation = 'relu',padding = 'same', name = 'block4_conv1')(x)
    x = Conv2D(512,(3,3),activation = 'relu',padding = 'same', name = 'block4_conv2')(x)
    x = Conv2D(512,(3,3),activation = 'relu',padding = 'same', name = 'block4_conv3')(x)
    x = MaxPooling2D((2,2),strides = (2,2), name = 'block4_pool')(x)

    # 第五个卷积部分
    # 7,7,512
    x = Conv2D(512,(3,3),activation = 'relu', padding = 'same', name = 'block5_conv1')(x)
    x = Conv2D(512,(3,3),activation = 'relu', padding = 'same', name = 'block5_conv2')(x)
    x = Conv2D(512,(3,3),activation = 'relu', padding = 'same', name = 'block5_conv3')(x)    

    return x

(2) ResNet50代码流程
ResNet50以卷积和池化作为基本结构,输入[600,600,3]进行4次下采样得到base_layers[38,38,1024]。ResNet50有conv_block 、identity_block两个基础结构。

'''
base_layers = ResNet50(inputs)
conv_block :BottleNeck + ResNet(下采样)
identity_block :BottleNeck + ResNet(无下采样)
ResNet50 :ZCBAM --> conv_block + identity_block *2 --> conv_block + identity_block *3 --> conv_block + identity_block *5   
'''
def identity_block(input_tensor, kernel_size, filters, stage, block):

    filters1, filters2, filters3 = filters

    conv_name_base  = 'res' + str(stage) + block + '_branch'
    bn_name_base    = 'bn' + str(stage) + block + '_branch'

    x = Conv2D(filters1, (1, 1), kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2a')(input_tensor)
    x = BatchNormalization(name=bn_name_base + '2a')(x)
    x = Activation('relu')(x)

    x = Conv2D(filters2, kernel_size, padding='same', kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2b')(x)
    x = BatchNormalization(name=bn_name_base + '2b')(x)
    x = Activation('relu')(x)

    x = Conv2D(filters3, (1, 1), kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2c')(x)
    x = BatchNormalization(name=bn_name_base + '2c')(x)

    x = layers.add([x, input_tensor])
    x = Activation('relu')(x)
    return x


def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2)):

    filters1, filters2, filters3 = filters

    conv_name_base  = 'res' + str(stage) + block + '_branch'
    bn_name_base    = 'bn' + str(stage) + block + '_branch'

    x = Conv2D(filters1, (1, 1), strides=strides, kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2a')(input_tensor)
    x = BatchNormalization(name=bn_name_base + '2a')(x)
    x = Activation('relu')(x)

    x = Conv2D(filters2, kernel_size, padding='same', kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2b')(x)
    x = BatchNormalization(name=bn_name_base + '2b')(x)
    x = Activation('relu')(x)

    x = Conv2D(filters3, (1, 1), kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2c')(x)
    x = BatchNormalization(name=bn_name_base + '2c')(x)

    shortcut = Conv2D(filters3, (1, 1), strides=strides, kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '1')(input_tensor)
    shortcut = BatchNormalization(name=bn_name_base + '1')(shortcut)

    x = layers.add([x, shortcut])
    x = Activation('relu')(x)
    return x

def ResNet50(inputs):
    #-----------------------------------#
    #   假设输入进来的图片是600,600,3
    #-----------------------------------#
    # 600,600,3 -> 300,300,64
    x = ZeroPadding2D((3, 3))(inputs)
    x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1')(x)
    x = BatchNormalization(name='bn_conv1')(x)
    x = Activation('relu')(x)

    # 300,300,64 -> 150,150,64
    x = MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)

    # 150,150,64 -> 150,150,256
    x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')

    # 150,150,256 -> 75,75,512
    x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')

    # 75,75,512 -> 38,38,1024
    x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')

    # 最终获得一个38,38,1024的共享特征层
    return x

1.2 RPN生成建议框

  1. 在FeatureMap上每个点生成9个Anchors。
  2. FeatureMap分别进行卷积得到置信度和预测偏移。
  3. Anchors与标签匹配得到正负样本,对预测偏移和置信度进行Loss。
  4. 根据预测得分、预测偏移抠图得到Proposals。
'''
rpn = get_rpn(base_layers, num_anchors)
base_layers[38,38,1024] -->  Conv2D((3, 3),512) -->  Conv2D((1, 1),num_anchors) +  Conv2D((3, 3),num_anchors * 4) --> Reshape --> x_class, x_regr]
'''
def get_rpn(base_layers, num_anchors):
    #----------------------------------------------------#
    #   利用一个512通道的3x3卷积进行特征整合
    #----------------------------------------------------#
    x = Conv2D(512, (3, 3), padding='same', activation='relu', kernel_initializer=random_normal(stddev=0.02), name='rpn_conv1')(base_layers)

    #----------------------------------------------------#
    #   利用一个1x1卷积调整通道数,获得预测结果
    #----------------------------------------------------#
    x_class = Conv2D(num_anchors, (1, 1), activation = 'sigmoid', kernel_initializer=random_normal(stddev=0.02), name='rpn_out_class')(x)
    x_regr  = Conv2D(num_anchors * 4, (1, 1), activation = 'linear', kernel_initializer=random_normal(stddev=0.02), name='rpn_out_regress')(x)
    
    x_class = Reshape((-1, 1),name="classification")(x_class)
    x_regr  = Reshape((-1, 4),name="regression")(x_regr)
    return [x_class, x_regr]

1.3 RCNN进行分类和回归

  1. 对上一步的到Proposals进行卷积得到分类结果和预测框的偏移。
  2. 根据到Proposals和groudtruth 的IoU划分正负样本。
  3. 根据正负样本,计算Loss。
'''
classifier  = get_vgg_classifier(base_layers, roi_input, 7, num_classes)

base_layers, input_rois --> RoiPoolingConv(roi_size) -->  vgg_classifier_layers --> TimeDistributed(Dense) +  TimeDistributed(Dense)  --> out_class, out_regr 
'''
def get_vgg_classifier(base_layers, input_rois, roi_size=7, num_classes=21):  
    # batch_size, 37, 37, 512 -> batch_size, num_rois, 7, 7, 512
    out_roi_pool = RoiPoolingConv(roi_size)([base_layers, input_rois])

    # batch_size, num_rois, 7, 7, 512 -> batch_size, num_rois, 4096
    out = vgg_classifier_layers(out_roi_pool)

    # batch_size, num_rois, 4096 -> batch_size, num_rois, num_classes
    out_class   = TimeDistributed(Dense(num_classes, activation='softmax', kernel_initializer=random_normal(stddev=0.02)), name='dense_class_{}'.format(num_classes))(out)
    # batch_size, num_rois, 4096 -> batch_size, num_rois, 4 * (num_classes-1)
    out_regr    = TimeDistributed(Dense(4 * (num_classes-1), activation='linear', kernel_initializer=random_normal(stddev=0.02)), name='dense_regress_{}'.format(num_classes))(out)
    return [out_class, out_regr]

RoIPooling : batch_size, 37, 37, 512 -> batch_size, num_rois, 7, 7, 512

'''
# batch_size, 37, 37, 512 -> batch_size, num_rois, 7, 7, 512
# roi_input   = Input(shape=(None, 4))
# 用RPN得到的建议框在FeatureMap上截取下来,并Pooling。
'''


out = vgg_classifier_layers(out_roi_pool)

'''
def vgg_classifier_layers(x):
    # num_rois, 14, 14, 1024 -> num_rois, 7, 7, 2048
    x = TimeDistributed(Flatten(name='flatten'))(x)
    x = TimeDistributed(Dense(4096, activation='relu'), name='fc1')(x)
    x = TimeDistributed(Dense(4096, activation='relu'), name='fc2')(x)
    return x
'''

2 预测

2.1 预测流程

r_image = frcnn.detect_image(image)
  1. 对输入原图像处理。
  2. 获得rpn网络预测结果和base_layer。
  3. 生成先验框并解码。
  4. 利用建议框获得classifier网络预测结果。
  5. 利用classifier的预测结果对建议框进行解码,获得预测框。
  6. 图像绘制。

(1) rpn网络进行预测得到置信度和预测偏移值

'''
preds = self.model_rpn.predict(photo)  
1. model_rpn.predict: photo[1,600,600,3] --> model_rpn --> preds[x_class, x_regr, base_layers]

2. model_rpn: input[1,600,600,3]  -->  ResNet50(inputs) --> base_layers + num_anchors=9  -->  get_rpn(base_layers, num_anchors) --> rpn

3. ResNet50: (inputs[1,600,600,3]) --> ZCBAM(None, 150, 150, 64) --> conv_block + identity_block*2   (None, 150, 150, 256) --> conv_block + identity_block*3   (None, 75, 75, 512)--> conv_block + identity_block*5  --> base_layers(None, 38, 38, 1024)
 
4. get_rpn:  base_layers(None, 38, 38, 1024)-->  Conv2D(512) (None, 38, 38, 512)--> Conv2D(num_anchors)(None, 38, 38, 9),Conv2D(num_anchors * 4)(None, 38, 38, 36) --> x_class(None, 12996, 1) , x_regr(None, 12996, 4) , base_layers(None, 38, 38, 1024)
'''

(1.1) ResNet50

'''
base_layers = ResNet50(inputs) # [1,600,600,3] --> (None, 38, 38, 1024)
'''
def ResNet50(inputs):
    # 输入 600*600*3
    img_input = inputs

    x = ZeroPadding2D((3, 3))(img_input)
    x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1')(x)                  # 300*300*64
    x = BatchNormalization(name='bn_conv1')(x)
    x = Activation('relu')(x)
    x = MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)              # 150*150*64

    x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))  # 150*150*256
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')

    x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')                 # 75*75*512
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')

    x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')                 # 38*38*1024
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')

    return x   # (None, 38, 38, 1024)

(1.2) get_rpn

'''
rpn = get_rpn(base_layers, num_anchors) # (None, 38, 38, 1024) --> x_class(None, 12996, 1) , x_regr(None, 12996, 4) 
'''
def get_rpn(base_layers, num_anchors):
    # 1 base_layers进行3*3卷积
    x = Conv2D(512, (3, 3), padding='same', activation='relu', kernel_initializer='normal', name='rpn_conv1')(base_layers)
    # 2 得到每个框的置信度和框坐标
    x_class = Conv2D(num_anchors, (1, 1), activation='sigmoid', kernel_initializer='uniform', name='rpn_out_class')(x)
    x_regr = Conv2D(num_anchors * 4, (1, 1), activation='linear', kernel_initializer='zero', name='rpn_out_regress')(x)
    
    x_class = Reshape((-1,1),name="classification")(x_class)  # 如果包含物体,x_class的值接近1。
    x_regr = Reshape((-1,4),name="regression")(x_regr)
    return [x_class, x_regr, base_layers] #  x_class(None, 12996, 1) , x_regr(None, 12996, 4) , base_layers(None, 38, 38, 1024)

(2) 生成先验框

  1. 生成框边长。
  2. 边长和中心点结合,每个像素点生成框
  3. 缩放框在0,1之间
'''
anchors = get_anchors(self.get_img_output_length(width,height),width,height)
self.get_img_output_length(width,height): 根据输入特征层的大小得到输出特征的大小。600 --> [300,150,75,38]
[height:600,width:600]
'''
# 生成先验框
def get_anchors(shape,width,height):
    ''' 获得先验框
    shape: featuremap.shape ,img.width,img.height '''
    # 1 生成框边长
    anchors = generate_anchors()
    # 2 边长和中心点结合,每个像素点生成框
    network_anchors = shift(shape,anchors)
    # 3 缩放框在0,1之间
    network_anchors[:,0] = network_anchors[:,0]/width
    network_anchors[:,1] = network_anchors[:,1]/height
    network_anchors[:,2] = network_anchors[:,2]/width
    network_anchors[:,3] = network_anchors[:,3]/height
    network_anchors = np.clip(network_anchors,0,1)  # 框的坐标限制在0~1范围内。
    return network_anchors

(2.1) 生成框边长

  1. 根据sizes、ratios确定宽高array([[ 128., 256., 512., 128., 256., 512., 256., 512., 1024.],[ 128., 256., 512., 256., 512., 1024., 128., 256., 512.]])
  2. 把宽高变为原来的1/2,方便后面根据框的中心点和宽高转化为框的左上角和右下角坐标。
'''
anchors = generate_anchors()
'''
def generate_anchors(sizes=None, ratios=None):
    ''' 生成大小不同的先验框的边长
    一共九个
    [[ -64.,  -64.,   64.,   64.],
       [-128., -128.,  128.,  128.],
       [-256., -256.,  256.,  256.],
       [ -64., -128.,   64.,  128.],
       [-128., -256.,  128.,  256.],
       [-256., -512.,  256.,  512.],
       [-128.,  -64.,  128.,   64.],
       [-256., -128.,  256.,  128.],
       [-512., -256.,  512.,  256.]]
    '''
    if sizes is None:
        sizes = config.anchor_box_scales  # [128, 256, 512]

    if ratios is None:
        ratios =  config.anchor_box_ratios #  [[1, 1], [1, 2], [2, 1]]
    # 框的数目
    num_anchors = len(sizes) * len(ratios) # 3*3
    # 放置框
    anchors = np.zeros((num_anchors, 4))  #[9,4]

    anchors[:, 2:] = np.tile(sizes, (2, len(ratios))).T  # 把size复制成[2,3]
    
    for i in range(len(ratios)):
        anchors[3*i:3*i+3, 2] = anchors[3*i:3*i+3, 2]*ratios[i][0]
        anchors[3*i:3*i+3, 3] = anchors[3*i:3*i+3, 3]*ratios[i][1]
    

    anchors[:, 0::2] -= np.tile(anchors[:, 2] * 0.5, (2, 1)).T
    anchors[:, 1::2] -= np.tile(anchors[:, 3] * 0.5, (2, 1)).T
    return anchors

(2.2) 边长和中心点结合,每个像素点生成框

  1. 把原图根据特征层次的宽高把img划分为38*38的小方格。
  2. 确定每个方格中心点的坐标。
  3. 根据中心点坐标和宽高计算先验框的左上角和右下角坐标。
'''
network_anchors = shift(shape,anchors)
'''
def shift(shape, anchors, stride=config.rpn_stride):
    ''' 生成网格中心点 self.rpn_stride = 16 ,根据边长和中心点生成框 What\How \Why
    base_layers(None, 38, 38, 1024)
    self.rpn_stride = 16
    [0,1,2,3,....37]
    [0.5,1.5,2.5,....,37.5]
    [0.5,1.5,2.5,....,37.5]*stride
    '''

    shift_x = (np.arange(0, shape[0], dtype=keras.backend.floatx()) + 0.5) * stride
    shift_y = (np.arange(0, shape[1], dtype=keras.backend.floatx()) + 0.5) * stride

    shift_x, shift_y = np.meshgrid(shift_x, shift_y)

    shift_x = np.reshape(shift_x, [-1])  # (1444,)
    shift_y = np.reshape(shift_y, [-1])  # (1444,)

    shifts = np.stack([
        shift_x,
        shift_y,
        shift_x,
        shift_y
    ], axis=0)  # (4, 1444)

    shifts            = np.transpose(shifts)  # (1444, 4)
    number_of_anchors = np.shape(anchors)[0]  # 9

    k = np.shape(shifts)[0]
    # 下面两步操作得到框的左上角和右下角坐标 [1,9,4] + [1444,1,4]
    shifted_anchors = np.reshape(anchors, [1, number_of_anchors, 4]) + np.array(np.reshape(shifts, [k, 1, 4]), keras.backend.floatx())
    shifted_anchors = np.reshape(shifted_anchors, [k * number_of_anchors, 4])
    return shifted_anchors  # (12996, 4)

(3) 将预测结果进行解码 + nms筛选得到建议框

  1. 解码。
  2. 对解码后对框处理:取出得分高于confidence_threshold的框、进行iou的非极大抑制。
  3. 按照置信度进行排序、选出置信度最大的keep_top_k个。
'''
rpn_results = self.bbox_util.detection_out(preds,anchors,1,confidence_threshold=0.8)

predspreds : x_class(None, 12996, 1) , x_regr(None, 12996, 4) 

anchors(12996, 4)
'''
    def detection_out(self, predictions, mbox_priorbox, num_classes, keep_top_k=300,
                        confidence_threshold=0.5):
        ''' 对PRN网络框预测结果用先验框解码、NMS '''
        mbox_conf = predictions[0] # 类别
        mbox_loc = predictions[1]  # 网络预测的结果
        # 先验框数量
        mbox_priorbox = mbox_priorbox
        results = []
        # 对每一个图片进行处理
        for i in range(len(mbox_loc)):
            results.append([])
            # 1 解码 **************************得到预测框的左上角和右下角
            decode_bbox = self.decode_boxes(mbox_loc[i], mbox_priorbox)
            # 2 对解码后对框处理
            for c in range(num_classes):
                c_confs = mbox_conf[i, :, c]
                c_confs_m = c_confs > confidence_threshold
                if len(c_confs[c_confs_m]) > 0:
                    # 取出得分高于confidence_threshold的框
                    boxes_to_process = decode_bbox[c_confs_m]
                    confs_to_process = c_confs[c_confs_m]
                    # 进行iou的非极大抑制
                    feed_dict = {self.boxes: boxes_to_process,
                                    self.scores: confs_to_process}
                    # 把标签、置信度、框取出来
                    idx = self.sess.run(self.nms, feed_dict=feed_dict)
                    # 取出在非极大抑制中效果较好的内容
                    good_boxes = boxes_to_process[idx]
                    confs = confs_to_process[idx][:, None]
                    # 将label、置信度、框的位置进行堆叠。
                    labels = c * np.ones((len(idx), 1))
                    c_pred = np.concatenate((labels, confs, good_boxes),
                                            axis=1)
                    # 添加进result里
                    results[-1].extend(c_pred)

            if len(results[-1]) > 0:
                # 按照置信度进行排序
                results[-1] = np.array(results[-1])
                argsort = np.argsort(results[-1][:, 1])[::-1]
                results[-1] = results[-1][argsort]
                # 选出置信度最大的keep_top_k个
                results[-1] = results[-1][:keep_top_k]
        # 获得,在所有预测结果里面,置信度比较高的框
        # 还有,利用先验框和RPN网络的预测结果,处理获得了真实框(预测框)的位置
        return results

(3.1) 解码得到预测框的左上角和右下角

  1. 获取先验框的宽高和中心点。
  2. 根据上一步结果和预测偏移带入解码公式得到预测框的中心和宽高值。
  3. 预测框的值限制在0~1之间。
'''
decode_bbox = self.decode_boxes(mbox_loc[i], mbox_priorbox)
'''
def decode_boxes(self, mbox_loc, mbox_priorbox):
        #  1.1 获得先验框的宽与高
        prior_width = mbox_priorbox[:, 2] - mbox_priorbox[:, 0]
        prior_height = mbox_priorbox[:, 3] - mbox_priorbox[:, 1]

        # 1.2 获得先验框的中心点
        prior_center_x = 0.5 * (mbox_priorbox[:, 2] + mbox_priorbox[:, 0])
        prior_center_y = 0.5 * (mbox_priorbox[:, 3] + mbox_priorbox[:, 1])

        # 2 预测的真实框距离先验框中心的xy轴偏移情况
        decode_bbox_center_x = mbox_loc[:, 0] * prior_width / 4
        decode_bbox_center_x += prior_center_x
        decode_bbox_center_y = mbox_loc[:, 1] * prior_height / 4
        decode_bbox_center_y += prior_center_y
        
        # 预测的真实框的宽与高的求取
        decode_bbox_width = np.exp(mbox_loc[:, 2] / 4)
        decode_bbox_width *= prior_width
        decode_bbox_height = np.exp(mbox_loc[:, 3] /4)
        decode_bbox_height *= prior_height

        # 获取预测的真实框的左上角与右下角
        decode_bbox_xmin = decode_bbox_center_x - 0.5 * decode_bbox_width
        decode_bbox_ymin = decode_bbox_center_y - 0.5 * decode_bbox_height
        decode_bbox_xmax = decode_bbox_center_x + 0.5 * decode_bbox_width
        decode_bbox_ymax = decode_bbox_center_y + 0.5 * decode_bbox_height

        # 真实框的左上角与右下角进行堆叠
        decode_bbox = np.concatenate((decode_bbox_xmin[:, None],
                                      decode_bbox_ymin[:, None],
                                      decode_bbox_xmax[:, None],
                                      decode_bbox_ymax[:, None]), axis=-1)
        # 防止超出0与1
        decode_bbox = np.minimum(np.maximum(decode_bbox, 0.0), 1.0)
        return decode_bbox

(3.2) NMS
上一步解码后得到预测框,然后遍历预测框的每一个类别,选出置信度大于阈值的预测框,再根据上一步的结果进行NMS删除掉重复的框。

  1. 按照置信度对同一类别的框从大到小排列。
  2. 计算所有框与第一个框的IoU,如果IoU小于阈值,则保留框,反之删除。
  3. 对上一步得到的框重复以上操作,直到待检测的框数量为0.
'''
self.nms = tf.image.non_max_suppression(self.boxes, self.scores,
                                                self._top_k,
                                                iou_threshold=self._nms_thresh)
'''

(4) P_cls种类,置信度, P_regr

  1. 抠图。遍历建议框,在特征图上截取框对应的位置,然后把截取的图统一到[1,32,14,14,1024]
'''
 [P_cls, P_regr] = self.model_classifier.predict([base_layer,ROIs]) 
'''
def get_classifier(base_layers, input_rois, num_rois, nb_classes=21, trainable=False):
    ''' roi --> cls+reg'''
    pooling_regions = 14
    input_shape = (num_rois, 14, 14, 1024)
    # base_layers[38,38,1024], input_rois[num_prior,4]  num_prior=32,out_roi_pool.shape[1,32,14,14,1024]
    # 1 roiPooling,base_layers[38,38,1024], input_rois[-1,4] ,pooling_regions=14, num_rois=32
    out_roi_pool = RoiPoolingConv(pooling_regions, num_rois)([base_layers, input_rois]) # input_rois是建议框
    # 2
    out = classifier_layers(out_roi_pool, input_shape=input_shape, trainable=True)
    out = TimeDistributed(Flatten())(out)
    out_class = TimeDistributed(Dense(nb_classes, activation='softmax', kernel_initializer='zero'), name='dense_class_{}'.format(nb_classes))(out)
    out_regr = TimeDistributed(Dense(4 * (nb_classes-1), activation='linear', kernel_initializer='zero'), name='dense_regress_{}'.format(nb_classes))(out)
    return [out_class, out_regr]  # 21, 20*4

(4.1) 抠图

'''
 out_roi_pool = RoiPoolingConv(pooling_regions, num_rois)([base_layers, input_rois])
'''
   def call(self, x, mask=None):

        assert(len(x) == 2)

        img = x[0]   # featureMap
        rois = x[1]  # 建议框

        outputs = []
        # 遍历建议框
        for roi_idx in range(self.num_rois):
            # x,y左上角,wh宽高
            x = rois[0, roi_idx, 0]
            y = rois[0, roi_idx, 1]
            w = rois[0, roi_idx, 2]
            h = rois[0, roi_idx, 3]

            x = K.cast(x, 'int32')
            y = K.cast(y, 'int32')
            w = K.cast(w, 'int32')
            h = K.cast(h, 'int32')
            # 在特征图上截取
            rs = tf.image.resize_images(img[:, y:y+h, x:x+w, :], (self.pool_size, self.pool_size))
            outputs.append(rs)

        final_output = K.concatenate(outputs, axis=0)
        final_output = K.reshape(final_output, (1, self.num_rois, self.pool_size, self.pool_size, self.nb_channels))

        final_output = K.permute_dimensions(final_output, (0, 1, 2, 3, 4))

        return final_output  # [1,32,14,14,1024]

(4.3) 对抠图卷积提取特征

'''
out = classifier_layers(out_roi_pool, input_shape=input_shape, trainable=True)

[1,32,14,14,1024] --> (None, 32, 1, 1, 204)
'''
def classifier_layers(x, input_shape=(32, 14, 14, 1024), trainable=False):
    x = conv_block_td(x, 3, [512, 512, 2048], stage=5, block='a', input_shape=input_shape, strides=(2, 2), trainable=trainable)
    x = identity_block_td(x, 3, [512, 512, 2048], stage=5, block='b', trainable=trainable)
    x = identity_block_td(x, 3, [512, 512, 2048], stage=5, block='c', trainable=trainable)
    x = TimeDistributed(AveragePooling2D((7, 7)), name='avg_pool')(x)  # 对第二维度处理

    return x   # (None, 32, 1, 1, 204)

(4.4) 分类、回归

'''
 (None, 32, 1, 1, 204) --> (None, 1, 6528)  --> (None, 1, 21) + (None, 1, 80) 
'''
out = TimeDistributed(Flatten())(out)  # (None, 1, 6528)
out_class = TimeDistributed(Dense(nb_classes, activation='softmax', kernel_initializer='zero'), name='dense_class_{}'.format(nb_classes))(out)   # (None, 1, 21) 
out_regr = TimeDistributed(Dense(4 * (nb_classes-1), activation='linear', kernel_initializer='zero'), name='dense_regress_{}'.format(nb_classes))(out)  #(None, 1, 80) 

3 训练

3.1 训练流程

  1. 加载模型
  2. 生成数据集
  3. 设置参数
  4. 训练模型

3.2 生成标签

  1. 数据增强。
  2. 获取先验框。
  3. 真实框与先验框配对,生成标签。
  4. 确定正负样本。
'''
gen = Generator(bbox_util, lines, NUM_CLASSES, solid=True)
rpn_train = gen.generate()
'''
    def generate(self):
        while True:
            shuffle(self.train_lines)
            lines = self.train_lines
            for annotation_line in lines:  
                # 数据增强
                img,y=self.get_random_data(annotation_line)
                height, width, _ = np.shape(img)
                
                if len(y)==0:
                    continue
                boxes = np.array(y[:,:4],dtype=np.float32)
                boxes[:,0] = boxes[:,0]/width
                boxes[:,1] = boxes[:,1]/height
                boxes[:,2] = boxes[:,2]/width
                boxes[:,3] = boxes[:,3]/height
                
                box_heights = boxes[:,3] - boxes[:,1]
                box_widths = boxes[:,2] - boxes[:,0]
                if (box_heights<=0).any() or (box_widths<=0).any():
                    continue

                y[:,:4] = boxes[:,:4]
                
                # 获取先验框
                anchors = get_anchors(get_img_output_length(width,height),width,height)
                
                # 计算真实框对应的先验框,与这个先验框应当有的预测结果
                assignment = self.bbox_util.assign_boxes(y,anchors)

                num_regions = 256
                
                classification = assignment[: , 4]
                regression = assignment[:,:]
                
                mask_pos = classification[:] > 0
                num_pos = len(classification[mask_pos])
                if num_pos > num_regions/2:
                    val_locs = random.sample(range(num_pos), int(num_pos - num_regions/2))
                    classification[mask_pos][val_locs] = -1
                    regression[mask_pos][val_locs,-1] = -1
                
                mask_neg = classification[:]==0
                num_neg = len(classification[mask_neg])
                if len(classification[mask_neg]) + num_pos > num_regions:
                    val_locs = random.sample(range(num_neg), int(num_neg - num_pos))
                    classification[mask_neg][val_locs] = -1
                    
                classification = np.reshape(classification,[-1,1])
                regression = np.reshape(regression,[-1,5])

                tmp_inp = np.array(img)
                tmp_targets = [np.expand_dims(np.array(classification,dtype=np.float32),0),np.expand_dims(np.array(regression,dtype=np.float32),0)]

                yield preprocess_input(np.expand_dims(tmp_inp,0)), tmp_targets, np.expand_dims(y,0)

(1) 数据增强

  1. 重新设置图片宽高(800, 600)
  2. 反转
  3. 调整色调、亮度、饱和度
  4. 修正框
'''
img,y=self.get_random_data(annotation_line)
'''
def get_random_data(self, annotation_line, random=True, jitter=.1, hue=.1, sat=1.1, val=1.1, proc_img=True):
        '''r实时数据增强随机预处理'''
        line = annotation_line.split()
        image = Image.open(line[0])
        iw, ih = image.size
        if self.solid:
            w,h = self.solid_shape
        else:
            w, h = get_new_img_size(iw, ih)
        box = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])

        # resize image
        new_ar = w/h * rand(1-jitter,1+jitter)/rand(1-jitter,1+jitter)
        scale = rand(.25, 2)
        if new_ar < 1:
            nh = int(scale*h)
            nw = int(nh*new_ar)
        else:
            nw = int(scale*w)
            nh = int(nw/new_ar)
        image = image.resize((nw,nh), Image.BICUBIC)

        # place image
        dx = int(rand(0, w-nw))
        dy = int(rand(0, h-nh))
        new_image = Image.new('RGB', (w,h), (128,128,128))
        new_image.paste(image, (dx, dy))
        image = new_image

        # flip image or not
        flip = rand()<.5
        if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)

        # distort image
        hue = rand(-hue, hue)
        sat = rand(1, sat) if rand()<.5 else 1/rand(1, sat)
        val = rand(1, val) if rand()<.5 else 1/rand(1, val)
        x = rgb_to_hsv(np.array(image)/255.)
        x[..., 0] += hue
        x[..., 0][x[..., 0]>1] -= 1
        x[..., 0][x[..., 0]<0] += 1
        x[..., 1] *= sat
        x[..., 2] *= val
        x[x>1] = 1
        x[x<0] = 0
        image_data = hsv_to_rgb(x)*255 # numpy array, 0 to 1

        # correct boxes
        box_data = np.zeros((len(box),5))
        if len(box)>0:
            np.random.shuffle(box)
            box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
            box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
            # flip
            if flip: box[:, [0,2]] = w - box[:, [2,0]]
            # 过滤
            box[:, 0:2][box[:, 0:2]<0] = 0
            box[:, 2][box[:, 2]>w] = w
            box[:, 3][box[:, 3]>h] = h
            box_w = box[:, 2] - box[:, 0]
            box_h = box[:, 3] - box[:, 1]
            box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box
            box_data = np.zeros((len(box),5))
            box_data[:len(box)] = box
        if len(box) == 0:
            return image_data, []

        if (box_data[:,:4]>0).any():
            return image_data, box_data
        else:
            return image_data, []

(2) 获取先验框

  1. 生成框边长。
  2. 边长和中心点结合,每个像素点生成框。
  3. 缩放框在0,1之间。
'''
anchors = get_anchors(get_img_output_length(width,height),width,height)
'''
def get_anchors(shape,width,height):
    ''' 获得先验框
    shape: featuremap.shape ,img.width,img.height '''
    # 1 生成框边长
    anchors = generate_anchors()
    # 2 边长和中心点结合,每个像素点生成框
    network_anchors = shift(shape,anchors)
    # 3 缩放框在0,1之间
    network_anchors[:,0] = network_anchors[:,0]/width
    network_anchors[:,1] = network_anchors[:,1]/height
    network_anchors[:,2] = network_anchors[:,2]/width
    network_anchors[:,3] = network_anchors[:,3]/height
    network_anchors = np.clip(network_anchors,0,1)
    return network_anchors

(3) 真实框与先验框配对,生成标签

  1. 确忽略的框。
  2. 找出正样本,并使符合要求每一个先验框只负责一个真实框。
'''
assignment = self.bbox_util.assign_boxes(y,anchors)
'''
    def assign_boxes(self, boxes, anchors):
        self.num_priors = len(anchors)
        self.priors = anchors
        assignment = np.zeros((self.num_priors, 4 + 1))

        assignment[:, 4] = 0.0
        if len(boxes) == 0:
            return assignment
        
        # 1. 确忽略的框  
        # 对每一个真实框都进行iou计算
        ingored_boxes = np.apply_along_axis(self.ignore_box, 1, boxes[:, :4])
        # 取重合程度最大的先验框,并且获取这个先验框的index
        ingored_boxes = ingored_boxes.reshape(-1, self.num_priors, 1)
        # (num_priors)
        ignore_iou = ingored_boxes[:, :, 0].max(axis=0)
        # (num_priors)
        ignore_iou_mask = ignore_iou > 0

        assignment[:, 4][ignore_iou_mask] = -1

        # 2. 找出正样本,并使符合要求每一个先验框只负责一个真实框。
        # (n, num_priors, 5)
        encoded_boxes = np.apply_along_axis(self.encode_box, 1, boxes[:, :4])
        # 每一个真实框的编码后的值,和iou
        # (n, num_priors)
        encoded_boxes = encoded_boxes.reshape(-1, self.num_priors, 5)

        # 取重合程度最大的先验框,并且获取这个先验框的index
        # (num_priors)
        best_iou = encoded_boxes[:, :, -1].max(axis=0)
        # (num_priors)
        best_iou_idx = encoded_boxes[:, :, -1].argmax(axis=0)
        # (num_priors)
        best_iou_mask = best_iou > 0
        # 某个先验框它属于哪个真实框
        best_iou_idx = best_iou_idx[best_iou_mask]

        assign_num = len(best_iou_idx)
        # 保留重合程度最大的先验框的应该有的预测结果
        # 哪些先验框存在真实框
        encoded_boxes = encoded_boxes[:, best_iou_mask, :]

        assignment[:, :4][best_iou_mask] = encoded_boxes[best_iou_idx,np.arange(assign_num),:4]
        # 4代表为背景的概率,为0
        assignment[:, 4][best_iou_mask] = 1
        # 通过assign_boxes我们就获得了,输入进来的这张图片,应该有的预测结果是什么样子的
        return assignment

(4) 确定正负样本

  1. 正负样本共256个。
  2. 如果正样本数量超过128个,从所有正样本中抽取(num_pos - 128/2)。
  3. 如果所有负样本量与正样本之和大于256,从所有负样本中抽取(num_neg - num_pos)个样本作为负样本。
num_regions = 256
                
classification = assignment[: , 4]
regression = assignment[:,:]

mask_pos = classification[:] > 0
num_pos = len(classification[mask_pos])
if num_pos > num_regions/2:
    val_locs = random.sample(range(num_pos), int(num_pos - num_regions/2))
    classification[mask_pos][val_locs] = -1
    regression[mask_pos][val_locs,-1] = -1

mask_neg = classification[:] == 0
num_neg = len(classification[mask_neg])
if len(classification[mask_neg]) + num_pos > num_regions:
    val_locs = random.sample(range(num_neg), int(num_neg - num_pos))
    classification[mask_neg][val_locs] = -1
    
classification = np.reshape(classification,[-1,1])
regression = np.reshape(regression,[-1,5])

tmp_inp = np.array(img)
tmp_targets = [np.expand_dims(np.array(classification,dtype=np.float32),0),np.expand_dims(np.array(regression,dtype=np.float32),0)]

3.3 损失函数

(1) smooth_l1损失函数

  1. 找到正样本
  2. 代入公式
'''
smooth_l1()
'''
def smooth_l1(sigma=1.0):
    sigma_squared = sigma ** 2

    def _smooth_l1(y_true, y_pred):
        # y_true [batch_size, num_anchor, 4+1]
        # y_pred [batch_size, num_anchor, 4]
        regression        = y_pred
        regression_target = y_true[:, :, :-1]
        anchor_state      = y_true[:, :, -1]

        # 找到正样本
        indices           = tf.where(keras.backend.equal(anchor_state, 1))
        regression        = tf.gather_nd(regression, indices)
        regression_target = tf.gather_nd(regression_target, indices)

        # 计算 smooth L1 loss
        # f(x) = 0.5 * (sigma * x)^2          if |x| < 1 / sigma / sigma
        #        |x| - 0.5 / sigma / sigma    otherwise
        regression_diff = regression - regression_target
        regression_diff = keras.backend.abs(regression_diff)
        regression_loss = tf.where(
            keras.backend.less(regression_diff, 1.0 / sigma_squared),
            0.5 * sigma_squared * keras.backend.pow(regression_diff, 2),
            regression_diff - 0.5 / sigma_squared
        )

        normalizer = keras.backend.maximum(1, keras.backend.shape(indices)[0])
        normalizer = keras.backend.cast(normalizer, dtype=keras.backend.floatx())
        loss = keras.backend.sum(regression_loss) / normalizer

        return loss

    return _smooth_l1

(2) cls_loss()损失函数

  1. 找出存在目标的先验框。
  2. 找出实际上为背景的先验框。
  3. 分别计算正负样本的交叉熵。
'''
def cls_loss(ratio=3):
    def _cls_loss(y_true, y_pred):
        # y_true [batch_size, num_anchor, num_classes+1]
        # y_pred [batch_size, num_anchor, num_classes]
        labels         = y_true
        anchor_state   = y_true[:,:,-1] # -1 是需要忽略的, 0 是背景, 1 是存在目标
        classification = y_pred

        # 找出存在目标的先验框
        indices_for_object        = tf.where(keras.backend.equal(anchor_state, 1))
        labels_for_object         = tf.gather_nd(labels, indices_for_object)
        classification_for_object = tf.gather_nd(classification, indices_for_object)

        cls_loss_for_object = keras.backend.binary_crossentropy(labels_for_object, classification_for_object)

        # 找出实际上为背景的先验框
        indices_for_back        = tf.where(keras.backend.equal(anchor_state, 0))
        labels_for_back         = tf.gather_nd(labels, indices_for_back)
        classification_for_back = tf.gather_nd(classification, indices_for_back)

        # 计算每一个先验框应该有的权重
        cls_loss_for_back = keras.backend.binary_crossentropy(labels_for_back, classification_for_back)

        # 标准化,实际上是正样本的数量
        normalizer_pos = tf.where(keras.backend.equal(anchor_state, 1))
        normalizer_pos = keras.backend.cast(keras.backend.shape(normalizer_pos)[0], keras.backend.floatx())
        normalizer_pos = keras.backend.maximum(keras.backend.cast_to_floatx(1.0), normalizer_pos)

        normalizer_neg = tf.where(keras.backend.equal(anchor_state, 0))
        normalizer_neg = keras.backend.cast(keras.backend.shape(normalizer_neg)[0], keras.backend.floatx())
        normalizer_neg = keras.backend.maximum(keras.backend.cast_to_floatx(1.0), normalizer_neg)
        
        # 将所获得的loss除上正样本的数量
        cls_loss_for_object = keras.backend.sum(cls_loss_for_object)/normalizer_pos
        cls_loss_for_back = ratio*keras.backend.sum(cls_loss_for_back)/normalizer_neg

        # 总的loss
        loss = cls_loss_for_object + cls_loss_for_back

        return loss
    return _cls_loss
'''

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

FasterRCNN详解 的相关文章

随机推荐

  • Android中收听特定应用安装成功的广播

    一 manifest的配置
  • C语言当中的分段函数求解

    对于学习C语言的小白来说 经常会遇到求解分段函数的问题 下面是小编写的一段求解分段函数的代码 希望对于初学的你们有所帮助 代码求解的分段函数为 y f x 4x 3 x lt 15 y f x 2 5x 10 5 x gt 15 inclu
  • 网络编程--TCP/IP协议(一)

    目录 前言 一丶网络基础 lt 1 gt 认识IP地址 1 gt 具体格式 2 gt 组成 3 gt 分类 4 gt 子网掩码 lt 2 gt 认识MAC地址 二丶网络设备及相关技术 lt 1 gt 物理层 集线器 lt 2 gt 数据链路
  • C++:实现计算贷款支付额问题

    计算月支付额的公式如下 月支付额 贷款总额 月利率 1 1 1 月利率 年数 12 include
  • win10使用腾讯会议软件没声音怎么解决

    购买了一个USB接口的带微麦网络摄像头 在win10电脑上打开腾讯会议软件进入会议后讲话始终没有声音输出 经尝试 排除麦克风和声卡音箱硬件和驱动的原因之后 Win10操作系统上腾讯会议麦克风能接收到输入但没有声音播出 一般会是麦克风设置和w
  • 什么是数据产品,如何设计一款好用的数据产品

    何为数据产品 从广义上讲 一切以数据作为驱动或者核心的都叫数据产品 例如数据报表平台 DMP 搜索与精准化产品 风控产品等等 从狭义上讲 就是公司的内部数据平台 今天和大家讨论的 主要是关于公司的内部数据平台的搭建 公司的内部数据平台 主要
  • 【满分】【华为OD机试真题2023B卷 JAVA&JS】计算最大乘积

    华为OD2023 B卷 机试题库全覆盖 刷题指南点这里 计算最大乘积 知识点字符串位运算 时间限制 1s 空间限制 32MB 限定语言 不限 题目描述 给定一个元素类型为小写字符串的数组 请计算两个没有相同字符的元素 长度乘积的最大值 如果
  • GAN网络系列博客(一):用于人脸生成的StyleGAN结构

    目录 简介 基于风格的生成器 生成器的性质 总结 Reference 在未来的一段时间 我会开一个小专题 来介绍下GAN网络的一些经典论文 希望对那些想要入坑的同学提供一点点帮助 考虑到StyleGAN系列论文在相关领域的影响力 我们首先来
  • QT 多线程实现方式

    前因 当调用QApplication exec 时 就启动了QT的事件循环 在开始的时候QT会发出一些事件命令来显示和绘制窗口部件 在这之后 事件循环就开始运行 它不断检查是否有事件发生并且把这些事件发生给应用程序的QObject 当处理一
  • 微信小程序使用van-tabs组件,ios真机z-index层级错乱问题【已解决,ios自定义组件层级不穿透】

    一 这是模拟器上的效果 二 这是苹果11真机上的效果 安卓真机正常 三 先来理一下代码的层级现状 A 为van tabs B 是自定义组件 为数据列表 C为单个数据 D 也是自定义组件 图中省略 为单个数据详情弹窗 且D是B的子组件 在z
  • 题目0158-快递业务站

    快递业务站 题目描述 快递业务范围有 N 个站点 A 站点与 B 站点可以中转快递 则认为 A B 站可达 如果 A B 可达 B C 可达 则 A C 可达 现在给 N 个站点编号 0 1 n 1 用 s i j 表示 i j 是否可达
  • 信任的机制——区块链

    区块链是一个从顶向下实现的一项技术 是可以设计 可以编程的 区块链是一个信任的机器 是在完全不信任的节点之间建立信任机制的技术 是利用互联网传递价值的一种价值网络 这是一个把时间当朋友的技术 区块链在应用的过程中通过自身的设计 解决的问题
  • SAX解析xml

    第一步 创建xml
  • 【死磕 Java 基础】 — 你以为异常就是 try…catch ?那你天真了

    大家好 我是大明哥 个人网站 https www cmsblogs com 前言 我敢说对于很多小伙伴来说 他们以为在 Java 中异常就是 try catch 稍微有点儿意识的还会用下 throw new Exception 真的有这么简
  • osg的ref_ptr和observer_ptr

    ref ptr就是所谓的强指针类型 observer ptr是所谓的弱指针类型 需要注意的是他们都是类 而不是指针 只不过他们用于管理指针 1 如何实现自动内存管理 所谓自动内存管理就是只管对象或指针的创建和使用而不管销毁 实现自动内存管理
  • 关于ORACLE清理表空间总结

    1 查看索引和表的初始大小 SELECT bytes 1024 1024 M TABLE SIZE u FROM dba SEGMENTS U WHERE U owner IN CLEAR TRADE INTERFACE SECURITY
  • 【THOI 2012】 水位

    A1363 水位 思路题 做这道题的时候如果思路清晰的话 就是一道简单的乘法原理 高精度题 按照原始高度升序排序 最开始 所有点各自属于一个连通块 按照高度顺序 从最低的开始合并连通块 假设当前处理到 l r l r这个区间 他们的高度都是
  • Unity实现用WASD控制一个物体前后左右移动-小白课程01

    1 根据业务逻辑搭建场景 02 根据业务写代码 using System Collections using System Collections Generic using UnityEngine 实现让被挂在的物体往前移动 按下W键往前
  • git submodule的使用

    开发过程中 经常会有一些通用的部分希望抽取出来做成一个公共库来提供给别的工程来使用 而公共代码库的版本管理是个麻烦的事情 今天无意中发现了git的git submodule命令 之前的问题迎刃而解了 添加 为当前工程添加submodule
  • FasterRCNN详解

    FasterRCNN详解 1 2 2 FasterRCNN 1 模型 1 1 主干网络VGG16 or ResNet50 1 2 RPN生成建议框 1 3 RCNN进行分类和回归 2 预测 2 1 预测流程 3 训练 3 1 训练流程 3