图像语义分割 DEEPLAB V3+的代码走读

2023-05-16

概述

GITHUB路径：
https://github.com/tensorflow/models/tree/master/research/deeplab
论文：
https://arxiv.org/abs/1802.02611
相对于DEEPLAB V3，有如下的修改：
1.网络层使用XCEPTION替换RESNET，性能更好
2.加入ENCODE-DECODE模块。
训练建议的参数：

python deeplab/train.py \
    --logtostderr \
    --training_number_of_steps=30000 \
    --train_split="train" \
    --model_variant="xception_65" \
    --atrous_rates=6 \
    --atrous_rates=12 \
    --atrous_rates=18 \
    --output_stride=16 \
    --decoder_output_stride=4 \
    --train_crop_size=513 \
    --train_crop_size=513 \
    --train_batch_size=1 \
    --dataset="pascal_voc_seg" \
    --tf_initial_checkpoint=${PATH_TO_INITIAL_CHECKPOINT} \
    --train_logdir=${PATH_TO_TRAIN_DIR} \
    --dataset_dir=${PATH_TO_DATASET}

网络

网络整体框架：
这里写图片描述
说实话，封装的过于完美，导致很难看懂。。。
调用链：
train.py:
_build_deeplab->model.multi_scale_logits
model.py:
multi_scale_logits->_get_logits->extract_features->feature_extractor.extract_features
feature_extractor.py:
extract_features->get_network

基础网络

基础网络层在CORE目录里实现。
这里主要看一下XCEPTION-65的实现和XCEPTION的区别。
先上图：
这是XCEPTION模型：
这里写图片描述
这是DEEPLAB使用的XCEPTION模型：

模型基础配置在：
xception.py的函数xception_65中实现。
但是有很多的细节。
除了基础模型上的改动，DEEPLAB在V2之后的核心思想是空洞卷积和ASPP。
空洞卷积以及网络的STRIDE是在：
xception.py的stack_blocks_dense函数中实现

  # The current_stride variable keeps track of the effective stride of the
  # activations. This allows us to invoke atrous convolution whenever applying
  # the next residual unit would result in the activations having stride larger
  # than the target output_stride.
  current_stride = 1

  # The atrous convolution rate parameter.
  rate = 1

  for block in blocks:
    with tf.variable_scope(block.scope, 'block', [net]) as sc:
      for i, unit in enumerate(block.args):
        if output_stride is not None and current_stride > output_stride:
          raise ValueError('The target output_stride cannot be reached.')
        with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
          # If we have reached the target output_stride, then we need to employ
          # atrous convolution with stride=1 and multiply the atrous rate by the
          # current unit's stride for use in subsequent layers.
          if output_stride is not None and current_stride == output_stride:
            net = block.unit_fn(net, rate=rate, **dict(unit, stride=1))
            rate *= unit.get('stride', 1)
          else:
            net = block.unit_fn(net, rate=1, **unit)
            current_stride *= unit.get('stride', 1)

      # Collect activations at the block's end before performing subsampling.
      net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net)

ASPP

DEEPLAB V3+的ASPP代码在：
model.py的extract_features函数中实现。
从网络整体框架可以看到：
需要对基础的网络的FEATURE做相应的处理整合。
ASPP默认也是使用separable_conv做ASPP。

        # Merge branch logits.
        concat_logits = tf.concat(branch_logits, 3)
        concat_logits = slim.conv2d(
            concat_logits, depth, 1, scope=CONCAT_PROJECTION_SCOPE)
        concat_logits = slim.dropout(
            concat_logits,
            keep_prob=0.9,
            is_training=is_training,
            scope=CONCAT_PROJECTION_SCOPE + '_dropout')

最后做一层1*1卷积加一层DROPOUT，ASPP所有网络的深度都是depth = 256。

DECODER

回到model.py的_get_logits函数

  if model_options.decoder_output_stride is not None:
    if model_options.crop_size is None:
      height = tf.shape(images)[1]
      width = tf.shape(images)[2]
    else:
      height, width = model_options.crop_size
    decoder_height = scale_dimension(height,
                                     1.0 / model_options.decoder_output_stride)
    decoder_width = scale_dimension(width,
                                    1.0 / model_options.decoder_output_stride)
    features = refine_by_decoder(
        features,
        end_points,
        decoder_height=decoder_height,
        decoder_width=decoder_width,
        decoder_use_separable_conv=model_options.decoder_use_separable_conv,
        model_variant=model_options.model_variant,
        weight_decay=weight_decay,
        reuse=reuse,
        is_training=is_training,
        fine_tune_batch_norm=fine_tune_batch_norm)

看一下
decoder的节点定义

    'xception_65': {
        DECODER_END_POINTS: [
            'entry_flow/block2/unit_1/xception_module/'
            'separable_conv2_pointwise',
        ],

entry_flow/block2/unit_1/xception_module/separable_conv2_pointwise
对应的结构：
xception_block(‘entry_flow/block2’,
depth_list=[256, 256, 256],
skip_connection_type=’conv’,
activation_fn_in_separable_conv=False,
regularize_depthwise=regularize_depthwise,
num_units=1,
stride=2),
LOW-LEVEL FEATURES使用的是entry_flow的第二层特征。

使用如下的代码，做低层特征的1*1卷积

  for i, name in enumerate(feature_list):
            decoder_features_list = [decoder_features]

            # MobileNet variants use different naming convention.
            if 'mobilenet' in model_variant:
              feature_name = name
            else:
              feature_name = '{}/{}'.format(
                  feature_extractor.name_scope[model_variant], name)
            decoder_features_list.append(
                slim.conv2d(
                    end_points[feature_name],
                    48,
                    1,
                    scope='feature_projection' + str(i)))

使用下面的代码做基础网络的4*UNSAMPLE

            # Resize to decoder_height/decoder_width.
            for j, feature in enumerate(decoder_features_list):
              decoder_features_list[j] = tf.image.resize_bilinear(
                  feature, [decoder_height, decoder_width], align_corners=True)
              h = (None if isinstance(decoder_height, tf.Tensor)
                   else decoder_height)
              w = (None if isinstance(decoder_width, tf.Tensor)
                   else decoder_width)
              decoder_features_list[j].set_shape([None, h, w, None])

最后与论文中的图稍有不同的是默认decoder_use_separable_conv=TRUE，没有使用3*3的卷积，而是二层的split_separable_conv2d

This operation differs from tf.layers.separable_conv2d as this operation
applies activation function between depthwise and pointwise conv2d.

decoder_depth = 256
            if decoder_use_separable_conv:
              decoder_features = split_separable_conv2d(
                  tf.concat(decoder_features_list, 3),
                  filters=decoder_depth,
                  rate=1,
                  weight_decay=weight_decay,
                  scope='decoder_conv0')
              decoder_features = split_separable_conv2d(
                  decoder_features,
                  filters=decoder_depth,
                  rate=1,
                  weight_decay=weight_decay,
                  scope='decoder_conv1')
            else:
              num_convs = 2
              decoder_features = slim.repeat(
                  tf.concat(decoder_features_list, 3),
                  num_convs,
                  slim.conv2d,
                  decoder_depth,
                  3,
                  scope='decoder_conv' + str(i))

最后回到
model.py的multi_scale_logits函数

    # Resize the logits to have the same dimension before merging.
    for output in sorted(outputs_to_logits):
      outputs_to_logits[output] = tf.image.resize_bilinear(
          outputs_to_logits[output], [logits_height, logits_width],
          align_corners=True)

返回4*的UNSAMPLE结果。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)