MMdetection系列之Config配置文件(V3更新后)

2023-11-07

1、了解配置

MMDetection 和其他 OpenMMLab 存储库使用MMEngine 的配置系统。模块化、继承性设计,便于进行各种实验。

2、配置文件的查看

在配置系统中采用模块化和继承性设计,便于进行各种实验。如果你想查看配置文件, 你可以运行

python tools/misc/print_config.py /PATH/TO/CONFIG to see the complete config.

3、配置文件案例

在 MMDetection 的配置中,我们model用来设置检测算法组件。神经网络除了 backbone,neck等组件外,它还需要data_preprocessor、train_cfg和test_cfg。data_preprocessor负责处理dataloader输出的一批数据。train_cfg,并且test_cfg在模型配置中用于训练和测试组件的超参数。

1、Mask CNN的例子

为了帮助用户对现代检测系统的完整配置和模块有一个基本的概念,我们使用ResNet50和FPN对掩码mask cnn的配置做简要的描述如下。要了解每个模块的更详细用法和相应的替代方法,请参阅API文档。

1、mmdetection v2.0版本配置文件

model = dict(
    type='MaskRCNN',  # The name of detector, 检测器的名字
    backbone=dict(  # The config of backbone,主干网络的配置信息
        type='ResNet',  # The type of the backbone, refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/backbones/resnet.py#L308 for more details.(主干网络的类型)
        depth=50,  # The depth of backbone, usually it is 50 or 101 for ResNet and ResNext backbones.(骨干网的深度,ResNet和ResNext通常是50或101)
        num_stages=4,  # Number of stages of the backbone.主干的阶段数
        out_indices=(0, 1, 2, 3),  # The index of output feature maps produced in each stages,每个阶段产生的输出特征图的索引
        frozen_stages=1,  # The weights in the first 1 stage are fronzen,对第1阶段的权重进行 fronzen
        norm_cfg=dict(  # The config of normalization layers.规范化层的配置
            type='BN',  # Type of norm layer, usually it is BN or GN,规范化层的类型,通常是BN或GN
            requires_grad=True),  # Whether to train the gamma and beta in BN,是否在BN中训练gamma和beta
        norm_eval=True,  # Whether to freeze the statistics in BN,是否冻结BN中的统计数据
        style='pytorch'# The style of backbone, 'pytorch' means that stride 2 layers are in 3x3 conv, 'caffe' means stride 2 layers are in 1x1 convs.
    	init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),  # The ImageNet pretrained backbone to be loaded,ImageNet预训练的主干网将被加载
    neck=dict(
        type='FPN',  # The neck of detector is FPN. We also support 'NASFPN', 'PAFPN', etc. Refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/necks/fpn.py#L10 for more details.探测器的颈部是FPN,我们还支持'NASFPN', 'PAFPN'等
        in_channels=[256, 512, 1024, 2048],  # The input channels, this is consistent with the output channels of backbone,输入通道,这与骨干的输出通道是一致的
        out_channels=256,  # The output channels of each level of the pyramid feature map,金字塔特征图每一层的输出通道
        num_outs=5),  # The number of output scales,输出缩放数量
    rpn_head=dict(
        type='RPNHead',  # The type of RPN head is 'RPNHead', we also support 'GARPNHead', etc. Refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/dense_heads/rpn_head.py#L12 for more details.RPN头的类型是“RPNHead”,我们也支持“GARPNHead”等
        in_channels=256,  # The input channels of each input feature map, this is consistent with the output channels of neck,每个输入特征图的输入通道,这与颈部的输出通道是一致的
        feat_channels=256,  # Feature channels of convolutional layers in the head.头部的卷积层的特征通道
        anchor_generator=dict(  # The config of anchor generator
            type='AnchorGenerator',  # Most of methods use AnchorGenerator, SSD Detectors uses `SSDAnchorGenerator`. Refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/anchor/anchor_generator.py#L10 for more details,大多数方法使用锚定生成器,SSD检测器使用' SSDAnchorGenerator '
            scales=[8],  # Basic scale of the anchor, the area of the anchor in one position of a feature map will be scale * base_sizes,锚点的基本比例尺,特征图中一个位置上锚点的面积为scale * base_sizes
            ratios=[0.5, 1.0, 2.0],  # The ratio between height and width.高宽之比
            strides=[4, 8, 16, 32, 64]),  # The strides of the anchor generator. This is consistent with the FPN feature strides. The strides will be taken as base_sizes if base_sizes is not set.锚发生器的跨距。这与FPN的特性步调一致。如果没有设置base_sizes,则步长将作为base_size。
        bbox_coder=dict(  # Config of box coder to encode and decode the boxes during training and testing,配置盒子编码器,在训练和测试期间对盒子进行编码和解码
            type='DeltaXYWHBBoxCoder',  # Type of box coder. 'DeltaXYWHBBoxCoder' is applied for most of methods. 盒式编码器的类型。'DeltaXYWHBBoxCoder'应用于大多数方法。Refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/bbox/coder/delta_xywh_bbox_coder.py#L9 for more details.
            target_means=[0.0, 0.0, 0.0, 0.0],  # The target means used to encode and decode boxes,目标用于编码和解码盒子
            target_stds=[1.0, 1.0, 1.0, 1.0]),  # The standard variance used to encode and decode boxes,用于编码和解码方框的标准方差
        loss_cls=dict(  # Config of loss function for the classification branch,分类分支损失函数的配置
            type='CrossEntropyLoss',  # Type of loss for classification branch, we also support FocalLoss etc.损失分类分支类型的,我们也支持FocalLoss等。
            use_sigmoid=True,  # RPN usually perform two-class classification, so it usually uses sigmoid function.RPN通常执行两类分类,因此通常使用sigmoid函数。
            loss_weight=1.0),  # Loss weight of the classification branch.分类分支减重
        loss_bbox=dict(  # Config of loss function for the regression branch.
            type='L1Loss',  # Type of loss, we also support many IoU Losses and smooth L1-loss, etc.损失类型,我们还支持许多IoU损失和平滑l1损失等。 Refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/losses/smooth_l1_loss.py#L56 for implementation.
            loss_weight=1.0)),  # Loss weight of the regression branch.
    roi_head=dict(  # RoIHead encapsulates the second stage of two-stage/cascade detectors.RoIHead封装了二级/级联检测器的第二级。
        type='StandardRoIHead',  # Type of the RoI head. Refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/roi_heads/standard_roi_head.py#L10 for implementation.
        bbox_roi_extractor=dict(  # RoI feature extractor for bbox regression.RoI特征提取器用于bbox回归
            type='SingleRoIExtractor',  # Type of the RoI feature extractor, most of methods uses SingleRoIExtractor. Refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/roi_heads/roi_extractors/single_level.py#L10 for details.
            roi_layer=dict(  # Config of RoI Layer
                type='RoIAlign',  # Type of RoI Layer, DeformRoIPoolingPack and ModulatedDeformRoIPoolingPack are also supported.还支持“RoI层类型”、“DeformRoIPoolingPack”和“ModulatedDeformRoIPoolingPack” Refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/ops/roi_align/roi_align.py#L79 for details.
                output_size=7,  # The output size of feature maps.特征映射的输出大小
                sampling_ratio=0),  # Sampling ratio when extracting the RoI features. 0 means adaptive ratio.提取RoI特征时的采样比。0表示自适应比。
            out_channels=256,  # output channels of the extracted feature.所提取特征的输出通道。
            featmap_strides=[4, 8, 16, 32]),  # Strides of multi-scale feature maps. It should be consistent to the architecture of the backbone.跨尺度特征地图。它应该与主干网的体系结构一致。
        bbox_head=dict(  # Config of box head in the RoIHead.
            type='Shared2FCBBoxHead',  # Type of the bbox head, Refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/roi_heads/bbox_heads/convfc_bbox_head.py#L177 for implementation details.
            in_channels=256,  # Input channels for bbox head. This is consistent with the out_channels in roi_extractor
            fc_out_channels=1024,  # Output feature channels of FC layers.FC层输出特征通道
            roi_feat_size=7,  # Size of RoI features,RoI特征大小
            num_classes=80,  # Number of classes for classification,分类的类数
            bbox_coder=dict(  # Box coder used in the second stage.
                type='DeltaXYWHBBoxCoder',  # Type of box coder. 'DeltaXYWHBBoxCoder' is applied for most of methods.
                target_means=[0.0, 0.0, 0.0, 0.0],  # Means used to encode and decode box
                target_stds=[0.1, 0.1, 0.2, 0.2]),  # Standard variance for encoding and decoding. It is smaller since the boxes are more accurate. [0.1, 0.1, 0.2, 0.2] is a conventional setting.编码和解码的标准方差。它更小,因为盒子更精确。[0.1, 0.1, 0.2, 0.2]是常规设置。
            reg_class_agnostic=False,  # Whether the regression is class agnostic.回归是否与类别无关
            loss_cls=dict(  # Config of loss function for the classification branch
                type='CrossEntropyLoss',  # Type of loss for classification branch, we also support FocalLoss etc.
                use_sigmoid=False,  # Whether to use sigmoid.
                loss_weight=1.0),  # Loss weight of the classification branch.
            loss_bbox=dict(  # Config of loss function for the regression branch.
                type='L1Loss',  # Type of loss, we also support many IoU Losses and smooth L1-loss, etc.
                loss_weight=1.0)),  # Loss weight of the regression branch.
        mask_roi_extractor=dict(  # RoI feature extractor for mask generation.用于掩膜生成的RoI特征提取器
            type='SingleRoIExtractor',  # Type of the RoI feature extractor, most of methods uses SingleRoIExtractor.
            roi_layer=dict(  # Config of RoI Layer that extracts features for instance segmentation,用于实例分割的提取特征的RoI层配置
                type='RoIAlign',  # Type of RoI Layer, DeformRoIPoolingPack and ModulatedDeformRoIPoolingPack are also supported,还支持“RoI层类型”、“DeformRoIPoolingPack”和“ModulatedDeformRoIPoolingPack”
                output_size=14,  # The output size of feature maps.
                sampling_ratio=0),  # Sampling ratio when extracting the RoI features.
            out_channels=256,  # Output channels of the extracted feature.
            featmap_strides=[4, 8, 16, 32]),  # Strides of multi-scale feature maps.
        mask_head=dict(  # Mask prediction head
            type='FCNMaskHead',  # Type of mask head, refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/roi_heads/mask_heads/fcn_mask_head.py#L21 for implementation details.
            num_convs=4,  # Number of convolutional layers in mask head.掩模头的卷积层数
            in_channels=256,  # Input channels, should be consistent with the output channels of mask roi extractor.
            conv_out_channels=256,  # Output channels of the convolutional layer.
            num_classes=80,  # Number of class to be segmented.要分段的类数。
            loss_mask=dict(  # Config of loss function for the mask branch.
                type='CrossEntropyLoss',  # Type of loss used for segmentation,用于分段的损耗类型
                use_mask=True,  # Whether to only train the mask in the correct class.
                loss_weight=1.0))))  # Loss weight of mask branch.
    train_cfg = dict(  # Config of training hyperparameters for rpn and rcnn,rpn和rcnn的训练超参数配置
        rpn=dict(  # Training config of rpn
            assigner=dict(  # Config of assigner
                type='MaxIoUAssigner',  # Type of assigner, MaxIoUAssigner is used for many common detectors.分配器的类型,MaxIoUAssigner用于许多常见的检测器 Refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/bbox/assigners/max_iou_assigner.py#L10 for more details.
                pos_iou_thr=0.7,  # IoU >= threshold 0.7 will be taken as positive samples,IoU &阈值 >=  0.7将作为阳性样本
                neg_iou_thr=0.3,  # IoU < threshold 0.3 will be taken as negative samples
                min_pos_iou=0.3,  # The minimal IoU threshold to take boxes as positive samples
                match_low_quality=True,  # Whether to match the boxes under low quality (see API doc for more details).低质量下是否匹配箱子
                ignore_iof_thr=-1),  # IoF threshold for ignoring bboxes,忽略盒子的IoF阈值
            sampler=dict(  # Config of positive/negative sampler
                type='RandomSampler',  # Type of sampler, PseudoSampler and other samplers are also supported.类型采样器,PseudoSampler和其他采样器也支持 Refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/bbox/samplers/random_sampler.py#L8 for implementation details.
                num=256,  # Number of samples
                pos_fraction=0.5,  # The ratio of positive samples in the total samples.阳性样本占总样本的比例
                neg_pos_ub=-1,  # The upper bound of negative samples based on the number of positive samples.基于正样本数的负样本的上限
                add_gt_as_proposals=False),  # Whether add GT as proposals after sampling.取样后是否添加GT作为建议
            allowed_border=-1,  # The border allowed after padding for valid anchors.在填充有效锚后允许的边界。
            pos_weight=-1,  # The weight of positive samples during training.
            debug=False),  # Whether to set the debug mode
        rpn_proposal=dict(  # The config to generate proposals during training
            nms_across_levels=False,  # Whether to do NMS for boxes across levels. Only work in `GARPNHead`, naive rpn does not support do nms cross levels.是否跨级别对盒子进行NMS。只工作在“GARPNHead”,幼稚的rpn不支持做nms跨级别
            nms_pre=2000,  # The number of boxes before NMS, NMS前的盒数
            nms_post=1000,  # The number of boxes to be kept by NMS, Only work in `GARPNHead`.
            max_per_img=1000,  # The number of boxes to be kept after NMS.
            nms=dict( # Config of NMS
                type='nms',  # Type of NMS
                iou_threshold=0.7 # NMS threshold
                ),
            min_bbox_size=0),  # The allowed minimal box size,允许的最小框大小
        rcnn=dict(  # The config for the roi heads.
            assigner=dict(  # Config of assigner for second stage, this is different for that in rpn
                type='MaxIoUAssigner',  # Type of assigner, MaxIoUAssigner is used for all roi_heads for now. Refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/bbox/assigners/max_iou_assigner.py#L10 for more details.
                pos_iou_thr=0.5,  # IoU >= threshold 0.5 will be taken as positive samples
                neg_iou_thr=0.5,  # IoU < threshold 0.5 will be taken as negative samples
                min_pos_iou=0.5,  # The minimal IoU threshold to take boxes as positive samples
                match_low_quality=False,  # Whether to match the boxes under low quality (see API doc for more details).
                ignore_iof_thr=-1),  # IoF threshold for ignoring bboxes
            sampler=dict(
                type='RandomSampler',  # Type of sampler, PseudoSampler and other samplers are also supported. Refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/bbox/samplers/random_sampler.py#L8 for implementation details.
                num=512,  # Number of samples
                pos_fraction=0.25,  # The ratio of positive samples in the total samples.
                neg_pos_ub=-1,  # The upper bound of negative samples based on the number of positive samples.
                add_gt_as_proposals=True
            ),  # Whether add GT as proposals after sampling.
            mask_size=28,  # Size of mask
            pos_weight=-1,  # The weight of positive samples during training.
            debug=False))  # Whether to set the debug mode
    test_cfg = dict(  # Config for testing hyperparameters for rpn and rcnn
        rpn=dict(  # The config to generate proposals during testing
            nms_across_levels=False,  # Whether to do NMS for boxes across levels. Only work in `GARPNHead`, naive rpn does not support do nms cross levels.
            nms_pre=1000,  # The number of boxes before NMS
            nms_post=1000,  # The number of boxes to be kept by NMS, Only work in `GARPNHead`.
            max_per_img=1000,  # The number of boxes to be kept after NMS.
            nms=dict( # Config of NMS
                type='nms',  #Type of NMS
                iou_threshold=0.7 # NMS threshold
                ),
            min_bbox_size=0),  # The allowed minimal box size
        rcnn=dict(  # The config for the roi heads.
            score_thr=0.05,  # Threshold to filter out boxes
            nms=dict(  # Config of NMS in the second stage
                type='nms',  # Type of NMS
                iou_thr=0.5),  # NMS threshold
            max_per_img=100,  # Max number of detections of each image
            mask_thr_binary=0.5))  # Threshold of mask prediction
dataset_type = 'CocoDataset'  # Dataset type, this will be used to define the dataset
data_root = 'data/coco/'  # Root path of data
img_norm_cfg = dict(  # Image normalization config to normalize the input images,图像规范化配置来规范化输入图像
    mean=[123.675, 116.28, 103.53],  # Mean values used to pre-training the pre-trained backbone models
    std=[58.395, 57.12, 57.375],  # Standard variance used to pre-training the pre-trained backbone models
    to_rgb=True
)  # The channel orders of image used to pre-training the pre-trained backbone models,图像的通道顺序用于对经过预训练的骨干模型进行预训练
train_pipeline = [  # Training pipeline
    dict(type='LoadImageFromFile'),  # First pipeline to load images from file path
    dict(
        type='LoadAnnotations',  # Second pipeline to load annotations for current image
        with_bbox=True,  # Whether to use bounding box, True for detection,是否使用包围框,True为检测
        with_mask=True,  # Whether to use instance mask, True for instance segmentation
        poly2mask=False),  # Whether to convert the polygon mask to instance mask, set False for acceleration and to save memory,是否将多边形掩码转换为实例掩码,加速设置为False,节省内存
    dict(
        type='Resize',  # Augmentation pipeline that resize the images and their annotations,调整图像及其注释大小的增强管道
        img_scale=(1333, 800),  # The largest scale of image,最大尺度的图像
        keep_ratio=True
    ),  # whether to keep the ratio between height and width.是否保持高宽比
    dict(
        type='RandomFlip',  # Augmentation pipeline that flip the images and their annotations,翻转图像及其注释的增强管道
        flip_ratio=0.5),  # The ratio or probability to flip
    dict(
        type='Normalize',  # Augmentation pipeline that normalize the input images
        mean=[123.675, 116.28, 103.53],  # These keys are the same of img_norm_cfg since the
        std=[58.395, 57.12, 57.375],  # keys of img_norm_cfg are used here as arguments
        to_rgb=True),
    dict(
        type='Pad',  # Padding config
        size_divisor=32),  # The number the padded images should be divisible,填充图像的数字应该是可分割的
    dict(type='DefaultFormatBundle'),  # Default format bundle to gather data in the pipeline,用于收集管道中的数据的默认格式bundle
    dict(
        type='Collect',  # Pipeline that decides which keys in the data should be passed to the detector,决定数据中的哪些键应该传递给检测器的管道
        keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),  # First pipeline to load images from file path
    dict(
        type='MultiScaleFlipAug',  # An encapsulation that encapsulates the testing augmentations
        img_scale=(1333, 800),  # Decides the largest scale for testing, used for the Resize pipeline
        flip=False,  # Whether to flip images during testing
        transforms=[
            dict(type='Resize',  # Use resize augmentation
                 keep_ratio=True),  # Whether to keep the ratio between height and width, the img_scale set here will be suppressed by the img_scale set above.
            dict(type='RandomFlip'),  # Thought RandomFlip is added in pipeline, it is not used because flip=False
            dict(
                type='Normalize',  # Normalization config, the values are from img_norm_cfg
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(
                type='Pad',  # Padding config to pad images divisible by 32.
                size_divisor=32),
            dict(
                type='ImageToTensor',  # convert image to tensor
                keys=['img']),
            dict(
                type='Collect',  # Collect pipeline that collect necessary keys for testing.
                keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,  # Batch size of a single GPU,单个GPU的批量大小
    workers_per_gpu=2,  # Worker to pre-fetch data for each single GPU,Worker为每个GPU预取数据
    train=dict(  # Train dataset config
        type='CocoDataset',  # Type of dataset, refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/coco.py#L19 for details.
        ann_file='data/coco/annotations/instances_train2017.json',  # Path of annotation file
        img_prefix='data/coco/train2017/',  # Prefix of image path
        pipeline=[  # pipeline, this is passed by the train_pipeline created before.
            dict(type='LoadImageFromFile'),
            dict(
                type='LoadAnnotations',
                with_bbox=True,
                with_mask=True,
                poly2mask=False),
            dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
        ]),
    val=dict(  # Validation dataset config
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        pipeline=[  # Pipeline is passed by test_pipeline created before
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(  # Test dataset config, modify the ann_file for test-dev/test submission
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        pipeline=[  # Pipeline is passed by test_pipeline created before
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        samples_per_gpu=2  # Batch size of a single GPU used in testing
        ))
evaluation = dict(  # The config to build the evaluation hook, refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/evaluation/eval_hooks.py#L7 for more details.
    interval=1,  # Evaluation interval
    metric=['bbox', 'segm'])  # Metrics used during evaluation,评估期间使用的度量标准
optimizer = dict(  # Config used to build optimizer, support all the optimizers in PyTorch whose arguments are also the same as those in PyTorch
    type='SGD',  # Type of optimizers, refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/optimizer/default_constructor.py#L13 for more details
    lr=0.02,  # Learning rate of optimizers, see detail usages of the parameters in the documentation of PyTorch,优化器的学习率
    momentum=0.9,  # Momentum
    weight_decay=0.0001)  # Weight decay of SGD,SGD的权重衰减
optimizer_config = dict(  # Config used to build the optimizer hook, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/optimizer.py#L8 for implementation details.
    grad_clip=None)  # Most of the methods do not use gradient clip
lr_config = dict(  # Learning rate scheduler config used to register LrUpdater hook
    policy='step',  # The policy of scheduler, also support CosineAnnealing, Cyclic, etc. 调度策略,也支持余弦退火,循环等Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9.
    warmup='linear',  # The warmup policy, also support `exp` and `constant`.预热策略,也支持' exp '和' constant
    warmup_iters=500,  # The number of iterations for warmup,热身的迭代次数
    warmup_ratio=
    0.001,  # The ratio of the starting learning rate used for warmup,用于热身的起始学习率的比率
    step=[8, 11])  # Steps to decay the learning rate,衰减学习率的步骤
runner = dict(
    type='EpochBasedRunner', # Type of runner to use,要使用的流道类型 (i.e. IterBasedRunner or EpochBasedRunner)
    max_epochs=12) # Runner that runs the workflow in total max_epochs. For IterBasedRunner use `max_iters`,运行总共max_epoch的工作流的运行器。对于IterBasedRunner使用' max_isters '
checkpoint_config = dict(  # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation.
    interval=1)  # The save interval is 1
log_config = dict(  # config to register logger hook
    interval=50,  # Interval to print the log
    hooks=[
        # dict(type='TensorboardLoggerHook')  # The Tensorboard logger is also supported
        dict(type='TextLoggerHook')
    ])  # The logger used to record the training process.
dist_params = dict(backend='nccl')  # Parameters to setup distributed training, the port can also be set.参数设置分布式培训,也可以设置端口
log_level = 'INFO'  # The level of logging.
load_from = None  # load models as a pre-trained model from a given path. This will not resume training.将模型作为预先训练的模型从给定的路径加载。这不会恢复训练。
resume_from = None  # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved.从一个给定的路径恢复检查点,训练将从检查点保存时开始恢复。
workflow = [('train', 1)]  # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once. The workflow trains the model by 12 epochs according to the total_epochs.[('train', 1)]表示只有一个工作流,名为'train'的工作流将执行一次。工作流根据total_epoch对模型进行12个epoch的训练。
work_dir = 'work_dir'  # Directory to save the model checkpoints and logs for the current experiments.保存当前实验的模型检查点和日志的目录。

2、mmdetection v3.0版本配置文件

model = dict(
    type='MaskRCNN',  # The name of detector
    data_preprocessor=dict(  # The config of data preprocessor, usually includes image normalization and padding,数据预处理的配置,通常包括图像归一化和填充
        type='DetDataPreprocessor',  # The type of the data preprocessor, refer to https://mmdetection.readthedocs.io/en/dev-3.x/api.html#module-mmdet.models.data_preprocessors
        mean=[123.675, 116.28, 103.53],  # Mean values used to pre-training the pre-trained backbone models, ordered in R, G, B
        std=[58.395, 57.12, 57.375],  # Standard variance used to pre-training the pre-trained backbone models, ordered in R, G, B
        bgr_to_rgb=True,  # whether to convert image from BGR to RGB
        pad_mask=True,  # whether to pad instance masks
        pad_size_divisor=32),  # The size of padded image should be divisible by ``pad_size_divisor``
    backbone=dict(  # The config of backbone
        type='ResNet',
        depth=50,  # The depth of backbone, usually it is 50 or 101 for ResNet and ResNext backbones.
        num_stages=4,  # Number of stages of the backbone.
        out_indices=(0, 1, 2, 3),  # The index of output feature maps produced in each stage
        frozen_stages=1,  # The weights in the first stage are frozen
        norm_cfg=dict(  # The config of normalization layers.
            type='BN',  # Type of norm layer, usually it is BN or GN
            requires_grad=True),  # Whether to train the gamma and beta in BN
        norm_eval=True,  # Whether to freeze the statistics in BN
        style='pytorch', # The style of backbone, 'pytorch' means that stride 2 layers are in 3x3 Conv, 'caffe' means stride 2 layers are in 1x1 Convs.
    	init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),  # The ImageNet pretrained backbone to be loaded
    neck=dict(
        type='FPN',  # The neck of detector is FPN. We also support 'NASFPN', 'PAFPN', etc. Refer to https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/models/necks/fpn.py#L10 for more details.
        in_channels=[256, 512, 1024, 2048],  # The input channels, this is consistent with the output channels of backbone
        out_channels=256,  # The output channels of each level of the pyramid feature map
        num_outs=5),  # The number of output scales
    rpn_head=dict(
        type='RPNHead',  # The type of RPN head is 'RPNHead', we also support 'GARPNHead', etc. Refer to https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/models/dense_heads/rpn_head.py#L12 for more details.
        in_channels=256,  # The input channels of each input feature map, this is consistent with the output channels of neck
        feat_channels=256,  # Feature channels of convolutional layers in the head.
        anchor_generator=dict(  # The config of anchor generator
            type='AnchorGenerator',  # Most of methods use AnchorGenerator, SSD Detectors uses `SSDAnchorGenerator`. Refer to https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/models/task_modules/prior_generators/anchor_generator.py for more details
            scales=[8],  # Basic scale of the anchor, the area of the anchor in one position of a feature map will be scale * base_sizes
            ratios=[0.5, 1.0, 2.0],  # The ratio between height and width.
            strides=[4, 8, 16, 32, 64]),  # The strides of the anchor generator. This is consistent with the FPN feature strides. The strides will be taken as base_sizes if base_sizes is not set.
        bbox_coder=dict(  # Config of box coder to encode and decode the boxes during training and testing
            type='DeltaXYWHBBoxCoder',  # Type of box coder. 'DeltaXYWHBBoxCoder' is applied for most of the methods. Refer to https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/models/task_modules/coders/delta_xywh_bbox_coder.py#L9 for more details.
            target_means=[0.0, 0.0, 0.0, 0.0],  # The target means used to encode and decode boxes
            target_stds=[1.0, 1.0, 1.0, 1.0]),  # The standard variance used to encode and decode boxes
        loss_cls=dict(  # Config of loss function for the classification branch
            type='CrossEntropyLoss',  # Type of loss for classification branch, we also support FocalLoss etc.
            use_sigmoid=True,  # RPN usually performs two-class classification, so it usually uses the sigmoid function.
            loss_weight=1.0),  # Loss weight of the classification branch.
        loss_bbox=dict(  # Config of loss function for the regression branch.
            type='L1Loss',  # Type of loss, we also support many IoU Losses and smooth L1-loss, etc. Refer to https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/models/losses/smooth_l1_loss.py#L56 for implementation.
            loss_weight=1.0)),  # Loss weight of the regression branch.
    roi_head=dict(  # RoIHead encapsulates the second stage of two-stage/cascade detectors.
        type='StandardRoIHead',
        bbox_roi_extractor=dict(  # RoI feature extractor for bbox regression.
            type='SingleRoIExtractor',  # Type of the RoI feature extractor, most of methods uses SingleRoIExtractor. Refer to https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py#L10 for details.
            roi_layer=dict(  # Config of RoI Layer
                type='RoIAlign',  # Type of RoI Layer, DeformRoIPoolingPack and ModulatedDeformRoIPoolingPack are also supported. Refer to https://mmcv.readthedocs.io/en/latest/api.html#mmcv.ops.RoIAlign for details.
                output_size=7,  # The output size of feature maps.
                sampling_ratio=0),  # Sampling ratio when extracting the RoI features. 0 means adaptive ratio.
            out_channels=256,  # output channels of the extracted feature.
            featmap_strides=[4, 8, 16, 32]),  # Strides of multi-scale feature maps. It should be consistent with the architecture of the backbone.
        bbox_head=dict(  # Config of box head in the RoIHead.
            type='Shared2FCBBoxHead',  # Type of the bbox head, Refer to https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/models/roi_heads/bbox_heads/convfc_bbox_head.py#L177 for implementation details.
            in_channels=256,  # Input channels for bbox head. This is consistent with the out_channels in roi_extractor
            fc_out_channels=1024,  # Output feature channels of FC layers.
            roi_feat_size=7,  # Size of RoI features
            num_classes=80,  # Number of classes for classification
            bbox_coder=dict(  # Box coder used in the second stage.
                type='DeltaXYWHBBoxCoder',  # Type of box coder. 'DeltaXYWHBBoxCoder' is applied for most of the methods.
                target_means=[0.0, 0.0, 0.0, 0.0],  # Means used to encode and decode box
                target_stds=[0.1, 0.1, 0.2, 0.2]),  # Standard variance for encoding and decoding. It is smaller since the boxes are more accurate. [0.1, 0.1, 0.2, 0.2] is a conventional setting.
            reg_class_agnostic=False,  # Whether the regression is class agnostic.
            loss_cls=dict(  # Config of loss function for the classification branch
                type='CrossEntropyLoss',  # Type of loss for classification branch, we also support FocalLoss etc.
                use_sigmoid=False,  # Whether to use sigmoid.
                loss_weight=1.0),  # Loss weight of the classification branch.
            loss_bbox=dict(  # Config of loss function for the regression branch.
                type='L1Loss',  # Type of loss, we also support many IoU Losses and smooth L1-loss, etc.
                loss_weight=1.0)),  # Loss weight of the regression branch.
        mask_roi_extractor=dict(  # RoI feature extractor for mask generation.
            type='SingleRoIExtractor',  # Type of the RoI feature extractor, most of methods uses SingleRoIExtractor.
            roi_layer=dict(  # Config of RoI Layer that extracts features for instance segmentation
                type='RoIAlign',  # Type of RoI Layer, DeformRoIPoolingPack and ModulatedDeformRoIPoolingPack are also supported
                output_size=14,  # The output size of feature maps.
                sampling_ratio=0),  # Sampling ratio when extracting the RoI features.
            out_channels=256,  # Output channels of the extracted feature.
            featmap_strides=[4, 8, 16, 32]),  # Strides of multi-scale feature maps.
        mask_head=dict(  # Mask prediction head
            type='FCNMaskHead',  # Type of mask head, refer to https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/models/roi_heads/mask_heads/fcn_mask_head.py#L21 for implementation details.
            num_convs=4,  # Number of convolutional layers in mask head.
            in_channels=256,  # Input channels, should be consistent with the output channels of mask roi extractor.
            conv_out_channels=256,  # Output channels of the convolutional layer.
            num_classes=80,  # Number of class to be segmented.
            loss_mask=dict(  # Config of loss function for the mask branch.
                type='CrossEntropyLoss',  # Type of loss used for segmentation
                use_mask=True,  # Whether to only train the mask in the correct class.
                loss_weight=1.0))),  # Loss weight of mask branch.
    train_cfg = dict(  # Config of training hyperparameters for rpn and rcnn
        rpn=dict(  # Training config of rpn
            assigner=dict(  # Config of assigner
                type='MaxIoUAssigner',  # Type of assigner, MaxIoUAssigner is used for many common detectors. Refer to https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/models/task_modules/assigners/max_iou_assigner.py for more details.
                pos_iou_thr=0.7,  # IoU >= threshold 0.7 will be taken as positive samples
                neg_iou_thr=0.3,  # IoU < threshold 0.3 will be taken as negative samples
                min_pos_iou=0.3,  # The minimal IoU threshold to take boxes as positive samples
                match_low_quality=True,  # Whether to match the boxes under low quality (see API doc for more details).
                ignore_iof_thr=-1),  # IoF threshold for ignoring bboxes
            sampler=dict(  # Config of positive/negative sampler
                type='RandomSampler',  # Type of sampler, PseudoSampler and other samplers are also supported. Refer to https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/models/task_modules/samplers/random_sampler.py for implementation details.
                num=256,  # Number of samples
                pos_fraction=0.5,  # The ratio of positive samples in the total samples.
                neg_pos_ub=-1,  # The upper bound of negative samples based on the number of positive samples.
                add_gt_as_proposals=False),  # Whether add GT as proposals after sampling.
            allowed_border=-1,  # The border allowed after padding for valid anchors.
            pos_weight=-1,  # The weight of positive samples during training.
            debug=False),  # Whether to set the debug mode
        rpn_proposal=dict(  # The config to generate proposals during training
            nms_across_levels=False,  # Whether to do NMS for boxes across levels. Only work in `GARPNHead`, naive rpn does not support do nms cross levels.
            nms_pre=2000,  # The number of boxes before NMS
            nms_post=1000,  # The number of boxes to be kept by NMS. Only work in `GARPNHead`.
            max_per_img=1000,  # The number of boxes to be kept after NMS.
            nms=dict( # Config of NMS
                type='nms',  # Type of NMS
                iou_threshold=0.7 # NMS threshold
                ),
            min_bbox_size=0),  # The allowed minimal box size
        rcnn=dict(  # The config for the roi heads.
            assigner=dict(  # Config of assigner for second stage, this is different for that in rpn
                type='MaxIoUAssigner',  # Type of assigner, MaxIoUAssigner is used for all roi_heads for now. Refer to https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/models/task_modules/assigners/max_iou_assigner.py for more details.
                pos_iou_thr=0.5,  # IoU >= threshold 0.5 will be taken as positive samples
                neg_iou_thr=0.5,  # IoU < threshold 0.5 will be taken as negative samples
                min_pos_iou=0.5,  # The minimal IoU threshold to take boxes as positive samples
                match_low_quality=False,  # Whether to match the boxes under low quality (see API doc for more details).
                ignore_iof_thr=-1),  # IoF threshold for ignoring bboxes
            sampler=dict(
                type='RandomSampler',  # Type of sampler, PseudoSampler and other samplers are also supported. Refer to https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/models/task_modules/samplers/random_sampler.py for implementation details.
                num=512,  # Number of samples
                pos_fraction=0.25,  # The ratio of positive samples in the total samples.
                neg_pos_ub=-1,  # The upper bound of negative samples based on the number of positive samples.
                add_gt_as_proposals=True
            ),  # Whether add GT as proposals after sampling.
            mask_size=28,  # Size of mask
            pos_weight=-1,  # The weight of positive samples during training.
            debug=False)),  # Whether to set the debug mode
    test_cfg = dict(  # Config for testing hyperparameters for rpn and rcnn
        rpn=dict(  # The config to generate proposals during testing
            nms_across_levels=False,  # Whether to do NMS for boxes across levels. Only work in `GARPNHead`, naive rpn does not support do nms cross levels.
            nms_pre=1000,  # The number of boxes before NMS
            nms_post=1000,  # The number of boxes to be kept by NMS. Only work in `GARPNHead`.
            max_per_img=1000,  # The number of boxes to be kept after NMS.
            nms=dict( # Config of NMS
                type='nms',  #Type of NMS
                iou_threshold=0.7 # NMS threshold
                ),
            min_bbox_size=0),  # The allowed minimal box size
        rcnn=dict(  # The config for the roi heads.
            score_thr=0.05,  # Threshold to filter out boxes
            nms=dict(  # Config of NMS in the second stage
                type='nms',  # Type of NMS
                iou_thr=0.5),  # NMS threshold
            max_per_img=100,  # Max number of detections of each image
            mask_thr_binary=0.5)))  # Threshold of mask prediction

2、faster_rcnn_r50_fpn_1x.py配置信息

首先介绍一下这个配置文件所描述的框架,它是基于resnet50的backbone,有着5个fpn特征层的faster-RCNN目标检测网络,训练迭代次数为标准的12次epoch。

# model settings
model = dict(
	type='FasterRCNN',                         # model类型
    pretrained='modelzoo://resnet50',          # 预训练模型:imagenet-resnet50
    backbone=dict(
        type='ResNet',                         # backbone类型
        depth=50,                              # 网络层数
        num_stages=4,                          # resnet的stage数量
        out_indices=(0, 1, 2, 3),              # 输出的stage的序号
        frozen_stages=1,                       # 冻结的stage数量,即该stage不更新参数,-1表示所有的stage都更新参数
        style='pytorch'),                      # 网络风格:如果设置pytorch,则stride为2的层是conv3x3的卷积层;如果设置caffe,则stride为2的层是第一个conv1x1的卷积层
    neck=dict(
        type='FPN',                            # neck类型
        in_channels=[256, 512, 1024, 2048],    # 输入的各个stage的通道数
        out_channels=256,                      # 输出的特征层的通道数
        num_outs=5),                           # 输出的特征层的数量
    rpn_head=dict(
        type='RPNHead',                        # RPN网络类型
        in_channels=256,                       # RPN网络的输入通道数
        feat_channels=256,                     # 特征层的通道数
        anchor_scales=[8],                     # 生成的anchor的baselen,baselen = sqrt(w*h),w和h为anchor的宽和高
        anchor_ratios=[0.5, 1.0, 2.0],         # anchor的宽高比
        anchor_strides=[4, 8, 16, 32, 64],     # 在每个特征层上的anchor的步长(对应于原图)
        target_means=[.0, .0, .0, .0],         # 均值
        target_stds=[1.0, 1.0, 1.0, 1.0],      # 方差
        use_sigmoid_cls=True),                 # 是否使用sigmoid来进行分类,如果False则使用softmax来分类
    bbox_roi_extractor=dict(
        type='SingleRoIExtractor',                                   # RoIExtractor类型
        roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),   # ROI具体参数:ROI类型为ROIalign,输出尺寸为7,sample数为2
        out_channels=256,                                            # 输出通道数
        featmap_strides=[4, 8, 16, 32]),                             # 特征图的步长
    bbox_head=dict(
        type='SharedFCBBoxHead',                     # 全连接层类型
        num_fcs=2,                                   # 全连接层数量
        in_channels=256,                             # 输入通道数
        fc_out_channels=1024,                        # 输出通道数
        roi_feat_size=7,                             # ROI特征层尺寸
        num_classes=81,                              # 分类器的类别数量+1,+1是因为多了一个背景的类别
        target_means=[0., 0., 0., 0.],               # 均值
        target_stds=[0.1, 0.1, 0.2, 0.2],            # 方差
        reg_class_agnostic=False))                   # 是否采用class_agnostic的方式来预测,class_agnostic表示输出bbox时只考虑其是否为前景,后续分类的时候再根据该bbox在网络中的类别得分来分类,也就是说一个框可以对应多个类别
# model training and testing settings
train_cfg = dict(
    rpn=dict(
        assigner=dict(
            type='MaxIoUAssigner',            # RPN网络的正负样本划分
            pos_iou_thr=0.7,                  # 正样本的iou阈值
            neg_iou_thr=0.3,                  # 负样本的iou阈值
            min_pos_iou=0.3,                  # 正样本的iou最小值。如果assign给ground truth的anchors中最大的IOU低于0.3,则忽略所有的anchors,否则保留最大IOU的anchor
            ignore_iof_thr=-1),               # 忽略bbox的阈值,当ground truth中包含需要忽略的bbox时使用,-1表示不忽略
        sampler=dict(
            type='RandomSampler',             # 正负样本提取器类型
            num=256,                          # 需提取的正负样本数量
            pos_fraction=0.5,                 # 正样本比例
            neg_pos_ub=-1,                    # 最大负样本比例,大于该比例的负样本忽略,-1表示不忽略
            add_gt_as_proposals=False),       # 把ground truth加入proposal作为正样本
        allowed_border=0,                     # 允许在bbox周围外扩一定的像素
        pos_weight=-1,                        # 正样本权重,-1表示不改变原始的权重
        smoothl1_beta=1 / 9.0,                # 平滑L1系数
        debug=False),                         # debug模式
    rcnn=dict(
        assigner=dict(
            type='MaxIoUAssigner',            # RCNN网络正负样本划分
            pos_iou_thr=0.5,                  # 正样本的iou阈值
            neg_iou_thr=0.5,                  # 负样本的iou阈值
            min_pos_iou=0.5,                  # 正样本的iou最小值。如果assign给ground truth的anchors中最大的IOU低于0.3,则忽略所有的anchors,否则保留最大IOU的anchor
            ignore_iof_thr=-1),               # 忽略bbox的阈值,当ground truth中包含需要忽略的bbox时使用,-1表示不忽略
        sampler=dict(
            type='RandomSampler',             # 正负样本提取器类型
            num=512,                          # 需提取的正负样本数量
            pos_fraction=0.25,                # 正样本比例
            neg_pos_ub=-1,                    # 最大负样本比例,大于该比例的负样本忽略,-1表示不忽略
            add_gt_as_proposals=True),        # 把ground truth加入proposal作为正样本
        pos_weight=-1,                        # 正样本权重,-1表示不改变原始的权重
        debug=False))                         # debug模式
test_cfg = dict(
    rpn=dict(                                 # 推断时的RPN参数
        nms_across_levels=False,              # 在所有的fpn层内做nms
        nms_pre=2000,                         # 在nms之前保留的的得分最高的proposal数量
        nms_post=2000,                        # 在nms之后保留的的得分最高的proposal数量
        max_num=2000,                         # 在后处理完成之后保留的proposal数量
        nms_thr=0.7,                          # nms阈值
        min_bbox_size=0),                     # 最小bbox尺寸
    rcnn=dict(
        score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100)   # max_per_img表示最终输出的det bbox数量
    # soft-nms is also supported for rcnn testing
    # e.g., nms=dict(type='soft_nms', iou_thr=0.5, min_score=0.05)            # soft_nms参数
)
# dataset settings
dataset_type = 'CocoDataset'                # 数据集类型
data_root = 'data/coco/'                    # 数据集根目录
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)   # 输入图像初始化,减去均值mean并处以方差std,to_rgb表示将bgr转为rgb
data = dict(
    imgs_per_gpu=2,                # 每个gpu计算的图像数量
    workers_per_gpu=2,             # 每个gpu分配的线程数
    train=dict(
        type=dataset_type,                                                 # 数据集类型
        ann_file=data_root + 'annotations/instances_train2017.json',       # 数据集annotation路径
        img_prefix=data_root + 'train2017/',                               # 数据集的图片路径
        img_scale=(1333, 800),                                             # 输入图像尺寸,最大边1333,最小边800
        img_norm_cfg=img_norm_cfg,                                         # 图像初始化参数
        size_divisor=32,                                                   # 对图像进行resize时的最小单位,32表示所有的图像都会被resize成32的倍数
        flip_ratio=0.5,                                                    # 图像的随机左右翻转的概率
        with_mask=False,                                                   # 训练时附带mask
        with_crowd=True,                                                   # 训练时附带difficult的样本
        with_label=True),                                                  # 训练时附带label
    val=dict(
        type=dataset_type,                                                 # 同上
        ann_file=data_root + 'annotations/instances_val2017.json',         # 同上
        img_prefix=data_root + 'val2017/',                                 # 同上
        img_scale=(1333, 800),                                             # 同上
        img_norm_cfg=img_norm_cfg,                                         # 同上
        size_divisor=32,                                                   # 同上
        flip_ratio=0,                                                      # 同上
        with_mask=False,                                                   # 同上
        with_crowd=True,                                                   # 同上
        with_label=True),                                                  # 同上
    test=dict(
        type=dataset_type,                                                 # 同上
        ann_file=data_root + 'annotations/instances_val2017.json',         # 同上
        img_prefix=data_root + 'val2017/',                                 # 同上
        img_scale=(1333, 800),                                             # 同上
        img_norm_cfg=img_norm_cfg,                                         # 同上
        size_divisor=32,                                                   # 同上
        flip_ratio=0,                                                      # 同上
        with_mask=False,                                                   # 同上
        with_label=False,                                                  # 同上
        test_mode=True))                                                   # 同上
# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)   # 优化参数,lr为学习率,momentum为动量因子,weight_decay为权重衰减因子
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))          # 梯度均衡参数
# learning policy
lr_config = dict(
    policy='step',                        # 优化策略
    warmup='linear',                      # 初始的学习率增加的策略,linear为线性增加
    warmup_iters=500,                     # 在初始的500次迭代中学习率逐渐增加
    warmup_ratio=1.0 / 3,                 # 起始的学习率
    step=[8, 11])                         # 在第8和11个epoch时降低学习率
checkpoint_config = dict(interval=1)      # 每1个epoch存储一次模型
# yapf:disable
log_config = dict(
    interval=50,                          # 每50个batch输出一次信息
    hooks=[
        dict(type='TextLoggerHook'),      # 控制台输出信息的风格
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
# runtime settings
total_epochs = 12                               # 最大epoch数
dist_params = dict(backend='nccl')              # 分布式参数
log_level = 'INFO'                              # 输出信息的完整度级别
work_dir = './work_dirs/faster_rcnn_r50_fpn_1x' # log文件和模型文件存储路径
load_from = None                                # 加载模型的路径,None表示从预训练模型加载
resume_from = None                              # 恢复训练模型的路径
workflow = [('train', 1)]                       # 当前工作区名称

3、cascade_rcnn_r50_fpn_1x.py配置文件

cascade-RCNN是cvpr2018的文章,相比于faster-RCNN的改进主要在于其RCNN有三个stage,这三个stage逐级refine检测的结果,使得结果达到更高的精度。下面逐条解释其config的含义,与faster-RCNN相同的部分就不再赘述。

# model settings
model = dict(
    type='CascadeRCNN',
    num_stages=3,                     # RCNN网络的stage数量,在faster-RCNN中为1
    pretrained='modelzoo://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        style='pytorch'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_scales=[8],
        anchor_ratios=[0.5, 1.0, 2.0],
        anchor_strides=[4, 8, 16, 32, 64],
        target_means=[.0, .0, .0, .0],
        target_stds=[1.0, 1.0, 1.0, 1.0],
        use_sigmoid_cls=True),
    bbox_roi_extractor=dict(
        type='SingleRoIExtractor',
        roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
        out_channels=256,
        featmap_strides=[4, 8, 16, 32]),
    bbox_head=[
        dict(
            type='SharedFCBBoxHead',
            num_fcs=2,
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=81,
            target_means=[0., 0., 0., 0.],
            target_stds=[0.1, 0.1, 0.2, 0.2],
            reg_class_agnostic=True),
        dict(
            type='SharedFCBBoxHead',
            num_fcs=2,
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=81,
            target_means=[0., 0., 0., 0.],
            target_stds=[0.05, 0.05, 0.1, 0.1],
            reg_class_agnostic=True),
        dict(
            type='SharedFCBBoxHead',
            num_fcs=2,
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=81,
            target_means=[0., 0., 0., 0.],
            target_stds=[0.033, 0.033, 0.067, 0.067],
            reg_class_agnostic=True)
    ])
# model training and testing settings
train_cfg = dict(
    rpn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.7,
            neg_iou_thr=0.3,
            min_pos_iou=0.3,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=256,
            pos_fraction=0.5,
            neg_pos_ub=-1,
            add_gt_as_proposals=False),
        allowed_border=0,
        pos_weight=-1,
        smoothl1_beta=1 / 9.0,
        debug=False),
    rcnn=[                    # 注意,这里有3个RCNN的模块,对应开头的那个RCNN的stage数量
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False),
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.6,
                neg_iou_thr=0.6,
                min_pos_iou=0.6,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False),
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.7,
                min_pos_iou=0.7,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)
    ],
    stage_loss_weights=[1, 0.5, 0.25])     # 3个RCNN的stage的loss权重
test_cfg = dict(
    rpn=dict(
        nms_across_levels=False,
        nms_pre=2000,
        nms_post=2000,
        max_num=2000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=dict(
        score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100),
    keep_all_stages=False)         # 是否保留所有stage的结果
# dataset settings
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017/',
        img_scale=(1333, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0.5,
        with_mask=False,
        with_crowd=True,
        with_label=True),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        img_scale=(1333, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_crowd=True,
        with_label=True),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        img_scale=(1333, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_label=False,
        test_mode=True))
# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[8, 11])
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
# runtime settings
total_epochs = 12
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/cascade_rcnn_r50_fpn_1x'
load_from = None
resume_from = None
workflow = [('train', 1)]

4、数据集和评估器配置

运行器的训练、验证和测试需要数据加载器。需要设置数据集和数据管道来构建数据加载器。由于这部分的复杂性,我们使用中间变量来简化数据加载器配置的编写。

4.1 数据集配置

dataset_type = 'CocoDataset'  # Dataset type, this will be used to define the dataset
data_root = 'data/coco/'  # Root path of data
file_client_args = dict(backend='disk')  # file client arguments,文件客户端参数

train_pipeline = [  # Training data processing pipeline
    dict(type='LoadImageFromFile', file_client_args=file_client_args),  # First pipeline to load images from file path
    dict(
        type='LoadAnnotations',  # Second pipeline to load annotations for current image
        with_bbox=True,  # Whether to use bounding box, True for detection
        with_mask=True,  # Whether to use instance mask, True for instance segmentation
        poly2mask=True),  # Whether to convert the polygon mask to instance mask, set False for acceleration and to save memory
    dict(
        type='Resize',  # Pipeline that resizes the images and their annotations
        scale=(1333, 800),  # The largest scale of the images
        keep_ratio=True  # Whether to keep the ratio between height and width
        ),
    dict(
        type='RandomFlip',  # Augmentation pipeline that flips the images and their annotations
        prob=0.5),  # The probability to flip
    dict(type='PackDetInputs')  # Pipeline that formats the annotation data and decides which keys in the data should be packed into data_samples,该管道格式化注释数据,并决定数据中的哪些键应该打包到data_samples中
]
test_pipeline = [  # Testing data processing pipeline
    dict(type='LoadImageFromFile', file_client_args=file_client_args),  # First pipeline to load images from file path
    dict(type='Resize', scale=(1333, 800), keep_ratio=True),  # Pipeline that resizes the images
    dict(
        type='PackDetInputs',  # Pipeline that formats the annotation data and decides which keys in the data should be packed into data_samples
        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
                   'scale_factor'))
]
train_dataloader = dict(   # Train dataloader config
    batch_size=2,  # Batch size of a single GPU
    num_workers=2,  # Worker to pre-fetch data for each single GPU
    persistent_workers=True,  # If ``True``, the dataloader will not shut down the worker processes after an epoch end, which can accelerate training speed.如果' ' True ' ',数据加载器将不会在epoch结束后关闭worker进程,这可以加快训练速度。
    sampler=dict(  # training data sampler
        type='DefaultSampler',  # DefaultSampler which supports both distributed and non-distributed training.支持分布式和非分布式训练的DefaultSampler, Refer to https://github.com/open-mmlab/mmengine/blob/main/mmengine/dataset/sampler.py
        shuffle=True),  # randomly shuffle the training data in each epoch
    batch_sampler=dict(type='AspectRatioBatchSampler'),  # Batch sampler for grouping images with similar aspect ratio into a same batch. It can reduce GPU memory cost.批量采样器用于将具有相似宽高比的图像分组为同一批次。它可以降低GPU的内存成本。
    dataset=dict(  # Train dataset config
        type=dataset_type,
        data_root=data_root,
        ann_file='annotations/instances_train2017.json',  # Path of annotation file
        data_prefix=dict(img='train2017/'),  # Prefix of image path
        filter_cfg=dict(filter_empty_gt=True, min_size=32),  # Config of filtering images and annotations
        pipeline=train_pipeline))
val_dataloader = dict(  # Validation dataloader config
    batch_size=1,  # Batch size of a single GPU. If batch-size > 1, the extra padding area may influence the performance.单个GPU的批量大小。如果批量&gt;1、多余的填充面积可能会影响性能。
    num_workers=2,  # Worker to pre-fetch data for each single GPU
    persistent_workers=True,  # If ``True``, the dataloader will not shut down the worker processes after an epoch end, which can accelerate training speed.
    drop_last=False,  # Whether to drop the last incomplete batch, if the dataset size is not divisible by the batch size,如果数据集大小不能被批处理大小整除,则是否删除最后一个不完整批处理
    sampler=dict(
        type='DefaultSampler',
        shuffle=False),  # not shuffle during validation and testing
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file='annotations/instances_val2017.json',
        data_prefix=dict(img='val2017/'),
        test_mode=True,  # Turn on the test mode of the dataset to avoid filtering annotations or images,打开数据集的测试模式,以避免过滤注释或图像
        pipeline=test_pipeline))
test_dataloader = val_dataloader  # Testing dataloader config
4.1 数据集配置(关于数据变换的形状顺序)

在OpenMMLab 2.0中,为了与OpenCV的输入参数保持一致,数据转换管道中有关图像形状的参数始终是按(width, height) 顺序排列的。相反,为了计算方便,字段通过数据管道和模型的顺序为 (height, width)。具体来说,各个数据转换管道处理的结果中,各个字段及其值含义如下:
img_shape: (height, width)
ori_shape: (height, width)
pad_shape: (height, width)
batch_input_shape: (height, width)

例如,初始化参数Mosaic如下:

@TRANSFORMS.register_module()
class Mosaic(BaseTransform):
    def __init__(self,
                img_scale: Tuple[int, int] = (640, 640),
                center_ratio_range: Tuple[float, float] = (0.5, 1.5),
                bbox_clip_border: bool = True,
                pad_val: float = 114.0,
                prob: float = 1.0) -> None:
       ...

       # img_scale order should be (width, height)
       self.img_scale = img_scale

    def transform(self, results: dict) -> dict:
        ...

        results['img'] = mosaic_img
        # (height, width)
        results['img_shape'] = mosaic_img.shape[:2]

4.2 评估器配置

val_evaluator = dict(  # Validation evaluator config
    type='CocoMetric',  # The coco metric used to evaluate AR, AP, and mAP for detection and instance segmentation
    ann_file=data_root + 'annotations/instances_val2017.json',  # Annotation file path
    metric=['bbox', 'segm'],  # Metrics to be evaluated, `bbox` for detection and `segm` for instance segmentation
    format_only=False)
test_evaluator = val_evaluator  # Testing evaluator config

由于测试数据集没有标注文件,MMDetection中的test_dataloader和test_evaluator配置一般和val的一样。如果想保存测试数据集上的检测结果,可以这样写config:

# inference on test dataset and
# format the output results for submission.
test_dataloader = dict(
    batch_size=1,
    num_workers=2,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'annotations/image_info_test-dev2017.json',
        data_prefix=dict(img='test2017/'),
        test_mode=True,
        pipeline=test_pipeline))
test_evaluator = dict(
    type='CocoMetric',
    ann_file=data_root + 'annotations/image_info_test-dev2017.json',
    metric=['bbox', 'segm'],  # Metrics to be evaluated
    format_only=True,  # Only format and save the results to coco json file,只将结果格式化并保存到coco json文件中
    outfile_prefix='./work_dirs/coco_detection/test')  # The prefix of output json files,输出json文件的路径

5、训练和测试配置

MMEngine 的运行器使用 Loop 来控制训练、验证和测试过程。用户可以使用这些字段设置最大训练周期和验证间隔

train_cfg = dict(
    type='EpochBasedTrainLoop',  # The training loop type. Refer to https://github.com/open-mmlab/mmengine/blob/main/mmengine/runner/loops.py
    max_epochs=12,  # Maximum training epochs
    val_interval=1)  # Validation intervals. Run validation every epoch.
val_cfg = dict(type='ValLoop')  # The validation loop type
test_cfg = dict(type='TestLoop')  # The testing loop type

6、优化配置

optim_wrapper是配置优化相关设置的字段。优化器包装器不仅提供了优化器的功能,还支持梯度裁剪、混合精度训练等功能,更多内容参见优化器包装器教程

optim_wrapper = dict(  # Optimizer wrapper config
    type='OptimWrapper',  # Optimizer wrapper type, switch to AmpOptimWrapper to enable mixed precision training.优化包装器类型,切换到AmpOptimWrapper以启用混合精确训练
    optimizer=dict(  # Optimizer config. Support all kinds of optimizers in PyTorch. Refer to https://pytorch.org/docs/stable/optim.html#algorithms
        type='SGD',  # Stochastic gradient descent optimizer,随机梯度下降优化器
        lr=0.02,  # The base learning rate,基础学习率
        momentum=0.9,  # Stochastic gradient descent with momentum,带动量的随机梯度下降
        weight_decay=0.0001),  # Weight decay of SGD
    clip_grad=None,  # Gradient clip option. Set None to disable gradient clip.渐变剪辑选项。设置None禁用渐变剪辑 Find usage in https://mmengine.readthedocs.io/en/latest/tutorials/optimizer.html
    )

param_scheduler是配置调整优化超参数(例如学习率和动量)的方法的字段。用户可以组合多个调度器来创建所需的参数调整策略。在参数调度器教程和参数调度器 API 文档中查找更多信息

param_scheduler = [
    dict(
        type='LinearLR',  # Use linear policy to warmup learning rate,使用线性策略来热身学习率
        start_factor=0.001, # The ratio of the starting learning rate used for warmup,用于热身的起始学习率的比率
        by_epoch=False,  # The warmup learning rate is updated by iteration,通过迭代更新热身学习率
        begin=0,  # Start from the first iteration
        end=500),  # End the warmup at the 500th iteration
    dict(
        type='MultiStepLR',  # Use multi-step learning rate policy during training,在培训过程中使用多步学习率策略
        by_epoch=True,  # The learning rate is updated by epoch,学习率按时代更新
        begin=0,   # Start from the first epoch
        end=12,  # End at the 12th epoch
        milestones=[8, 11],  # Epochs to decay the learning rate,学习率下降的EPOCH
        gamma=0.1)  # The learning rate decay ratio,学习率衰减比
]

7、钩子配置(使用参见MMDetection 系列之高级指南(自定义运行设置之优化器、其他设置、学习率、工作流、有用的和自定义钩子))

用户可以将 Hooks 附加到训练、验证和测试循环中,以在运行过程中插入一些操作。有两个不同的钩子域,一个是default_hooks,另一个是custom_hooks。

default_hooks是钩子配置的字典,它们是运行时必须需要的钩子。它们具有默认优先级,不应修改。如果未设置,runner 将使用默认值。要禁用默认挂钩,用户可以将其配置设置为None.

default_hooks = dict(
    timer=dict(type='IterTimerHook'),
    logger=dict(type='LoggerHook', interval=50),
    param_scheduler=dict(type='ParamSchedulerHook'),
    checkpoint=dict(type='CheckpointHook', interval=1),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    visualization=dict(type='DetVisualizationHook'))

custom_hooks是所有其他钩子配置的列表。用户可以开发自己的钩子并将其插入到该字段中。

custom_hooks = []

8、运行时配置

default_scope = 'mmdet'  # The default registry scope to find modules. 用于查找模块的默认注册表范围Refer to https://mmengine.readthedocs.io/en/latest/tutorials/registry.html

env_cfg = dict(
    cudnn_benchmark=False,  # Whether to enable cudnn benchmark,是否启用cudnn基准测试
    mp_cfg=dict(  # Multi-processing config
        mp_start_method='fork',  # Use fork to start multi-processing threads. 'fork' usually faster than 'spawn' but maybe unsafe. See discussion in https://github.com/pytorch/pytorch/issues/1355
        opencv_num_threads=0),  # Disable opencv multi-threads to avoid system being overloaded,关闭opencv多线程以避免系统过载
    dist_cfg=dict(backend='nccl'),  # Distribution configs,分布配置
)

vis_backends = [dict(type='LocalVisBackend')]  # Visualization backends. Refer to TODO: visualization documents
visualizer = dict(
    type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer')
log_processor = dict(
    type='LogProcessor',  # Log processor to process runtime logs,日志处理器处理运行时日志
    window_size=50,  # Smooth interval of log values,平滑的日志值间隔
    by_epoch=True)  # Whether to format logs with epoch type. Should be consistent with the train loop's type.是否使用epoch类型格式化日志

log_level = 'INFO'  # The level of logging.
load_from = None  # Load model checkpoint as a pre-trained model from a given path. This will not resume training.
resume = False  # Whether to resume from the checkpoint defined in `load_from`. If `load_from` is None, it will resume the latest checkpoint in the `work_dir`.

9、基于迭代器的配置

MMEngine 的 Runner 除了基于 epoch 之外,还提供了基于 iter 的训练循环。要使用基于 iter 的训练,用户应修改train_cfg、param_scheduler、train_dataloader、default_hooks和log_processor。下面是一个将基于 epoch 的 RetinaNet 配置更改为基于 iter 的示例:configs/retinanet/retinanet_r50_fpn_90k_coco.py

train_cfg = dict(
    type='EpochBasedTrainLoop',  # The training loop type. Refer to https://github.com/open-mmlab/mmengine/blob/main/mmengine/runner/loops.py
    max_epochs=12,  # Maximum training epochs
    val_interval=1)  # Validation intervals. Run validation every epoch.

# Iter-based training config
train_cfg = dict(
    _delete_=True,  # Ignore the base config setting (optional),忽略基本配置设置
    type='IterBasedTrainLoop',  # Use iter-based training loop
    max_iters=90000,  # Maximum iterations
    val_interval=10000)  # Validation interval


# Change the scheduler to iter-based
param_scheduler = [
    dict(
        type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500),
    dict(
        type='MultiStepLR',
        begin=0,
        end=90000,
        by_epoch=False,
        milestones=[60000, 80000],
        gamma=0.1)
]

# Switch to InfiniteSampler to avoid dataloader restart
train_dataloader = dict(sampler=dict(type='InfiniteSampler'))

# Change the checkpoint saving interval to iter-based
default_hooks = dict(checkpoint=dict(by_epoch=False, interval=10000))

# Change the log format to iter-based
log_processor = dict(by_epoch=False)

10、配置文件继承

下有4个基本组件类型config/base,dataset、model、schedule、default_runtime。许多方法可以使用这些模型之一轻松构建,例如 Faster R-CNN、Mask R-CNN、Cascade R-CNN、RPN、SSD。由来自的组件组成的配置_base_称为原语

对于同一文件夹下的所有配置,建议只有一个 原始配置。所有其他配置都应继承自原始配置。这样,继承级别的最大值为3。

为了便于理解,我们建议贡献者继承现有的方法。比如在Faster R-CNN的基础上做一些修改,用户可以先通过指定继承Faster R-CNN的基本结构,然后在配置文件中修改必要的字段。base = …/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py

如果您正在构建一个不与任何现有方法共享结构的全新方法,您可以xxx_rcnn在 下创建一个文件夹configs,
详细文档请参考mmengine配置教程
通过设置该_base_字段,我们可以设置当前配置文件继承自哪些文件。

当_base_是一串文件路径时,表示继承一个配置文件的内容。

_base_ = './mask-rcnn_r50_fpn_1x_coco.py'

当_base_是多个文件路径的列表时,表示从多个文件继承。

_base_ = [
    '../_base_/models/mask-rcnn_r50_fpn.py',
    '../_base_/datasets/coco_instance.py',#做目标检测,修改为coco_detection.py
    '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'
]

1、忽略基本配置中的一些字段

有时,您可能会设置_delete_=True忽略基本配置中的某些字段。

例如,在 MMDetection 中,使用以下配置更改 Mask R-CNN 的主干。

model = dict(
    type='MaskRCNN',
    pretrained='torchvision://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch'),
    neck=dict(...),
    rpn_head=dict(...),
    roi_head=dict(...))

ResNet并HRNet使用不同的关键字来构造。

_base_ = '../mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py'
model = dict(
    pretrained='open-mmlab://msra/hrnetv2_w32',
    backbone=dict(
        _delete_=True,#忽略某些字段
        type='HRNet',
        extra=dict(
            stage1=dict(
                num_modules=1,
                num_branches=1,
                block='BOTTLENECK',
                num_blocks=(4, ),
                num_channels=(64, )),
            stage2=dict(
                num_modules=1,
                num_branches=2,
                block='BASIC',
                num_blocks=(4, 4),
                num_channels=(32, 64)),
            stage3=dict(
                num_modules=4,
                num_branches=3,
                block='BASIC',
                num_blocks=(4, 4, 4),
                num_channels=(32, 64, 128)),
            stage4=dict(
                num_modules=3,
                num_branches=4,
                block='BASIC',
                num_blocks=(4, 4, 4, 4),
                num_channels=(32, 64, 128, 256)))),
    neck=dict(...))

当_delete_=True时,将用新键替换骨干字段中的所有旧键。

2、在配置中使用中间变量

配置文件中使用了一些中间变量,例如数据集中的train_pipeline/ test_pipeline。需要注意的是,修改children configs中的中间变量时,需要将中间变量重新传入相应的字段。例如,我们想使用多尺度策略来训练 Mask R-CNN。train_pipeline/test_pipeline是我们要修改的中间变量。

_base_ = './mask_rcnn_r50_fpn_1x_coco.py'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(
        type='Resize',
        img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736),
                   (1333, 768), (1333, 800)],#多尺度训练
        multiscale_mode="value",
        keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    train=dict(pipeline=train_pipeline),
    val=dict(pipeline=test_pipeline),
    test=dict(pipeline=test_pipeline))

我们首先定义新的train_pipeline/test_pipeline并将它们传递到数据加载器字段中。
类似地,如果我们想从SyncBN切换到BN或MMSyncBN,我们需要替换配置中的每个norm_cfg。

_base_ = './mask_rcnn_r50_fpn_1x_coco.py'
norm_cfg = dict(type='BN', requires_grad=True)
model = dict(
    backbone=dict(norm_cfg=norm_cfg),
    neck=dict(norm_cfg=norm_cfg),
    ...)

3、在 base 文件中重用变量

如果用户想重用基础文件中的变量,可以通过使用{{base.xxx}}. 例如:

_base_ = './mask-rcnn_r50_fpn_1x_coco.py'

a = {{_base_.model}} # Variable `a` is equal to the `model` defined in `_base_`

11、通过脚本参数修改配置

提交作业时使用tools/train.py或tools/test.py,您可以指定–cfg-options就地修改配置信息。

一些配置指令在您的配置中组成一个列表。例如,训练管道train_dataloader.dataset.pipeline通常是一个列表,例如。如果您想在管道中更改为,您可以指定.[dict(type=‘LoadImageFromFile’), …]‘LoadImageFromFile’‘LoadImageFromNDArray’–cfg-options data.train.pipeline.0.type=LoadImageFromNDArray

更新列表/元组的值。

如果要更新的值是列表或元组。例如,配置文件通常设置. 如果你想改变平均值,你可以指定。请注意,引号是支持列表/元组数据类型所必需的,并且在指定值的引号内不允许有空格。model.data_preprocessor.mean=[123.675, 116.28, 103.53]–cfg-options model.data_preprocessor.mean=“[127,127,127]”"

1、更新config字典链的配置键。

可以按照原始配置中字典键的顺序指定配置选项。例如,将模型主干中的所有 BN 模块更改

--cfg-options model.backbone.norm_eval=False

将模型骨干中的所有BN模块改为训练模式。

2、更新配置列表中的键

一些配置字典在你的配置中组成一个列表,例如,培训管道”data.train.pipeline is normally a list e.g. [dict(type=‘LoadImageFromFile’), …]
If you want to change ‘LoadImageFromFile’ to ‘LoadImageFromWebcam’ in the pipeline, you may specify

[dict(type='LoadImageFromFile'), ...]'LoadImageFromFile''LoadImageFromNDArray'--cfg-options data.train.pipeline.0.type=LoadImageFromNDArray
 --cfg-options data.train.pipeline.0.type=LoadImageFromWebcam.

3、 更新列表/元组的值

如果要更新的值是列表或元组。例如,配置文件通常设置. 如果你想改变平均值,你可以指定。请注意,引号是支持列表/元组数据类型所必需的,并且在指定值的引号内不允许有空格

--cfg-options model.data_preprocessor.mean="[127,127,127]"

If the value to be updated is a list or a tuple. For example, the config file normally sets workflow=[(‘train’, 1)]. If you want to change this key, you may specify

 --cfg-options workflow="[(train,1),(val,1)]".

12、配置名称的风格

文件名分为五个部分。所有部件和部件都用连接_,每个部件或部件的单词应该用连接-。

{algorithm name}_{model component names [component1]_[component2]_[...]}_{training settings}_{training dataset information}_{testing dataset information}.py

{algorithm name}:算法的名称。它可以是检测器名称,例如faster-rcnn、mask-rcnn等。也可以是半监督或知识蒸馏算法,例如soft-teacher、lad。ETC。

{model component names}: 算法中使用的组件名称,如backbone、neck等。例如,r50-caffe_fpn_gn-head表示在算法中使用caffe风格的ResNet50、FPN和detection head with Group Norm。

{training settings}:训练设置的信息,例如批量大小、增强、损失技巧、调度程序和时期/迭代。例如:4xb4-mixup-giou-coslr-100e意味着使用 8-gpus x 4-images-per-gpu,mixup augmentation,GIoU loss,cosine annealing learning rate,训练 100 个 epochs。一些缩写:

{gpu x batch_per_gpu}:GPU 和每个 GPU 的样本。bN表示每个 GPU 的 N 个批量大小。例如4xb4是 4-GPU x 4-images-per-GPU 的短期术语。如果8xb2未提及,则默认使用。

{schedule}: training schedule,选项有1x、2x、20e等 1x,2x分别表示12个epoch和24个epoch。 20e在级联模型中采用,表示20个epochs。对于1x/ 2x,初始学习率在第 8/16 和 11/22 时期衰减了 10 倍。对于20e,初始学习率在第 16 和第 19 个时期衰减了 10 倍。

{training dataset information}: 训练数据集名称,如coco, coco-panoptic, cityscapes, voc-0712, wider-face.

{testing dataset information}(可选):在一个数据集上训练但在另一个数据集上测试的模型的测试数据集名称。如果未提及,则表示该模型是在相同的数据集类型上训练和测试的

我们遵循下面的风格命名配置文件。建议贡献者遵循相同的风格

{model}_[model setting]_{backbone}_{neck}_[norm setting]_[misc]_[gpu x batch_per_gpu]_{schedule}_{dataset}

{xxx}为必填字段,[yyy]为可选字段。

{model}: model type like faster_rcnn, mask_rcnn, etc.模型类型 ,例如 faster_rcnn, mask_rcnn
[model setting]: specific setting for some model, like without_semantic for htc, moment for reppoints, etc.特定型号的设置
{backbone}: backbone type like r50 (ResNet-50), x101 (ResNeXt-101).
{neck}: neck type like fpn, pafpn, nasfpn, c4.
[norm_setting]: bn (Batch Normalization) is used unless specified, other norm layer type could be gn (Group Normalization), syncbn (Synchronized Batch Normalization). gn-head/gn-neck indicates GN is applied in head/neck only, while gn-all means GN is applied in the entire model, e.g. backbone, neck, head.
[misc]: miscellaneous setting/plugins of model(模型的其他设置/插件), e.g. dconv, gcb, attention, albu, mstrain.
[gpu x batch_per_gpu]: GPUs and samples per GPU, 8x2 is used by default.
{schedule}: training schedule, options are 1x, 2x, 20e, etc. 1x and 2x means 12 epochs and 24 epochs ,3x means 36 epochs respectively. 20e is adopted in cascade models, which denotes 20 epochs. For 1x/2x, initial learning rate decays by a factor of 10 at the 8/16th and 11/22th epochs(初始学习速率在8/16和11/22时衰减10倍). For 20e, initial learning rate decays by a factor of 10 at the 16th and 19th epochs.
{dataset}: dataset like coco, cityscapes, voc_0712, wider_face.

13、 弃用train_cfg / test_cfg

train_cfg和test_cfg在配置文件中已弃用,请在模型配置中指定它们。原始配置结构如下所示。

# deprecated
model = dict(
   type=...,
   ...
)
train_cfg=dict(...)
test_cfg=dict(...)

迁移示例如下所示。

 recommended
model = dict(
   type=...,
   ...
   train_cfg=dict(...),
   test_cfg=dict(...),
)

14、配置文件的载入、查看和更新

1、 载入修改好的配置文件

from mmcv import Config
import albumentations as albu
cfg = Config.fromfile('./configs/dcn/cascade_rcnn_r101_fpn_dconv_c3-c5_20e_coco.py')

2、检查几个重要参数

可以使用以下的命令检查几个重要参数:
cfg.data.train
cfg.total_epochs
cfg.data.samples_per_gpu
cfg.resume_from
cfg.load_from
cfg.data
...

3、改变config中某些参数

from mmdet.apis import set_random_seed
 
# Modify dataset type and path
 
# cfg.dataset_type = 'Xray'
# cfg.data_root = 'Xray'
 
cfg.data.samples_per_gpu = 4
cfg.data.workers_per_gpu = 4
 
# cfg.data.test.type = 'Xray'
cfg.data.test.data_root = '../mmdetection_torch_1.5'
# cfg.data.test.img_prefix = '../mmdetection_torch_1.5'
 
# cfg.data.train.type = 'Xray'
cfg.data.train.data_root = '../mmdetection_torch_1.5'
# cfg.data.train.ann_file = 'instances_train2014.json'
# # cfg.data.train.classes = classes
# cfg.data.train.img_prefix = '../mmdetection_torch_1.5'
 
# cfg.data.val.type = 'Xray'
cfg.data.val.data_root = '../mmdetection_torch_1.5'
# cfg.data.val.ann_file = 'instances_val2014.json'
# # cfg.data.train.classes = classes
# cfg.data.val.img_prefix = '../mmdetection_torch_1.5'
 
# modify neck classes number
# cfg.model.neck.num_outs
# modify num classes of the model in box head
# for i in range(len(cfg.model.roi_head.bbox_head)):
#     cfg.model.roi_head.bbox_head[i].num_classes = 10
 
 
# cfg.data.train.pipeline[2].img_scale = (1333,800)
 
cfg.load_from = '../mmdetection_torch_1.5/coco_exps/latest.pth'
# cfg.resume_from = './coco_exps_v3/latest.pth'
 
# Set up working dir to save files and logs.
cfg.work_dir = './coco_exps_v4'
 
# The original learning rate (LR) is set for 8-GPU training.
# We divide it by 8 since we only use one GPU.
cfg.optimizer.lr = 0.02 / 8
# cfg.lr_config.warmup = None
# cfg.lr_config = dict(
#     policy='step',
#     warmup='linear',
#     warmup_iters=500,
#     warmup_ratio=0.001,
#     # [7] yields higher performance than [6]
#     step=[7])
# cfg.lr_config = dict(
#     policy='step',
#     warmup='linear',
#     warmup_iters=500,
#     warmup_ratio=0.001,
#     step=[36,39])
cfg.log_config.interval = 10
 
# # Change the evaluation metric since we use customized dataset.
# cfg.evaluation.metric = 'mAP'
# # We can set the evaluation interval to reduce the evaluation times
# cfg.evaluation.interval = 12
# # We can set the checkpoint saving interval to reduce the storage cost
# cfg.checkpoint_config.interval = 12
 
# # Set seed thus the results are more reproducible
cfg.seed = 0
set_random_seed(0, deterministic=False)
cfg.gpu_ids = range(1)
# cfg.total_epochs = 40
 
# # We can initialize the logger for training and have a look
# # at the final config used for training
print(f'Config:\n{cfg.pretty_text}')
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

MMdetection系列之Config配置文件(V3更新后) 的相关文章

随机推荐

  • PCRE正则表达式

    PCRE正则表达式的定义 用于描述字符排列和匹配模式的一种语法规则 它主要用于字符串的模式分割 匹配 查找及替换操作 PHP中的正则函数 PHP中有两套正则函数 两者功能差不多 分别为 一套是由PCRE Perl Compatible Re
  • 黑盒测试中的等价类和边界值

    黑盒测试时基于程序规格说明书 找出程序不符合格则说明书的地方 也就是我们常说的点点点 在进行黑盒测试时 我们可以利用一下几种方法来写出测试用例 尽量用科学的方法来找出更多的缺陷 等价类划分 等价类划分是一种重要的 常用的黑盒测试方法 不需要
  • js打开新的窗口window_前端必备基础知识:window.location 详解

    作者简介 李中凯 八年多工作经验 前端负责人 擅长JavaScript Vue 掘金文章专栏 KaysonLi 的个人主页 专栏 掘金 前端开发人员对 window location对象应该不陌生 通过它不但可以获取当前页面的地址信息 还可
  • android mtk6732 camera otp 加载流程

    在android的hal层获取属性节点信息值 Get Property char value PROPERTY VALUE MAX 0 property get camcaldrv log value 0 MINT32 dumpEnable
  • Java垃圾回收【GC】详解

    什么是垃圾 垃圾是内存中没有任何引用指向它 并且没有引用的对象无法使用 所以就可以认为这个对象是垃圾 需要被回收释放其占用的内存空间 垃圾定位 1 引用计数法 RefrenceCount 堆中每个对象实例都有一个引用计数 每当有一个引用指向
  • 六轴机器人轨迹规划(直线轨迹规划,弧线轨迹规划)——C#实现+ABB为例(规划直接下发离线程序运动)

    机器人直线插补算法 弧线插补算法 离线编程转换 空间直线插补规划 空间弧线插补规划 离线编程 ABB二次开发 六轴机器人控制 C 一 通过对空间点的插补 形成空间点轨迹 1 插补算法原理简述 1 直线插补时 指定起止坐标与速度 2 确定要插
  • 安装多版本java

    系统变量 JAVA HOME JAVA8 HOME JAVA8 HOME C java jdk8 JAVA11 HOME C java jdk11 JAVA17 HOME C java jdk17 Path 新增 移到最上面 重新打开Pat
  • 像素和毫米怎么换算

    像素和毫米怎么换算 像素和毫米是不能直接转换的 只有在分辩率 dpi 下才能进行转换 因此 像素与毫米的转换 需要知道参数 DPI 每英寸多少点 象素数 DPI 英寸数 英寸数 25 4 毫米数 对于显示设备 不管是打印机还是屏幕 都有一种
  • Linux中如何使用命令修改文件所属用户组?

    先来了解一下文件属性 在shell环境里输入 ls l 可以查看当前目录文件 如 drwxr xr x 2 nsf users 1024 12 10 17 37 下载文件备份 分别对应的是 文件属性 连接数 文件拥有者 所属群组 文件大小
  • Docker部署Nginx(window)

    安装Nginx docker pull nginx 运行Nginx docker run p 80 80 name nginx d nginx 进入Nginx docker exec it nginx bash 配置Nginx cd etc
  • 手机有什么副业做?用手机能做啥副业?

    随着手机移动互联网的越来越流行 用户越来越多 手机赚钱也成为现在的一大优势 很多企业开始开发手机赚钱的软件和应用 当你还用手机娱乐的时候 别人已经用手机赚钱了 你知道手机有什么副业可以做吗 1 苹果APP试玩 就是用苹果手机下载一个试玩AP
  • libcore.io.ErrnoException: open failed: EINVAL (Invalid argument)

    出现异常 04 16 17 58 52 714 W System err 23703 Caused by libcore io ErrnoException open failed EINVAL Invalid argument 04 16
  • 什么是游戏测试?

    在很多外行甚至行内非测试岗位人的眼里 游戏测试是很没有技术含量的活儿 门槛低 不过就是不断玩游戏 反馈问题 纯粹是体力活 如果真是这样 那我们到网吧随便拉几个玩家都可以做 早期的一些游戏测试确实是这样做的 但如今这样做已经不可能保证游戏的品
  • 算法应该怎样学习

    算法背景 了解该算法的背景知识 该算法要解决的问题 算法流程 了解该算法的流程 包括输入和输出 算法核心部分的具体实现 时间复杂度 了解该算法的时间复杂度 说明算法的执行效率 空间复杂度 了解该算法的空间复杂度 说明算法所需的存储空间 优点
  • MySQL提权篇

    一 Mysql提权必备条件 1 服务器安装Mysql数据库 利用Mysql提权的前提就是服务器安装了mysql数据库 且mysql的服务没有降权 Mysql数据库默认安装是以系统权限继承的 并且需要获取Mysql root账号密码 2 判断
  • 14:00面试,14:06就出来了,问的问题有点变态。。。

    从小厂出来 没想到在另一家公司又寄了 到这家公司开始上班 加班是每天必不可少的 看在钱给的比较多的份上 就不太计较了 没想到5月一纸通知 所有人不准加班 加班费不仅没有了 薪资还要降40 这下搞的饭都吃不起了 还在有个朋友内推我去了一家互联
  • 【生成式网络】入门篇(二):GAN的 代码和结果记录

    GAN非常经典 我就不介绍具体原理了 直接上代码 感兴趣的可以阅读 里面有更多变体 https github com rasbt deeplearning models tree master pytorch ipynb gan GAN 在
  • Kafka消费者Relance机制和分区机制

    kafka消费者Relance rebalance就是说如果消费组里的消费者数量有变化或消费的分区数有变化 kafka会重新分配消费者消费分区的关系 比如consumer group中某个消费者挂了 此时会自动把分配给他的分区交给其他的消费
  • MySQL如何删除#sql开头的临时表

    1 现象 巡检时发现服务器磁盘空间不足 通过查看大文件进行筛选是发现有几个 sql开头的文件 且存在超过100G及10G以上的文件 2 原因 如果MySQL在一个 ALTER TABLE操作 ALGORITHM INPLACE 的中间退出
  • MMdetection系列之Config配置文件(V3更新后)

    1 了解配置 MMDetection 和其他 OpenMMLab 存储库使用MMEngine 的配置系统 模块化 继承性设计 便于进行各种实验 2 配置文件的查看 在配置系统中采用模块化和继承性设计 便于进行各种实验 如果你想查看配置文件