预先知识
${CONFIG_FILE}:config/里面的文件
config/faster_rcnn_r50_fpn_1x_coco.py
${CHECKPOINT_FILE}:模型权重所在位置
checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
[–out ${RESULT_FILE}]:测试生成的文件输出位置
[–eval ${EVAL_METRICS}]:选用的测试方法
${GPU_NUM}:GPU数量
测试数据集
# single-gpu
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show]
# multi-gpu testing
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
模型训练
单机单GPU训练
python tools/train.py ${CONFIG_FILE}
举例:
python tools/train.py ./configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
如需指定工作目录,后接参数:--work_dir${WORK_DIR}
单机多GPU训练
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
举例:
./tools/dist_train.sh ./configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py 4
可选参数:
--validate
:训练过程中,每隔k代执行一次评估(默认为1)
--work_dir ${WOR_DIR}
:指定工作目录
--resume_from ${CHECKPOINT_FILE}
:从先前的检查点文件恢复
多机多GPU训练
使用slurm集群管理:
./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} [${GPUS}]
举例:16GPU,test分区,训练faster R-CNN
./tools/slurm_train.sh test Faster_r50_1x configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py /home/xxx/faster_rcnn_r50_fpn_1x 16
Reference
MMDetection中文文档——2.入门