MaskRCNN入门路径–> Mask-RCNN应用研究方法 - 持续更新中
如有问题或需要指导,请私聊留下联系方式用手机打开https://m.tb.cn/h.fINaraE?tk=PCzA2jPp4V0进行咨询
本文介绍标注数据和训练过程并提供代码参考
优化标注与训练过程中的步骤,简化操作,方便初学者学习
Complete:2020/08/14 - 文章内容完成
Update : 2020/08/24 - 补充并修正文章内容
前言
上一篇文章已经介绍了Mask-RCNN在win10上的安装步骤,本文是在上一篇文章的基础上通过自己的图片进行标注并训练生成数据集
图片标注并生成训练所需文件
1、准备标注工具lableme
- 在anaconda中下载labelme软件,在此过程中如安装速度过慢可以根据上一篇文章的做法换成豆瓣或阿里源能够加快安装速度
(tensorflow_gpu) C:\User$ pip install labelme
- 安装完后打开labelme软件并打开所需标注图片所在的文件夹
(tensorflow_gpu) C:\User$ labelme
2、根据所需的class标注图片
- 如图中所示,class分为box与bottle,标注完成后会在文件夹中生成对应的json文件
此处需要注意:
标注的同一个class的不通物体,需要通过后缀区分,如box1,box2
由于labelme标注的区域为中间实体区域,标注图片中的每一个物体的region不要出现重叠或包含关系,否则标注的区域只能属于大的region,会出现错误
3、【可以批量转换】转换为训练文件
- 标注完成所有图片后,需要将json文件转换为训练所需的文件,转换后会生成一个与"xxx_json"文件夹,其中包含所需训练的info.yaml和label.png文件
此处需要注意:
新版的lablme不能直接生成info.yaml文件,需要修改lableme的代码,以支持此功能,或者重新安装老版本的,例如-labelme==3.20.0
## 转换单张图片的json文件
(tensorflow_gpu) C:\User$ labelme_json_to_dataset 'xxx\Image0.json'
## windows下批量转换所有的json文件
(tensorflow_gpu) C:\User> $json = Get-ChildItem -Filter *.json
(tensorflow_gpu) C:\User> $json | ForEach-Object { labelme_json_to_dataset $_.Name}
## linux下批量处理方法
https://www.cnblogs.com/happyamyhope/p/14974116.html
训练自己的数据集
1、将数据集分成train和val两个部分
此处输入的文件夹即为标注处理后的文件夹,由于博主采集的图片除了RGB图片外还有3D的XYZ数据,为预留下一步使用RGB-D进行训练做准备,因此在自动划分程序中考虑了TIF格式的XYZ数据。如果仅使用RGB图,此处可进行修改
- 以下提供自动划分程序,可以按照想要的比例进行训练集和验证集的划分(train_rate),一般推荐train_rate=0.33
- 划分结束后会在指定的’dir of result images’文件夹中生成./train和./val两个文件夹,每个文件夹中分别由./pic和./mask分别存放图片和标注数据
#======================================================================
#
# Copyright (C) 2020
# All rights reserved
#
# description :
#
# created by 天木青(https://blog.csdn.net/qq_15615505) at 08/27/2020 18:41:28
#
#======================================================================
import numpy as np
import os
import sys
import shutil
import time
import random
import tensorflow as tf
tf.app.flags.DEFINE_string('src_dir', os.path.abspath("C:/Tensorflow/Mask_RCNN/Images"), 'dir of source images')
tf.app.flags.DEFINE_string('res_dir', os.path.abspath("C:/Tensorflow/Mask_RCNN/dataset"), 'dir of result images')
tf.app.flags.DEFINE_float('train_rate', 3/4, 'random rate to aggrange images')
FLAGS = tf.app.flags.FLAGS
srcimg_floder_val = FLAGS.src_dir
resimg_floder_val = FLAGS.res_dir
if not os.path.isdir(resimg_floder_val):
os.makedirs(resimg_floder_val)
train_dataset_dir = os.path.join(resimg_floder_val,"train")
train_dataset_pic_dir = os.path.join(train_dataset_dir,"pic")
train_dataset_mask_dir = os.path.join(train_dataset_dir,"mask")
if not os.path.isdir(train_dataset_dir):
os.makedirs(train_dataset_dir)
os.makedirs(train_dataset_pic_dir)
os.makedirs(train_dataset_mask_dir)
else:
if not os.path.isdir(train_dataset_pic_dir):
os.makedirs(train_dataset_pic_dir)
if not os.path.isdir(train_dataset_mask_dir):
os.makedirs(train_dataset_mask_dir)
val_dataset_dir = os.path.join(resimg_floder_val,"val")
val_dataset_pic_dir = os.path.join(val_dataset_dir,"pic")
val_dataset_mask_dir = os.path.join(val_dataset_dir,"mask")
if not os.path.isdir(val_dataset_dir):
os.makedirs(val_dataset_dir)
os.makedirs(val_dataset_pic_dir)
os.makedirs(val_dataset_mask_dir)
else:
if not os.path.isdir(val_dataset_pic_dir):
os.makedirs(val_dataset_pic_dir)
if not os.path.isdir(val_dataset_mask_dir):
os.makedirs(val_dataset_mask_dir)
def print_message(message):
ct = time.time()
local_time = time.localtime(ct)
data_head = time.strftime("%Y-%m-%d %H:%M:%S", local_time)
data_secs = (ct - int(ct)) * 1000
time_stamp = "%s.%03d" % (data_head, data_secs)
print(time_stamp, ":", message)
rgb_images = []
for file in os.listdir(srcimg_floder_val):
if file.endswith('png'):
rgb_images.append(file)
train_rgb_images = random.sample(rgb_images,int(len(rgb_images)*FLAGS.train_rate))
val_rgb_images = []
for image in rgb_images:
if not (image in train_rgb_images):
val_rgb_images.append(image)
for image in train_rgb_images:
file_num = 0
shutil.copyfile(os.path.join(srcimg_floder_val,image),os.path.join(train_dataset_pic_dir,image))
print_message("Copy " + image + " success!")
file_num = file_num + 1
name = image[:len(image)-4]
for file in os.listdir(srcimg_floder_val):
if name in file:
if file.endswith('tif') or file.endswith('tiff'):
#train_xyz = os.path.join(srcimg_floder_val,file)
if name == file[:len(file)-4]:
shutil.copyfile(os.path.join(srcimg_floder_val, file), os.path.join(train_dataset_pic_dir, file))
print_message("Copy " + file + " success!")
file_num = file_num + 1
elif (not file.endswith('png')) and (not file.endswith('.json')):
if name == file[:len(file)-5]:
mask_dir = os.path.join(train_dataset_mask_dir,file)
if not os.path.isdir(mask_dir):
os.makedirs(mask_dir)
train_maskfile = os.path.join(srcimg_floder_val,file)
for val in os.listdir(train_maskfile):
shutil.copyfile(os.path.join(train_maskfile, val),os.path.join(mask_dir, val))
print_message("Copy " + val + " success!")
file_num = file_num + 1
if file_num == 3:
break
for image in val_rgb_images:
file_num = 0
shutil.copyfile(os.path.join(srcimg_floder_val,image),os.path.join(val_dataset_pic_dir,image))
print_message("Copy " + image + " success!")
name = image[:len(image)-4]
for file in os.listdir(srcimg_floder_val):
if name in file:
if file.endswith('tif') or file.endswith('tiff'):
#train_xyz = os.path.join(srcimg_floder_val,file)
if name == file[:len(file) - 4]:
shutil.copyfile(os.path.join(srcimg_floder_val, file), os.path.join(val_dataset_pic_dir, file))
print_message("Copy " + file + " success!")
file_num = file_num + 1
elif (not file.endswith('png')) and (not file.endswith('.json')):
if name == file[:len(file) - 5]:
mask_dir = os.path.join(val_dataset_mask_dir,file)
if not os.path.isdir(mask_dir):
os.makedirs(mask_dir)
train_maskfile = os.path.join(srcimg_floder_val,file)
for val in os.listdir(train_maskfile):
shutil.copyfile(os.path.join(train_maskfile, val),os.path.join(mask_dir, val))
print_message("Copy " + val + " success!")
file_num = file_num + 1
if file_num == 3:
break
2、写训练程序
脚本中有些部分需要根据自己的情况修改:
【class BoxConfig(Config)】–> 修改见脚本注释
【load_shapes】–> 修改见脚本注释
此处需要注意的部分:
import os
import sys
import random
import math
import re
import time
try:
import numpy as np
except ImportError:
os.system('python -m pip install --upgrade pip & pip install numpy')
import numpy as np
try:
import cv2
except ImportError:
os.system('pip install opencv-python')
import cv2
try:
import matplotlib
except ImportError:
os.system('pip install matplotlib')
import matplotlib
import matplotlib.pyplot as plt
import tensorflow as tf
import yaml
try:
from PIL import Image
except ImportError:
os.system('pip install Pillow')
from PIL import Image
tf.app.flags.DEFINE_integer('epoch_num', 10, 'epoch number')
FLAGS = tf.app.flags.FLAGS
# Root directory of the project
ROOT_DIR = os.path.abspath("../")
# Import Mask RCNN
sys.path.append(ROOT_DIR) # To find local version of the library
from mrcnn.config import Config
try:
from mrcnn import utils
except ImportError:
os.system('pip install scikit-image')
from mrcnn import utils
import mrcnn.model as modellib
try:
from mrcnn import visualize
except ImportError:
os.system('pip install IPython')
from mrcnn import visualize
from mrcnn.model import log
#get_ipython().magic('matplotlib inline')
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "model/mask_rcnn_coco.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
utils.download_trained_weights(COCO_MODEL_PATH)
# ## Configurations
# In[27]:
# 此处修改参考博主的另一篇文章《Mask-RCNN学习 - 结合源码详细解析Config.py》
class BoxConfig(Config):
"""Configuration for training on the toy shapes dataset.
Derives from the base Config class and overrides values specific
to the toy shapes dataset.
"""
# Give the configuration a recognizable name
## 修改成自己所需要的模型名称
NAME = "model"
# Train on 1 GPU and 2 images per GPU. We can put multiple images on each
# GPU because the images are small. Batch size is 2 (GPUs * images/GPU).
GPU_COUNT = 1
IMAGES_PER_GPU = 2
# Number of classes (including background)
NUM_CLASSES = 1 + 2 # background + 3 shapes
# Use small images for faster training. Set the limits of the small side
# the large side, and that determines the image shape.
IMAGE_MIN_DIM = 128*8
IMAGE_MAX_DIM = 128*8
# Use smaller anchors because our image and objects are small
RPN_ANCHOR_SCALES = (8*8, 16*6, 32*8, 64*8, 128*8) # anchor side in pixels
# Reduce training ROIs per image because the images are small and have
# few objects. Aim to allow ROI sampling to pick 33% positive ROIs.
TRAIN_ROIS_PER_IMAGE = 32
# Use a small epoch since the data is simple
STEPS_PER_EPOCH = 100
# use small validation steps since the epoch is small
VALIDATION_STEPS = 5
config = BoxConfig()
config.display()
def get_ax(rows=1, cols=1, size=8):
"""Return a Matplotlib Axes array to be used in
all visualizations in the notebook. Provide a
central point to control graph sizes.
Change the default size attribute to control the size
of rendered images
"""
_, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows))
return ax
# Dataset
class BoxDataset(utils.Dataset):
#得到该图中有多少个实例,指的是一张图中标注了多少个mask,由于生成mask图时,第一个mask的像素=1,第二个像素=2,所以此处可以直接获取图片(np数组)中像素最大值,即是mask的数量
def get_obj_index(self, image):
n = np.max(image)
return n
# 解析labelme中得到的yaml文件,从而得到每个mask对应的实例标签
# 根据给出的image_id,读取对应的yaml.info文件,其读取内容是字典
# 取键label_names的值(形式为一个list),去掉labels[0](也就是背景类)
# 返回值labels就是一个所有类别名称的list,此处返回值是labels=['hook']
def from_yaml_get_class(self,image_id):
info=self.image_info[image_id]
with open(info['yaml_path']) as f:
temp=yaml.load(f.read())
labels=temp['label_names']
#labels = list(labels.keys())
del labels[0]
return labels
# 重新写drcv_img = cv2.imread(dataset_root_path + "labelme_json/" + filestr + "_json/img.png")aw_mask, 根据给出的image_id画出对应的mask
# 判断image中某点的像素值x,如果x=index+1,说明该像素是第index个目标的mask(像素值=0是背景)
# 然后将mask[j,i,index]赋值=1,也就是在第index个图上画该像素对应的点
# 返回值mask是在每个通道index上,已经画好了mask的一系列图片(三维数组)
def draw_mask(self, num_obj, mask, image,image_id):
info = self.image_info[image_id]
for index in range(num_obj):
for i in range(info['width']):
for j in range(info['height']):
at_pixel = image.getpixel((i, j))
if at_pixel == index + 1:
mask[j, i, index] =1
return mask
#重新写load_shapes,即load_images函数,里面包含自己的自己的类别
#并在self.image_info信息中添加了path、mask_path 、yaml_path
def load_shapes(self, count, height, width, img_floder, mask_floder, imglist,dataset_root_path):
# 添加类别,参数分别是(source,class_id,class_name)
# 此处根据自己定义的class进行修改
self.add_class("shapes", 1, "box")
self.add_class("shapes", 2, "bottle")
# 根据图片数量count遍历图片,具体操作:
# 取第i张图片前缀,根据该前缀获取对应的mask图片路径
# 根据该图片获取对应的info.yaml文件路径
# 将图片的各种信息使用add_image添加到info中
for i in range(count):
filestr = imglist[i].split(".")[0]
mask_path = mask_floder + "/" + filestr +"_json/label.png"
yaml_path=mask_floder + "/" + filestr+"_json/info.yaml"
if not os.path.exists(mask_path) or not os.path.exists(yaml_path):
continue
cv_img = cv2.imread(img_floder + "/" + filestr + ".png")
if cv_img is None:
cv_img = cv2.imread(img_floder + "/" + filestr + ".jpg")
self.add_image("shapes", image_id=i, path=img_floder + "/" + imglist[i],
width=cv_img.shape[1], height=cv_img.shape[0], mask_path=mask_path,yaml_path=yaml_path)
#重写load_mask,根据给定的image_id载入mask信息
def load_mask(self, image_id):
info = self.image_info[image_id] # 根据id从info中获取图片信息
print(info)
count = 1 # 检测目标共有1类
img = Image.open(info['mask_path']) # 根据mask路径打开图片的mask文件
num_obj = self.get_obj_index(img) # 由于mask的规则:第i个目标的mask像素值=i,所以通过像素值最大值,可以知道有多少个目标
mask = np.zeros([info['height'], info['width'], num_obj], dtype=np.uint8) # 根据h,w和num创建三维数组(多张mask)
mask = self.draw_mask(num_obj, mask, img,image_id) # 调用draw_mask画出mask
occlusion = np.logical_not(mask[:, :, -1]).astype(np.uint8)
for i in range(count - 2, -1, -1):
mask[:, :, i] = mask[:, :, i] * occlusion
occlusion = np.logical_and(occlusion, np.logical_not(mask[:, :, i]))
# 获取obj_class的列表,此处labels=['hook']
labels=[]
labels=self.from_yaml_get_class(image_id)
labels_form=[]
for i in range(len(labels)):
if labels[i].find("box")!=-1: #如果labels[i]box
labels_form.append("box") # 添加到 label中
if labels[i].find("bottle")!=-1: #如果labels[i]bottle
labels_form.append("bottle") # 添加到 label中
# 生成class_id,其实际上使用class_names中映射过来的
# 从class_names中找到hook对应的index,然后添加到class_ids中
class_ids = np.array([self.class_names.index(s) for s in labels_form])
# 返回mask以及映射的类别id
return mask, class_ids.astype(np.int32)
# In[36]:
## 加入训练集
dataset_root_path="C:/Tensorflow/Mask_RCNN/dataset/train"
img_floder = dataset_root_path+"pic"
mask_floder = dataset_root_path+"mask"
imglist = os.listdir(img_floder) #获取图片文件名list
count = len(imglist)
dataset_train = BoxDataset()
dataset_train.load_shapes(count, 1000, 1000, img_floder, mask_floder, imglist,dataset_root_path)
dataset_train.prepare()
# 加入验证集
dataset_root_path_val="C:/Tensorflow/Mask_RCNN/dataset/Val/"
img_floder_val = dataset_root_path_val+"pic"
mask_floder_val = dataset_root_path_val+"mask"
imglist_val = os.listdir(img_floder_val)
count_val = len(imglist_val)
dataset_val = BoxDataset()
dataset_val.load_shapes(count_val, 1000, 1000, img_floder_val, mask_floder_val, imglist_val,dataset_root_path_val)
dataset_val.prepare()
# In[37]:
# image_ids = np.random.choice(dataset_train.image_ids, 4)
# print(image_ids)
# for image_id_example in image_ids:
# image = dataset_train.load_image(image_id_example)
# mask, class_ids = dataset_train.load_mask(image_id_example)
# visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)
# ## Create Model
# In[38]:
model = modellib.MaskRCNN(mode="training", config=config,
model_dir=MODEL_DIR)
# In[39]:
init_with = "coco" # imagenet, coco, or last
if init_with == "imagenet":
model.load_weights(model.get_imagenet_weights(), by_name=True)
elif init_with == "coco":
# 载入在MS COCO上的预训练权重,但是跳过不一致的层,因为类别数不一致,详见README
model.load_weights(COCO_MODEL_PATH, by_name=True,
exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",
"mrcnn_bbox", "mrcnn_mask"])
elif init_with == "last":
# Load the last model you trained and continue training
model.load_weights(model.find_last(), by_name=True)
# ## Training
#
# Train in two stages:
# 1. Only the heads. Here we're freezing all the backbone layers and training only the randomly initialized layers (i.e. the ones that we didn't use pre-trained weights from MS COCO). To train only the head layers, pass `layers='heads'` to the `train()` function.
#
# 2. Fine-tune all layers. For this simple example it's not necessary, but we're including it to show the process. Simply pass `layers="all` to train all layers.
# In[40]:
# Train the head branches
# Passing layers="heads" freezes all layers except the head
# layers. You can also pass a regular expression to select
# which layers to train by name pattern.
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=FLAGS.epoch_num,
layers='heads')
# In[41]:
# Fine tune all layers
# Passing layers="all" trains all layers. You can also
# pass a regular expression to select which layers to
# train by name pattern.
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE / 10,
epochs=2,
layers="all")
# In[45]:
# Save weights
# Typically not needed because callbacks save after every epoch
# Uncomment to save manually
# model_path = os.path.join(MODEL_DIR, "mask_rcnn_box.h5")
# model.keras_model.save_weights(model_path)
# ## Detection
# In[48]:
class InferenceConfig(BoxConfig):
GPU_COUNT = 1
IMAGES_PER_GPU = 1
inference_config = InferenceConfig()
# Recreate the model in inference mode
model = modellib.MaskRCNN(mode="inference",
config=inference_config,
model_dir=MODEL_DIR)
# Get path to saved weights
# Either set a specific path or find last trained weights
# model_path = os.path.join(ROOT_DIR, ".h5 file name here")
model_path = model.find_last()
# Load trained weights
print("Loading weights from ", model_path)
model.load_weights(model_path, by_name=True)
# In[63]:
# Test on a random image
image_id = random.choice(dataset_val.image_ids)
original_image, image_meta, gt_class_id, gt_bbox, gt_mask = modellib.load_image_gt(dataset_val, inference_config,
image_id, use_mini_mask=False)
log("original_image", original_image)
log("image_meta", image_meta)
log("gt_class_id", gt_class_id)
log("gt_bbox", gt_bbox)
log("gt_mask", gt_mask)
visualize.display_instances(original_image, gt_bbox, gt_mask, gt_class_id,
dataset_train.class_names, figsize=(8, 8))
# In[69]:
results = model.detect([original_image], verbose=1)
r = results[0]
visualize.display_instances(original_image, r['rois'], r['masks'], r['class_ids'],
dataset_val.class_names, r['scores'], ax=get_ax())
# ## Evaluation
# In[15]:
# Compute VOC-Style mAP @ IoU=0.5
# Running on 10 images. Increase for better accuracy.
image_ids = np.random.choice(dataset_val.image_ids, 10)
APs = []
for image_id in image_ids:
# Load image and ground truth data
image, image_meta, gt_class_id, gt_bbox, gt_mask = modellib.load_image_gt(dataset_val, inference_config,
image_id, use_mini_mask=False)
molded_images = np.expand_dims(modellib.mold_image(image, inference_config), 0)
# Run object detection
results = model.detect([image], verbose=0)
r = results[0]
# Compute AP
AP, precisions, recalls, overlaps = utils.compute_ap(gt_bbox, gt_class_id, gt_mask,
r["rois"], r["class_ids"], r["scores"], r['masks'])
APs.append(AP)
print("mAP: ", np.mean(APs))
3、通过Tensorboard查看训练过程
- 当训练开始后,可以打开tensorboard找到保存模型的文件夹进行查看(本文中脚本指定保存的文件夹为logs文件夹),当运行tensorboard后一般提供的地址为“http://localhost:6006/”
(tensorflow_gpu) C:\Tensorflow\Mask_RCNN\logs> tensorboard --logdir ./