BEVDet
BEVDet继承于CenterPoint–>MVTwoStageDetector 模型实现基于MMlab MMdet3D框架 该算法基于Centeroint点云检测,通过多视角图像估计深度,形成层视锥形点云,进而生成BEV视角下的pillar点云主体,完成点云检测。
模型
bevdet-r50
模块
type
模块
type
img_backbone
'ResNet'
img_neck
CustomFPN
[1024,2048]->512
img_view_transformer
LSSViewTransformer
512->80
img_bev_encoder_backbone
CustomResNet
80->[80x2,80x4,80x8
img_bev_encoder_neck
FPN_LSS
80x8+80*2->256
pts_bbox_head
CenterHead
256->
bbox_coder
CenterPointBBoxCoder
separate_head
SeparateHead
loss_cls
GaussianFocalLoss
loss_bbox
L1Loss
model = dict (
type = 'BEVDet' ,
img_backbone= dict (
pretrained= 'torchvision://resnet50' ,
type = 'ResNet' ,
depth= 50 ,
num_stages= 4 , # 该网络共有4个阶段
out_indices= ( 2 , 3 ) , ## 要网络的第2个和第3个阶段的特征图作为输出
frozen_stages= - 1 , # 将所有层的权重都冻结,只训练最后一层或几层的权重
norm_cfg= dict ( type = 'BN' , requires_grad= True ) ,
norm_eval= False , # 当norm_eval=False时,归一化层将处于训练模式,它将使用当前的batch的均值和方差来归一化输入数据。当norm_eval=True时,归一化层将处于评估模式,它将使用先前存储的移动平均均值和方差来归一化输入数据。
with_cp= True , # 特征金字塔网络在进行特征融合时会使用copy操作
style= 'pytorch' ) ,
img_neck= dict (
type = 'CustomFPN' ,
in_channels= [ 1024 , 2048 ] ,
out_channels= 512 ,
num_outs= 1 ,
start_level= 0 , # 从网络的第0层开始进行特征提取
out_ids= [ 0 ] ) , # 特征金字塔网络中的第0个特征图
img_view_transformer= dict (
type = 'LSSViewTransformer' ,
grid_config= grid_config,
input_size= data_config[ 'input_size' ] ,
in_channels= 512 ,
out_channels= numC_Trans,
downsample= 16 ) ,
img_bev_encoder_backbone= dict (
type = 'CustomResNet' ,
numC_input= numC_Trans,
num_channels= [ numC_Trans * 2 , numC_Trans * 4 , numC_Trans * 8 ] ) ,
img_bev_encoder_neck= dict (
type = 'FPN_LSS' ,
in_channels= numC_Trans * 8 + numC_Trans * 2 ,
out_channels= 256 ) ,
pts_bbox_head= dict (
type = 'CenterHead' , # BEVDet继承Centerpoints
in_channels= 256 ,
tasks= [
dict ( num_class= 1 , class_names= [ 'car' ] ) ,
dict ( num_class= 2 , class_names= [ 'truck' , 'construction_vehicle' ] ) ,
dict ( num_class= 2 , class_names= [ 'bus' , 'trailer' ] ) ,
dict ( num_class= 1 , class_names= [ 'barrier' ] ) ,
dict ( num_class= 2 , class_names= [ 'motorcycle' , 'bicycle' ] ) ,
dict ( num_class= 2 , class_names= [ 'pedestrian' , 'traffic_cone' ] ) ,
] ,
common_heads= dict (
reg= ( 2 , 2 ) , height= ( 1 , 2 ) , dim= ( 3 , 2 ) , rot= ( 2 , 2 ) , vel= ( 2 , 2 ) ) ,
share_conv_channel= 64 ,
bbox_coder= dict (
type = 'CenterPointBBoxCoder' ,
pc_range= point_cloud_range[ : 2 ] ,
post_center_range= [ - 61.2 , - 61.2 , - 10.0 , 61.2 , 61.2 , 10.0 ] ,
max_num= 500 ,
score_threshold= 0.1 ,
out_size_factor= 8 ,
voxel_size= voxel_size[ : 2 ] ,
code_size= 9 ) ,
separate_head= dict (
type = 'SeparateHead' , init_bias= - 2.19 , final_kernel= 3 ) ,
loss_cls= dict ( type = 'GaussianFocalLoss' , reduction= 'mean' ) ,
loss_bbox= dict ( type = 'L1Loss' , reduction= 'mean' , loss_weight= 0.25 ) ,
norm_bbox= True ) ,
# model training and testing settings
train_cfg= dict (
pts= dict (
point_cloud_range= point_cloud_range,
grid_size= [ 1024 , 1024 , 40 ] ,
voxel_size= voxel_size,
out_size_factor= 8 ,
dense_reg= 1 ,
gaussian_overlap= 0.1 ,
max_objs= 500 ,
min_radius= 2 ,
code_weights= [ 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 0.2 , 0.2 ] ) ) ,
test_cfg= dict (
pts= dict (
pc_range= point_cloud_range[ : 2 ] ,
post_center_limit_range= [ - 61.2 , - 61.2 , - 10.0 , 61.2 , 61.2 , 10.0 ] ,
max_per_img= 500 ,
max_pool_nms= False ,
min_radius= [ 4 , 12 , 10 , 1 , 0.85 , 0.175 ] ,
score_threshold= 0.1 ,
out_size_factor= 8 ,
voxel_size= voxel_size[ : 2 ] ,
pre_max_size= 1000 ,
post_max_size= 83 ,
# Scale-NMS
nms_type= [
'rotate' , 'rotate' , 'rotate' , 'circle' , 'rotate' , 'rotate'
] ,
nms_thr= [ 0.2 , 0.2 , 0.2 , 0.2 , 0.2 , 0.5 ] ,
nms_rescale_factor= [
1.0 , [ 0.7 , 0.7 ] , [ 0.4 , 0.55 ] , 1.1 , [ 1.0 , 1.0 ] , [ 4.5 , 9.0 ]
] ) ) )
训练配置
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
train_pipeline
test_pipeline
PrepareImageInputs
PrepareImageInputs
LoadAnnotationsBEVDepth
LoadAnnotationsBEVDepth
ObjectRangeFilter
LoadPointsFromFile
ObjectNameFilter
MultiScaleFlipAug3D
DefaultFormatBundle3D
(DefaultFormatBundle3D
Collect3D
Collect3D)
Scale NMS
# Scale-NMS
nms_type= [
'rotate' , 'rotate' , 'rotate' , 'circle' , 'rotate' , 'rotate'
] ,
nms_thr= [ 0.2 , 0.2 , 0.2 , 0.2 , 0.2 , 0.5 ] ,
nms_rescale_factor= [
1.0 , [ 0.7 , 0.7 ] , [ 0.4 , 0.55 ] , 1.1 , [ 1.0 , 1.0 ] , [ 4.5 , 9.0 ]
]
优化配置
optimizer
lr
lr_config
AdamW
2e-4
policy=step
推理记录
模块
子模块
子模块
x_size块
mean
extract_img_feat
image_encoder
img_backbone `ResNet`
([1, 1024, 16, 44]) ([1, 2048, 8, 22])
2,3特征图
img_neck `CustomFPN`
([1, 512, 16, 44])
融合后特征
img_view_transformer
([1, 59, 16, 44])
depth
bev_encoder
`CustomResNet` `FPN_LSS`
[1, 256, 128, 128]
BEV特征
pts_bbox_head
CenterHead
`SeparateHead`
Loss
多任务检测
注册
注册机制通过cfg中关键字type
对已经注册类进行对应实现 。
数据处理
训练流程
seed
cfg.data.train
cfg.data.test
type
type
加载模型
cfg.model
build_model
register
registry
pipeline
build_dataset
结束
训练
model
数据处理
加载数据
开始
obj_type = args. pop( 'type' )
if isinstance ( obj_type, str ) :
obj_cls = registry. get( obj_type)
if obj_cls is None :
raise KeyError(
f' { obj_type} is not in the { registry. name} registry' )
elif inspect. isclass( obj_type) or inspect. isfunction( obj_type) :
obj_cls = obj_type
else :
raise TypeError(
f'type must be a str or valid type, but got { type ( obj_type) } ' )
try :
return obj_cls( ** args)
注意:利用deepcopy
实现参数的传递和隔离
随机种子
在相同中下,随机数相同,即此通过函数实现的随机数为伪随机数 。类似为一元函数关系,相同输入产生同一个随机值。特别的是在产生随机数后将会产生新的随机种子,所以在重复使用随机函数时会输出不同的随机值,因为第二次的随机‘种子x’已经不一样了
总结
mmlab框架已经对各个基础模块进行封装,和一些功能模块的解耦。在使用的时候可以不用深究细节,==严禁重复造轮子!!!!==
严禁重复造轮子!!!!