当前位置：首页 > news >正文

BEVFormer模型处理流程

news 2025/7/5 14:51:32

1.主要代码结构

BEVFormer-data：自定义数据集-ckpts：预训练权重-mmdetection3d：MMDetection3D框架的实现文件-projects：包含模型配置，MMDetection3D框架外自定义的模型-- configs：模型配置文件，包括bevformer，bevformer v2 ...--- _base_: 基础配置文件，如default_runtime.py: cfg.workflow--- bevformer：自定义的bevformer的配置文件-- mmdet3d_plugin：bevformer自定义的模型文件，数据集生成文件等-- tools：包含一些数据集转换工具，benchmark测试工具，模型训练、测试脚本，自定义数据集生成工具，ros接口脚本等

2.模型训练流程

2.1 模型创建

以bevformer_tiny.py配置文件为例，通过Config.fromfile()将配置文件解析为 Python 字典结构。MMDetection3D根据配置文件自动完成模型构建，大致流程：

Config File (e.g. bevformer_tiny.py)↓
cfg = Config.fromfile('projects/configs/bevformer/bevformer_tiny.py')↓
model = build_model(cfg.model)↓
model = MODELS.build(cfg.model)↓
最终调用：class BEVFormer(MVXTwoStageDetector)

(1).配置文件自定义模型类型。配置文件中有代码段声明采用的模型类型，即使用自定义的 class BEVFormer 作为检测器。如下：

model = dict(type='BEVFormer',   # ←←← 这个 'type' 字段是关键！use_grid_mask=True,img_backbone=dict(...),pts_bbox_head=dict(...),...
)

(2).注册机制。在BEVFormer\projects\mmdet3d_plugin\bevformer\detectors\bevformer.py有自定义模型装饰器将BEVFormer 类注册到全局模块字典中，如下：

@DETECTORS.register_module()
class BEVFormer(MVXTwoStageDetector):"""BEVFormer.Args:video_test_mode (bool): Decide whether to use temporal information during inference."""def __init__(self,...

(3).构建函数自动调用。MMDetection3D 内部通过 build_model() 函数构建整个模型。

from mmdet.models import build_model
model = build_model(cfg.model,train_cfg=cfg.get('train_cfg'),test_cfg=cfg.get('test_cfg'))

函数会递归地解析 cfg.model 中的各个子模块（backbone、neck、head 等），并最终调用：

model_class = Registry.get('DETECTORS').get(model_type)
model = model_class(**kwargs)

等同于：model = BEVFormer(**cfg.model)。至此，完成了：
a.图像 img_backbone（ResNet50）构建；
b.img_neck（FPN）构建；
c.pts_bbox_head（BEVFormerHead）构建；
d.模型训练、推理设置。

2.2 数据集构建

datasets = [build_dataset(cfg.data.train)]

读取配置中的 data.train（引用方式参考 mmcv Config）字段：

data = dict(samples_per_gpu=1,workers_per_gpu=8,train=dict(type=dataset_type,data_root=data_root,ann_file=data_root + 'robotruck_infos_train.pkl',pipeline=train_pipeline,classes=class_names,modality=input_modality,test_mode=False,use_valid_flag=True,bev_size=(bev_h_, bev_w_),queue_length=queue_length,# we use box_type_3d='LiDAR' in kitti and nuscenes dataset# and box_type_3d='Depth' in sunrgbd and scannet dataset.box_type_3d='LiDAR'),...)

然后调用：

build_dataset(cfg.data.train)

同时创建一个 RoboTruckDataset 实例，并应用如下预处理流程：

train_pipeline = [dict(type='LoadMultiViewImageFromFiles', to_float32=True),dict(type='PhotoMetricDistortionMultiViewImage'),dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),dict(type='ObjectNameFilter', classes=class_names),dict(type='RandomScaleImageMultiViewImage', scales=[0.3]),dict(type='NormalizeMultiviewImage', ...),dict(type='PadMultiViewImage', size_divisor=32),dict(type='DefaultFormatBundle3D', class_names=class_names),dict(type='CustomCollect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img'])
]

这些步骤确保输入图像和标注被正确加载、增强和格式化。

2.3 训练流程

train.py里通过调用mmdet3d_plugin\bevformer\apis\train.py custom_train_model() ，而custom_train_model()实际通过调用custom_train_detector()实现：

from projects.mmdet3d_plugin.bevformer.apis.train import custom_train_model...
custom_train_model(model,datasets,cfg,distributed=distributed,validate=(not args.no_validate),timestamp=timestamp,meta=meta)

custom_train_detector()定义在mmdet3d_plugin\bevformer\apis\mmdet_train.py中，它封装了 MMDetection 的训练流程，并进行了适配和扩展。

def custom_train_model(model,dataset,cfg,distributed=False,validate=False,timestamp=None,eval_model=None,meta=None):"""A function wrapper for launching model training according to cfg.Because we need different eval_hook in runner. Should be deprecated in thefuture."""if cfg.model.type in ['EncoderDecoder3D']:assert Falseelse:custom_train_detector(model,dataset,cfg,distributed=distributed,validate=validate,timestamp=timestamp,eval_model=eval_model,meta=meta)

custom_train_detector()的职责包括根据配置启动一个完整的训练流程，包括数据准备、模型并行化、优化器构建、钩子注册、以及最终调用 runner.run() 开始训练。详细的流程如下：
1). 数据加载器构建。使用配置构建训练和验证用的DataLoader
2). 多 GPU 支持。根据是否分布式训练，使用MMDataParallel或MMDistributedDataParallel
3). 模型与优化器初始化。构建模型、优化器，并设置runner
4). 注册标准训练钩子(Hooks)。注册学习率调整、日志记录、检查点保存等钩子
5). 验证支持。如果启用验证，则构建验证集和评估钩子
6). 自定义Hook支持。支持用户自定义的训练 Hook
7). 模型恢复/加载。支持从 checkpoint 恢复训练或加载预训练权重
8). 启动训练。调用 runner.run() 正式开始训练
总体流程图如下所示：

graph TDA[config file] --> B[data_loaders: build_dataloader()]B --> C[model: MMDataParallel / MMDistributedDataParallel]C --> D[optimizer: build_optimizer()]D --> E[runner: EpochBasedRunner]E --> F[register hooks]F --> G[learning rate, optimizer, loggers, checkpointing]G --> H[validation hook (optional)]H --> I[custom hooks (optional)]I --> J[resume or load pretrained weights]J --> K[runner.run() → 开始训练]

2.4 模型前向传播

训练过程中，模型的forward()方法会被调用，forward()根据训练、测试阶段的不同调用forward_train()或forward_test()，训练流程主要由forward_train()定义。调用链如下：

model.forward_train() 
│
├── obtain_history_bev(): 获取历史帧 BEV 特征（for temporal modeling）
├── extract_feat(): 提取当前帧图像特征
│   └── extract_img_feat():
│       ├── img_backbone(): CNN 提取图像特征
│       └── img_neck(): FPN 上采样
│
└── forward_pts_train():└── pts_bbox_head():├── get_bev_features(): 构建 BEV（使用空间交叉注意力）├── decoder(bev_embed): DETR-style 解码器└── loss():├── HungarianMatcher: 匹配预测与真值├── FocalLoss: 分类损失├── L1Loss: 框坐标损失└── GIoULoss: 框重叠损失

BEVFormer模型总体运行流程如下：

graph TDA[config file: bevformer_tiny.py] --> B[Config.fromfile()]B --> C[build_model()]C --> D[BEVFormer.__init__()]D --> E[build img_backbone, img_neck, pts_bbox_head]A --> F[data.train config]F --> G[build_dataset()]G --> H[DataLoader 加载数据]E --> I[custom_train_model()]H --> II --> J[train loop]J --> K[model.train()]K --> L[dataloader.next()]L --> M[model(inputs) → losses]M --> N[loss.backward() → optimizer.step()]N --> O[log and save checkpoint]