当前位置: 首页 > news >正文

BEV:隐式相机视角转换-----BEVFormer

一、背景

基于imp投影的相机视角转换,对相机的内外参依赖较高,BEV 网格融合固定,可能对小目标不够敏感;考虑通过transformer的方式进行相机的视角转换,BEV query 可以自适应关注关键区域,提高小目标检测,transformer 注意力机制,灵活采样。故通过BEVFormer的demo代码理解其原理。

二、代码

import torch
import torch.nn as nn# -------------------------
# 参数
# -------------------------
B, C, H, W = 2, 64, 16, 16        # 摄像头特征图
bev_H, bev_W = 8, 8                # BEV 网格
num_cameras = 7
num_classes = 10
num_det_queries = 32                # detection query 数量# -------------------------
# 1. 多摄像头特征
# -------------------------
camera_feats = [torch.randn(B, C, H*W) for _ in range(num_cameras)]  # B x C x N (N=H*W)
for i in range(num_cameras):camera_feats[i] = camera_feats[i].permute(0, 2, 1)  # B x N x C# -------------------------
# 2. BEV query + Transformer 投影
# -------------------------
num_bev_queries = bev_H * bev_W
bev_queries = nn.Parameter(torch.randn(num_bev_queries, B, C))class BEVProjectionTransformer(nn.Module):def __init__(self, C, num_heads=8):super().__init__()self.attn = nn.MultiheadAttention(embed_dim=C, num_heads=num_heads)def forward(self, bev_queries, camera_feats):"""bev_queries: num_bev_queries x B x Ccamera_feats: list of B x N x C"""# 拼接所有摄像头特征feats = torch.cat(camera_feats, dim=1)      # B x (num_cameras*N) x Cfeats = feats.permute(1,0,2)               # (num_cameras*N) x B x Cbev_out, _ = self.attn(bev_queries, feats, feats)return bev_outbev_proj_transformer = BEVProjectionTransformer(C)
bev_features = bev_proj_transformer(bev_queries, camera_feats)
bev_features_grid = bev_features.permute(1,0,2).reshape(B, bev_H, bev_W, C)# -------------------------
# 3. Detection query + Transformer
# -------------------------
det_queries = nn.Parameter(torch.randn(num_det_queries, B, C))class DetectionDecoderTransformer(nn.Module):def __init__(self, C, num_heads=8):super().__init__()self.attn = nn.MultiheadAttention(embed_dim=C, num_heads=num_heads)def forward(self, det_queries, bev_features_grid):B, H, W, C = bev_features_grid.shapebev_flat = bev_features_grid.reshape(B, H*W, C).permute(1,0,2)out, _ = self.attn(det_queries, bev_flat, bev_flat)return outdecoder = DetectionDecoderTransformer(C)
det_features = decoder(det_queries, bev_features_grid)# -------------------------
# 4. Detection head
# -------------------------
class SimpleDetectionHead(nn.Module):def __init__(self, C, num_classes):super().__init__()self.cls_head = nn.Linear(C, num_classes)self.bbox_head = nn.Linear(C, 7)def forward(self, det_features):cls_logits = self.cls_head(det_features)bbox_preds = self.bbox_head(det_features)return cls_logits, bbox_predsdetection_head = SimpleDetectionHead(C, num_classes)
cls_logits, bbox_preds = detection_head(det_features)print("类别 logits shape:", cls_logits.shape)     # num_det_queries x B x num_classes
print("3D bbox preds shape:", bbox_preds.shape)   # num_det_queries x B x 7
http://www.lryc.cn/news/625326.html

相关文章:

  • C#/.NET/.NET Core技术前沿周刊 | 第 50 期(2025年8.11-8.17)
  • 【leetcode 3】最长连续序列 (Longest Consecutive Sequence) - 解题思路 + Golang实现
  • Selenium使用指南
  • Ubuntu conda虚拟环境下pip换源
  • jsPDF 不同屏幕尺寸 生成的pdf不一致,怎么解决
  • 软件测试-Selenium学习笔记
  • LeetCode 134.加油站:贪心策略下的环形路线可行性判断
  • 【基础-判断】用户在长视频、短视频、直播、通话、会议、拍摄类应用等场景下,可以采用悬停适配在折叠屏半折态时,上屏进行浏览下屏进行交互操作
  • 技术分享:跨域问题的由来与解决
  • WebSocket的连接原理
  • Ansible 配置并行 - 项目管理笔记
  • Go 并发入门:从 goroutine 到 worker pool
  • 边缘智能体:Go编译在医疗IoT设备端运行轻量AI模型(中)
  • CentOS 8开发测试环境:直接安装还是Docker更优?
  • 半导体笔记<01-半导体中的数据>
  • C5.5:VDB及后面的电路讨论
  • C++STL-vector底层实现
  • [日常学习] -2025-8-18- 页面元类和装饰器工厂
  • VSCode 从安装到精通:下载安装与快捷键全指南
  • LINUX 软件编程 -- 线程
  • WebPack》》Loader原理、分类
  • 如何在 Ubuntu Linux 上安装 RPM 软件包
  • 字符分类函数与字符转换函数
  • 在Qt中使用PaddleOCR进行文本识别
  • ubuntu24.04 用apt安装的mysql修改存储路径(文件夹、目录)
  • Vue2+Vue3前端开发_Day1
  • 当宠物机器人装上「第六感」:Deepoc 具身智能如何重构宠物机器人照看逻辑
  • Ubuntu22.04安装docker最新教程,包含安装自动脚本
  • 雷卯针对香橙派Orange Pi 3 LTS开发板防雷防静电方案
  • 在 Windows 上使用 Kind 创建本地 Kubernetes 集群并集成Traefik 进行负载均衡