当前位置: 首页 > news >正文

(a)Mask RCNN总体流程

(a)Mask RCNN总体流程

一.Mask RCNN 架构

自己整理了一份Mask RCNN架构图如下,其中绿色模块只有推理过程才会涉及。

Mask RCNN网络架构

核心模块包括:数据预处理,骨干网络,区域提议网络,FastRCNN分支,Mask分支,数据后处理等。

二.网络核心流程

class FasterRCNNBase(nn.Module):def __init__(self, backbone, rpn, roi_heads, transform):super(FasterRCNNBase, self).__init__()self.transform = transformself.backbone = backboneself.rpn = rpnself.roi_heads = roi_heads# used only on torchscript modeself._has_warned = False@torch.jit.unuseddef eager_outputs(self, losses, detections):# type: (Dict[str, Tensor], List[Dict[str, Tensor]]) -> Union[Dict[str, Tensor], List[Dict[str, Tensor]]]if self.training:return lossesreturn detectionsdef forward(self, images, targets=None):# type: (List[Tensor], Optional[List[Dict[str, Tensor]]]) -> Tuple[Dict[str, Tensor], List[Dict[str, Tensor]]]if self.training and targets is None:raise ValueError("In training mode, targets should be passed")if self.training:assert targets is not Nonefor target in targets:         # 进一步判断传入的target的boxes参数是否符合规定boxes = target["boxes"]if isinstance(boxes, torch.Tensor):if len(boxes.shape) != 2 or boxes.shape[-1] != 4:raise ValueError("Expected target boxes to be a tensor""of shape [N, 4], got {:}.".format(boxes.shape))else:raise ValueError("Expected target boxes to be of type ""Tensor, got {:}.".format(type(boxes)))original_image_sizes = torch.jit.annotate(List[Tuple[int, int]], [])for img in images:val = img.shape[-2:]assert len(val) == 2  # 防止输入的是个一维向量original_image_sizes.append((val[0], val[1]))# original_image_sizes = [img.shape[-2:] for img in images]images, targets = self.transform(images, targets)  # 对图像进行预处理# print(images.tensors.shape)features = self.backbone(images.tensors)  # 将图像输入backbone得到特征图if isinstance(features, torch.Tensor):  # 若只在一层特征层上预测,将feature放入有序字典中,并编号为‘0’features = OrderedDict([('0', features)])  # 若在多层特征层上预测,传入的就是一个有序字典# 将特征层以及标注target信息传入rpn中# proposals: List[Tensor], Tensor_shape: [num_proposals, 4],# 每个proposals是绝对坐标,且为(x1, y1, x2, y2)格式proposals, proposal_losses = self.rpn(images, features, targets)# 将rpn生成的数据以及标注target信息传入fast rcnn后半部分detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)# 对网络的预测结果进行后处理(主要将bboxes还原到原图像尺度上)detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)losses = {}losses.update(detector_losses)losses.update(proposal_losses)if torch.jit.is_scripting():if not self._has_warned:warnings.warn("RCNN always returns a (Losses, Detections) tuple in scripting")self._has_warned = Truereturn losses, detectionselse:return self.eager_outputs(losses, detections)# if self.training:#     return losses## return detections

FasterRCNNBase是RCNN检测算法的基类,FasterRCNN类要继承FasterRCNNBase类,而MaskRCNN类又要继承FasterRCNN类,所以当实例化一个model并传入数据x时,会调用FasterRCNNBase的forward函数:

model = MaskRCNN(backbone,num_classes)
model(images,targets)

FasterRCNNBase的 init() 函数:

    def __init__(self, backbone, rpn, roi_heads, transform):super(FasterRCNNBase, self).__init__()self.transform = transformself.backbone = backboneself.rpn = rpnself.roi_heads = roi_heads# used only on torchscript modeself._has_warned = False

传入参数包括:
(1)backbone:
resnet50
resnet101
resnet50+fpn
resnet101+fpn
(2)rpn:
区域提议网络
(3)roi_haeds:
box roi pooling/align
two MLP head
box predictor
mask roi pool
mask head
mask predictor
(4)transforms:
GeneraRCNNtransforms类的实例,用于数据预处理

FasterRCNNBase的 forward() 函数:

    def forward(self, images, targets=None):# type: (List[Tensor], Optional[List[Dict[str, Tensor]]]) -> Tuple[Dict[str, Tensor], List[Dict[str, Tensor]]]if self.training and targets is None:raise ValueError("In training mode, targets should be passed")if self.training:assert targets is not Nonefor target in targets:         # 进一步判断传入的target的boxes参数是否符合规定boxes = target["boxes"]if isinstance(boxes, torch.Tensor):if len(boxes.shape) != 2 or boxes.shape[-1] != 4:raise ValueError("Expected target boxes to be a tensor""of shape [N, 4], got {:}.".format(boxes.shape))else:raise ValueError("Expected target boxes to be of type ""Tensor, got {:}.".format(type(boxes)))original_image_sizes = torch.jit.annotate(List[Tuple[int, int]], [])for img in images:val = img.shape[-2:]assert len(val) == 2  # 防止输入的是个一维向量original_image_sizes.append((val[0], val[1]))# original_image_sizes = [img.shape[-2:] for img in images]images, targets = self.transform(images, targets)  # 对图像进行预处理# print(images.tensors.shape)features = self.backbone(images.tensors)  # 将图像输入backbone得到特征图if isinstance(features, torch.Tensor):  # 若只在一层特征层上预测,将feature放入有序字典中,并编号为‘0’features = OrderedDict([('0', features)])  # 若在多层特征层上预测,传入的就是一个有序字典# 将特征层以及标注target信息传入rpn中# proposals: List[Tensor], Tensor_shape: [num_proposals, 4],# 每个proposals是绝对坐标,且为(x1, y1, x2, y2)格式proposals, proposal_losses = self.rpn(images, features, targets)# 将rpn生成的数据以及标注target信息传入fast rcnn后半部分detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)# 对网络的预测结果进行后处理(主要将bboxes还原到原图像尺度上)detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)losses = {}losses.update(detector_losses)losses.update(proposal_losses)if torch.jit.is_scripting():if not self._has_warned:warnings.warn("RCNN always returns a (Losses, Detections) tuple in scripting")self._has_warned = Truereturn losses, detectionselse:return self.eager_outputs(losses, detections)

首先增加一些容错机制,保住输入数据格式符合模型要求,然后将images和targets输入transforms中进行数据格式的预处理;然后将images输入到backbone中,得到特征图features;将features,images,targets输入rpn网络中,得到proposals和proposals_loss;然后将proposals,images,features等输入到roi_heads得到detections和detector_loss;如果在训练模式下,则返回loss(proposals_loss和detection_loss),在推理模式下,则返回detections。

http://www.lryc.cn/news/223485.html

相关文章:

  • 浅谈数据中心机房末端配电技术与产品监控选型-安科瑞黄安南
  • 红包算法 java实现
  • MVCC中的可见性算法
  • Leetcode73矩阵置零
  • linux重要的目录之proc和dev目录
  • 【组件自定义事件+全局事件总线+消息订阅与发布+TodoList案例——编辑+过度与动画】
  • 单独封装export default .js 在引入
  • 【带头学C++】----- 三、指针章 ---- 3.11 补充重要指针知识(二,拓展基础知识)
  • Jmeter分布式性能测试细节+常见问题解决,资深老鸟带你避坑...
  • 动态表单获取某一项值
  • 短路表达式
  • 风力发电场集中监控系统解决方案
  • SpringDataJpa(二)
  • 软件测评中心▏软件功能测试和非功能测试的区别和联系简析
  • 打卡系统有什么用?如何通过日常管理系统提高企业员工的效率?
  • png怎么转jpg?这款图片转格式工具一学就会用
  • 万界星空科技MES系统软件体系架构及应用
  • uniapp h5实现Excel、Word、PDF文件在线预览,而不是跳转下载,也不需要下载
  • 台式电脑一键重装Win10系统详细教程
  • 图像相机-相机属性SDK汇总设置
  • 使用ffmpeg调用电脑自带的摄像头和扬声器录制音视频
  • 工业物联网模块应用之砂芯库桁架机器人远程无线控制
  • Ubuntu安装.Net SDK
  • 相交链表~
  • 跨境电商API接口如何通过API数据接口进行选品
  • ArrayList集合方法(自写)
  • sql注入学习笔记
  • 企业涉密文件怎么加密?企业重要文件加密方法
  • 经典猜数游戏(python类封装)
  • 环形链表~