当前位置: 首页 > news >正文

(self-supervised learning)Event Camera Data Pre-training

Publisher: ICCV 2023 

MOTIVATION OF READING: 自监督学习、稀疏事件 = NILM

link: https://arxiv.org/pdf/2301.01928.pdf

Code: GitHub - Yan98/Event-Camera-Data-Pre-training


1. Overview

Contributions are summarized as follows:

1. A self-supervised framework for event camera data pre-training. The pre-trained model can be transferred to diverse downstream tasks;

2. A family of event data augmentations, generating meaningful event images;

3. A conditional masking strategy, sampling informative event patches for network training;

4. An embedding projection loss, using paired RGB embeddings to regularize event embeddings to avoid model collapse;

5. A probability distribution alignment loss for aligning embeddings from the paired event and RGB images.

6. We achieve state-of-the-art performance in standard event benchmark datasets.

2. Related work

The SSL frameworks can be generally divided into two categories: contrastive learning and masked modeling.

2.1 Contrastive learning

This approach generally assumes augmentation invariance of images. one notable drawback
of contrastive learning is suffering from model collapse and training instability.

2.2 Masked modeling

Reconstructing masked inputs from the (i. e., unmasked) visible ones is a popular selfsupervised
learning objective motivated by the idea of autoencoding. (Bert, GPT)

3. Methodology

For pre-training, our method takes event data E and its paired natural RGB image I as inputs, and outputs a pre-trained network fe.

Firstly, consecutively perform data augmentations, event image generation, and conditional masking to obtain two patch sets (xq, xk).

Secondly, fe extracts features from event patch set xq, and he_img and he_evt separately project features from fe to latent embeddings q_img and q_evt.

fm and hm_evt are the momentum of fe and he_evt, and are updated by the exponential moving average (EMA). (momentum的含义可以参考MOCO论文)

The momentum network takes patch set xk as input and generates an embedding k_evt.

At the same time, the natural RGB image I is embeded into y = f1(h1(I)).

Finally, we perform event discrimination, and event and natural RGB image discrimination to train our model. 这里不用INFONCE直接对q_evt和k_evt进行相似度计算是因为这么做会导致embedding collapse使得embedding过于相似。原因是事件图像是稀疏离散。因此使用RGB图像的映射。

L_evt is an event embedding projection loss aiming to pull together paired event embeddings qevt and kevt, for event discrimination.

L_RGB aims to pull together paired event and RGB embeddings q_evt and y, for event and natural RGB image discrimination.

L_k1 aims to drive fe learning discriminative event embeddings, towards well-structured embedding space of natural RGB images.


InfoNCE loss Contrastive learning aims to pull together embeddings q and k+, and pushes away embeddings q and {k−}.

Event embedding projection loss

ζ(v1, v2) is the projection function.

Event and RGB image discrimination

Considering the sparsity of the event image, a single event image is less informative than an RGB image, possessing difficulty for self-supervised event network training.

We pull together embeddings of paired event and RGB images, xq and I.

we first compute the pairwise embedding similarity and then fit an exponential kernel to the similarities to compute probability scores. The probability score of the (i, j)-th pair is given by,

Our probability distribution alignment loss is given by,

Total Loss

where λ1 is a hyper-parameter for balancing the losses.

4. Experiment

We evaluate our method on three downstream tasks: object recognition, optical flow estimation, and semantic segmentation.

http://www.lryc.cn/news/269591.html

相关文章:

  • 关于个人Git学习记录及相关
  • 【eclipse】eclipse开发springboot项目使用入门
  • Android 13 默认关闭 快速打开相机
  • pytest pytest-html优化样式
  • Visual Studio 配置DLL
  • C/C++转WebAssembly及微信小程序调用
  • 【WPF.NET开发】弱事件模式
  • [Angular] 笔记 16:模板驱动表单 - 选择框与选项
  • Webpack基础使用
  • 扭蛋机小程序搭建:打造互联网“流量池”
  • 解决VNC连接Ubuntu服务器打开终端出现闪退情况
  • flutter是什么
  • GET和POST请求
  • 基于电商场景的高并发RocketMQ实战-Broker写入读取流程性能优化总结、Broker基于Pull模式的主从复制原理
  • 前端DApp开发利器,Ant Design Web3 正式发布 1.0
  • [RoarCTF 2019]Easy Java(java web)
  • Abaqus许可管理策略
  • 对采集到的温湿度数据,使用python进行数据清洗,并使用预测模型进行预测未来一段时间的温湿度数据。
  • 嵌入式SOC之通用图像处理之OSD文字信息叠加的相关实践记录
  • Java日期工具类LocalDateTime
  • 从C到C++1
  • [Angular] 笔记 18:Angular Router
  • 微服务全链路灰度方案介绍
  • 低代码开发OA系统 低代码平台如何搭建OA办公系统
  • 构建Python的Windows整合包教程
  • 《整机柜服务器通用规范》由OCTC正式发布!浪潮信息牵头编制
  • Linux:修改和删除已有变量
  • 【23.12.29期--Spring篇】Spring的 IOC 介绍
  • 【Python排序算法系列】—— 选择排序
  • 会议室占用时间段 - 华为OD统一考试