一文可视化分析2025年6月计算机视觉顶刊IJCV前沿热点
追踪计算机视觉领域的前沿热点是把握技术发展方向、推动创新落地的关键,分析这些热点,不仅能洞察技术趋势,更能为科研选题和工程实践提供重要参考。本文对计算机视觉顶刊《International Journal of Computer Vision》2025年6月前沿热点进行了可视化分析。欢迎阅读和转发。
本文作者为韩煦,审核为邓镝。
一、期刊介绍
《国际计算机视觉杂志》(International Journal of Computer Vision,简称IJCV)是计算机视觉领域的顶级期刊。该期刊现为月刊(每年出版12期),致力于发表高质量、原创性的学术论文,以推动计算机视觉科学与工程的蓬勃发展。期刊影响因子11.6(2023),5年期刊影响因子14.5(2023),提交首次决定(中位数)96天。表1展示了IJCV近5年发表文章的数量及期刊的影响因子(IF)的变化情况。
表 1 IJCV每年的文章数量和影响因子
年度 | 文章数/年 | IF |
2023 | 198 | 11.6 |
2022 | 187 | 19.5 |
2021 | 130 | 13.3 |
2020 | 187 | 7.4 |
2019 | 90 | 5.7 |
该期刊的讨论主题领域主要聚焦于计算机视觉领域,具体来说包括图像形成、处理、分析与解读、机器学习技术、统计方法;传感器技术;基于图像的渲染、计算机图形学、机器人技术、影像解译、图像检索、视频分析与标注、多媒体等;视觉计算模型及人脑视觉架构研究。
期刊网址:Home | International Journal of Computer Vision
二、热点分析
表2 论文标题中出现的高频主题词
高频主题 | 翻译 | 出现次数 | 主题方向 |
learning | 12 | 学习 | 深度学习、迁移学习、生成式学习等 |
image | 12 | 图像 | 图像分割、图像生成、图像理解等 |
detection | 8 | 检测 | 目标检测、异常检测、行为检测等 |
generation | 8 | 生成 | 图像生成、文本生成、多模态生成等 |
3d | 7 | 三维 | 三维目标检测、三维重建、三维分割等 |
object | 7 | 目标 | 目标检测、目标追踪、物体分割等 |
segmentation | 7 | 分割 | 图像分割、实例分割、语义分割等 |
vision | 6 | 视觉 | 计算机视觉、视觉模型、视觉推理等 |
multi | 6 | 多(模态) | 多模态、多领域、多任务、多源数据等 |
text | 6 | 文本/语义 | 文本生成、文本引导、文本到图像等 |
generative | 6 | 生成式 | 生成式模型(如diffusion、GAN、VAE等) |
language | 5 | 语言 | 视觉-语言模型、语言理解、信息融合等 |
deep | 5 | 深度 | 深度神经网络、深度学习方法等 |
Modal/Multimodal | 4 | 模态 | 多模态学习、跨模态、模态融合等 |
Self/supervised | 4 | 自监督/监督 | 自监督学习、半监督、全监督等 |
video | 4 | 视频 | 视频理解、视频生成、事件检测等 |
diffusion | 3 | 扩散 | 扩散生成模型、扩散过程建模 |
matching | 3 | 匹配 | 图像/特征/体素匹配等 |
图 1 研究热点词云图
表2列出了在本次会议中,被录用的38篇论文标题中的15个高频主题词。图1展示了基于IJCV研究热点生成的词云图,涵盖语义分割、扩散模型、一致性等研究领域。
投稿的论文主题反映出本期研究热点集中在一下几个方向:
1. 多模态与融合技术
核心概念:不同模态数据的融合与协同学习
技术方向:视觉-语言模型、多源数据融合、跨模态学习、多模态生成。
相关论文示例:
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models
作者:Jiawei Liang, Siyuan Liang, Aishan Liu & Xiaochun Cao
下载地址:https://link.springer.com/article/10.1007/s11263-025-02368-9
Multi-Source Domain Adaptation by Causal-Guided Adaptive Multimodal Diffusion Networks
作者:Ziyun Cai, Yawen Huang, Tengfei Zhang, Yefeng Zheng & Dong Yue 、
下载地址:https://link.springer.com/article/10.1007/s11263-025-02401-x
Bootstrapping Vision-Language Models for Frequency-Centric Self-Supervised Remote Physiological Measurement
作者:Zijie Yue, Miaojing Shi, Hanli Wang, Shuai Ding, Qijun Chen & Shanlin Yang
下载地址:https://link.springer.com/article/10.1007/s11263-025-02388-5
2. 生成式模型与内容创造
核心概念:基于深度学习的内容自动生成技术
技术方向:扩散模型、条件生成、编辑与操作:图像编辑、风格迁移、内容替换、多模态生成。
相关论文示例:
DiffuVolume: Diffusion Model for Volume based Stereo Matching
作者:Dian Zheng, Xiao-Ming Wu, Zuhao Liu, Jingke Meng & Wei-Shi Zheng
下载地址:https://link.springer.com/article/10.1007/s11263-025-02362-1
Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation
作者:Wenjing Wang, Huan Yang, Zixi Tuo, Huiguo He, Junchen Zhu, Jianlong Fu & Jiaying Liu
下载地址:https://link.springer.com/article/10.1007/s11263-025-02349-y
Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation
作者:Yin Wang, Mu Li, Jiapeng Liu, Zhiying Leng, Frederick W. B. Li, Ziyao Zhang & Xiaohui Liang
下载地址:https://link.springer.com/article/10.1007/s11263-025-02392-9
3. 三维视觉与空间理解
核心概念:三维空间中的视觉感知与理解
技术方向:3D目标检测、三维重建、点云处理、空间推理。
相关论文示例:
LiDAR-guided Geometric Pretraining for Vision-Centric 3D Object Detection
作者:Linyan Huang, Huijie Wang, Jia Zeng, Shengchuan Zhang, Liujuan Cao, Junchi Yan & Hongyang Li
下载地址:https://link.springer.com/article/10.1007/s11263-025-02351-4
CT3D++: Improving 3D Object Detection with Keypoint-Induced Channel-wise Transformer
作者:Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Qiao Liang, Min-Jian Zhao & Jieping Ye
下载地址:https://link.springer.com/article/10.1007/s11263-025-02404-8
PointSea: Point Cloud Completion via Self-structure Augmentation
作者:Zhe Zhu, Honghua Chen, Xing He & Mingqiang Wei
下载地址:https://link.springer.com/article/10.1007/s11263-025-02400-y
4. 细粒度图像分析与理解
核心概念:图像中精细层次的分析与识别
技术方向:目标检测与识别、图像分割、特征匹配、场景理解。
相关论文示例:
Instance-Level Moving Object Segmentation from a Single Image with Events
作者:Zhexiong Wan, Bin Fan, Le Hui, Yuchao Dai & Gim Hee Lee
下载地址:https://link.springer.com/article/10.1007/s11263-025-02380-z
Camouflaged Object Detection with Adaptive Partition and Background Retrieval
作者:Bowen Yin, Xuying Zhang, Li Liu, Ming-Ming Cheng, Yongxiang Liu & Qibin Hou
下载地址:https://link.springer.com/article/10.1007/s11263-025-02406-6
Informative Scene Graph Generation via Debiasing
作者:Lianli Gao, Xinyu Lyu, Yuyu Guo, Yuxuan Hu, Yuan-Fang Li, Lu Xu, Heng Tao Shen & Jingkuan Song
下载地址:https://link.springer.com/article/10.1007/s11263-025-02365-y
5. 学习范式创新与优化
核心概念:改进深度学习的训练方式与模型架构
技术方向:自监督学习、持续学习、少样本学习、轻量化模型、迁移学习。
相关论文示例:
Investigating Self-Supervised Methods for Label-Efficient Learning
作者:Srinivasa Rao Nandam, Sara Atito, Zhenhua Feng, Josef Kittler & Muhammad Awais
下载地址:https://link.springer.com/article/10.1007/s11263-025-02397-4
Exemplar-Free Continual Learning of Vision Transformers via Gated Class-Attention and Cascaded Feature Drift Compensation
作者:Marco Cotogni, Fei Yang, Claudio Cusano, Andrew D. Bagdanov & Joost van de Weijer
下载地址:https://link.springer.com/article/10.1007/s11263-025-02374-x
Parameter Efficient Fine-Tuning for Multi-modal Generative Vision Models with Möbius-Inspired Transformation
作者:Haoran Duan, Shuai Shao, Bing Zhai, Tejal Shah, Jungong Han & Rajiv Ranjan
下载地址:https://link.springer.com/article/10.1007/s11263-025-02398-3
6. 领域应用与实际场景
核心概念:计算机视觉在特定领域的应用落地
技术方向:医学影像、自动驾驶、人体分析。
相关论文示例:
FlowSDF: Flow Matching for Medical Image Segmentation Using Distance Transforms
作者:Lea Bogensperger, Dominik Narnhofer, Alexander Falk, Konrad Schindler & Thomas Pock
下载地址:https://link.springer.com/article/10.1007/s11263-025-02373-y
LaneCorrect: Self-Supervised Lane Detection
作者:Ming Nie, Xinyue Cai, Hang Xu & Li Zhang
下载地址:https://link.springer.com/article/10.1007/s11263-025-02417-3
Learning Structure-Supporting Dependencies via Keypoint Interactive Transformer for General Mammal Pose Estimation
作者:Tianyang Xu, Jiyong Rao, Xiaoning Song, Zhenhua Feng & Xiao-Jun Wu
下载地址:https://link.springer.com/article/10.1007/s11263-025-02355-0
三、论文汇总
为了深入分析计算机视觉顶刊《International Journal of Computer Vision》2025年6月前沿热点,本文对该期所收录的58篇论文进行了系统归纳。表3总结了全部论文的标题以及中文标题,旨在为计算机视觉与人工智能等相关领域的研究人员提供研究方向上的参考。
表3 2025年6月IJCV发表论文的列表
标题 | 中文标题 |
DiffuVolume: Diffusion Model for Volume based Stereo Matching | DiffuVolume:基于扩散模型的体积立体匹配 |
Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation | 对比解耦表示学习与正则化用于语音保留的面部表情操控 |
Towards Boosting Out-of-Distribution Detection from a Spatial Feature Importance Perspective | 基于空间特征重要性的分布外检测提升方法 |
Learning Structure-Supporting Dependencies via Keypoint Interactive Transformer for General Mammal Pose Estimation | 通过关键点交互变换器学习结构支持依赖用于哺乳动物姿态估计 |
LiDAR-guided Geometric Pretraining for Vision-Centric 3D Object Detection | 基于激光雷达引导的视觉中心三维目标检测几何预训练 |
Smaller But Better: Unifying Layout Generation with Smaller Large Language Models | 以更小的语言模型统一布局生成:更小更优 |
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-training | 基于掩码图像建模预训练探索强轻量级视觉Transformer的实验研究 |
Fusion4DAL Offline Multi-modal 3D Object Detection for 4D Auto-labeling | Fusion4DAL:面向4D自动标注的离线多模态三维目标检测 |
VideoQA in the Era of LLMs An Empirical Study | 大模型时代下的视频问答实证研究 |
VL-Trojan Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models | VL-Trojan:多模态指令后门攻击自回归视觉语言模型 |
Consistent Prompt Tuning for Generalized Category Discovery | 用于广义类别发现的一致性提示调优 |
Instance-Level Moving Object Segmentation from a Single Image with Events | 基于事件的单幅图像实例级运动物体分割 |
Imbuing, Enrichment and Calibration: Leveraging Language for Unseen Domain Extension | 注入、丰富与校准:利用语言实现未知领域扩展 |
Image Matting and 3D Reconstruction in One Loop | 一体化图像抠图与三维重建 |
Bootstrapping Vision-Language Models for Frequency-Centric Self-Supervised Remote Physiological Measurement | 基于频率自监督远程生理测量的视觉-语言模型自举 |
Continual Test-Time Adaptation for Single Image Defocus Deblurring via Causal Siamese Networks | 通过因果孪生网络实现单幅图像散焦去模糊的持续测试时自适应 |
Deep Convolutional Neural Network Enhanced Non-uniform Fast Fourier Transform for Undersampled MRI Reconstruction | 深度卷积神经网络增强的非均匀快速傅里叶变换用于MRI欠采样重建 |
Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation | 时空扩散中的注意力交换用于文本到视频生成 |
Informative Scene Graph Generation via Debiasing | 通过去偏见实现信息丰富的场景图生成 |
DustNet++: Deep Learning-Based Visual Regression for Dust Density Estimation | DustNet++:基于深度学习的粉尘密度视觉回归 |
A Survey on Deep Stereo Matching in the Twenties | 二十年代深度立体匹配综述 |
Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation | Fg-T2M++:大模型增强的细粒度文本驱动人体动作生成 |
Realistic Evaluation of Deep Active Learning for Image Classification and Semantic Segmentation | 深度主动学习在图像分类与语义分割中的真实评测 |
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook | 前沿生成模型可信度综述与展望 |
LMD: Light-Weight Prediction Quality Estimation for Object Detection in Lidar Point Clouds | LMD:用于激光点云目标检测的轻量级预测质量估计 |
Unknown Support Prototype Set for Open Set Recognition | 未知支持原型集用于开放集识别 |
LaMD: Latent Motion Diffusion for Image-Conditional Video Generation | LaMD:基于图像条件的视频生成的潜运动扩散模型 |
METS: Motion-Encoded Time-Surface for Event-Based High-Speed Pose Tracking | METS:基于事件的高速姿态跟踪的运动编码时间表面 |
Deep Hierarchical Learning for 3D Semantic Segmentation | 三维语义分割的深度分层学习 |
UMSCS: A Novel Unpaired Multimodal Image Segmentation Method Via Cross-Modality Generative and Semi-supervised Learning | UMSCS:新型非配对多模态图像分割的跨模态生成与半监督方法 |
Temporal Transductive Inference for Few-Shot Video Object Segmentation | 针对少样本视频目标分割的时序迁移推断 |
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding | 面向多模态场景理解的部分-整体关系融合 |
Semantics-Conditioned Generative Zero-Shot Learning via Feature Refinement | 基于语义条件的生成式零样本学习的特征细化 |
Investigating Self-Supervised Methods for Label-Efficient Learning | 面向高效标注的自监督方法研究 |
UniFace++: Revisiting a Unified Framework for Face Reenactment and Swapping via 3D Priors | UniFace++:基于三维先验的人脸重演与置换统一框架 |
Attribute-Centric Compositional Text-to-Image Generation | 属性中心的组合型文本到图像生成 |
Exemplar-Free Continual Learning of Vision Transformers via Gated Class-Attention and Cascaded Feature Drift Compensation | 无需样本的Transformer持续学习:门控类注意力与级联漂移补偿 |
Parameter Efficient Fine-Tuning for Multi-modal Generative Vision Models with Möbius-Inspired Transformation | 参数高效的多模态生成式视觉模型微调:Möbius变换启发方案 |
Expressive Image Generation and Editing with Rich Text | 丰富文本指导下的表达性图像生成与编辑 |
Multi-Source Domain Adaptation by Causal-Guided Adaptive Multimodal Diffusion Networks | 基于因果引导的自适应多模态扩散网络实现多源域自适应 |
Multi-Text Guidance Is Important: Multi-Modality Image Fusion via Large Generative Vision-Language Model | 多文本引导的重要性:基于大生成模型的多模态图像融合 |
Not All Pixels are Equal: Learning Pixel Hardness for Semantic Segmentation | 并非所有像素都相同:语义分割中的像素难度学习 |
Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using Corrupted Multimodal Data | 真实监控中的可见-红外行人重识别多模态融合 |
A Solution to Co-occurrence Bias in Pedestrian Attribute Recognition: Theory, Algorithms, and Improvements | 行人属性识别中的共现偏见解决方案:理论、算法与提升 |
Learning to Generalize Heterogeneous Representation for Cross-Modality Image Synthesis via Multiple Domain Interventions | 基于多域干预的跨模态图像合成泛化异质表征学习 |
LR-ASD Lightweight and Robust Network for Active Speaker Detection | LR-ASD:高效鲁棒的主动说话人检测网络 |
PointSea Point Cloud Completion via Self-structure Augmentation | PointSea:基于自结构增强的点云补全 |
Fully Decoupled End-to-End Person Search: An Approach without Conflicting Objectives | 全解耦端到端行人检索:无冲突优化目标方法 |
CT3D++: Improving 3D Object Detection with Keypoint-Induced Channel-wise Transformer | CT3D++:基于关键点引导的通道注意力Transformer提升三维目标检测 |
Preconditioned Score-Based Generative Models | 预条件分数生成模型 |
FlowSDF: Flow Matching for Medical Image Segmentation Using Distance Transforms | FlowSDF:结合距离变换的医学图像分割流配准 |
Camouflaged Object Detection with Adaptive Partition and Background Retrieval | 自适应分区与背景检索的伪装目标检测 |
LaneCorrect: Self-Supervised Lane Detection | LaneCorrect:自监督车道检测 |
ScenarioDiff: Text-to-video Generation with Dynamic Transformations of Scene Conditions | ScenarioDiff:场景条件动态变换的文本到视频生成 |
Pre-training for Action Recognition with Automatically Generated Fractal Datasets | 基于自动生成分形数据集的动作识别预训练 |
I2MD: 3D Action Representation Learning with Inter- and Intra-Modal Mutual Distillation | I2MD:基于模态内外互蒸馏的三维动作表征学习 |
四、总结
本期IJCV期刊展现了计算机视觉领域的前沿发展趋势,主要聚焦于多模态融合技术、生成式AI模型创新以及三维视觉理解三大核心方向。期刊中大量研究体现了视觉-语言模型的深度融合,包括多模态指令学习、文本引导的图像视频生成等;扩散模型成为生成式AI的重要载体,在立体匹配、动作生成、医学影像分割等领域展现出强大潜力;三维视觉技术日趋成熟,涵盖点云处理、3D目标检测、空间重建等关键技术。同时,自监督学习、持续学习、轻量化模型等新兴学习范式为解决数据稀缺和计算效率问题提供了创新思路。