版权说明 操作指南
首页 > 成果 > 详情

Expression Prompt Collaboration Transformer for universal referring video object segmentation

认领
导出
Link by DOI
反馈
分享
QQ微信 微博
成果类型:
期刊论文
作者:
Chen, Jiajun;Lin, Jiacheng;Zhong, Guojin;Fu, Haolong;Nai, Ke;...
通讯作者:
Yang, Kailun;Li, ZY
作者机构:
[Li, Zhiyong; Yang, Kailun; Chen, Jiajun; Yang, KL] Hunan Univ, Sch Robot, Changsha 410082, Peoples R China.
[Li, Zhiyong; Lin, Jiacheng; Zhong, Guojin; Fu, Haolong] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Peoples R China.
[Nai, Ke] Changsha Univ Sci & Technol, Sch Comp & Commun Engn, Changsha 410011, Peoples R China.
通讯机构:
[Li, ZY ; Yang, KL] H
Hunan Univ, Sch Robot, Changsha 410082, Peoples R China.
Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Peoples R China.
语种:
英文
关键词:
Audio-guided video object segmentation;Referring video object segmentation;Expression-visual attention;Audio-text contrastive learning;Multi-task learning
期刊:
Knowledge-Based Systems
ISSN:
0950-7051
年:
2025
卷:
311
页码:
113006
基金类别:
CRediT authorship contribution statement Jiajun Chen: Writing – original draft, Visualization, Resources, Methodology, Data curation. Jiacheng Lin: Writing – review & editing, Writing – original draft, Visualization, Formal analysis, Conceptualization. Guojin Zhong: Visualization, Validation, Software, Data curation. Haolong Fu: Visualization, Validation, Investigation. Ke Nai: Visualization, Validation. Kailun Yang: Writing – review & editing, Supervision, acquisition, Formal analysis, Conceptualization. Zhiyong Li: Writing –
机构署名:
本校为其他机构
院系归属:
计算机与通信工程学院
摘要:
Audio-guided Video Object Segmentation (A-VOS) and Referring Video Object Segmentation (R-VOS) are two highly related tasks aiming to segment specific objects from video sequences according to expression prompts. However, due to the challenges of modeling representations for different modalities, existing methods struggle to balance between interaction flexibility and localization precision. In this paper, we address this problem from two perspectives: the alignment of audio and text and the deep interaction among audio, text, and visual modalities. First, we propose a universal architecture, ...

反馈

验证码:
看不清楚,换一个
确定
取消

成果认领

标题:
用户 作者 通讯作者
请选择
请选择
确定
取消

提示

该栏目需要登录且有访问权限才可以访问

如果您有访问权限,请直接 登录访问

如果您没有访问权限,请联系管理员申请开通

管理员联系邮箱:yun@hnwdkj.com