News
[2025.06]    We are excited to release the project, MindOmni
[2025.05]    Two papers, LoRA-Gen and HaploVLM are accepted by ICML 2025 (CCF-A)
[2024.10]    Obtain National Scholarship, Tsinghua University
[2024.06]    Two papers, MambaTree (Spotlight) and COVE are accepted by NeurIPS 2025 (CCF-A)
[2024.03]    The paper UVCOM is accepted by CVPR 2024 (CCF-A)
[2023.09]    The paper SOC is accepted by NeurIPS 2023 (CCF-A)
[2023.09]    The first prize of The 5th Large-scale Video Object Segmentation Challenge Track3: Referring Video Object                                         Segmentation
[2023.03]    The paper SemanticAC is accepted by ICASSP 2023 (CCF-B)
[2021.12]    Obtain National Scholarship, Xidian University
|
Academic experience
|
2023-Present
Studying as a Master Student at Tsinghua University
|
|
2019-2023
Studying as an Undergraduate Student at Xidian University
|
Industrial experience
|
2024.06-Present
I am a multimodal algorithm research intern supervised by Dr. Lin Song at Tencent ARC Lab
|
|
2024.01-2024.06
I am a multimodal algorithm research intern supervised by Dr. Lin Song at Tencent AI Lab
|
|
2022.12-2023.3
I am a multimodal algorithm research intern at OPPO Research Institute
|
Publications
|
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao, Lin Song, Yukang Chen, Yingmin Luo, Yuxin Chen, Yukang Gan, Wei Huang, Xiu Li, Xiaojuan Qi, Ying Shan
Under Review / Paper / Code
|
|
HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
Yicheng Xiao*, Lin Song*, Rui Yang, Cheng Cheng, Zunnan Xu, Zhaoyang Zhang, Yixiao Ge, Xiu Li, Ying Shan (* equal contribution)
Under Review / Paper / Code
|
|
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
Rui Yang, Lin Song, Yicheng Xiao, Runhui Huang, Yixiao Ge, Ying Shan, Hengshuang Zhao
ICML 2025 (CCF-A) / Paper / Code
|
|
LoRA-Gen: Specializing Language Model via Online LoRA Generation
Yicheng Xiao*, Lin Song*, Rui Yang, Cheng Cheng, Yixiao Ge, Xiu Li, Ying Shan (* equal contribution)
ICML 2025 (CCF-A) / Paper / Code
|
|
MambaTree: Tree Topology is All You Need in State Space Model
Yicheng Xiao*, Lin Song*, Shaoli Huang, Jiangshan Wang, Siyu Song, Yixiao Ge, Xiu Li, Ying Shan (* equal contribution)
NeurIPS 2024 Spotlight (CCF-A) / Paper / Code
|
|
COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing
Jiangshan Wang*, Yue Ma*, Jiayi Guo*, Yicheng Xiao, Gao Huang, Xiu Li (* equal contribution)
NeurIPS 2024 (CCF-A) / Paper / Code
|
|
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
Yicheng Xiao*,Zhuoyan Luo*, Yong Liu, Yue Ma, Hengwei Bian, Yatai Ji, Yujiu Yang, Xiu Li (* equal contribution)
CVPR 2024 (CCF-A) / Paper / Code
|
|
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
Zhuoyan Luo*, Yicheng Xiao*, Yong Liu*, Shuyan Li, Yitong Wang, Yansong Tang, Xiu Li, Yujiu Yang (* equal contribution)
NeurIPS 2023 (CCF-A) / Paper / Code
|
|
Zhuoyan Luo*, Yicheng Xiao*, Yong Liu*‡, Yitong Wang, Yansong Tang, Xiu Li, Yujiu Yang.(*equal contribution, ‡Project lead)
ICCV Workshop 2023 / Paper / Code
|
|
SEMANTICAC: SEMANTICS-ASSISTED FRAMEWORK FOR AUDIO CLASSIFICATION
Yicheng Xiao*, Yue Ma*, Shuyan Li, Hantao Zhou, Ran Liao, Xiu Li (* equal contribution)
ICASSP 2023 (CCF-B) / Paper / Code
|
|