profile photo

About Me

I am a year-2 master student at Tsinghua University, under the supervision of Prof. Xiu Li. I am fortunate to be collaborating closely with Dr. Lin Song on Vision Language Model. Before that, I obtained my BSc in Mathematics and Applied Mathematics at Xidian University in 2023. My research interest includes Multi-Modal Learning and Computer Vision.

News

[2025.06]    We are excited to release the project, MindOmni

[2025.05]    Two papers, LoRA-Gen and HaploVLM are accepted by ICML 2025 (CCF-A)

[2024.10]    Obtain National Scholarship, Tsinghua University

[2024.06]    Two papers, MambaTree (Spotlight) and COVE are accepted by NeurIPS 2025 (CCF-A)

[2024.03]    The paper UVCOM is accepted by CVPR 2024 (CCF-A)

[2023.09]    The paper SOC is accepted by NeurIPS 2023 (CCF-A)

[2023.09]    The first prize of The 5th Large-scale Video Object Segmentation Challenge Track3: Referring Video Object                                         Segmentation

[2023.03]    The paper SemanticAC is accepted by ICASSP 2023 (CCF-B)

[2021.12]    Obtain National Scholarship, Xidian University

Academic experience

clean-usnob

2023-Present

Studying as a Master Student at Tsinghua University

clean-usnob

2019-2023

Studying as an Undergraduate Student at Xidian University


Industrial experience

clean-usnob

2024.06-Present

I am a multimodal algorithm research intern supervised by Dr. Lin Song at Tencent ARC Lab


clean-usnob

2024.01-2024.06

I am a multimodal algorithm research intern supervised by Dr. Lin Song at Tencent AI Lab


clean-usnob

2022.12-2023.3

I am a multimodal algorithm research intern at OPPO Research Institute


Publications

clean-usnob

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO


Yicheng Xiao, Lin Song, Yukang Chen, Yingmin Luo, Yuxin Chen, Yukang Gan, Wei Huang, Xiu Li, Xiaojuan Qi, Ying Shan

Under Review / Paper / Code
clean-usnob

HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation


Yicheng Xiao*, Lin Song*, Rui Yang, Cheng Cheng, Zunnan Xu, Zhaoyang Zhang, Yixiao Ge, Xiu Li, Ying Shan (* equal contribution)

Under Review / Paper / Code
clean-usnob

HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding


Rui Yang, Lin Song, Yicheng Xiao, Runhui Huang, Yixiao Ge, Ying Shan, Hengshuang Zhao

ICML 2025 (CCF-A) / Paper / Code
clean-usnob

LoRA-Gen: Specializing Language Model via Online LoRA Generation


Yicheng Xiao*, Lin Song*, Rui Yang, Cheng Cheng, Yixiao Ge, Xiu Li, Ying Shan (* equal contribution)

ICML 2025 (CCF-A) / Paper / Code
clean-usnob

MambaTree: Tree Topology is All You Need in State Space Model


Yicheng Xiao*, Lin Song*, Shaoli Huang, Jiangshan Wang, Siyu Song, Yixiao Ge, Xiu Li, Ying Shan (* equal contribution)

NeurIPS 2024 Spotlight (CCF-A) / Paper / Code
clean-usnob

COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing


Jiangshan Wang*, Yue Ma*, Jiayi Guo*, Yicheng Xiao, Gao Huang, Xiu Li (* equal contribution)

NeurIPS 2024 (CCF-A) / Paper / Code
clean-usnob

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection


Yicheng Xiao*,Zhuoyan Luo*, Yong Liu, Yue Ma, Hengwei Bian, Yatai Ji, Yujiu Yang, Xiu Li (* equal contribution)

CVPR 2024 (CCF-A) / Paper / Code
clean-usnob

SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation


Zhuoyan Luo*, Yicheng Xiao*, Yong Liu*, Shuyan Li, Yitong Wang, Yansong Tang, Xiu Li, Yujiu Yang (* equal contribution)

NeurIPS 2023 (CCF-A) / Paper / Code
clean-usnob

The First Prize of ICCV 2023 The 5th Large-scale Video Object Segmentation Challenge Track3: Referring Video Object Segmentation


Zhuoyan Luo*, Yicheng Xiao*, Yong Liu*‡, Yitong Wang, Yansong Tang, Xiu Li, Yujiu Yang.(*equal contribution, ‡Project lead)

ICCV Workshop 2023 / Paper / Code
clean-usnob

SEMANTICAC: SEMANTICS-ASSISTED FRAMEWORK FOR AUDIO CLASSIFICATION


Yicheng Xiao*, Yue Ma*, Shuyan Li, Hantao Zhou, Ran Liao, Xiu Li (* equal contribution)

ICASSP 2023 (CCF-B) / Paper / Code

Thanks Jon Barron for this template.