-->

I am now a Tenure-track Assistant Professor (Ph.D. Supervisor) in Wangxuan Institute of Computer Technology, Peking University, Peking University Boya Young Fellow. Also a member of MIPL Group (led by Prof. Yuxin Peng) at Peking University.
Before joining Peking University, I was a Postdoctoral Researcher in the Visual Geometry Group (VGG) at University of Oxford, supervised by Prof. Andrew Zisserman. I received PhD and MPhil in Advanced Computer Science from University of Cambridge, and B.Eng. in Telecommunication Engineering from Beijing University of Posts and Telecommunications (BUPT).
My research interests include computer vision, natural language processing and machine learning, with an emphasis on how these areas can collaborate best to perform real-world tasks. Below are some of my recent research topic:
We are always actively recruiting postdocs, Prospective graduate students and interns!
Welcome to contact me with your detailed CV! Please read this Note first!
|
Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models Shengli Zhou, Minghang Zheng, Feng Zheng, Yang Liu† IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2026 [ PDF] [Project Page] [ Code] [ Bibtex] |
|
OmniVTG: A Large-Scale Dataset and Training Paradigm for open-World Video Temporal Grounding Minghang Zheng, Zihao Yin, Yi Yang, Yuxin Peng, Yang Liu† IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2026 [ PDF] [Project Page] [ Code] [ Model] [ Bibtex] |
|
Confidence-Aware Pseudo-Label Self-Correction for Weakly Supervised Visual Grounding Yang Liu, Jiahua Zhang, Yue Wu, Zijing Zhao, Qingchao Chen, Yuxin Peng IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2026 [ PDF] [ Code] [公众号] [ Bibtex] |
|
Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement Winner of ACM-MM 2025 Identity-Preserving Video Generation Challenge Jiayi Gao, Changcheng Hua, Qingchao Chen, Yuxin Peng, Yang Liu† ACM International Conference on Multimedia (ACM-MM), 2025 [ PDF] [Project Page] [ Code] [ Challenge] [ Bibtex] |
|
ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools Winner of CVPR-2025 Cultural VQA challenge Shaofeng Yin, Ting Lei, Yang Liu† International Conference on Computer Vision (ICCV), 2025 [ PDF] [Project Page] [ Code] [ Challenge] [公众号] [ Bibtex] |
|
Weakly and Single-Frame Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal Learning Minghang Zheng, Yanjie Huang,Qingchao Chen, Yuxin Peng, Yang Liu† IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025 [ PDF] [ Bibtex] |
|
Large-Scale Pre-trained Models Empowering Phrase Generalization in Temporal Sentence Localization Yang Liu, Minghang Zheng, Qingchao Chen, Shaogang Gong, Yuxin Peng International Journal of Computer Vision (IJCV), 2025 [ PDF] [ Bibtex] |
|
Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding Minghang Zheng, Yuxin Peng, Benyuan Sun, Yi Yang, Yang Liu† International Conference on Computer Vision (ICCV), 2025 [ PDF] [Project Page] [ Code] [ Bibtex] |
|
Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration Ting Lei, Shaofeng Yin, Qingchao Chen, Yuxin Peng, Yang Liu† International Conference on Computer Vision (ICCV), 2025 [ PDF] [Project Page] [ Code] [公众号] [ Bibtex] |
|
Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced In-domain Knowledge Transferring Zhu Xu, Ting Lei, Zhimin Li, Guan Wang, Qingchao Chen, Yuxin Peng, Yang Liu† International Conference on Computer Vision (ICCV), 2025 [ PDF] [Project Page] [ Code] [ Bibtex] |
|
AR-VRM: Imitating Human Motions for Visual Robot Manipulation with Analogical Reasoning Dejie Yang, Zijing Zhao, Yang Liu† International Conference on Computer Vision (ICCV), 2025 [ PDF] [Project Page] [ Code] [公众号] [ Bibtex] |
|
VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges Yuxuan Wang, Yiqi Song, Cihang Xie, Yang Liu, Zilong Zheng International Conference on Computer Vision (ICCV), 2025 [ PDF] [Project Page] [ Code] [公众号] [ Bibtex] |
|
DisTime: Distribution-based Time Representation for Video Large Language Models Yingsen Zeng, Zepeng Huang, Yujie Zhong, Chengjian Feng, Jie Hu, Lin Ma, Yang Liu International Conference on Computer Vision (ICCV), 2025 [ PDF] [Dataseat] [ Code] [ Bibtex] |
|
Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing Zhuoying Li, Zhu Xu, Yuxin Peng, Yang Liu† International Conference on Machine Learning (ICML), 2025 [ PDF] [Project Page] [ Code] [Video] [ Bibtex] |
|
ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer Jiayi Gao, Zijin Yin, Changcheng Hua, Yuxin Peng, Kongming Liang,Zhanyu Ma,Jun Guo, Yang Liu† IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025 [ PDF] [Project Page] [ Code] [Video] [ Bibtex] |
|
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation Yiming Qin, Zhu Xu, Yang Liu† IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025 [ PDF] [Project Page] [ Code] [Video] [公众号] [ Bibtex] |
|
Customized Human Object Interaction Image Generation Zhu Xu, Zhaowen Wang, Yuxin Peng, Yang Liu† ACM International Conference on Multimedia (ACM-MM), 2025 [ PDF] [Project Page] [ Code] [ Bibtex] |
|
InteractMove: Text-Controlled Human-Object Interaction Generation in 3D Scenes with Movable Objects Xinhao Cai, Minghang Zheng, Xin Jin, Yang Liu† ACM International Conference on Multimedia (ACM-MM), 2025 [ PDF] [Project Page] [ Code] [ Bibtex] |
|
Advancing 3D Scene Understanding with MV-ScanQA Multi-View Reasoning Evaluation and TripAlign Pre-training Dataset Wentao Mo, Qingchao Chen, Yuxin Peng, Siyuan Huang, Yang Liu† ACM International Conference on Multimedia (ACM-MM) Dataset Track, 2025 [PDF] [Project Page] [ Code] [ Bibtex] |
|
Investigating Domain Gaps for Indoor 3D Object Detection Zijing Zhao, Zhu Xu, Qingchao Chen, Yuxin Peng, Yang Liu† ACM International Conference on Multimedia (ACM-MM) Dataset Track, 2025 [PDF] [Project Page] [ Code] [ Bibtex] |
|
Learn 3D VQA Better with Active Selection and Reannotation Shengli Zhou, Yang Liu,Feng Zheng ACM International Conference on Multimedia (ACM-MM), 2025 [ PDF] [ Code] [ Bibtex] |
|
PlanLLM: Video Procedure Planning with Refinable Large Language Models Dejie Yang, Zijing Zhao, Yang Liu† Conference on Artificial Intelligence (AAAI), 2025 [ PDF] [Project Page] [ Code] [公众号] [ Bibtex] |
|
Generative Video Diffusion for Unseen Cross-Domain Video Moment Retrieval Dezhao Luo, Shaogang Gong, Jiabo Huang, Hailin Jin, Yang Liu† Conference on Artificial Intelligence (AAAI), 2025 [ PDF] [Project Page] [ Bibtex] |
|
3D Weakly Supervised Visual Grounding at Category and Instance Levels Xiaoqi Li, Jiaming Liu, Yandong Guo, Hao Dong, Yang Liu† International Conference on Robotics and Automation (ICRA), 2025 [ PDF] [ Bibtex] |
|
Zero Shot Domain Adaptive Semantic Segmentation by Synthetic Data Generation and Progressive Adaptation Jun Luo, Zijing Zhao, Yang Liu† International Conference on Intelligent Robots and Systems(IROS), 2025 [ PDF] [ Code] [ Bibtex] |
|
Hierarchical Sub-action Tree for Continuous Sign Language Recognition Dejie Yang, Zhu Xu, Xinjie Gao, Yang Liu† IEEE International Conference on Multimedia&Expo (ICME), 2025 [ PDF] [Project Page] [ Code] [ Bibtex] |
|
Semantic-Aware Human Object Interaction Image Generation Zhu Xu, Qingchao Chen, Yuxin Peng, Yang Liu† International Conference on Machine Learning (ICML), 2024 [ PDF] [Project Page] [Video] [ Code] [公众号] [ Bibtex] |
|
Training Free Video Temporal Grounding using Large-scale Pre-trained Models Minghang Zheng, Xinhao Cai, Qingchao Chen, Yuxin Peng, Yang Liu† European Conference on Computer Vision (ECCV), 2024 [ PDF] [Project Page] [Video] [ Code] [ Bibtex] |
|
Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection Ting Lei, Shaofeng Yin, Yuxin Peng, Yang Liu† European Conference on Computer Vision (ECCV), 2024 [ PDF] [Project Page] [Video] [ Code] [公众号] [ Bibtex] |
|
Active Object Detection with Knowledge Aggregation and Distillation from Large Models Dejie Yang, Yang Liu† IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024 [ PDF] [Project Page] [Video] [ Code] [ Bibtex] |
|
Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection Ting Lei, Shaofeng Yin, Yang Liu† IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024 [ PDF] [Project Page] [Video] [ Code] [ Bibtex] |
|
OED: Towards One-stage End-to-End Dynamic Scene Graph Generation Guan Wang, Zhimin Li, Qingchao Chen, Yang Liu† IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024 [ PDF] [Project Page] [Video] [ Code] [ Bibtex] |
|
Diff-BGM: A Diffusion Model for Video Background Music Generation Sizhe Li, Yiming Qin, Minghang Zheng, Xin Jin, Yang Liu† IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024 [ PDF] [Project Page] [Video] [ Code] [ Bibtex] |
|
TeachText: CrossModal Text-Video Retrieval through Generalized Distillation Ioana Croitoru, Simion-Vlad Bogolin, Marius Leordeanuc, Hailin Jin, Andrew Zisserman, Yang Liu†, Samuel Albanie Artificial Intelligence Journal (AIJ), 2024 [ PDF] [Project Page] [Code] |
|
3D Vision and Language Pretraining with Large-Scale Synthetic Data Dejie Yang, Zhu Xu, Wentao Mo, Qingchao Chen, Siyuan Huang, Yang Liu† International Joint Conference on Artificial Intelligence (IJCAI), 2024 [ PDF] [Project Page] [Video] [ Code] [ Bibtex] |
|
Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA Wentao Mo, Yang Liu† Conference on Artificial Intelligence (AAAI), 2024 [ PDF] [Project Page] [ Code] [ Bibtex] |
|
Semantic-Guided Novel Category Discovery Weis |