Publication

*: indicating equal contribution.

2026

Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints
Tian Xu, Chenyang Wang, Xiaochen Zhai, Ziniu Li, Yi-Chen Li, Lei Yuan, Yang Yu
In Proceedings of the Forty-third International Conference on Machine Learning (ICML), 2026

Efficient Policy-Reward Co-Pretraining for Adversarial Imitation Learning
Tian Xu, Zexuan Chen, Zhilong Zhang, Yi-Chen Li, Chenyang Wang, Lei Yuan, Yang Yu
In Proceedings of the Forty-third International Conference on Machine Learning (ICML), 2026

Provably Efficient Offline Adversarial Imitation Learning with General Function Approximation
Tian Xu, Wenqi Lai, Lei Yuan, Yang Yu
SCIENCE CHINA Information Sciences, 2026

Off-Policy Value-Based Reinforcement Learning for Large Language Models
Peng-Yuan Wang*, Ziniu Li*, Tian Xu*, Bohan Yang, Tian-Shuo Liu, ChenYang Wang, Xiong-Hui Chen, Yi-Chen Li, Tianyun Yang, Chongliang Chen, Yang Yu
arxiv preprint: 2603.23355.

Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis [Slides]
Tian Xu*, Ziniu Li*, Yang Yu, Zhi-Quan Luo
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2026

Adversarial Imitation Learning with General Function Approximation: Theoretical Analysis and Practical Algorithms.
Tian Xu*, Zhilong Zhang*, Zexuan Chen, Ruishuo Chen, Yihao Sun, Yang Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2026

Can Reinforcement Learning Achieve Expert-level Placement?
Ruo-Tong Chen, Ke Xue, Chengrui Gao, Yunqi Shi, Tian Xu, Peng Xie, Siyuan Xu, Mingxuan Yuan, Chao Qian, Zhi-Hua Zhou
In Proceedings of Design Automation Conference.

2025

Generalist Reward Models: Found Inside Large Language Models
Yi-Chen Li*, Tian Xu*, Yang Yu*, Xuqin Zhang, Xiong-Hui Chen, Zhongxiang Ling, Ningjing Chao, Lei Yuan, Zhi-Hua Zhou.
arxiv preprint: 2506.23235.

Reinforcement Learning With Sparse-Executing Actions via Sparsity Regularization
Jing-Cheng Pang, Tian Xu, Shengyi Jiang, Yu-Ren Liu, Yang Yu
In IEEE Transactions on Neural Networks and Learning Systems 2025.

Improving Reward Model Generalization from Adversarial Process Enhanced Preferences.
Zhilong Zhang*, Tian Xu*, Xinghao Du*, Xingchen Cao, Yihao Sun, and Yang Yu.
In Proceedings of the Forty-second International Conference on Machine Learning (ICML), 2025

Preserving Diversity in Supervised Fine-Tuning of Large Language Models.
Ziniu Li, Congliang Chen, Tian Xu, Zeyu Qin, Jiancong Xiao, Ruoyu Sun, Zhi-Quan Luo
In Proceedings of the 13th International Conference on Learning Representations (ICLR), 2025

2024

Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation.
Tian Xu*, Zhilong Zhang*, Ruishuo Chen, Yihao Sun, Yang Yu
In Advances in Neural Information Processing Systems 38 (NeurIPS), 2024

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models.
Ziniu Li, Tian Xu, Yushun Zhang, Zhi-han Lin, Yang Yu, Ruoyu Sun, Zhi-Quan Luo
In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

Preference Aided Imitation Learning from Imperfect Demonstrations.
Xingchen Cao, Fanming Luo, Junjin Ye, Tian Xu, Zhilong Zhang, Yang Yu
In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

Policy Optimization in RLHF: The Impact of Out-of-preference Data.
Ziniu Li*, Tian Xu*, Yang Yu
In Tiny Paper Track of the 12th International Conference on Learning Representations (ICLR), 2024

Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning.
Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu
In Proceedings of the 12th International Conference on Learning Representations (ICLR, spotlight, acceptance rate < 5% ), 2024

Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning
Chengxing Jia, Chenxiao Gao, Hao Yin, Fuxiang Zhang, Xiong-Hui Chen, Tian Xu, Lei Yuan, Zongzhang Zhang, Yang Yu, Zhi-Hua Zhou
In Proceedings of the 12th International Conference on Learning Representations (ICLR), 2024

2023

Imitation Learning from Imperfection: Theoretical Justifications and Algorithms.
Ziniu Li*, Tian Xu*, Zeyu Qin, Yang Yu, Zhi-Quan Luo
In Advances in Neural Information Process System 36 (NeurIPS, spotlight, acceptance rate < 5 % ), 2023.
An early version is accepted in DMLR Workshop: Data-centric Machine Learning Research in ICML 2023.

Model Gradient: Unified Model and Policy Learning in Model-based Reinforcement Learning.
Chengxing Jia*, Fuxiang Zhang*, Tian Xu, Jing-Cheng Pang, Zongzhang Zhang, Yang Yu.
Frontiers of Computer Science, 2023.

Provably Efficient Adversarial Imitation Learning with Unknown Transitions [Poster] [Video] [Oral Slides]
Tian Xu*, Ziniu Li*, Yang Yu, Zhi-Quan Luo
In Proceedings of the 39th Conference on Uncertainty in Artificial Intelligence (UAI, oral presentation, acceptance rate < 3% ), 2023

A Survey on Model-based Reinforcement Learning
Fan-Ming Luo, Tian Xu, Hang Lai, Xiong-Hui Chen, Weinan Zhang, Yang Yu
SCIENCE CHINA Information Sciences, 2023

2022

Rethinking ValueDice: Does It Really Improve Performance?
Ziniu Li*, Tian Xu*, Yang Yu, Zhi-Quan Luo
In Proceedings of the 10th International Conference on Learning Representations (ICLR) (Blog Track), 2022

A Note on Target Q-learning for Solving Finite MDPs with A Generative Oracle
Ziniu Li*, Tian Xu*, Yang Yu
Submitted to Artificial Intelligence

2021

More Efficient Adversarial Imitation Learning Algorithms With Known and Unknown Transitions
Tian Xu*, Ziniu Li*, Yang Yu
Ecological Theory of RL Workshop in NeurIPS 2021.

A Concise Introduction to Imitation Learning (In Chinese)
Tian Xu, Ziniu Li, Yang Yu

Error Bounds of Imitating Policies and Environments for Reinforcement Learning [Appendix]
Tian Xu, Ziniu Li, Yang Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021

2020

Error Bounds of Imitating Policies and Environments
Tian Xu, Ziniu Li, Yang Yu
In Advances in Neural Information Processing Systems 33 (NeurIPS), 2020.

2019

On Value Discrepancy of Imitation Learning
Tian Xu, Ziniu Li, Yang Yu
arXiv:1911.07027