Research

I specialize in reinforcement learning with a focus on imitation learning (IL). My research primarily concerns improving the sample efficiency of IL. I utilize ideas from statistical learning theory and optimization theory to analyze and design IL methods.

Background:

Imitation learning (IL) trains good policies from expert demonstrations, and it has been applied in various domains such as robotics and recommendation systems.
Behavioral cloning (BC) and adversarial imitation learning (AIL) are two representative IL algorithms that have been widely studied through experiments.
However, despite their empirical progress, the theoretical foundations of these algorithms are not well established.
Our research aims to close this gap and establish theoretical foundations for IL. Our works (finished with great collaborators) towards this goal are summarized below.

Contribution:

In [1, 2], we develop a general analysis framework upon which error bounds of imitating policies and environments are studied.

In [3], we present the first reduction of offline AIL methods to BC, suggesting that AIL cannot outperform BC in the offline setting.

In [4], we provide the first horizon-free imitation gap bound of AIL, which explains the superior performance of online AIL methods.

In [5], we develop a provably efficient online AIL method MB-TAIL, which is the first to achieve the minimax optimal expert sample complexity in the unknown transition setting. Besides, MB-TAIL also enjoys the best-known environment interaction complexity.

In [6], we present a theoretical analysis of offline imitation learning with supplementary dataset.

In [7], we propose a new AIL algorithm OPT-AIL, which is the first provably efficient AIL algorithm with general function approximation.

Reference:

[1] Xu, T., Li, Z., and Yu, Y. Error Bounds of Imitating Policies and Environments. NeurIPS 2020.

[2] Xu, T., Li, Z., and Yu, Y. Error Bounds of Imitating Policies and Environments for Reinforcement Learning. TPAMI 2021.

[3] Li, Z., Xu, T., Yu, Y., and Luo, Z.-Q. Rethinking ValueDice: Does It Really Improve Performance? ICLR 2021.

[4] Xu, T., Li, Z., Yu, Y., and Luo, Z.-Q. Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis. arXiv:2208.01899.

[5] Xu, T., Li, Z., and Yu, Y. Provably Efficient Adversarial Imitation Learning with Unknown Transitions. UAI 2023.

[6] Li, Z., Xu, T., Yu, Y., and Luo, Z.-Q. Imitation Learning from Imperfection: Theoretical Justifications and Algorithms. NeurIPS 2023.

[7] Xu, T., Zhang, Z., Chen, R., Sun, Y., and Yu, Y. Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation. NeurIPS 2024