Fan-Ming Luo @ LAMDA


Fan-Ming Luo
Ph.D. Student, LAMDA Group
School of Artificial Intelligence
National Key Laboratory for Novel Software Technology
Nanjing University, Nanjing 210023, China

Supervisor: Prof. Yang Yu

Email: luofm {AT}
Laboratory: Shaoyifu Building, Xianlin Campus of Nanjing University


Currently I am a third year Ph.D. student of School of Artificial Intelligence in Nanjing University and a member of LAMDA Group, which is led by professor Zhi-Hua Zhou.

I received my B.Sc. degree in School of Physics in June 2019 from Nanjing University. In September 2019, I was admitted to study for a M.Sc. degree in Nanjing University under the supervision of Prof. Yang Yu without entrance examination, respectively. From September 2021, I started my Ph.D. degree under the supervision of Prof. Yang Yu.

Research Interests

My research interest is reinforcement learning. Especially, I am interested in


Fan-Ming Luo, Zuolin Tu, Zefang Huang, Yang Yu. Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate, CoRR abs/2405.15384, 2024. [paper] [code]

Fan-Ming Luo, Xingchen Cao, Yang Yu. Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble, CoRR abs/2206.00238, 2022. [paper]

Rong-Jun Qin, Fan-Ming Luo, Hong Qian, Yang Yu. Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games, CoRR abs/2208.09452, 2022. [paper]

Conference Papers

Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu. Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning, In: Proceedings of the 12th International Conference on Learning Representations (ICLR'24), Vienna, Austria, 2024. [paper] (Spotlight)

Fan-Ming Luo, Shengyi Jiang, Yang Yu, Zongzhang Zhang, Yi-Feng Zhang. Adapt to Environment Sudden Changes by Learning a Context Sensitive Policy. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI'22), virtual, 2022. [paper] (Oral)

Xingchen Cao, Fan-Ming Luo, Junyin Ye, Tian Xu, Zhilong Zhang, Yang Yu. Limited Preference Aided Imitation Learning from Imperfect Demonstrations. In: Proceedings of the 40th International Conference on Machine Learning (ICML'24), Vienna, Austria, 2024. [paper]

Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Tony Qin, Shang Wenjie, Jieping Ye. Offline Model-based Adaptable Policy Learning. In: Advances in Neural Information Processing Systems 34 (NeurIPS'21), virtual, 2021. [paper]

Journal Papers

Fan-Ming Luo, Tian Xu, Hang Lai, Xiong-Hui Chen, Weinan Zhang, Yang Yu. A Survey on Model-based Reinforcement Learning, SCIENCE CHINA Information Sciences (SCIS), 67(2):121101, 2024. [paper]

Xiong-Hui Chen*, Fan-Ming Luo*, Yang Yu, Qingyang Li, Zhiwei Tony Qin, Shang Wenjie, Jieping Ye. Offline Model-Based Adaptable Policy Learning for Decision-Making in Out-of-Support Regions, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 45(12):15260–15274, 2023. [paper]

Yi-Feng Zhang, Fan-Ming Luo, Yang Yu. Improve Generated Adversarial Imitation Learning with Reward Variance Regularization, Machine Learning, 111(3):977-995, 2022. [paper]

Teaching Assistant

  • Introduction to Reinforcement Learning (with Prof. Yang Yu; for both undergraduate and graduate students), Fall, 2021.
  • Awards & Honors

  • National Scholarship for Doctoral Students 2021 (2021年度博士国家奖学金)
  • LAMDA Elite Award 2021 (2021年度LAMDA英才奖)
  • Correspondence

    Email: luofm {AT}

    Laboratory: Shaoyifu Building, Xianlin Campus of Nanjing University

    Address: National Key Laboratory for Novel Software Technology, Nanjing University, Xianlin Campus Mailbox 603, 163 Xianlin Avenue, Qixia District, Nanjing 210023, China
    (南京市栖霞区仙林大道163号, 南京大学仙林校区603信箱, 软件新技术国家重点实验室, 210023.)