Peng-Yuan Wang @ LAMDA, NJU-AI

王鹏远
Peng-Yuan Wang
Ph.D. Student, LAMDA Group
School of Artificial Intelligence
National Key Laboratory for Novel Software Technology
Nanjing University, Nanjing 210023, China

Supervisor: Prof. Yang Yu

Email: wangpy@lamda.nju.edu.cn
Laboratory: Shaoyifu Building, Xianlin Campus of Nanjing University

Biography

Currently I am a second-year Ph.D. student of School of Artificial Intelligence in Nanjing University and a member of LAMDA Group, led by professor Zhi-Hua Zhou.

I received my B.Sc. degree from School of Computer Science, Northwestern Polytechnical University (NPU), China in June 2022. In September 2022, I was admitted to pursue a M.Sc. degree in Nanjing University, under the supervision of Professor Yang Yu.

Research Interests

My research focus on the algorithm design in reinforcement learning. Currently, I am working on large language models and reinforcement learning.

Feel free to contact me if you want to discuss some ideas.

Publications

Surveys

Peng-Yuan Wang*, Tian-Shuo Liu*, Chenyang Wang, Ziniu Li, Yi-Di Wang, Shu Yan, Cheng-Xing Jia, Xu-Hui Liu, Xin-Wei Chen, Jia-Cheng Xu, Yang Yu. A Survey on Large Language Models for Mathematical Reasoning. ACM Computing Surveys (CSUR) 2026 [PDF]

Ziniu Li, Peng-Yuan Wang, Tian Xu, Tian Ding, Ruoyu Sun, Yang Yu. Review of Reinforcement Learning for Large Language Models: Formulations, Algorithms, and Opportunities. Submitted to TMLR [PDF]

Preprints

Peng-Yuan Wang*, ZiLiang Guo*, Ziniu Li*, Chenyang Wang, Tian-Shuo Liu, Congliang Chen, Xinwei Chen, Tian Ding, Ruoyu Sun, Yang Yu. Long CoT In-Context Learning Can Empower Pre-trained LLMs. [PDF]

Chengxing Jia, Peng-Yuan Wang, Ziniu Li, Yi-Chen Li, Nan Tang, Yang Yu. BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation. arXiv:2405.17039 [PDF]

Zhilong Zhang*, Ruifeng Chen*, Junyin Ye*, Yihao Sun, Peng-Yuan Wang, Jingcheng Pang, Kaiyuan Li, Tianshuo Liu, Haoxin Lin, Yang Yu, Zhi-Hua Zhou. WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making. arXiv:2411.05619 [PDF]

Conference Papers

2025

Peng-Yuan Wang*, Jing-Cheng Pang*, Chen-Yang Wang*, Xuhui Liu, Tian-Shuo Liu, Si-Hang Yang, Hong Qian, Yang Yu. InCLET: Large Language Model In-context Learning can Improve Embodied Instruction-following. In Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS'25), 2025 [PDF]

Chengxing Jia, Ziniu Li, Peng-Yuan Wang, Yi-Chen Li, Zhenyu Hou, Yuxiao Dong, Yang Yu. Controlling Large Language Model with Latent Actions. In Proceedings of the 12th International Conference on Learning Representations (ICML'25), 2025 [PDF]

Tian-Shuo Liu, Xu-Hui Liu, Ruifeng Chen, Lixuan Jin, Peng-Yuan Wang, Zhilong Zhang, Yang Yu. Semantic Skill Extraction via Vision-Language Model Guidance for Efficient Reinforcement Learning. In Proceedings of the 12th International Conference on Learning Representations (ICML'25), 2025 [PDF]

2024

Jing-Cheng Pang*, Peng-Yuan Wang*, Kaiyuan Li, Xiong-Hui Chen, Jiacheng Xu, Zongzhang Zhang, Yang Yu. Language model self-improvement by reinforcement learning contemplation. In Proceedings of the 12th International Conference on Learning Representations (ICLR'24), 2024 [PDF]

Journal Papers

Peng-Yuan Wang*, Chen-Yang Wang*, Jing-Cheng Pang*, Xuhui Liu, Tian-Shuo Liu, Si-Hang Yang, Hong Qian, Yang Yu. nCLET: LLM In-Context Learning Enhances Embodied Robot Instruction Following. Submitted to T-PAMI

Jing-Cheng Pang*, Heng-Bo Fan*, Peng-Yuan Wang*, Jia-Hao Xiao*, Nan Tang, Si-Hang Yang, Chengxing Jia, Sheng-Jun Huang, Yang Yu. Interactive Large Language Models for Reliable Answering under Incomplete Context. Transactions on Machine Learning Research (TMLR), 2025 [PDF]

* Equal Contribution

Correspondence

Email: wangpy@lamda.nju.edu.cn

Laboratory: Shaoyifu Building, Xianlin Campus of Nanjing University

Address: Peng-Yuan Wang, National Key Laboratory for Novel Software Technology, Nanjing University, Xianlin Campus Mailbox 603, 163 Xianlin Avenue, Qixia District, Nanjing 210023, China
(南京市栖霞区仙林大道163号, 南京大学仙林校区603信箱, 软件新技术国家重点实验室, 210023.)