|
汪钰岑 Email: wangyc@lamda.nju.edu.cn |
![]() |
![]() |
Sept. 2023 - Present : Ph.D. student in Computer Science and Technology, School of Artificial Intelligence, Nanjing University, under the supervision of Prof. De-Chuan Zhan.
Sept. 2020 - Jun. 2023 : M.Sc. in Computer Science and Technology, School of Artificial Intelligence, Nanjing University, under the supervision of Prof. De-Chuan Zhan.
Sept. 2016 - Jun. 2020 : B.Sc. in Information Management and Information System, School of Information Management, Nanjing University.
Apr. 2025 - Present : Research Intern (Reinforcement Learning), Machine Learning Group, Microsoft Research Asia
My research interests include Reinforcement Learning and World Models. Currently, I mainly focus on:
Advanced World Model learning and related topics, e.g. Latent Action Modeling and Video Foundation Models.
RL and decision making with LLMs and VLMs.
Embodied AI and Vision-Language-Action models.
I have also worked on Model-Based RL and representation learning in Visual RL.
Yucen Wang, Fengming Zhang, De-Chuan Zhan, Li Zhao, Kaixin Wang, Jiang Bian. Co-Evolving Latent Action World Models. [Paper] [Demo]
Xiaoyu Chen, Hangxing Wei, Pushi Zhang, Chuheng Zhang, Kaixin Wang, Yanjiang Guo, Rushuai Yang, Yucen Wang, Xinquan Xiao, Li Zhao, Jianyu Chen, Jiang Bian. Villa-x: enhancing latent action modeling in vision-language-action models. [Paper] [Website]
|
We propose FOUNDER, a framework that integrates the generalizable knowledge embedded in FMs with the dynamic modeling capabilities of WMs to enable open-ended decision-making in embodied environments in a reward-free manner. FOUNDER demonstrates superior performance on various multi-task offline visual control benchmarks, excelling in capturing the deep-level semantics of tasks specified by text or videos, particularly in scenarios involving complex observations or domain gaps where prior methods struggle. |
|
In this survey, we provide a comprehensive review of reward modeling techniques within the RL literature. We present an overview of recent reward modeling approaches, categorizing them based on the source, the mechanism, and the reward learning paradigm. This survey includes both established and emerging methods, filling the vacancy of a systematic review of reward models in current literature. |
|
We propose Implicit Action Generator (IAG) to learn the implicit actions of visual distractors, and present AD3, that leverages the action inferred by IAG to train separated world models. Implicit actions effectively aid in distinguishing task-irrelevant components, and the agent can optimize the policy in the task-relevant space. AD3 achieves superior performance on various visual control tasks featuring both heterogeneous and homogeneous distractors. |
|
Existing Model-based imitation learning algorithms are highly deceptive by task-irrelevant information, especially moving distractors in videos. To tackle this problem, we propose SeMAIL, decoupling the environment dynamics into two parts by task-relevant dependency, which is determined by agent actions, and training separately. Our method achieves near-expert performance on various visual imitation tasks with complex observations. |
LAMDA Excellent Student Award, 2022.
Advanced Machine Learning. (For undergraduate students, Autumn, 2023)
Computing Methods. (For undergraduate students, Spring, 2024)
Email: wangyc@lamda.nju.edu.cn
Office: Room 201, Yi-fu Building, Xianlin Campus of Nanjing University
Address: National Key Laboratory for Novel Software Technology
                 Nanjing University, Xianlin Campus
                 163 Xianlin Avenue, Qixia District, Nanjing 210023, China