![]() |
汪钰岑 Email: wangyc@lamda.nju.edu.cn |
![]() |
![]() |
Sept. 2023 - Present : Ph.D. student in Computer Science and Technology, School of Artificial Intelligence, Nanjing University, under the supervision of Prof. De-Chuan Zhan.
Sept. 2020 - Jun. 2023 : M.Sc. in Computer Science and Technology, School of Artificial Intelligence, Nanjing University, under the supervision of Prof. De-Chuan Zhan.
Sept. 2016 - Jun. 2020 : B.Sc. in Information Management and Information System, School of Information Management, Nanjing University.
Apr. 2025 - Present : Research Intern (Reinforcement Learning), Machine Learning Group, Microsoft Research Asia
My research interests include Machine Learning and Data Mining, especially Reinforcement Learning and World Models.
![]() |
We propose Implicit Action Generator (IAG) to learn the implicit actions of visual distractors, and present AD3, that leverages the action inferred by IAG to train separated world models. Implicit actions effectively aid in distinguishing task-irrelevant components, and the agent can optimize the policy in the task-relevant space. AD3 achieves superior performance on various visual control tasks featuring both heterogeneous and homogeneous distractors. |
![]() |
Existing Model-based imitation learning algorithms are highly deceptive by task-irrelevant information, especially moving distractors in videos. To tackle this problem, we propose SeMAIL, decoupling the environment dynamics into two parts by task-relevant dependency, which is determined by agent actions, and training separately. Our method achieves near-expert performance on various visual imitation tasks with complex observations. |
![]() |
We propose FOUNDER, a framework that integrates the generalizable knowledge embedded in FMs with the dynamic modeling capabilities of WMs to enable open-ended decision-making in embodied environments in a reward-free manner. FOUNDER demonstrates superior performance on various multi-task offline visual control benchmarks, excelling in capturing the deep-level semantics of tasks specified by text or videos, particularly in scenarios involving complex observations or domain gaps where prior methods struggle. |
![]() |
In this survey, we provide a comprehensive review of reward modeling techniques within the RL literature. We present an overview of recent reward modeling approaches, categorizing them based on the source, the mechanism, and the reward learning paradigm. This survey includes both established and emerging methods, filling the vacancy of a systematic review of reward models in current literature. |
LAMDA Excellent Student Award, 2022.
Advanced Machine Learning. (For undergraduate students, Autumn, 2023)
Computing Methods. (For undergraduate students, Spring, 2024)
Email: wangyc@lamda.nju.edu.cn
Office: Room 201, Yi-fu Building, Xianlin Campus of Nanjing University
Address: National Key Laboratory for Novel Software Technology
                 Nanjing University, Xianlin Campus
                 163 Xianlin Avenue, Qixia District, Nanjing 210023, China