Sheng-Hua Wan @ LAMDA, NJU-AI

wansh.jpg 

万盛华
Sheng-Hua Wan
Ph.D. candidate, LAMDA Group
School of Artificial Intelligence
Nanjing University, Nanjing 210023, China

Email: wansh [at] lamda.nju.edu.cn
Github: https://github.com/yixiaoshenghua
Google Scholar personal


Short Biography

I received my B.Sc. degree of GIS from Nanjing University, in June 2021. In the same year, I was admitted to study for a Ph.D. degree in Nanjing University without entrance examination in the LAMDA Group led by professor Zhi-Hua Zhou, under the supervision of Prof. De-Chuan Zhan.

Research Interests

My research interest includes Reinforcement Learning and its real-world applications, and mainly focus on sim2real problems:

Publications - Conference

WSFG 
  • Sheng-hua Wan, Haihang Sun, Le Gan, De-chuan Zhan. MOSER: Learning Sensory Policy for Task-specific Viewpoint via View-conditional World Model. In Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI-2024), Jeju, South Korea, 2024.[Paper] [Code]

  • Existing visual RL algorithms mostly rely on a single observation from a well-designed fixed camera that requires human knowledge. Recent studies learn from different viewpoints with multiple fixed cameras, but this incurs high computation and storage costs and does not guarantee the coverage of the optimal viewpoint. To address these issues, we propose the View-conditional Markov Decision Process (VMDP) assumption and develop a new method, the MOdel-based SEnsor controlleR (MOSER), based on VMDP. MOSER jointly learns a view-conditional world model (VWM) to simulate the environment, a sensory policy to control the camera, and a motor policy to complete tasks. We design intrinsic rewards from the VWM without additional modules to guide the sensory policy to adjust the camera parameters.

WSFG 
  • Sheng-hua Wan, Yu-cen Wang, Ming-hao Shao, Ru-ying Chen, De-chuan Zhan. SeMAIL: Eliminating Distractors in Visual Imitation vis Separated Models. In Proceedings of the 40th International Conference on Machine Learning (ICML-2023), Honolulu, Hawaii, USA, 2023. [Paper] [Code]

  • Existing Model-based imitation learning algorithms are highly deceptive by task-irrelevant information, especially moving distractors in videos. To tackle this problem, we propose a new algorithm - named Separated Model-based Adversarial Imitation Learning (SeMAIL) - decoupling the environment dynamics into two parts by task-relevant dependency, which is determined by agent actions, and training separately.

Publications - Journal

WSFG 
  • Wen-ye Wang, Sheng-hua Wan, Peng-feng Xiao, Xue-liang Zhang. A Novel Multi-Training Method for Time-Series Urban Green Cover Recognition From Multitemporal Remote Sensing Images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2022), 15, 9531-9544. [Paper] [Code] (Completed at undergraduate)

  • We designed a general multitemporal framework to extract urban green cover using multi-training, a novel semi-supervised learning method for land cover classification on multitemporal remote sensing images.

Preprints

WSFG 
  • Shenghua Wan, Ziyuan Chen, Shuai Feng, Le Gan, De-Chuan Zhan. SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets. [Paper] [Code]

  • Model-based offline reinforcement Learning (RL) is a promising approach that leverages existing data effectively in many real-world applications, especially those involving high-dimensional inputs like images and videos. To alleviate the distribution shift issue in offline RL, existing model-based methods heavily rely on the uncertainty of learned dynamics. However, the model uncertainty estimation becomes significantly biased when observations contain complex distractors with non-trivial dynamics. To address this challenge, we propose a new approach - Separated Model-based Offline Policy Optimization (SeMOPO) - decomposing states into endogenous and exogenous parts via conservative sampling and estimating model uncertainty on the endogenous states only. We provide a theoretical guarantee of model uncertainty and performance bound of SeMOPO, and construct the Low-Quality Vision Deep Data-Driven Datasets for RL (LQV-D4RL).

WSFG 
  • Kaichen Huang*, Hai-Hang Sun*, Shenghua Wan, Minghao Shao, Shuai Feng, Le Gan, De-Chuan Zhan. DIDA: Denoised Imitation Learning based on Domain Adaptation. [Paper] [Code]

  • Imitating skills from low-quality datasets, such as sub-optimal demonstrations and observations with distractors, is common in real-world applications. In this work, we focus on the problem of Learning from Noisy Demonstrations (LND), where the imitator is required to learn from data with noise that often occurs during the processes of data collection or transmission. To alleviate the above problems, we propose Denoised Imitation learning based on Domain Adaptation (DIDA), which designs two discriminators to distinguish the noise level and expertise level of data, facilitating a feature encoder to learn task-related but domain-agnostic representations.

WSFG 
  • Kaichen Huang*, Minghao Shao*, Shenghua Wan, Hai-Hang Sun, Shuai Feng, Le Gan, De-Chuan Zhan. SENSOR: Imitate Third-Person Expert's Behaviors via Active Sensoring. [Paper] [Code]

  • In many real-world visual Imitation Learning (IL) scenarios, there is a misalignment between the agent’s and the expert’s perspectives, which might lead to the failure of imitation. Previous methods have generally solved this problem by domain alignment, which incurs extra computation and storage costs, and these methods fail to handle the hard cases where the viewpoint gap is too large. To alleviate the above problems, we introduce active sensoring in the visual IL setting and propose a model-based SENSory imitatOR (SENSOR) to automatically change the agent's perspective to match the expert's. SENSOR jointly learns a world model to capture the dynamics of latent states, a sensor policy to control the camera, and a motor policy to control the agent.

WSFG 
  • Shaowei Zhang, Dian Cheng, Shenghua Wan, Xiaolong Yin, Lu Han, Shuai Feng, Le Gan, De-Chuan Zhan. Efficient Online Reinforcement Learning with Cross-Modality Offline Data. [Paper] [Code]

  • Observation space variation brings great challenges in transfer reinforcement learning, common in many real-world problems such as software updates and sensor replacement. Existing approaches primarily focus on policy transfer across similar observation spaces while neglecting the potential of utilizing cross-modality offline data. To address this limitation, we propose CROss-MOdality Shared world model (CROMOS), a co-modality framework that trains an environmental dynamics model in the latent space by simultaneously aligning the source and target modality data to it for the subsequent policy training. We conduct a theoretical proof of the data utilization effectiveness and provide a practical implementation for our framework.

WSFG 
  • Yucen Wang*, Shenghua Wan*, Le Gan, Shuai Feng, De-Chuan Zhan. AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors. [Paper] [Code]

  • Model-based methods have significantly contributed to distinguishing task-irrelevant distractors for visual control. However, prior research has primarily focused on heterogeneous distractors like noisy background videos, leaving homogeneous distractors that closely resemble controllable agents largely unexplored, which poses significant challenges to existing methods. To tackle this problem, we propose Implicit Action Generator (IAG) to learn the implicit actions of visual distractors, and present a new algorithm named implicit Action-informed Diverse visual Distractors Distinguisher (AD3), that leverages the action inferred by IAG to train separated world models. Implicit actions effectively capture the behavior of background distractors, aiding in distinguishing the task-irrelevant components, and the agent can optimize the policy within the task-relevant state space.

Selected Honors

Presidential Special Scholarship for first year Ph.D. Student in Nanjing University, 2021.

Outstanding Graduate of Nanjing University, 2021.

Winner of the Ping An Insurance Data Mining Competition, 2021.

2-nd place in ZhongAn Cup Insurance Data Mining Competition, 2020.

Teaching Assistant

Introduction to Machine Learning. (For undergraduate students, Spring, 2022)

Correspondence

Email: wansh [at] lamda.nju.edu.cn
Office: Yifu Building, Xianlin Campus of Nanjing University
Address: National Key Laboratory for Novel Software Technology, Nanjing University, Xianlin Campus Mailbox 603, 163 Xianlin Avenue, Qixia District, Nanjing 210023, China

(南京市栖霞区仙林大道163号, 南京大学仙林校区603信箱, 软件新技术国家重点实验室, 210023.)

14,332 Total Pageviews