Yu-Ren Liu @ LAMDA, NJU-CS


Yu-Ren Liu
Ph.D. student, LAMDA Group
Department of Computer Science and Technology
National Key Laboratory for Novel Software Technology
Nanjing University, Nanjing 210023, China

Supervisor: Prof. Yang Yu
Co-supervisor: Prof. Kun Zhang from CMU & MBZUAI

Email: liuyr@lamda.nju.edu.cn
Laboratory: Computer Science Building, Xianlin Campus of
Nanjing University

About Me

I am a Ph.D. student of Department of Computer Science and Technology and a member of LAMDA Group, led by professor Zhi-Hua Zhou. Before my Ph.D. career, I was an undergraduate student of Kuang Yaming Honors School (recommend enrollment without requiring taking the college entrance examination), Nanjing University and received my B.Sc. degree in Computer Science and Technology on June, 2018. In the same year, I was admitted to study for a Ph.D. degree at Nanjing University. Now I'm focused on Reinforcement Learning (RL), including some of its subfields such as Model-based (Online/Offline) RL, Inverse RL and Causal RL. I have great interest in the cross domain of ML and quantitative finance.


Research Experience

Causal Reinforcement Learning. 2021.9 – present
Causal reinforcement learning is an area of research that combines ideas from causal inference and reinforcement learning to improve decision-making in sequential environments. Along this line, I am focused on learning causal representations for reinforcement learning. In the work of "Learning World Models with Identifiable Factorization", we propose a novel method to learn world models with disentangled latent process. Our work extends the theoretical results in previous work to enable block-wise identifiability of four categories of latent variables in general nonlinear case. Our method achieves the state-of-the-art performance in variants of the DeepMind Control Suite and RoboDesk with noisy distractors. In the work of "Learning De-biased Environment Model for Delivery Incentive Policy Optimization in Food Delivery Platforms", we propose to learn a de-biased environment model for policy optimization in food delivery platforms. Our policy optimization framework significantly reduces the customer complaint rate (Meituan) in the A/B tests. Currently, we are exploring to learn and utilize causal representations in nonstationary/heterogenous environments with the change of reward function, observation function, or transition dynamics.

Derivative-free Optimization. 2018.9 – 2021.6
Derivative-free optimization (DFO) is a class of optimization methods that aim to find the minimum or maximum of a function without using explicit derivatives. In the work of “Asynchronous Classification-Based Optimization”, we propose to accelerate the classification-based optimization method based on asynchronous parallelization. We show in experiments that our method can achieve almost linear speedup while preserving good solution quality. In the work of “ZOOpt: Toolbox for derivative-free optimization”, we opensource a toolbox that implements a series of classification-based optimization methods and pareto optimization methods. In the work of “COVID-19 Asymptomatic Infection Estimation”, we design a fine-grained infectious disease transmission simulator, where the parameters for setting the simulator are learned based on derivative-free optimization methods.


Github: https://github.com/AlexLiuyuren?tab=repositories

Publication list

Conference Paper

Journal Paper



Teaching Assistant

Awards & Honors



Email: liuyr@lamda.nju.edu.cn

Laboratory: Computer Science Building, Xianlin Campus of Nanjing University

Address: Yu-Ren Liu, National Key Laboratory for Novel Software Technology, Nanjing University, 163 Xianlin Avenue, Qixia District, Nanjing 210023, China
(南京市栖霞区仙林大道163号, 南京大学仙林校区, 软件新技术国家重点实验室, 210023.)