Yu-Ren Liu @ LAMDA, NJU-CS

刘驭壬
Yu-Ren Liu
Ph.D. student, LAMDA Group
Department of Computer Science and Technology
National Key Laboratory for Novel Software Technology
Nanjing University, Nanjing 210023, China

Supervisor: Prof. Yang Yu
Co-supervisor: Prof. Kun Zhang from CMU & MBZUAI

Email: liuyr@lamda.nju.edu.cn
Laboratory: Computer Science Building, Xianlin Campus of
Nanjing University

About Me

I am a Ph.D. student of Department of Computer Science and Technology and a member of LAMDA Group, led by professor Zhi-Hua Zhou. Before my Ph.D. career, I was an undergraduate student of Kuang Yaming Honors School (recommend enrollment without requiring taking the college entrance examination), Nanjing University and received my B.Sc. degree in Computer Science and Technology on June, 2018. In the same year, I was admitted to study for a Ph.D. degree at Nanjing University. Now I'm focused on Reinforcement Learning (RL), including some of its subfields such as Model-based (Online/Offline) RL, Inverse RL and Causal RL. I have great interest in the cross domain of ML and quantitative finance.

Education

2022.5~2023.5
Research Assistant: Machine Learning Department, MBZUAI, UAE (One-year joint-supervision Ph.D. program, supported by CSC Funding)
2018.9~present
Ph.D. student: Computer Science and Technology, Department of Computer Science and Technology, Nanjing University, China
2014.9~2018.6
B.Sc. degree: Computer Science and Technology, Kuang Yaming Honors School, Nanjing University, China.

Research Experience

Causal Reinforcement Learning. 2021.9 – present
Causal reinforcement learning is an area of research that combines ideas from causal inference and reinforcement learning to improve decision-making in sequential environments. Along this line, I am focused on learning causal representations for reinforcement learning. In the work of "Learning World Models with Identifiable Factorization", we propose a novel method to learn world models with disentangled latent process. Our work extends the theoretical results in previous work to enable block-wise identifiability of four categories of latent variables in general nonlinear case. Our method achieves the state-of-the-art performance in variants of the DeepMind Control Suite and RoboDesk with noisy distractors. In the work of "Learning De-biased Environment Model for Delivery Incentive Policy Optimization in Food Delivery Platforms", we propose to learn a de-biased environment model for policy optimization in food delivery platforms. Our policy optimization framework significantly reduces the customer complaint rate (Meituan) in the A/B tests. Currently, we are exploring to learn and utilize causal representations in nonstationary/heterogenous environments with the change of reward function, observation function, or transition dynamics.

Derivative-free Optimization. 2018.9 – 2021.6
Derivative-free optimization (DFO) is a class of optimization methods that aim to find the minimum or maximum of a function without using explicit derivatives. In the work of “Asynchronous Classification-Based Optimization”, we propose to accelerate the classification-based optimization method based on asynchronous parallelization. We show in experiments that our method can achieve almost linear speedup while preserving good solution quality. In the work of “ZOOpt: Toolbox for derivative-free optimization”, we opensource a toolbox that implements a series of classification-based optimization methods and pareto optimization methods. In the work of “COVID-19 Asymptomatic Infection Estimation”, we design a fine-grained infectious disease transmission simulator, where the parameters for setting the simulator are learned based on derivative-free optimization methods.

Codes

Github: https://github.com/AlexLiuyuren?tab=repositories

ZOOpt: I am a core developer of the open-sourced python package ZOOpt, which provides efficient derivative-free solvers as well as their parallel implementations. ZOOpt toolbox is designed easy to use and particularly focuses on optimization problems in machine learning, addressing high-dimensional, noisy, and large-scale problems.

Publication list

Conference Paper

Yu-Ren Liu, Biwei Huang, Zhengmao Zhu, Honglong Tian, Mingming Gong, Yang Yu, Kun Zhang. Learning World Models with Identifiable Factorization. In: Advances in Neural Information Processing Systems 36 (NeurIPS'23), New Orleans, Louisiana, 2023. (PDF)
Zheng-Mao Zhu, Shengyi Jiang, Yu-Ren Liu, Yang Yu, Kun Zhang. Invariant Action Effect Model for Reinforcement Learning. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI'22), Vancouver, Canada, 2022.
Yu-Ren Liu, Yi-Qi Hu, Hong Qian, Yang Yu. Asynchronous Classification-Based Optimization. In: Proceedings of the 1st International Conference on Distributed Artificial Intelligence (DAI'19), Beijing, China, 2019 (PDF).

Journal Paper

Yu-Ren Liu, Yi-Qi Hu, Hong Qian, Yang Yu, and Chao Qian. ZOOpt: Toolbox for derivative-free optimization. In: Science China Information Science. (PDF).

Manuscrips

Yu-Ren Liu, Xiong-Hui Chen, Xinyu Yang, Siyuan Xiao, Xintong Qi, Linjun Zhou, Yang Yu, Fangsheng Huang. Learning De-biased Environment Model for Delivery Incentive Policy Optimization in Food Delivery Platforms. (submitted to CIKM2024)
Zhengmao Zhu, Yu-Ren Liu, Honglong Tian, Yang Yu, Kun Zhang. Beware of Instantaneous Dependence in Reinforcement Learning. (PDF)
Jing-Cheng Pang, Tian Xu, Shengyi Jiang, Yu-Ren Liu , Yang Yu. Reinforcement Learning With Sparse-Executing Actions via Sparsity Regularization. (PDF)
Yang Yu, Yu-Ren Liu, Fan-Ming Luo, Wei Wei Tu, De-Chuan Zhang, Guo Yu, Zhi-Hua Zhou. COVID-19 Asymptomatic Infection Estimation. (PDF)

Internship

2024.1~present: Zhuoshi Fund (卓识基金)
Quantitative Researcher
2023.7~present: Meituan (美团)
Machine Learning Engineer
My work is trying to identify causal latent variables influencing transitions in order delivery scenario. This can assist delivery service providers in deducing present situations from observed decision trajectories, ultimately leading to improved policy optimization. Our method demonstrates a substantial advantage over baseline models in terms of both the identifiability of the latent variables and transition prediction accuracy.
2018.3~2018.7: Meridian Global Inc (子午投资)
Quantitative Researcher
My work centered on the automated identification of effective factors within the Chinese A-share market. Throughout this internship, I transformed the factor search problem to a derivative-free optimization problem and then developed a distributed optimization system using the Julia programming language to automate the searching process.

Teaching Assistant

Introduction to Artificial intelligence (with Prof. Yang Yu; for undergraduate students), Fall, 2019
Introduction to Machine Learning (with Prof. Zhi-hua Zhou, Prof. De-Chuan Zhan and Prof. Han-Jia Ye; for undergraduate students), Spring, 2020

Awards & Honors

第七届“互联网+”大学生创新创业大赛师生共创组国赛银奖, 2021
Second-Class Academic Scholarship for Ph.D. Students, Nanjing University, 2021
Postgraduate Elite Scholarship, Nanjing University, 2019
First-Class Academic Scholarship for Master Students, Nanjing University, 2018, 2019, 2020
Undergraduate Elite Scholarship, Nanjing University, 2015, 2016, 2017
Second-Class People's Scholarship, Nanjing University, 2015
Citi Cup Innovation and Application Contest Top 20 in China, Xi'an, 2016

Certificate

Feburaray, 2021: I passed the CFA exam level 1 at first attempt with 8 As and 2 Bs. [Performance]
November, 2020: I passed the FRM exam part 1 at first attempt with excellent grades in all four subjects. [Certificate]
[Performance]
August, 2019: I completed the courses at the Machine Learning Summer School held in Skoltech, Moscow, Russia. [Certificate]

Correspondence

Email: liuyr@lamda.nju.edu.cn

Laboratory: Computer Science Building, Xianlin Campus of Nanjing University

Address: Yu-Ren Liu, National Key Laboratory for Novel Software Technology, Nanjing University, 163 Xianlin Avenue, Qixia District, Nanjing 210023, China
(南京市栖霞区仙林大道163号, 南京大学仙林校区, 软件新技术国家重点实验室, 210023.)