![]() |
刘驭壬 |
I am a Ph.D. student of Department of Computer Science and Technology and a member of LAMDA Group, led by professor Zhi-Hua Zhou. Before my Ph.D. career, I was an undergraduate student of Kuang Yaming Honors School (recommend enrollment without requiring taking the college entrance examination), Nanjing University and received my B.Sc. degree in Computer Science and Technology on June, 2018. In the same year, I was admitted to study for a Ph.D. degree at Nanjing University. Now I'm focused on Reinforcement Learning (RL), including some of its subfields such as Model-based (Online/Offline) RL, Inverse RL and Causal RL. I have great interest in the cross domain of ML and quantitative finance.
2022.5~2023.5
Research Assistant: Machine Learning Department, MBZUAI, UAE (One-year joint-supervision Ph.D. program, supported by CSC Funding)
2018.9~present
Ph.D. student: Computer Science and Technology, Department of Computer Science and Technology, Nanjing University, China
2014.9~2018.6
B.Sc. degree: Computer Science and Technology, Kuang Yaming Honors School, Nanjing University, China.
Causal Reinforcement Learning. 2021.9 – present
Causal reinforcement learning is an area of research that combines ideas from causal inference and reinforcement learning to improve decision-making in sequential environments. Along this line, I am focused on learning causal representations for reinforcement learning. In the work of "Learning World Models with Identifiable Factorization", we propose a novel method to learn world models with disentangled latent process. Our work extends the theoretical results in previous work to enable block-wise identifiability of four categories of latent variables in general nonlinear case. Our method achieves the state-of-the-art performance in variants of the DeepMind Control Suite and RoboDesk with noisy distractors. In the work of "Learning De-biased Environment Model for Delivery Incentive Policy Optimization in Food Delivery Platforms", we propose to learn a de-biased environment model for policy optimization in food delivery platforms. Our policy optimization framework significantly reduces the customer complaint rate (Meituan) in the A/B tests. Currently, we are exploring to learn and utilize causal representations in nonstationary/heterogenous environments with the change of reward function, observation function, or transition dynamics.
Derivative-free Optimization. 2018.9 – 2021.6
Derivative-free optimization (DFO) is a class of optimization methods that aim to find the minimum or maximum of a function without using explicit derivatives. In the work of “Asynchronous Classification-Based Optimization”, we propose to accelerate the classification-based optimization method based on asynchronous parallelization. We show in experiments that our method can achieve almost linear speedup while preserving good solution quality. In the work of “ZOOpt: Toolbox for derivative-free optimization”, we opensource a toolbox that implements a series of classification-based optimization methods and pareto optimization methods. In the work of “COVID-19 Asymptomatic Infection Estimation”, we design a fine-grained infectious disease transmission simulator, where the parameters for setting the simulator are learned based on derivative-free optimization methods.
ZOOpt: I am a core developer of the open-sourced python package ZOOpt, which provides efficient derivative-free solvers as well as their parallel implementations. ZOOpt toolbox is designed easy to use and particularly focuses on optimization problems in machine learning, addressing high-dimensional, noisy, and large-scale problems.
2024.1~present: Zhuoshi Fund (卓识基金)
Quantitative Researcher
2023.7~present: Meituan (美团)
Machine Learning Engineer
My work is trying to identify causal latent variables influencing transitions in order delivery scenario. This can assist delivery service providers in deducing present situations from observed decision trajectories, ultimately leading to improved policy optimization. Our method demonstrates a substantial advantage over baseline models in terms of both the identifiability of the latent variables and transition prediction accuracy.
2018.3~2018.7: Meridian Global Inc (子午投资)
Quantitative Researcher
My work centered on the automated identification of effective factors within the Chinese A-share market. Throughout this internship, I transformed the factor search problem to a derivative-free optimization problem and then developed a distributed optimization system using the Julia programming language to automate the searching process.
November, 2020: I passed the FRM exam part 1 at first attempt with excellent grades in all four subjects. [Certificate]
[Performance]August, 2019: I completed the courses at the Machine Learning Summer School held in Skoltech, Moscow, Russia. [Certificate]
Email: liuyr@lamda.nju.edu.cn
Laboratory: Computer Science Building, Xianlin Campus of Nanjing University
Address: Yu-Ren Liu, National Key Laboratory for Novel Software Technology, Nanjing University, 163 Xianlin Avenue, Qixia District, Nanjing 210023, China