Yu-Yang Qian @ LAMDA, NJU-AI

钱宇阳
Yu-Yang Qian (Y.-Y. Qian)
Ph.D. Student, LAMDA Group
Department of Computer Science and Technology
National Key Laboratory for Novel Software Technology
Nanjing University, Nanjing 210023, China

Supervisor: Professor Zhi-Hua Zhou

Email: qianyy@lamda.nju.edu.cn
Laboratory: Computer Science Building, Xianlin Campus of Nanjing University

[ Google Scholar, Github ]

Supervisor

Sep 2016 - Jun 2020 : Receive my B.Sc. degree in School of Information and Communication Engineering, University of Electronic Science and Technology of China.
Sep 2020 : Admitted to study for a M.Sc. degree in Nanjing University without entrance examination under the guidance of professor Zhi-Hua Zhou.
Sep 2023 - Now : Curretnly I am a Ph.D. student of School of Artificial Intelligence in Nanjing University and a member of LAMDA Group, where I was very fortunate to be advised by Prof. Zhi-Hua Zhou and Prof. Yuan Jiang.

My research interests include Machine Learning and Data Mining. Most recently, I am interested in:

Efficient Machine Learning in Non-stationary and Open World Environments;
Efficient Post-training for LLMs, including efficient Continual Fine-tuning, RLHF, and Agentic RL;
Diffusion Large Language Models (dLLMs).

When Drafts Evolve: Speculative Decoding Meets Online Learning. [arXiv] [code] [bibtex]

Yu-Yang Qian, Hao-Cong Wu, Yichao Fu, Hao Zhang, and Peng Zhao.

In: Proceedings of the 14th International Conference on Learning Representations (ICLR 2026) Workshop on Lifelong Agents: Learning, Aligning, Evolving, Rio de Janeiro, Brazil. 2026.

d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation. [arXiv] [code] [🤗model] [leaderboard] [bibtex]

Yu-Yang Qian, Junda Su, Lanxiang Hu, Peiyuan Zhang, Zhijie Deng, Peng Zhao, and Hao Zhang.

Internalizing Agency from Reflective Experience. [arXiv] [bibtex]

Rui Ge, Yichao Fu, Yu-Yang Qian, Junda Su, Yiming Zhao, Peng Zhao, and Hao Zhang.

Provably Efficient Online RLHF with One-Pass Reward Modeling. [paper] [code] [bibtex]

Long-Fei Li*, Yu-Yang Qian*, Peng Zhao, and Zhi-Hua Zhou. (* indicates equal contribution)

In: Advances in Neural Information Processing Systems 38 (NeurIPS 2025), San Diego, California, 2025.

Handling New Class in Online Label Shift. [paper] [code] [bibtex]

Yu-Yang Qian, Yong Bai, Zhen-Yu Zhang, Peng Zhao, and Zhi-Hua Zhou.

IEEE Transactions on Knowledge and Data Engineering (TKDE 2025), 37(9):5257-5270, 2025.

TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree. [paper] [code] [arXiv] [bibtex]

Yu-Yang Qian, Yuan-Ze Xu, Zhen-Yu Zhang, Peng Zhao, and Zhi-Hua Zhou.

In: Proceedings of the 42nd International Conference on Machine Learning (ICML 2025), Vancouver, Canada, 2025.

Adapting to Generalized Online Label Shift by Invariant Representation Learning. [paper] [code] [bibtex]

Yu-Yang Qian, Yi-Han Wang, Zhen-Yu Zhang, Yuan Jiang, and Zhi-Hua Zhou.

In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2025), Toronto, Canada, 2025.

Efficient Non-stationary Online Learning by Wavelets with Applications to Online Distribution Shift Adaptation. [paper] [code] [StreamingWavelet Package] [Pip Install] [bibtex]

Yu-Yang Qian, Peng Zhao, Yu-Jie Zhang, Masashi Sugiyama, and Zhi-Hua Zhou.

In: Proceedings of the 41st International Conference on Machine Learning (ICML 2024), Vienna, Austria, 2024.

Learning with Asynchronous Labels. [paper] [code] [bibtex]

Yu-Yang Qian, Zhen-Yu Zhang, Peng Zhao, and Zhi-Hua Zhou.

ACM Transactions on Knowledge Discovery from Data (TKDD 2024), 18(8):1-27, 2024.

Handling New Class in Online Label Shift. [paper] [code] [bibtex]

Yu-Yang Qian*, Yong Bai*, Zhen-Yu Zhang, Peng Zhao, and Zhi-Hua Zhou.

In: Proceedings of the 23rd IEEE International Conference on Data Mining (ICDM 2023), Shanghai, China, 2023.

Adaptive Learning for Weakly Labeled Streams. [paper] [code] [bibtex]

Zhen-Yu Zhang, Yu-Yang Qian, Yu-Jie Zhang, Yuan Jiang, and Zhi-Hua Zhou.

In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2022), Washington, DC, 2022.

🫧 d3LLM — Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation 🚀 [GitHub] [Blog]

We introduce a novel recipe for building an ultra-fast diffusion language model named d3LLM (pseuDo-Distilled Diffusion LLM).

Key Features: up to 10× speedup over the vanilla LLaDA / Dream, and 5× speedup over AR models (Qwen-2.5-7B-it) on H100 GPU! With negligible accuracy degradation. PyTorch implementation of the paper "d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation".

🧑‍🍳 Online RLHF Pipeline — Cook your own RLHF recipe! [GitHub]

This repository provides a flexible and modular code framework to Reinforcement Learning from Human Feedback (RLHF). A PyTorch implementation of the paper "Provably Efficient Online RLHF with One-Pass Reward Modeling".

Key Features: You can cook your own Online-RLHF recipe using this repo! Customizable RLHF recipes, modular training components (SFT, RM, PPO, DPO), online learning capabilities, and quickly build up your own RLHF pipeline.

🌳 TreeLoRA — Efficient Continual Fine-tuning of LLMs [GitHub]

Propose a hierarchical tree-structured LoRA adapters for efficient continual fine-tuning of large models. A PyTorch implementation of the paper TreeLoRA (ICML'25).

Key Features: Efficiently explores task structure using bandits and optimizes parameters through sparse gradient updates, validated on various sizes of models, up to 3.2× speedup for ViTs and 2.4× speedup for LLMs.

📈 Streaming Wavelet Operator [Pip Install] [GitHub]

Sequentially apply wavelet transform to a sequence efficiently in an online manner, instead of recalculation each round. A PyTorch implementation of the paper "Efficient Non-stationary Online Learning by Wavelets with Applications to Online Distribution Shift Adaptation" (ICML'24).

Can be used for: detect environmental changes, efficiently identify changing points, and analyze variations in sequences.

2025, National Scholarship for PhD candidates.
2024, Ruli Scholarship in Nanjing University
2024, achieved an Excellent rating in the Doctoral Qualify Examination (Top 15% of all Ph.D. candidates)
2021, Dongliang Scholarship Excellent Award in Nanjing University
2021, the First-Class Academic Scholarship in Nanjing University
2020, Outstanding Undergraduate Thesis Award
2018, Gold Medal, ACM-CCPC (China Collegiate Programming Contest) National Invitational Contest
2018, First Prize, the National Mathematical Contest in Modeling

Program Committee/Reviewer for Conferences: ICML 2022-2025, NeurIPS 2022-2025, ICLR 2024-2025, KDD 2024-2026, AAAI 2026, AISTATS 2024, UAI 2024.

Email: qianyy@lamda.nju.edu.cn
Office: Room 912, Computer Science Building, Xianlin Campus of Nanjing University
Address: National Key Laboratory for Novel Software Technology, Nanjing University, Xianlin Campus Mailbox 603, 163 Xianlin Avenue, Qixia District, Nanjing 210023, China
(南京市栖霞区仙林大道163号, 南京大学仙林校区603信箱, 软件新技术国家重点实验室, 210023.)