Ke Zhu @ LAMDA, NJU-AI

朱可
Ke Zhu (K. Zhu)

M.Sc. Student, LAMDA Group
School of Artificial Intelligence
Nanjing University, Nanjing 210023, China

Email: zhuk@lamda.nju.edu.cn
Laboratory: Computer Science Building, Xianlin Campus of Nanjing University

Supervisor

Professor Jianxin Wu

Biography

Currently I am a third-year PhD student of School of Artificial Intelligence in Nanjing University and a member of LAMDA Group, led by professor Zhi-Hua Zhou. I got my B.Sc. degree in Automation Science and Technology from Department of Electronics and Information, Xi'an Jiaotong University (XJTU), China in June 2020.

Research Interests

My research interests include Multimodal LLM and General Computer Vision Tasks. Currently, I'm focused on:

VLM Post-training (RLHF), Data Synthesis, Reasoning.
Object detection, Multi-Label SSL, Compresion.

Project

Projects: 2023.6~2024.5
Model comprehension for Radio User Allocation
Cut off over 95% paramters w/o accuracy drop
Bring about 5x inference speed to HUAWEI original architecture.

InternShips

2023.11~2024.5
Supervised by: Xiangyu Zhang (阶跃星辰首席科学家)
Autoregressive LLM for comprehension and generation
Multimodal LLM Foundation: Pre-/Post-training, RLHF.

2024.6~2025.5
Supervised by: Jingdong Wang (百度视觉首席科学家)
Multimodal LLM Post-training: RLHF, SFT.
LLM reasoning, Data Synthesis (CoT)

2025.5~2026.2
Supervised by: Shuai Bai
Qwen-VL Foundation Model Group
Task: Post-Training For Qwen-VL

Technical Report

Qwen3-VL Technical Report (Core Contributors)
Shuai Bai, Yuxuan Cai, ..., Jingren Zhou, Fan Zhou, Jing Zhou, Yuanzhi Zhu, Ke Zhu
[paper] [code] [huggingface] [blog]

Publications

VLM Pre-/Post-raining, RLHF, Data Synthesis:

Perception and Reasoning Scaling Laws: The Role of RLHF (Working in Progress)
Ke Zhu, et al.
In Submission.

On Data Synthesis and Post-training for Visual Abstract Reasoning. [arxiv]
Ke Zhu, Yu Wang, et al, Jingdong Wang.
In Submission.

Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception. [arxiv]
Y. Sun, Jing Hao, Ke Zhu, Jiangjiang Liu, et al, Jingdong Wang.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (CVPR2026) To appear..

Contiunal SFT Matches multimodal RLHF with Negative Supervision. [arxiv]
Ke Zhu*, Yu Wang*, Yanpeng Sun, Qiang Chen, Jiangjiang Liu, Gang Zhang, Jingdong Wang.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (CVPR2025) To appear.

Self-Supervised Visual Preference Alignment. [paper][arxiv] (The First Multimodal RLHF w/o GPT-4 or Human)
Ke Zhu, Liang Zhao, Zheng Ge, Xiangyu Zhang.
In Proceedings of the 32st ACM International Conference on Multimedia, Melbourne, Australia. 2024 (MM2024 oral, 3.97%).

Object detection, multi-label SSL, Compresion:

Bias Mitigation for Long-Tailed Detection
Ke Zhu, Minghao Fu, Jie Shao, Tianyu Liu, Jianxin Wu.
Submitted to IJCV.

Coarse Is Better? A New Pipeline Towards Self-Supervised Learning with Uncurated Images [arxiv]
Ke Zhu, Yin-Yin He, Jianxin Wu.
Pattern Recognition Journal (PR),

All You Need in Knowledge Distillation Is a Tailored Coordinate System [arxiv]
Junjie Zhou, Ke Zhu, Jianxin Wu.
In Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI 2025),

DiffuLT: How to Make Diffusion Model Useful for Long-tail Recognition [arxiv]
Jie Shao, Ke Zhu, Hanxiao Zhang, Jianxin Wu.
In Advances in Neural Information Processing Systems 37 (NeurIPS 2024),

Rectify the Regression Bias in Long-Tailed Object Detection. [paper][arxiv]
Ke Zhu, Minghao Fu, Jie Shao, Tianyu Liu, Jianxin Wu.
In Proceedings of the 18th European Conference on Computer Vision, Milan, Italy, Sep.-Oct. 2024 (ECCV2024).

Instance-based Max-margin for Practical Few-shot Recognition.[arxiv]

Minghao Fu, Ke Zhu* (Corresponding author)
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2024)

DTL: Disentangled Transfer Learning for Visual Recognition.
Minghao Fu, Ke Zhu, Jianxin Wu.
In Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI2024). To appear.

Multi-Label Self-Supervised Learning with Scene Images. [paper][arxiv]
Ke Zhu, Minghao Fu, Jianxin Wu.
In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV2023). pp. 6694-6703.

Quantized Feature Distillation for Network Quantization.[paper][arxiv]
Ke Zhu, Yin-Yin He, Jianxin Wu.
In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI2023). pp. 11452-11460.

Residual attention: A simple but effective method for multi-label recognition. [paper][arxiv]
Ke Zhu, Jianxin Wu.
In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV2021). pp. 184-193.

Services

Reviewers

Journal: TPAMI, TCSVT, TIP,
Conference: CVPR2023, ICCV2023, ECAI2023, AAAI2024, CVPR2024, ECCV2024, CVPR2025, ICCV2025, ...

Teaching Assistant

Pattern Recognition, Spring, 2022
Pattern Recognition, Spring, 2023

Correspondence

Emails: zhuk@lamda.nju.edu.cn

Address: Ke Zhu, National Key Laboratory for Novel Software Technology, Nanjing University, Xianlin Campus Mailbox 113, 163 Xianlin Avenue, Qixia District, Nanjing 210023, China
(南京市栖霞区仙林大道163号, 南京大学仙林校区113信箱, 软件新技术国家重点实验室, 210023.)