|
付明浩
Ph.D. Student, LAMDA Group
|
Expected graduation: June 2027. I am currently preparing for the 2026 autumn recruitment.
Professor Jianxin Wu
Currently I am a final-year Ph.D. student in School of Artificial Intelligence in Nanjing University and a member of LAMDA Group, led by professor Zhi-Hua Zhou.
I got my B.Sc. degree in Computer Science and Technology from Yingcai Honors College, University of Electronic Science and Technology of China (UESTC) in June 2021.
My research interests include Machine Learning and Computer Vision. Currently, I'm focused on:
Text-to-Image Generation.
Image Editing.
Compositional Generation.
Visual Language Modeling.
Seedream Model Pre-training, ByteDance, Nov. 2025 - Now.
Core contributor to the pre-training of Seedream models, with a focus on core capabilities such as interactive image editing and dense captioning.
I also contribute to align training and evaluation for the prompt enhancer, as well as infrastructure development for data transformation.
Multimodal Understanding and Generation, Alibaba Group, Nov. 2024 - Nov. 2025.
Research on DPO, RLHF, and distillation for text-to-image models.
This work contributed to the Ovis series models, including Ovis-U1 and Ovis-Image, and led to papers accepted by ICML 2025 and ICCV 2025.
Large-Scale Recommendation Model Compression, Huawei, Aug. 2023 - Aug. 2024.
Integrated our AFM algorithm into Huawei's recommendation system.
This deployment reduced inference latency by 25% on average and improved online overall revenue metrics, including RPM and CPM, by 2%.
DTL: Parameter- and Memory-Efficient Disentangled Vision Learning.
[paper]
Minghao Fu, Ke Zhu, Zonghao Ding, Jianxin Wu
In IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI 2025).
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance.
[arXiv]
[paper]
[code]
Minghao Fu, Guo-Hua Wang, Xiaohao Chen, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang
In IEEE/CVF International Conference on Computer Vision (ICCV 2025).
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation.
[arXiv]
[paper]
[code]
[demo]
Minghao Fu, Guo-Hua Wang, Liangfu Cao, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang
In Forty-Second International Conference on Machine Learning (ICML 2025).
Quantization without Tears.
[arXiv]
[paper]
[code]
[poster]
Minghao Fu, Hao Yu, Jie Shao, Junjie Zhou, Ke Zhu, Jianxin Wu
In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025).
Minimal Interaction Seperated Tuning: A New Paradigm for Visual Adaptation.
[arXiv]
[paper]
Ningyuan Tang, Minghao Fu, Jianxin Wu
In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025).
DTL: Disentangled Transfer Learning for Visual Recognition.
[arXiv]
[paper]
[appendix]
[code]
[poster]
[video]
Minghao Fu, Ke Zhu, Jianxin Wu
In AAAI Conference on Artificial Intelligence (AAAI 2024).
Instance-based Max-margin for Practical Few-shot Recognition.
[arXiv]
[paper]
[code]
[poster]
[video]
Minghao Fu, Ke Zhu
In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024).
Unified Low-rank Compression Framework for Click-through Rate Prediction.
[arXiv]
[paper]
[code]
Hao Yu, Minghao Fu, Jiandong Ding, Yusheng Zhou, Jianxin Wu
In 2024 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 2024).
Rectify the Regression Bias in Long-Tailed Object Detection.
[arXiv]
[paper]
Ke Zhu, Minghao Fu, Jie Shao, Tianyu Liu, Jianxin Wu
In European Conference on Computer Vision (ECCV 2024).
Multi-Label Self-Supervised Learning with Scene Images.
[arXiv]
[paper]
Ke Zhu, Minghao Fu, Jianxin Wu
In IEEE/CVF International Conference on Computer Vision (ICCV 2023).
Worst Case Matters for Few-Shot Recognition.
[arXiv]
[paper]
[code]
[poster]
[video]
Minghao Fu, Yun-Hao Cao, Jianxin Wu
In European Conference on Computer Vision (ECCV 2022).
Ovis-Image Technical Report.
[arXiv]
Ovis Team, Alibaba Group
In arXiv:2511.22982, 2025.
Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models.
[arXiv]
Minghao Fu, Guo-Hua Wang, Tianyu Cui, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang
In arXiv:2511.03317, 2025.
Images Speak Louder Than Scores: Failure Mode Escape for Enhancing Generative Quality.
[arXiv]
Jie Shao, Ke Zhu, Minghao Fu, Guo-hua Wang, Jianxin Wu
In arXiv:2508.09598, 2025.
QwT-v2: Practical, Effective and Efficient Post-Training Quantization.
[arXiv]
Ningyuan Tang, Minghao Fu, Hao Yu, Jianxin Wu
In arXiv:2505.20932, 2025.
Ovis-U1 Technical Report.
[arXiv]
Ovis Team, Alibaba Group
In arXiv:2506.23044, 2025.
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities.
[arXiv]
Xinjie Zhang, Jintao Guo, Shanshan Zhao, Minghao Fu, Lunhao Duan, Guo-Hua Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang
In arXiv:2505.02567, 2025.
Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning.
[arXiv]
Ningyuan Tang, Minghao Fu, Ke Zhu, Jianxin Wu
In arXiv:2402.04009, 2024.
Pattern Recognition. Spring, 2023.
Emails: fumh@lamda.nju.edu.cn