Fan Wenshu @ LAMDA, NJU-CS

Fan Wenshu 
Fan Wenshu

LAMDA Group
School of Artificial Intelligence
Nanjing University, Nanjing 210023, China.

Email: fanws [at] lamda.nju.edu.cn
      

Short Bio

Main Research Interests

Wenshu mainly focuses on machine learning, especially:

Knowledge Distillation

Publications - Conference Papers

WSFG 
  • Wen-Shu Fan, Su Lu, Shangyu Xing, Xin-Chun Li, De-Chuan Zhan. Maximizing the Effectiveness of Larger BERT Models for Compression. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL'25), Vienna, Austria, 2025.

  • We find that student BERT models taught by larger BERT teachers exhibit limited linear differences among their intermediate layers, leading to suboptimal distillation performance. Building on this insight, we enhance the distillation method, thereby improving the effectiveness of BERT compression.

WSFG 
  • Wen-Shu Fan, Su Lu, Xin-Chun Li, De-Chuan Zhan, Le Gan. Revisit the Essence of Distilling Knowledge through Calibration. In: Proceedings of the 41st International Conference on Machine Learning (ICML'24), Vienna, Austria, 2024. [Paper]

  • A counter-intuitive phenomenon known as capacity mismatch in knowledge distillation has been identified, wherein KD performance may not be good when a better teacher instructs the student. In this paper, we propose a unifying analytical framework to pinpoint the core of capacity mismatch based on calibration.

WSFG 
  • Xin-Chun Li , Wen-Shu Fan, Shaoming Song, Yinchuan Li, Bingshuai Li, Yunfeng Shao, De-Chuan Zhan. Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again. In: Advances in Neural Information Processing Systems 35 (NeurIPS'22), New Orleans, Louisiana, USA, 2022. [Paper]

  • We find traditional temperature scaling limits the efficacy of class discriminability and hinders large teachers from teaching well. We propose a method which enriches class discriminability by separately applying a higher/lower temperature to the correct/wrong class.

Publications - Journal Articles

WSFG 
  • Wen-Shu Fan, Xin-Chun Li, De-Chuan Zhan. Exploring Dark Knowledge under Various Teacher Capacities and Addressing Capacity Mismatch. Frontiers of Computer Science (FCS). [Paper]

  • We find that variance of non-target classes correlates with distillation performance, and that different teacher models exhibit relative class affinity. We propose a simple method that more effectively amplifies wrong-class variance, improving the distillation performance of large teachers.

Academic Service

Teaching Assistant

Correspondence

Email: fanws [at] lamda.nju.edu.cn or dz1937001@smail.nju.edu.cn or 1005448166 [at] qq.com
Office: Room 113, Computer Science Building, Xianlin Campus of Nanjing University
Address: Fan Wenshu
                 National Key Laboratory for Novel Software Technology
                 Nanjing University, Xianlin Campus Mailbox 603
                 163 Xianlin Avenue, Qixia District, Nanjing 210046, China