![]() |
![]() Fan Wenshu LAMDA Group School of Artificial Intelligence Nanjing University, Nanjing 210023, China. Email: fanws [at] lamda.nju.edu.cn |
![]() |
![]() |
Fan Wenshu received his B.Sc. degree from Northwestern Polytechnical University, China in June 2019.
After that, he became a first year Ph.D. student in the LAMDA Group led by professor Zhi-Hua Zhou in Nanjing University, under the supervision of Prof. De-Chuan Zhan.
Wenshu mainly focuses on machine learning, especially:
Knowledge Distillation
Knoweldge distillation, which utilizes a well-trained teacher model to assist student model, often accelerates training process or improves model performance. This technology can be also used in model compression and other applications.
|
We find that student BERT models taught by larger BERT teachers exhibit limited linear differences among their intermediate layers, leading to suboptimal distillation performance. Building on this insight, we enhance the distillation method, thereby improving the effectiveness of BERT compression. |
![]() |
A counter-intuitive phenomenon known as capacity mismatch in knowledge distillation has been identified, wherein KD performance may not be good when a better teacher instructs the student. In this paper, we propose a unifying analytical framework to pinpoint the core of capacity mismatch based on calibration. |
![]() |
We find traditional temperature scaling limits the efficacy of class discriminability and hinders large teachers from teaching well. We propose a method which enriches class discriminability by separately applying a higher/lower temperature to the correct/wrong class. |
|
We find that variance of non-target classes correlates with distillation performance, and that different teacher models exhibit relative class affinity. We propose a simple method that more effectively amplifies wrong-class variance, improving the distillation performance of large teachers. |
Conference PC Member/Reviewer: CCML'21, ACML'22, ICML'23, KDD'23
Journal Reviewer: IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI)
Advanced Machine Learning. (For undergraduate and graduate students, Autumn, 2021)
Discrete Mathematics. (For undergraduate students, Autumn, 2019)
Email: fanws [at] lamda.nju.edu.cn or dz1937001@smail.nju.edu.cn or 1005448166 [at] qq.com
Office: Room 113, Computer Science Building, Xianlin Campus of Nanjing University
Address: Fan Wenshu
National
Key Laboratory for Novel Software Technology
Nanjing
University, Xianlin Campus Mailbox 603
163
Xianlin Avenue, Qixia District, Nanjing 210046, China