2009年5月27日(星期三)10:30-11:30,蒙民伟楼404会议室
Weakly Supervised Learning of Topic Model
Hang Li
Dr.
Microsoft Research Asia
Abstract :
Abstract:Topic Modeling is a powerful technology for data mining and search. In this talk, I will first give a brief survey on Topic Modeling techniques such as PLSI (Probabilistic Latent Semantic Indexing) and LDA (Latent Dirichlet Allocation). Then, I will explain a novel Topic Modeling technique which we have developed recently, called Weakly Supervised Latent Dirichlet Allocation (WS-LDA). WS-LDA is unique in that it leverages weak supervision from humans in learning. We have applied WS-LDA to the task of named entity recognition from queries in search. I will take it as example to explain the motivation, technical details, and advantages of WS-LDA. Related papers will be presented at KDD and SIGIR this year.
Bio:
Hang Li is senior researcher and research manager in the Information Retrieval and Mining Group at Microsoft Research Asia. He is also adjunct professor at Peking University, Nanjing University, Xian Jiaotong University, and Nankai University. His research areas include natural language processing, information retrieval, statistical machine learning, and data mining. He graduated from Kyoto University and earned his PhD from the University of Tokyo. Hang has about 60 publications in international journals and conferences. He is associate editor of ACM Transaction on Asian Language Information Processing and is in editorial board of Journal for Computer and Science Technology, Journal of Chinese Information Processing, etc.
His recent academic activities include program committee co-chair of AIRS’08, poster and demo committee co-chair of SIGIR’08, program committee area chair of PAKDD’08, etc. Hang has been working on development of several products. These include NEC TopicScope, Microsoft SQL Server 2005, Microsoft Office 2007, and Microsoft Live Search 2008.