Image
Image
Image
Image
Image
Image
Image
Image
Image
Image



Search
»

Seminar abstract

Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach

Rong Jin
Associate Professor
Department of Computer Science and Engineering, Michigan State University


Abstract: Crowdsourcing utilizes human ability by distributing tasks to a large number of workers. It is especially suitable for data clustering because it measures similarity between objects based on manual annotations, capturing the human perception of similarity among objects. This is in contrast to most clustering algorithms that face the challenge of finding an appropriate similarity measure for the given dataset. Although several algorithms have been developed for crowdclustering, they require a large number of annotations, due to the noisy nature of human annotations, leading to a high computational cost in addition to the large cost associated with annotation. We address this problem by developing a novel approach for crowclustering that exploits the technique of matrix completion. The key idea is to first construct a partially observed similarity matrix based on a subset of pairwise annotation labels that are agreed upon by most annotators. It then deploys the matrix completion algorithm to complete the similarity matrix and obtains the final data partition by applying a spectral clustering algorithm to the completed similarity matrix. We show, both theoretically and empirically, that the proposed approach needs only a small number of manual annotations to obtain an accurate data partition. In effect, we highlight the trade-off between a large number of noisy crowdsourced labels and a small number of high quality labels.

Bio: Rong Jin focuses his research on statistical machine learning and its application to information retrieval. He has worked on a variety of machine learning algorithms and their application to information retrieval, including retrieval models, collaborative filtering, cross lingual information retrieval, document clustering, and video/image retrieval. He has published over 160 conference and journal articles on related topics. Dr. Jin Ph.D. holds a Ph.D. in Computer Science from Carnegie Mellon University in 2003. He received the NSF Career Award in 2006.
  Name Size

Image
PoweredBy © LAMDA, 2022