Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach
Rong Jin
Associate Professor
Department of Computer Science and Engineering, Michigan State University
Abstract: Crowdsourcing utilizes human ability by distributing tasks to a
large number of workers. It is especially suitable for data clustering
because it measures similarity between objects based on manual annotations,
capturing the human perception of similarity among objects. This is in
contrast to most clustering algorithms that face the challenge of finding an
appropriate similarity measure for the given dataset. Although several
algorithms have been developed for crowdclustering, they require a large
number of annotations, due to the noisy nature of human annotations, leading
to a high computational cost in addition to the large cost associated with
annotation. We address this problem by developing a novel approach for
crowclustering that exploits the technique of matrix completion. The key
idea is to first construct a partially observed similarity matrix based on a
subset of pairwise annotation labels that are agreed upon by most
annotators. It then deploys the matrix completion algorithm to complete the
similarity matrix and obtains the final data partition by applying a
spectral clustering algorithm to the completed similarity matrix. We show,
both theoretically and empirically, that the proposed approach needs only a
small number of manual annotations to obtain an accurate data partition. In
effect, we highlight the trade-off between a large number of noisy
crowdsourced labels and a small number of high quality labels.
Bio: Rong Jin focuses his research on statistical machine learning and its
application to information retrieval. He has worked on a variety of machine
learning algorithms and their application to information retrieval,
including retrieval models, collaborative filtering, cross lingual
information retrieval, document clustering, and video/image retrieval. He
has published over 160 conference and journal articles on related topics.
Dr. Jin Ph.D. holds a Ph.D. in Computer Science from Carnegie Mellon
University in 2003. He received the NSF Career Award in 2006.