Large Scale Clustering using Approximate Kernel K-means
Rong Jin
Associate Professor
Department of Computer Science and Engineering, Michigan State University
Abstract :
Digital data explosion in recent times mandates the development of
scalable clustering algorithms to organize the data in a meaningful and
easily accessible form. Most clustering algorithms proposed in the
literature to handle large datasets assume linear separability. Kernel based
clustering algorithms, on the other hand, capture the non-linear structure
of data and have been found to be more effective on real world datasets.
However, kernel based algorithms are not scalable to large data sets because
their running-time complexity and memory requirements are quadratic in the
number of data instances. In this paper, we propose an approximation scheme
for kernel K-means, termed approximate kernel K-means, that reduces both the
computational complexity and the memory requirements by employing a
randomized approach. We prove analytically and demonstrate empirically that
the proposed algorithm's clustering performance is similar to that of the
classical kernel K-means algorithm, but with dramatically reduced running
time and memory requirements.
Bio:
Dr. Jin is an Associative professor of the Computer and Science
Engineering Dept. of Michigan State University. His research interest is
statistical machine learning and its application to information retrieval.
He is currently an associative editor of ACM Transactions on Knowledge
Discovery from Data. Dr. Jin received his Ph.D. degree from Carnegie Mellon
University in 2003, and is a recipient of NSF Career Award in 2006 .