Data Mining (Fall, 2014)

Modified: 2015/01/21 16:54 by admin - Uncategorized
(Back to homepage)

Edit

Information

  • Course Number: 081202B03
  • To: M. Sc. students of Department of Computer Science and Technology, Nanjing University.
  • Classroom: 106 Xian-I, Xianlin Campus
  • Time: 10:10 - 12:00, Thursday
  • Office Hour: 13:00 - 16:00, Thursday (Rm 311, Computer Science Building)
  • Main Reference Books:
    • D. Hand, H. Mannila, P. Smyth. Principles of Data Mining. MIT Press, MA:Cambridge, 2001.
    • J. Han, M. Kamber. Data Mining: Concepts and Techniques, 2nd edition. Morgan Kaufmann Publishers, 2006
    • I. H. Witten, E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, 3rd edition. Morgan Kaufmann Publishers, 2011
    • P.-N. Tan, M. Steinbach, V. Kumar. Introduction to Data Mining, Addison-Wesley, 2006.
    • E. Alpaydin. Introduction to Machine Learning, 2nd edition. MIT Press, 2010.
    • C. M. Bishop. Pattern Recognition and Machine Learning, Springer, 2007.
  • Grading: Final exam (30%) + assignments (70%)
  • TA: Mr. Qing Da and Mr. Yue Zhu

  • Final Exam: Yi-B212, 205, 207, 9:00-11:00, Jan. 5.

Edit

Assignments

Please read carefully the assignments in http://lamda.nju.edu.cn/daq/dm14.ashx, and accomplish them in time.

Edit

Video Lectures

Before the next class in Oct. 16, you need please watch the video lectures to Chapter 5.

Edit

Schedule and Lecture slides

              
Sep. 18: Introduction (Download PDF) Reading material:
Z.-H. Zhou. Three perspectives of data mining. Artificial Intelligence, 2003, 143(1): 139-146.
H.-P. Kriegel, et al. Future trends in data mining. Data Mining and Knowledge Discovery, 2007, 15(1): 87-97.
Q. Yang and X. Wu. 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making, 2006, 5(4): 597-604.
 
Sep. 25: Data, Measurements, and Visualization (Download PDF) Reading material:
M. C. F. de Oliveira and H. Levkowitz. From visual data exploration to visual data mining: A survey. IEEE TVCG, 2003, 9(3): 378-394.
H. Liu, F. Hussain, C. L. Tan, and M. Dash. Discretization: An enabling technique. DMKD, 2002, 6(4): 393-423.
J. Dougherty, R. Kohavi, M. Sahami. Supervised and unsupervised discretization of continuous features. In Proceedings of ICML'95, 194-202, Tahoe City, CA.
X. Zhu and X. Wu. Class noise vs. attribute noise: A qualitative study of their impacts. AI Review, 2004, 22(3-4): 177-210.
Link: A javascript for simple data visualization
 
Oct. 9: Machine Learning I: Supervised Learning and Basic Algorithms (Download PDF) Reading material:
Chapters 9 of Introduction to Machine Learning (E. Alpaydin, MIT Press, 2010).
R. Quinlan. Induction of decision trees. MLJ, 1:81-106, 1986.
 
Oct. 16: Machine Learning II: Principle of Learning (Download PDF)   Reading material:
Chapter 2 of Introduction to Machine Learning (E. Alpaydin, MIT Press, 2010).
L. Valiant. A theory of the learnable. Communication of the ACM, 27(11):1134-1142, 1984.
D. Heckerman. Bayesian networks for data mining. DMKD, 1997, 1(1): 79-119.
H. Zhang. The Optimality of Naive Bayes. FLAIRS Conference 2004.
F. Zheng and G. I. Webb. A Comparative Study of Semi-naive Bayes Methods in Classification Learning. In AusDM'05, 141-156.
 
Oct. 23 Oct. 22: Machine Learning III: Nearest Neighbors and Neural Networks (Download PDF) Reading material:
A. Roy. Artificial neural networks - A science in trouble. SIGKDD Explorations, 2000, 1(2): 33-38.
A. Andoni and P. Indyk. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. CACM, 2008, 51(1): 117-121.
 
Oct. 30: Machine Learning IV: Linear Models (Download PDF) Reading material:
Chapters 3, 4, 6, and 7 of Pattern Recognition and Machine Learning (C. M. Bishop, Springer, 2007)
C. J. C. Burges. A tutorial on support vector machines for pattern recognition. DMKD, 1998, 2(2): 121-167.
K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf. An introduction to kernel-based learning algorithms. IEEE TNN, 2001, 12(2): 181-201.
 
Nov. 6: TA class
 
Nov. 13: Machine Learning V: Ensemble Methods (Download PDF) Reading material:
L. Breiman. Random Forest. Machine Learning 45 (1): 5–32.
Z.-H. Zhou. Ensemble Methods: Foundations and Algorithms, Boca Raton, FL: Chapman & Hall/CRC, 2012. (Chapter 2: Boosting).
E. Bauer and R. Kohavi. An Empirical Comparison of Voting Classi cation Algorithms: Bagging, Boosting, and Variants. Machine Learning, 1999, 36(1):105-139.
M. Fernández-Delgado et al. [^http://jmlr.org/papers/v15/delgado14a.html|Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?. JMLR, 2014, 15):3133−3181.
 
Nov. 20: Machine Learning VI: Unsupervised Learning (Download PDF) Reading material:
Chapter 8 and 7 of Introduction to Machine Learning (E. Alpaydin, MIT Press, 2010).
V. Estivill-Castro. Why so many clustering algorithms - a position paper. SIGKDD Explorations, 2002, 4(1): 65-75.
R. Xu and D. Wunsch II. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678.
C. Elkan. Using the Triangle Inequality to Accelerate k-Means. ICML'03, 147-153.
 
Nov. 27: Data Mining I: Feature Processing A (Download PDF) Reading material:
A. L. Blum and P. Langley. Selection of relevant features and examples in machine learning. AIJ, 1997, 97(1-2): 245-271.
I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003, 3: 1157-1182.
Chapter 6 of Introduction to Machine Learning (E. Alpaydin, MIT Press, 2010).
 
Dec. 4: Data Mining II: Feature Processing B (Download PDF) Reading material:
J. B. Tenenbaum, V. de Silva and J. C. Langford. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 2000, 290:2319-2322.
G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313:504-507, 2006.
 
Data Mining III: Handling Large-scale Data (Download PDF) Reading material:
J. B. Tenenbaum, V. de Silva and J. C. Langford. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 2000, 290:2319-2322.
G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313:504-507, 2006.
 
Dec. 11: Data Mining IV: In Computer Vision Systems (Download PDF)

Data Mining V: Information Retrieval Systems (Download PDF)
Reading material:
Chapter 14 of the text book (Principles of Data Mining)
M. Mitra, B. Chaudhuri. Information retrieval from documents: A survey. Information Retrieval 2000. M. Lew, N. Sebe, C. Djeraba, R. Jain. Content-based multimedia information retrieval: State of the art and challenges. TOMCCAP 2006.
D. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. IJCV 2004.
Yoav Freund, R. Iyer, R.E. Schapire, Y. Singer. An Efficient Boosting Algorithm for Combining Preferences. JMLR
 
Dec. 18: : Data Mining V: Mining Linkage Data (Download PDF) Reading material:
L. Getoor and C. Diehl. Link mining: A survey. SIGKDD Explorations, 7(2):3-12, 2005.
L. Page, et al. The PageRank citation ranking: Bringing order to the web. Technic report, 1997.
 
Dec. 25 Guest Lecture by C.-J. Lin
 

Edit

Links



  • Scikit-Learn An open source machine learning packags for Python.


  • KDnuggets A website for data mining resources.

Edit

Major academic venues


The end