Page History: Data Mining (Fall, 2012)

Compare Page Revisions

« Older Revision - Back to Page History - Newer Revision »

Page Revision: 2012/11/15 15:38

Information

Course Number: 081202B3
To: M. Sc. students of Department of Computer Science and Technology, Nanjing University.
Classroom: 221, Computer Science and Technology Building, Xianlin Campus
Time: 16:00 -- 17:50, Wednesday
Office Hour: 14:30 - 15:30, Wednesday (Rm 919)
Text Book: D. Hand, H. Mannila, P. Smyth. Principles of Data Mining. MIT Press, MA:Cambridge, 2001.
Main Reference Books:
- J. Han, M. Kamber. Data Mining: Concepts and Techniques, 2nd edition. Morgan Kaufmann Publishers, 2006
- I. H. Witten, E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, 3rd edition. Morgan Kaufmann Publishers, 2011
- P.-N. Tan, M. Steinbach, V. Kumar. Introduction to Data Mining, Addison-Wesley, 2006.
- E. Alpaydin. Introduction to Machine Learning, 2nd edition. MIT Press, 2010.
- C. M. Bishop. Pattern Recognition and Machine Learning, Springer, 2007.
Grading: Final exam (30%) + assignment 1 (20%) + assignment 2 (15%) + assignment 3 (15%) + assignment 4 (20%)
TA: Mr. Teng Zhang and Mr. Chao Qian

Assignments

Assignment 1: Write a report on data mining applications ~~Due on Sept. 26, 2012~~ TA page

Assignment 2: A classification task ~~Due on Oct. 17, 2012~~ TA page

Assignment 3: A clustering task ~~Due on Nov. 7, 2012~~ TA page

Assignment 4: Mining from a real-world data set Due on Dec. 5, 2012 TA page

Lecture slides

9/12: Introduction (Download PDF)		Reading material: Z.-H. Zhou. Three perspectives of data mining. Artificial Intelligence, 2003, 143(1): 139-146. H.-P. Kriegel, et al. Future trends in data mining. Data Mining and Knowledge Discovery, 2007, 15(1): 87-97. Q. Yang and X. Wu. 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making, 2006, 5(4): 597-604.

9/19: Data, Measurements, and Visualization (Download PDF)		Reading material: M. C. F. de Oliveira and H. Levkowitz. From visual data exploration to visual data mining: A survey. IEEE TVCG, 2003, 9(3): 378-394. H. Liu, F. Hussain, C. L. Tan, and M. Dash. Discretization: An enabling technique. DMKD, 2002, 6(4): 393-423. J. Dougherty, R. Kohavi, M. Sahami. Supervised and unsupervised discretization of continuous features. In Proceedings of ICML'95, 194-202, Tahoe City, CA. X. Zhu and X. Wu. Class noise vs. attribute noise: A qualitative study of their impacts. AI Review, 2004, 22(3-4): 177-210. Link: A javascript for simple data visualization

9/26: Supervised Learning (Download PDF)		Reading material: Chapter 2 of Introduction to Machine Learning (E. Alpaydin, MIT Press, 2010). L. Valiant. A theory of the learnable. Communication of the ACM, 27(11):1134-1142, 1984.

10/10: Decision Tree and Neural Networks (Download PDF)		Reading material: Chapters 9 and 11 of Introduction to Machine Learning (E. Alpaydin, MIT Press, 2010). R. Quinlan. Induction of decision trees. MLJ, 1:81-106, 1986. A. Roy. Artificial neural networks - A science in trouble. SIGKDD Explorations, 2000, 1(2): 33-38. G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313:504-507, 2006.

10/17: Linear Models and Kernel Trick (Download PDF)		Reading material: Chapters 3, 4, 6, and 7 of Pattern Recognition and Machine Learning (C. M. Bishop, Springer, 2007) (You may find the ebook to download using Baidu.com) C. J. C. Burges. A tutorial on support vector machines for pattern recognition. DMKD, 1998, 2(2): 121-167. K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf. An introduction to kernel-based learning algorithms. IEEE TNN, 2001, 12(2): 181-201.

10/24: Bayesian Methods and Lazy Methods (Download PDF)		Reading material: D. Heckerman. Bayesian networks for data mining. DMKD, 1997, 1(1): 79-119. H. Zhang. The Optimality of Naive Bayes. FLAIRS Conference 2004. F. Zheng and G. I. Webb. A Comparative Study of Semi-naive Bayes Methods in Classification Learning. In AusDM'05, 141-156. A. Andoni and P. Indyk. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. CACM, 2008, 51(1): 117-121.

10/31: Discussion of Assignment 1 and Assignment 2

11/7: Ensemble Methods (Download PDF)		Reading material: L. Breiman. Random Forest. Machine Learning 45 (1): 5–32. Z.-H. Zhou. Ensemble Methods: Foundations and Algorithms, Boca Raton, FL: Chapman & Hall/CRC, 2012. (Chapter 2: Boosting). E. Bauer and R. Kohavi. An Empirical Comparison of Voting Classication Algorithms: Bagging, Boosting, and Variants. Machine Learning, 1999, 36(1):105-139.

11/14: Unsupervised learning: Density estimation and clustering (Download PDF)

(The following arrangement is tentative)

11/21: Handling Big Data

11/28: Experiment Design and Analysis / Discussion of Assignment 3

12/5: Feature Extraction

12/12: Score Functions and Optimization

12/19: Applications: Content-based Information Retrieval

12/26: Discussion of Assignment 4

1/2: Q & A

Major academic venues

Journals in data mining: DMKD, TKDD, TKDE, KAIS
Journals in machine learning: MLJ, JMLR
Journals in database: TODS, VLDBJ
Journals in information retrieval: TOIS, IP&M, IRJ
Journals in web search and mining: WWWJ, TOIT, TWeb
Conferences in data mining: KDD, ICDM, SDM, ECML/PKDD, PAKDD
Conferences in machine learning: ICML, NIPS, COLT
Conferences in database: SIGMOD, VLDB, ICDE
Conferences in information retrieval: SIGIR, CIKM
Conferences in web search and mining: WWW, WSDM