Back
History
Data Mining (Fall, 2014)
([MainPage|Back to homepage]) ==Information== * '''Course Number''': 081202B03 * '''To''': M. Sc. students of Department of Computer Science and Technology, Nanjing University. * '''Classroom''': <font color="red">106 Xian-I</font>, Xianlin Campus * '''Time''': 10:10 - 12:00, Thursday * '''Office Hour''': 13:00 - 16:00, Thursday (Rm 311, Computer Science Building) * '''Main Reference Books''': ** D. Hand, H. Mannila, P. Smyth. Principles of Data Mining. MIT Press, MA:Cambridge, 2001. ** J. Han, M. Kamber. Data Mining: Concepts and Techniques, 2nd edition. Morgan Kaufmann Publishers, 2006 ** I. H. Witten, E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, 3rd edition. Morgan Kaufmann Publishers, 2011 ** P.-N. Tan, M. Steinbach, V. Kumar. Introduction to Data Mining, Addison-Wesley, 2006. ** E. Alpaydin. Introduction to Machine Learning, 2nd edition. MIT Press, 2010. ** C. M. Bishop. Pattern Recognition and Machine Learning, Springer, 2007. * '''Grading''': Final exam (30%) + assignments (70%) * '''TA''': Mr. [^http://lamda.nju.edu.cn/daq|Qing Da] and Mr. [^http://lamda.nju.edu.cn/zhuy|Yue Zhu] * <font color="red">'''Final Exam''': Yi-B212, 205, 207, 9:00-11:00, Jan. 5.</font> ==Assignments== Please read carefully the assignments in [^http://lamda.nju.edu.cn/daq/dm14.ashx], and accomplish them in time. ==Video Lectures== Before the next class in Oct. 16, you need please watch the video lectures to <b>Chapter 5</b>. ==Schedule and Lecture slides== <table border="0" cellspace="10"> <tr valign="top"><td> '''Sep. 18''': Introduction ([{UP}course_dm14ms/Lecture1.pdf|Download PDF]) </td><td></td> <td> Reading material:<br/> Z.-H. Zhou. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/aij03.pdf|Three perspectives of data mining]. Artificial Intelligence, 2003, 143(1): 139-146.<br/> H.-P. Kriegel, et al. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading01/hans_dmkd07.pdf|Future trends in data mining]. Data Mining and Knowledge Discovery, 2007, 15(1): 87-97.<br/> Q. Yang and X. Wu. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading01/yang_ijitdm06.pdf|10 challenging problems in data mining research]. International Journal of Information Technology & Decision Making, 2006, 5(4): 597-604. </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''Sep. 25''': Data, Measurements, and Visualization ([{UP}course_dm14ms/Lecture2.pdf|Download PDF]) </td><td></td> <td> Reading material:<br/> M. C. F. de Oliveira and H. Levkowitz. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading03/deOliveira_tvcg03.pdf|From visual data exploration to visual data mining: A survey]. IEEE TVCG, 2003, 9(3): 378-394.<br/> H. Liu, F. Hussain, C. L. Tan, and M. Dash. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading03/liu_dmkd02.pdf|Discretization: An enabling technique]. DMKD, 2002, 6(4): 393-423.<br/> J. Dougherty, R. Kohavi, M. Sahami. [^http://robotics.stanford.edu/users/sahami/papers-dir/disc.pdf|Supervised and unsupervised discretization of continuous features]. In Proceedings of ICML'95, 194-202, Tahoe City, CA.<br/> X. Zhu and X. Wu. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading03/zhu_airev04.pdf|Class noise vs. attribute noise: A qualitative study of their impacts]. AI Review, 2004, 22(3-4): 177-210.<br/> Link: [^http://datavlab.org/datavjs/|A javascript for simple data visualization]<br/> </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''Oct. 9''': Machine Learning I: Supervised Learning and Basic Algorithms ([{UP}course_dm14ms/Lecture3.pdf|Download PDF]) </td><td></td> <td> Reading material:<br/> [^http://www.realtechsupport.org/UB/MRIII/papers/MachineLearning/Alppaydin_MachineLearning_2010.pdf|Chapters 9 of Introduction to Machine Learning] (E. Alpaydin, MIT Press, 2010).<br/> R. Quinlan. [^http://www.dmi.unict.it/~apulvirenti/agd/Qui86.pdf|Induction of decision trees]. MLJ, 1:81-106, 1986. </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''Oct. 16''': Machine Learning II: Principle of Learning ([{UP}course_dm14ms/Lecture4.pdf|Download PDF]) </td><td> </td><td> Reading material:<br/> [^http://www.realtechsupport.org/UB/MRIII/papers/MachineLearning/Alppaydin_MachineLearning_2010.pdf|Chapter 2 of Introduction to Machine Learning] (E. Alpaydin, MIT Press, 2010).<br/> L. Valiant. [^http://www.mpi-inf.mpg.de/~mehlhorn/SeminarEvolvability/ValiantLearnable.pdf|A theory of the learnable]. Communication of the ACM, 27(11):1134-1142, 1984.<br/> D. Heckerman. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading04/heckerman_dmkd97.pdf|Bayesian networks for data mining]. DMKD, 1997, 1(1): 79-119.<br/> H. Zhang. [^http://www.aaai.org/Papers/FLAIRS/2004/Flairs04-097.pdf|The Optimality of Naive Bayes]. FLAIRS Conference 2004.<br/> F. Zheng and G. I. Webb. [^http://www.csse.monash.edu.au/~webb/Files/ZhengWebb05.pdf|A Comparative Study of Semi-naive Bayes Methods in Classification Learning]. In AusDM'05, 141-156.<br/> </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''<s>Oct. 23</s> Oct. 22''': Machine Learning III: Nearest Neighbors and Neural Networks ([{UP}course_dm14ms/Lecture5.pdf|Download PDF]) </td><td></td> <td> Reading material:<br/> A. Roy. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading04/roy_sigkddexp00.pdf|Artificial neural networks - A science in trouble]. SIGKDD Explorations, 2000, 1(2): 33-38.</br> A. Andoni and P. Indyk. [^http://people.csail.mit.edu/indyk/p117-andoni.pdf|Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions]. CACM, 2008, 51(1): 117-121. </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''Oct. 30''': Machine Learning IV: Linear Models ([{UP}course_dm14ms/Lecture6.pdf|Download PDF]) </td><td></td> <td> Reading material:<br/> Chapters 3, 4, 6, and 7 of Pattern Recognition and Machine Learning (C. M. Bishop, Springer, 2007)<br/> C. J. C. Burges. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading04/burges_dmkd98.pdf|A tutorial on support vector machines for pattern recognition]. DMKD, 1998, 2(2): 121-167.<br/> K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading04/muller_tnn01.pdf|An introduction to kernel-based learning algorithms]. IEEE TNN, 2001, 12(2): 181-201. </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''Nov. 6''': TA class </td><td></td> <td> </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''Nov. 13''': Machine Learning V: Ensemble Methods ([{UP}course_dm14ms/Lecture7.pdf|Download PDF]) </td><td></td> <td> Reading material:<br/> L. Breiman. [^http://oz.berkeley.edu/users/breiman/randomforest2001.pdf|Random Forest]. Machine Learning 45 (1): 5–32.<br/> Z.-H. Zhou. Ensemble Methods: Foundations and Algorithms, Boca Raton, FL: Chapman & Hall/CRC, 2012. ([^http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/emfa-ch2.pdf|Chapter 2: Boosting]).<br/> E. Bauer and R. Kohavi. [^http://robotics.stanford.edu/~ronnyk/vote.pdf|An Empirical Comparison of Voting Classication Algorithms: Bagging, Boosting, and Variants]. Machine Learning, 1999, 36(1):105-139.<br/> M. Fernández-Delgado et al. [^http://jmlr.org/papers/v15/delgado14a.html|Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?. JMLR, 2014, 15):3133−3181. </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''Nov. 20''': Machine Learning VI: Unsupervised Learning ([{UP}course_dm14ms/Lecture8.pdf|Download PDF]) </td><td></td> <td> Reading material:<br/> [^http://www.realtechsupport.org/UB/MRIII/papers/MachineLearning/Alppaydin_MachineLearning_2010.pdf|Chapter 8 and 7 of Introduction to Machine Learning] (E. Alpaydin, MIT Press, 2010).<br/> V. Estivill-Castro. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading06/estivill-castro_sigkddexp02.pdf|Why so many clustering algorithms - a position paper]. SIGKDD Explorations, 2002, 4(1): 65-75.<br/> R. Xu and D. Wunsch II. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading06/xu_tnn05.pdf|Survey of clustering algorithms]. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678.</br> C. Elkan. [^http://cseweb.ucsd.edu/~elkan/kmeansicml03.pdf|Using the Triangle Inequality to Accelerate k-Means]. ICML'03, 147-153. </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''Nov. 27''': Data Mining I: Feature Processing A ([{UP}course_dm14ms/Lecture9A.pdf|Download PDF]) </td><td></td> <td> Reading material:<br/> A. L. Blum and P. Langley. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading03/blum_aij97.pdf|Selection of relevant features and examples in machine learning]. AIJ, 1997, 97(1-2): 245-271.<br/> I. Guyon and A. Elisseeff. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading03/guyon_jmlr03.pdf|An introduction to variable and feature selection]. Journal of Machine Learning Research, 2003, 3: 1157-1182.<br/> [^http://www.realtechsupport.org/UB/MRIII/papers/MachineLearning/Alppaydin_MachineLearning_2010.pdf|Chapter 6 of Introduction to Machine Learning] (E. Alpaydin, MIT Press, 2010). </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''Dec. 4''': Data Mining II: Feature Processing B ([{UP}course_dm14ms/Lecture9B.pdf|Download PDF]) </td><td></td> <td> Reading material:<br/> J. B. Tenenbaum, V. de Silva and J. C. Langford. [^http://web.mit.edu/cocosci/Papers/sci_reprint.pdf|A Global Geometric Framework for Nonlinear Dimensionality Reduction]. Science, 2000, 290:2319-2322.<br/> G. E. Hinton and R. R. Salakhutdinov. [^http://www.cs.toronto.edu/~hinton/science.pdf|Reducing the dimensionality of data with neural networks]. Science, 313:504-507, 2006.<br/> </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> Data Mining III: Handling Large-scale Data ([{UP}course_dm14ms/Lecture10.pdf|Download PDF]) </td><td></td> <td> Reading material:<br/> J. B. Tenenbaum, V. de Silva and J. C. Langford. [^http://web.mit.edu/cocosci/Papers/sci_reprint.pdf|A Global Geometric Framework for Nonlinear Dimensionality Reduction]. Science, 2000, 290:2319-2322.<br/> G. E. Hinton and R. R. Salakhutdinov. [^http://www.cs.toronto.edu/~hinton/science.pdf|Reducing the dimensionality of data with neural networks]. Science, 313:504-507, 2006.<br/> </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''Dec. 11''': Data Mining IV: In Computer Vision Systems [{UP}course_dm14ms/Lecture11.pdf|(Download PDF)]<br/><br/> Data Mining V: Information Retrieval Systems [{UP}course_dm14ms/Lecture12.pdf|(Download PDF)] </td><td></td> <td> Reading material:<br/> Chapter 14 of the text book (Principles of Data Mining)<br/> M. Mitra, B. Chaudhuri. [^http://www.springerlink.com/index/g465l6775267g380.pdf|Information retrieval from documents: A survey]. Information Retrieval 2000. M. Lew, N. Sebe, C. Djeraba, R. Jain. [^http://www.liacs.nl/~mlew/mir.survey16b.pdf|Content-based multimedia information retrieval: State of the art and challenges]. TOMCCAP 2006.<br/> D. Lowe. [^http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf|Distinctive Image Features from Scale-Invariant Keypoints]. IJCV 2004.<br/> Yoav Freund, R. Iyer, R.E. Schapire, Y. Singer. [^http://jmlr.org/papers/volume4/freund03a/freund03a.pdf|An Efficient Boosting Algorithm for Combining Preferences]. JMLR </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''Dec. 18''': : Data Mining V: Mining Linkage Data [{UP}course_dm14ms/Lecture13.pdf|(Download PDF)] </td><td></td> <td> Reading material:<br/> L. Getoor and C. Diehl. [^http://www.cs.umd.edu/~getoor/Publications/getoor-kddexp05.pdf|Link mining: A survey]. SIGKDD Explorations, 7(2):3-12, 2005.<br/> L. Page, et al. [^http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf|The PageRank citation ranking: Bringing order to the web]. Technic report, 1997. </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''Dec. 25''' Guest Lecture by C.-J. Lin </td><td></td> <td> </td> </tr> <tr><td> </td></tr> </table> ==Links== * [^http://www.cs.waikato.ac.nz/ml/weka/|Weka] An open source (Java) machine learning/data mining algorithms software. ** Note: in Weka 3.7, many algorithms are in the separated packages, [http://weka.sourceforge.net/packageMetaData/|http://weka.sourceforge.net/packageMetaData/], you will need to download them as well. * [^http://www.r-project.org/|R Project] An open source platform for statistic computing using R script language. ** [^http://cran.r-project.org/web/views/MachineLearning.html|Machine learning packages for R] * [^http://scikit-learn.org/stable/|Scikit-Learn] An open source machine learning packags for Python. * [^http://www.sigkdd.org/|ACM SIGKDD] The website of the ACM Special Interest Group on Knowledge Discovery and Data Mining. ** [^http://dl.acm.org/citation.cfm?id=J721&picked=prox&cfid=140024035&cftoken=61726307|ACM SIGKDD Explorations Newsletters] A magazine of SIGKDD. * [^http://www.kdnuggets.com/|KDnuggets] A website for data mining resources. ==Major academic venues== * Journals in data mining: [^http://www.informatik.uni-trier.de/~ley/db/journals/datamine/index.html|DMKD], [^http://www.informatik.uni-trier.de/~ley/db/journals/tkdd/index.html|TKDD], [^http://www.informatik.uni-trier.de/~ley/db/journals/tkde/index.html|TKDE], [^http://www.informatik.uni-trier.de/~ley/db/journals/kais/index.html|KAIS] * Journals in machine learning: [^http://www.informatik.uni-trier.de/~ley/db/journals/ml/index.html|MLJ], [^http://www.informatik.uni-trier.de/~ley/db/journals/jmlr/index.html|JMLR] * Journals in database: [^http://www.informatik.uni-trier.de/~ley/db/journals/tods/index.html|TODS], [^http://www.informatik.uni-trier.de/~ley/db/journals/vldb/index.html|VLDBJ] * Journals in information retrieval: [^http://www.informatik.uni-trier.de/~ley/db/journals/tois/index.html|TOIS], [^http://www.informatik.uni-trier.de/~ley/db/journals/ipm/index.html|IP&M], [^http://www.informatik.uni-trier.de/~ley/db/journals/ir/index.html|IRJ] * Journals in web search and mining: [^http://www.informatik.uni-trier.de/~ley/db/journals/www/index.html|WWWJ], [^http://www.informatik.uni-trier.de/~ley/db/journals/toit/index.html|TOIT], [^http://www.informatik.uni-trier.de/~ley/db/journals/tweb/index.html|TWeb] * Conferences in data mining: [^http://www.informatik.uni-trier.de/~ley/db/conf/kdd/index.html|KDD], [^http://www.informatik.uni-trier.de/~ley/db/conf/icdm/index.html|ICDM], [^http://www.informatik.uni-trier.de/~ley/db/conf/sdm/index.html|SDM], [^http://www.informatik.uni-trier.de/~ley/db/conf/ecml/index.html|ECML/PKDD], [^http://www.informatik.uni-trier.de/~ley/db/conf/pakdd/index.html|PAKDD] * Conferences in machine learning: [^http://www.informatik.uni-trier.de/~ley/db/conf/icml/index.html|ICML], NIPS, [^http://www.informatik.uni-trier.de/~ley/db/conf/colt/index.html|COLT] * Conferences in database: [^http://www.informatik.uni-trier.de/~ley/db/conf/sigmod/index.html|SIGMOD], [^http://www.informatik.uni-trier.de/~ley/db/journals/pvldb/index.html|VLDB], [^http://www.informatik.uni-trier.de/~ley/db/conf/icde/index.html|ICDE] * Conferences in information retrieval: [^http://www.informatik.uni-trier.de/~ley/db/conf/sigir/index.html|SIGIR], [^http://www.informatik.uni-trier.de/~ley/db/conf/cikm/index.html|CIKM] * Conferences in web search and mining: [^http://www.informatik.uni-trier.de/~ley/db/conf/www/index.html|WWW], [^http://www.informatik.uni-trier.de/~ley/db/conf/wsdm/index.html|WSDM]
The end