Back
History
Data Mining (Fall, 2012)
([MainPage|Back to homepage]) ==Information== * '''Course Number''': 081202B3 * '''To''': M. Sc. students of Department of Computer Science and Technology, Nanjing University. * '''Classroom''': 221, Computer Science and Technology Building, Xianlin Campus * '''Time''': 16:00 -- 17:50, Wednesday * '''Office Hour''': 14:30 - 15:30, Wednesday (Rm 919) * '''Text Book''': D. Hand, H. Mannila, P. Smyth. Principles of Data Mining. MIT Press, MA:Cambridge, 2001. * '''Main Reference Books''': ** J. Han, M. Kamber. Data Mining: Concepts and Techniques, 2nd edition. Morgan Kaufmann Publishers, 2006. ** I. H. Witten, E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, 3rd edition. Morgan Kaufmann Publishers, 2011. ** P.-N. Tan, M. Steinbach, V. Kumar. Introduction to Data Mining, Addison-Wesley, 2006. ** E. Alpaydin. Introduction to Machine Learning, 2nd edition. MIT Press, 2010. ** C. M. Bishop. Pattern Recognition and Machine Learning, Springer, 2007. * '''Grading''': Final exam (30%) + assignment 1 (20%) + assignment 2 (15%) + assignment 3 (15%) + assignment 4 (20%) * '''TA''': Mr. [^http://lamda.nju.edu.cn/zhangt|Teng Zhang] and Mr. [^http://lamda.nju.edu.cn/qianc|Chao Qian] * '''Final Exam''': <font color="red">1/8, 逸B212, 2:00-4:00PM</font> ==Assignments== '''Assignment 1: Write a report on data mining applications''' <s>Due on Sept. 26, 2012</s> <a href="http://lamda.nju.edu.cn/zhangt/dm2012/" target="_blank">TA page</a> '''Assignment 2: A classification task''' <s> Due on Oct. 17, 2012</s> <a href="http://lamda.nju.edu.cn/zhangt/dm2012/" target="_blank">TA page</a> '''Assignment 3: A clustering task''' <s> Due on Nov. 7, 2012</s> <a href="http://lamda.nju.edu.cn/zhangt/dm2012/" target="_blank">TA page</a> '''Assignment 4: Mining from a real-world data set''' <s>Due on Dec. 5, 2012</s> <s>Due on Dec. 12, 2012</s> <s>Due on Dec. 19, 2012 (this is final and firm)</s> <a href="http://lamda.nju.edu.cn/zhangt/dm2012/" target="_blank">TA page</a> ==Schedule and Lecture slides== <table border="0" cellspace="10"> <tr valign="top"><td> '''9/12''': Introduction ([{UP}course_dm12/Lecture1.pdf|Download PDF]) </td><td></td> <td> Reading material:<br/> Z.-H. Zhou. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/aij03.pdf|Three perspectives of data mining]. Artificial Intelligence, 2003, 143(1): 139-146.<br/> H.-P. Kriegel, et al. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading01/hans_dmkd07.pdf|Future trends in data mining]. Data Mining and Knowledge Discovery, 2007, 15(1): 87-97.<br/> Q. Yang and X. Wu. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading01/yang_ijitdm06.pdf|10 challenging problems in data mining research]. International Journal of Information Technology & Decision Making, 2006, 5(4): 597-604. </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''9/19''': Data, Measurements, and Visualization ([{UP}course_dm12/Lecture2.pdf|Download PDF]) </td><td></td> <td> Reading material:<br/> M. C. F. de Oliveira and H. Levkowitz. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading03/deOliveira_tvcg03.pdf|From visual data exploration to visual data mining: A survey]. IEEE TVCG, 2003, 9(3): 378-394.<br/> H. Liu, F. Hussain, C. L. Tan, and M. Dash. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading03/liu_dmkd02.pdf|Discretization: An enabling technique]. DMKD, 2002, 6(4): 393-423.<br/> J. Dougherty, R. Kohavi, M. Sahami. [^http://robotics.stanford.edu/users/sahami/papers-dir/disc.pdf|Supervised and unsupervised discretization of continuous features]. In Proceedings of ICML'95, 194-202, Tahoe City, CA.<br/> X. Zhu and X. Wu. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading03/zhu_airev04.pdf|Class noise vs. attribute noise: A qualitative study of their impacts]. AI Review, 2004, 22(3-4): 177-210.<br/> Link: [^http://datavlab.org/datavjs/|A javascript for simple data visualization] </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''9/26''': Supervised Learning ([{UP}course_dm12/Lecture3.pdf|Download PDF]) </td><td></td> <td> Reading material:<br/> [^http://www.realtechsupport.org/UB/MRIII/papers/MachineLearning/Alppaydin_MachineLearning_2010.pdf|Chapter 2 of Introduction to Machine Learning] (E. Alpaydin, MIT Press, 2010).<br/> L. Valiant. [^http://www.mpi-inf.mpg.de/~mehlhorn/SeminarEvolvability/ValiantLearnable.pdf|A theory of the learnable]. Communication of the ACM, 27(11):1134-1142, 1984. </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''10/10''': Decision Tree and Neural Networks ([{UP}course_dm12/Lecture4.pdf|Download PDF]) </td><td> </td><td> Reading material:<br/> [^http://www.realtechsupport.org/UB/MRIII/papers/MachineLearning/Alppaydin_MachineLearning_2010.pdf|Chapters 9 and 11 of Introduction to Machine Learning] (E. Alpaydin, MIT Press, 2010).<br/> R. Quinlan. [^http://www.dmi.unict.it/~apulvirenti/agd/Qui86.pdf|Induction of decision trees]. MLJ, 1:81-106, 1986.</br> A. Roy. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading04/roy_sigkddexp00.pdf|Artificial neural networks - A science in trouble]. SIGKDD Explorations, 2000, 1(2): 33-38.</br> G. E. Hinton and R. R. Salakhutdinov. [^http://www.cs.toronto.edu/~hinton/science.pdf|Reducing the dimensionality of data with neural networks]. Science, 313:504-507, 2006. </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''10/17''': Linear Models and Kernel Trick ([{UP}course_dm12/Lecture5.pdf|Download PDF]) </td><td> </td><td> Reading material:<br/> Chapters 3, 4, 6, and 7 of Pattern Recognition and Machine Learning (C. M. Bishop, Springer, 2007) (You may find the ebook to download using Baidu.com)<br/> C. J. C. Burges. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading04/burges_dmkd98.pdf|A tutorial on support vector machines for pattern recognition]. DMKD, 1998, 2(2): 121-167.<br/> K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading04/muller_tnn01.pdf|An introduction to kernel-based learning algorithms]. IEEE TNN, 2001, 12(2): 181-201. </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''10/24''': Bayesian Methods and Lazy Methods ([{UP}course_dm12/Lecture6.pdf|Download PDF]) </td><td> </td><td> Reading material:<br/> D. Heckerman. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading04/heckerman_dmkd97.pdf|Bayesian networks for data mining]. DMKD, 1997, 1(1): 79-119.<br/> H. Zhang. [^http://www.aaai.org/Papers/FLAIRS/2004/Flairs04-097.pdf|The Optimality of Naive Bayes]. FLAIRS Conference 2004.<br/> F. Zheng and G. I. Webb. [^http://www.csse.monash.edu.au/~webb/Files/ZhengWebb05.pdf|A Comparative Study of Semi-naive Bayes Methods in Classification Learning]. In AusDM'05, 141-156.<br/> A. Andoni and P. Indyk. [^http://people.csail.mit.edu/indyk/p117-andoni.pdf|Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions]. CACM, 2008, 51(1): 117-121. </td> </tr> <tr><td> </td></tr> <tr><td> '''10/31''': Discussion of Assignment 1 and Assignment 2 </td><td> </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''11/7''': Ensemble Methods ([{UP}course_dm12/Lecture7.pdf|Download PDF]) </td><td> </td><td> Reading material:<br/> L. Breiman. [^http://oz.berkeley.edu/users/breiman/randomforest2001.pdf|Random Forest]. Machine Learning 45 (1): 5–32.<br/> Z.-H. Zhou. Ensemble Methods: Foundations and Algorithms, Boca Raton, FL: Chapman & Hall/CRC, 2012. ([^http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/emfa-ch2.pdf|Chapter 2: Boosting]).<br/> E. Bauer and R. Kohavi. [^http://robotics.stanford.edu/~ronnyk/vote.pdf|An Empirical Comparison of Voting Classication Algorithms: Bagging, Boosting, and Variants]. Machine Learning, 1999, 36(1):105-139. </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''11/14''': Unsupervised learning: Density estimation and clustering ([{UP}course_dm12/Lecture8.pdf|Download PDF]) </td><td> </td><td> Reading material:<br/> [^http://www.realtechsupport.org/UB/MRIII/papers/MachineLearning/Alppaydin_MachineLearning_2010.pdf|Chapter 8 and 7 of Introduction to Machine Learning] (E. Alpaydin, MIT Press, 2010).<br/> V. Estivill-Castro. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading06/estivill-castro_sigkddexp02.pdf|Why so many clustering algorithms - a position paper]. SIGKDD Explorations, 2002, 4(1): 65-75.<br/> R. Xu and D. Wunsch II. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading06/xu_tnn05.pdf|Survey of clustering algorithms]. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678.</br> C. Elkan. [^http://cseweb.ucsd.edu/~elkan/kmeansicml03.pdf|Using the Triangle Inequality to Accelerate k-Means]. ICML'03, 147-153. </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''11/21''': Handling Big Data ([{UP}course_dm12/Lecture9.pdf|Download PDF]) </td><td> </td><td> Reading material:<br/> M. Banko and E. Brill. [^http://research.microsoft.com/pubs/66840/acl2001.pdf|Scaling to very very large corpora for natural language disambiguation]. ACL'01.<br/> J. Dean and S. Ghemawat. [^http://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf|Mapreduce: Simplified data processing on large clusters]. OSDI'04.<br/> B. Panda, et al. [^http://www.bayardo.org/ps/vldb2009.pdf|PLANET: Massively parallel learning of tree ensembles with MapReduce]. VLDB'09.<br/> J. Friedman. [^http://www.salford-systems.com/doc/StochasticBoostingSS.pdf|Stochastic gradient boosting]. Computational Statistics & Data Analysis, 2002, 38(4):367-378.<br/> J. Lin and A. Kolcz. [^http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf|Large-Scale Machine Learning at Twitter]. SIGMOD'12. <br/> </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''11/28''': Feature Processing ([{UP}course_dm12/Lecture10.pdf|Download PDF]) </td><td> </td><td> Reading material:<br/> A. L. Blum and P. Langley. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading03/blum_aij97.pdf|Selection of relevant features and examples in machine learning]. AIJ, 1997, 97(1-2): 245-271.<br/> I. Guyon and A. Elisseeff. [^http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading03/guyon_jmlr03.pdf|An introduction to variable and feature selection]. Journal of Machine Learning Research, 2003, 3: 1157-1182.<br/> [^http://www.realtechsupport.org/UB/MRIII/papers/MachineLearning/Alppaydin_MachineLearning_2010.pdf|Chapter 6 of Introduction to Machine Learning] (E. Alpaydin, MIT Press, 2010).<br/> J. B. Tenenbaum, V. de Silva and J. C. Langford. [^http://web.mit.edu/cocosci/Papers/sci_reprint.pdf|A Global Geometric Framework for Nonlinear Dimensionality Reduction]. Science, 2000, 290:2319-2322.<br/> Sam T. Roweis and Lawrence K. Saul. [^http://www.cs.cmu.edu/~efros/courses/AP06/Papers/roweis-science-00.pdf|Nonlinear Dimensionality Reduction by Locally Linear Embedding], Science, 2000, 290:2323–2326. </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''12/5''': Mining Link Data ([{UP}course_dm12/Lecture11.pdf|Download PDF]), Experiment Design and Analysis ([{UP}course_dm12/LectureA.pdf|Download PDF]) </td><td> </td><td> Reading material:<br/> L. Getoor and C. Diehl. [^http://www.cs.umd.edu/~getoor/Publications/getoor-kddexp05.pdf|Link mining: A survey]. SIGKDD Explorations, 7(2):3-12, 2005.<br/> L. Page, et al. [^http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf|The PageRank citation ranking: Bringing order to the web]. Technic report, 1997. </td> </tr> <tr><td> </td></tr> <tr><td> '''12/12''': Discussion of Assignment 3 </td><td> </td> </tr> <tr><td> </td></tr> <tr><td valign="top"> '''12/19''': Some Applications ([{UP}course_dm12/Lecture12.pdf|Download PDF]) </td><td> </td><td> Reading material:<br/> Chapter 14 of the text book (Principles of Data Mining)<br/> M. Mitra, B. Chaudhuri. [^http://www.springerlink.com/index/g465l6775267g380.pdf|Information retrieval from documents: A survey]. Information Retrieval 2000. M. Lew, N. Sebe, C. Djeraba, R. Jain. [^http://www.liacs.nl/~mlew/mir.survey16b.pdf|Content-based multimedia information retrieval: State of the art and challenges]. TOMCCAP 2006.<br/> D. Lowe. [^http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf|Distinctive Image Features from Scale-Invariant Keypoints]. IJCV 2004.<br/> P. Viola and M. Jones. [^http://research.microsoft.com/en-us/um/people/viola/Pubs/Detect/violaJones_IJCV.pdf|Robust Real-time Object Detection], IJCV 2001. </td> </tr> <tr><td> </td></tr> <tr><td> '''12/26''': Discussion of Assignment 4 </td><td> </td> </tr> <tr><td> </td></tr> <tr><td> '''12/29''': Q & A (In office: 919) </td><td> </td> </tr> <tr><td> </td></tr> </table> ==Links== * [^http://www.cs.waikato.ac.nz/ml/weka/|Weka] An open source (Java) machine learning/data mining algorithms software. * [^http://www.r-project.org/|R Project] An open source platform for statistic computing using R script language. ** [^http://cran.r-project.org/web/views/MachineLearning.html|Machine learning packages for R] * [^http://www.sigkdd.org/|ACM SIGKDD] The website of the ACM Special Interest Group on Knowledge Discovery and Data Mining. ** [^http://dl.acm.org/citation.cfm?id=J721&picked=prox&cfid=140024035&cftoken=61726307|ACM SIGKDD Explorations Newsletters] A magazine of SIGKDD. * [^http://www.kdnuggets.com/|KDnuggets] A website for data mining resources. ==Major academic venues== * Journals in data mining: [^http://www.informatik.uni-trier.de/~ley/db/journals/datamine/index.html|DMKD], [^http://www.informatik.uni-trier.de/~ley/db/journals/tkdd/index.html|TKDD], [^http://www.informatik.uni-trier.de/~ley/db/journals/tkde/index.html|TKDE], [^http://www.informatik.uni-trier.de/~ley/db/journals/kais/index.html|KAIS] * Journals in machine learning: [^http://www.informatik.uni-trier.de/~ley/db/journals/ml/index.html|MLJ], [^http://www.informatik.uni-trier.de/~ley/db/journals/jmlr/index.html|JMLR] * Journals in database: [^http://www.informatik.uni-trier.de/~ley/db/journals/tods/index.html|TODS], [^http://www.informatik.uni-trier.de/~ley/db/journals/vldb/index.html|VLDBJ] * Journals in information retrieval: [^http://www.informatik.uni-trier.de/~ley/db/journals/tois/index.html|TOIS], [^http://www.informatik.uni-trier.de/~ley/db/journals/ipm/index.html|IP&M], [^http://www.informatik.uni-trier.de/~ley/db/journals/ir/index.html|IRJ] * Journals in web search and mining: [^http://www.informatik.uni-trier.de/~ley/db/journals/www/index.html|WWWJ], [^http://www.informatik.uni-trier.de/~ley/db/journals/toit/index.html|TOIT], [^http://www.informatik.uni-trier.de/~ley/db/journals/tweb/index.html|TWeb] * Conferences in data mining: [^http://www.informatik.uni-trier.de/~ley/db/conf/kdd/index.html|KDD], [^http://www.informatik.uni-trier.de/~ley/db/conf/icdm/index.html|ICDM], [^http://www.informatik.uni-trier.de/~ley/db/conf/sdm/index.html|SDM], [^http://www.informatik.uni-trier.de/~ley/db/conf/ecml/index.html|ECML/PKDD], [^http://www.informatik.uni-trier.de/~ley/db/conf/pakdd/index.html|PAKDD] * Conferences in machine learning: [^http://www.informatik.uni-trier.de/~ley/db/conf/icml/index.html|ICML], NIPS, [^http://www.informatik.uni-trier.de/~ley/db/conf/colt/index.html|COLT] * Conferences in database: [^http://www.informatik.uni-trier.de/~ley/db/conf/sigmod/index.html|SIGMOD], [^http://www.informatik.uni-trier.de/~ley/db/journals/pvldb/index.html|VLDB], [^http://www.informatik.uni-trier.de/~ley/db/conf/icde/index.html|ICDE] * Conferences in information retrieval: [^http://www.informatik.uni-trier.de/~ley/db/conf/sigir/index.html|SIGIR], [^http://www.informatik.uni-trier.de/~ley/db/conf/cikm/index.html|CIKM] * Conferences in web search and mining: [^http://www.informatik.uni-trier.de/~ley/db/conf/www/index.html|WWW], [^http://www.informatik.uni-trier.de/~ley/db/conf/wsdm/index.html|WSDM]
The end