Modified: 2014/09/28 12:27 by admin - Uncategorized
We create a Google Group for the discussions. If you have any question, please post it there so everybody can see our reply, and try not to send us your questions directly via emails.


The Task: Implement the K-Nearest Neighbor Search by Locality-Sensitive Hashing algorithms

  • For the information retrieval task, given an object set {x1,x2,...,xn} and a query q, you need to return a list of K most relevant object. However, with a very large n, the brute-force search will take a very long time.
  • Locality-Sensitive Hashing (named as LSH) algorithms was proposed for this task. In this assignment, you are asked to implement the LSH method for K-nearest neighbor search.
  • Let X be the previous forum posts dataset, you need to extract the first 20 lines from each original txt file and put these posts into a query set Q (200 posts). Then for any post q in Q, you need to return the K most similar posts D={d1,d2,...,dK} in the set X-Q. Then calculate the precision


       and record the time. At the same time, record the precision and time of brute-force search. Report the mean and standard deviation of precision and time. You need to conduct the experiment multiple times with K=10,20,30,40,50. (Here I(C) is an indicator function which equals to 1 if C is true, 0 otherwise)
  • Write a brief report to show your results.


Benchmark Dataset

The same as Assignment 1.


Programming Language

  • The choice you have made in the first assignment



  • Please use this MSWord template to report your results.
  • Do NOT plagiarize, plagiarism will be seriously penalized: You should be careful on writing your report. Whenever you are using words and works of others, citations should be made clear such that one can tell which part is actually yours. Details about how to identify a plagiarism can be found in "Introduction to the Guidelines for Handling Plagiarism Complaints".
  • Do NOT falsify results, data fraud will be even more seriously penalized: You should honestly record your results in the report, NEVER EVER modify the performance results manually.
  • Pack your report and code into a zip file named with your student ID, e.g., ''. If you have multiple submissions, add an extra '_' with a number, such as ''. We will use the the version with the largest number as your final submission.

    • The file format should be zip, no other format is acceptable!
    • NO submission after the deadline is acceptable!
    • NO email submission will be accepted!

Upload your file to FTP: (please use FTP software to upload, do not use Windows Explorer or IE)
username/password: You will be informed in the first class



We will evaluate your submission according to your implementation and report.

For implementation :
  • Efficiency
  • Performance
  • Code style

For report:
  • Technique: clearly explain all the component you used in your implementation
  • Language: concise, precise, and logical.

If plagiarism is identified, no scores will be given to this report.


Contact TA

Mr. Qing Da and Mr. Yue Zhu

Back to assignment homepage
Back to course homepage