Assignment 7

Modified: 2013/09/17 02:02 by huangsj - Uncategorized

The Task: Movie Recommendation

People may rate online for the movies they watched. From these ratings, the preference of a user to different movies is expected to be exploited, such that we can make accurate movie recommendations to the users. In this task, you are going to predict the rating of a user to a movie based on a set of observed ratings.
  • What you have: a set of ratings given by different users to movies.
  • What you are asked to do: for unseen user-movie pairs, predict which grade the user will rate for the movie.
  • The target: achieve small root mean square error (RSME) on the predictions.



download (5.6MB)
  • The data contains 2 files: "train.txt" and "test.txt".
  • "train.txt" has 1000000 lines, each for a rating of a user to a movie.
  • One rating item consists of three columns: user ID, movie ID and the grade.
  • The integral grades ranges from 1 to 5. The higher the grade, the more the user likes the movie.
  • "test.txt" has 250000 lines, each line contains only user ID and movie ID.



  • Based on the training data, design an algorithm to predict the grades for the test users-movies pairs listed in the "test.txt" file.
  • Record the predicted grades in another file "test.rate", with 250000 lines aligned with "test.txt".
  • Submit the "test.rate" at, you can immediately find the performance of your algorithm and compare the result with those of your classmates.
  • Write a report to describe your method and implementation.
  • Submit your code and the report.

  • Try to improve your algorithm again and again to get a higher rank at the leaderboard. You can submit the predictions as many times as you want before the deadline.
  • Please use this MSWord template to write your report in Chinese with English abstract
  • Do NOT plagiarize, plagiarism will be seriously penalized: You should be careful on writing your report. Whenever you are using words and works of others, citations should be made clear such that one can tell which part is actually yours. Details about how to identify a plagiarism can be found in "Introduction to the Guidelines for Handling Plagiarism Complaints".



  • Submit the predictions ("test.rate") at, please remember the password you entered at the first time, it will be needed for later submission.
  • Pack all the files needed to be submitted, e.g., report.docx,
  • Name this pack using your student ID, e.g., ''.

The file format should be zip, no other format is acceptable.
NO submission after the deadline is acceptable!
NO email submission is acceptable!

Upload your file to FTP: (please use FTP software to upload, do not use Windows Explorer or IE)
username: mg_dm13
password: mg_dm13



We will evaluate your submission according to the performance of your algorithm and the report.

For the algorithm:
We will evaluate the performance of your algorithm with the root mean square error on the predictions. You can find the rank of your submission from the leaderboard at Higher rank gets higher scores. Please note that the performance you see at the the leaderboard is calculated from *half* of your predictions, and the final score of your submission will be based on the results on all of your predictions (they are usually consistent).

For report:
Technique: clearly explain why you choose such method, how you implement the method, and how the method perform on this data mining task
Language: concise, precise, and logical.
Organization: good structure, clearly and properly separated sections and paragraphs.
Citations: all works of non-yourself should have correct references.

If plagiarism is identified, no scores will be given to this report.


Contact TA

Mr. Sheng-Jun Huang and Mr. Qing Da

Back to assignment homepage
Back to course homepage