Netflix Prize Dataset

Context:
- It can have:
  - 480,189 User ID's (or 9,500)
  - 17,770 movies
  - 100,480,507 ratings collected from October 1998 to December 2005.
- It can be a 2GB+ compressed dataset.
- …
Counter-Example(s):
- MovieLens Dataset.
See: Collaborative Filtering.

References

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Netflix_Prize Retrieved:2017-7-20.
- The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users or the films being identified except by numbers assigned for the contest.
  The competition was held by Netflix, an online DVD-rental and video streaming service, and was open to anyone who is neither connected with Netflix (current and former employees, agents, close relatives of Netflix employees, etc.) nor a resident of certain blocked countries (such as Cuba or North Korea). On September 21, 2009, the grand prize of was given to the BellKor's Pragmatic Chaos team which bested Netflix's own algorithm for predicting ratings by 10.06%.

Ilya Grigorik. (2006). “Dissecting the Netflix Dataset." posted October 29, 2006
- QUOTE: In case you haven't heard, Netflix announced a public competition ($1 million prize) for a general purpose machine learning algorithm to predict movie ratings based on users' history (with the assumption, that we can learn from similar users). Now, the prize is nice, but the dataset that they released on its own caused quiet a stir in the Computer Science/Data Mining community - it is orders of magnitude larger than anything that was available before! Here are some quick stats:
  480,189 User ID's, 17,770 movies, 100,480,507 ratings collected from October 1998 to December 2005.
  In compressed version it fits on a single CD, once uncompressed it becomes a hefty 2GB+ dataset.