6| Book-Crossing DatasetĪbout: Book-Crossing Dataset is a 4-week crawl dataset from the Book-Crossing community. With the help of this dataset, one can predict missing entries in the movie-user rating matrix.Ĭlick here to know more. There are over 4,80,000 customers in the dataset, where each is identified by a unique integer id.
The Netflix Prize dataset consists of about 100 million movie ratings. 5| Netflix Prize DatasetĪbout: Netflix Prize dataset is the multivariate, time-series dataset which was used in the Netflix Prize competition. FMA provides full-length and high-quality audio, pre-computed features, together with the track- and user-level metadata, tags, and free-form text such as biographies.Ĭlick here to know more. It contains 917 GiB and 343 days of Creative Commons-licensed audio from 1,06,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres. The dataset is suitable for evaluating several tasks in MIR, a field which is concerned with browsing, searching, and organizing vast music collections.
4| Free Music ArchiveĪbout: Free Music Archive (FMA) is a collection of high-quality, legal audio downloads for music analysis. The purpose of this dataset is to encourage research on algorithms that scale to commercial sizes, provide a reference dataset for evaluating research, help new researchers get started in the MIR field, and more.Ĭlick here to know more. Provided by Echo Nest, the core of this dataset is the feature analysis and metadata for one million songs. 3| Million Song DatasetĪbout: Million Song Dataset is a collection of audio features and metadata for a million contemporary popular music tracks. With the help of this dataset, one can train a machine learning model, which can predict which human is more influential with high accuracy.Ĭlick here to know more. This includes the volume of interactions, number of followers, etc. Here, each datapoint describes two individuals and the pre-computed, standardized features based on twitter activity. 2| Social Network InfluencerĪbout: This dataset is provided by Peerindex, which comprises a standard, pairwise preference learning task. These data were created by 1,62,541 users between 9 January 1995, and 21 November 2019.Ĭlick here to know more. MovieLens 25M movie rating dataset describes 5-star rating and free-text tagging activity from MovieLens, which contains 2,50,00,095 ratings and 10,93,360 tag applications across 62,423 movies. Apply> 1| MovieLens 25M DatasetĪbout: MovieLens is a rating data set from the MovieLens website, which has been collected over several periods.
In this article, we list down – in no particular order – ten datasets one must know to build recommender systems.
Popular online platforms such as Facebook, Netflix, Myntra, among others, have been using this technology in many ways. This system predicts and estimates the preferences of a user’s content.
Be it watching a web series or shopping online, recommender systems work as time-savers for many.