Saved datasets
Last updated
Download format
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Free
Cost to access
Described as free to access or have a license that allows redistribution.
100+ datasets found
  1. The Movies Dataset

    • www.kaggle.com
    • wb.n3ncloud.co.kr
    zip
    Updated Nov 10, 2017
  2. World Development Indicators

    • www.kaggle.com
    • wb.n3ncloud.co.kr
    zip
    Updated May 1, 2017
  3. Lego Sets

    • www.kaggle.com
    zip
    Updated May 18, 2018
  4. Framingham Heart study dataset

    • www.kaggle.com
    zip
    Updated May 25, 2021
  5. The Brackish Dataset

    • www.kaggle.com
    zip
    Updated Aug 25, 2020
  6. Solar Radiation Prediction

    • www.kaggle.com
    • wb.n3ncloud.co.kr
    zip
    Updated May 21, 2017
  7. U.S. Opiate Prescriptions/Overdoses

    • www.kaggle.com
    • wb.n3ncloud.co.kr
    zip
    Updated Nov 14, 2019
  8. Housing Dataset

    • www.kaggle.com
    zip
    Updated Mar 7, 2019
  9. Boston Weather Data Jan 2013 - Apr 2018

    • www.kaggle.com
    zip
    Updated Apr 10, 2018
  10. Spoken Language Identification

    • www.kaggle.com
    zip
    Updated Jul 5, 2018
  11. Alexa Dataset

    • www.kaggle.com
    zip
    Updated Apr 22, 2018
  12. Bengali Sign Language Dataset

    • www.kaggle.com
    zip
    Updated Mar 8, 2020
  13. World Bank Climate Change Data

    • www.kaggle.com
    zip
    Updated May 16, 2019
  14. Cryptocurrency Historical Prices

    • www.kaggle.com
    zip
    Updated Feb 28, 2021
  15. Silicon Valley Diversity Data

    • www.kaggle.com
    zip
    Updated Jun 27, 2018
  16. Kiva.DHS.v5

    • www.kaggle.com
    zip
    Updated May 11, 2018
  17. Old Newspapers

    • www.kaggle.com
    • wb.n3ncloud.co.kr
    zip
    Updated May 12, 2020
  18. Real and Fake Face Detection

    • www.kaggle.com
    zip
    Updated Jan 14, 2019
  19. Data from: Speech Accent Archive

    • www.kaggle.com
    • wb.n3ncloud.co.kr
    zip
    Updated Nov 6, 2017
  20. Europe Datasets

    • www.kaggle.com
    zip
    Updated Jun 14, 2019
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rounak Banik (2017). The Movies Dataset [Dataset]. https://www.kaggle.com/rounakbanik/the-movies-dataset
Organization logo

The Movies Dataset

Metadata on over 45,000 movies. 26 million ratings from over 270,000 users.

37 scholarly articles cite this dataset (View in Google Scholar)
zip(238862293 bytes)Available download formats
Dataset updated Nov 10, 2017
Authors
Rounak Banik
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Context

These files contain metadata for all 45,000 movies listed in the Full MovieLens Dataset. The dataset consists of movies released on or before July 2017. Data points include cast, crew, plot keywords, budget, revenue, posters, release dates, languages, production companies, countries, TMDB vote counts and vote averages.

This dataset also has files containing 26 million ratings from 270,000 users for all 45,000 movies. Ratings are on a scale of 1-5 and have been obtained from the official GroupLens website.

Content

This dataset consists of the following files:

movies_metadata.csv: The main Movies Metadata file. Contains information on 45,000 movies featured in the Full MovieLens dataset. Features include posters, backdrops, budget, revenue, release dates, languages, production countries and companies.

keywords.csv: Contains the movie plot keywords for our MovieLens movies. Available in the form of a stringified JSON Object.

credits.csv: Consists of Cast and Crew Information for all our movies. Available in the form of a stringified JSON Object.

links.csv: The file that contains the TMDB and IMDB IDs of all the movies featured in the Full MovieLens dataset.

links_small.csv: Contains the TMDB and IMDB IDs of a small subset of 9,000 movies of the Full Dataset.

ratings_small.csv: The subset of 100,000 ratings from 700 users on 9,000 movies.

The Full MovieLens Dataset consisting of 26 million ratings and 750,000 tag applications from 270,000 users on all the 45,000 movies in this dataset can be accessed here

Acknowledgements

This dataset is an ensemble of data collected from TMDB and GroupLens. The Movie Details, Credits and Keywords have been collected from the TMDB Open API. This product uses the TMDb API but is not endorsed or certified by TMDb. Their API also provides access to data on many additional movies, actors and actresses, crew members, and TV shows. You can try it for yourself here.

The Movie Links and Ratings have been obtained from the Official GroupLens website. The files are a part of the dataset available here

Inspiration

This dataset was assembled as part of my second Capstone Project for Springboard's Data Science Career Track. I wanted to perform an extensive EDA on Movie Data to narrate the history and the story of Cinema and use this metadata in combination with MovieLens ratings to build various types of Recommender Systems.

Both my notebooks are available as kernels with this dataset: The Story of Film and Movie Recommender Systems

Some of the things you can do with this dataset: Predicting movie revenue and/or movie success based on a certain metric. What movies tend to get higher vote counts and vote averages on TMDB? Building Content Based and Collaborative Filtering Based Recommendation Engines.

Search
Clear search
Close search
Google apps
Main menu