14 datasets found
  1. User reviews of 16 movies on Rotten Tomatoes

    • kaggle.com
    Updated Apr 11, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jonsteve (2018). User reviews of 16 movies on Rotten Tomatoes [Dataset]. https://www.kaggle.com/datasets/jonsteve/user-reviews-of-16-movies-on-rotten-tomatoes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2018
    Dataset provided by
    Kaggle
    Authors
    jonsteve
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by jonsteve

    Released under CC0: Public Domain

    Contents

  2. Netflix Series Data Rotten Tomatoes

    • kaggle.com
    Updated Feb 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sachin_patel_01_01 (2022). Netflix Series Data Rotten Tomatoes [Dataset]. https://www.kaggle.com/datasets/sachinpatel0101/netflix-series-data-rotten-tomatoes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 23, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    sachin_patel_01_01
    Description

    Scrapped rotten tomatoes website for Netflix series data using Requests and BeautifulSoup libraries in python. It contains code and dataset obtained from web scrapping.

  3. ULMFiT for Rotten Tomatoes

    • kaggle.com
    Updated Jul 8, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nadja Rhodes (2018). ULMFiT for Rotten Tomatoes [Dataset]. https://www.kaggle.com/iconix/ulmfit-rt/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 8, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nadja Rhodes
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    As part of my OpenAI Scholars summer program, I wanted to try out the ULMFiT approach to text classification: http://nlp.fast.ai/classification/2018/05/15/introducting-ulmfit.html.

    ULMFiT has been described as a "state-of-the-art AWD LSTM" language model backbone or encoder with a linear classifier head or decoder.

    The language model released by Jeremy Howard and Sebastian Ruder comes pre-trained with WikiText-103, and optionally one can choose to fine-tune it with a corpus more related to the downstream task.

    The general idea is to first teach the model English (Wikipedia), then teach it about more specific writing (e.g., movie reviews). With that kind of prior knowledge, sentiment analysis should be a whole lot easier.

    Approach

    I initially tried fine-tuning the WikiText-103 language model on the complete sentences provided by the Rotten Tomatoes dataset from the Movie Review Sentiment Analysis Playground Competition - however, my classification results were lackluster.

    I got better results by fine-tuning first on the larger IMDB movie reviews dataset, then fine-tuning that on sentences from Rotten Tomatoes, then finally applying the linear head and classifying sentiment. The result of this process is the pre-trained model fwd_pretrain_aclImdb_clas_1.h5. It was pre-trained with scripts provided here. I executed the scripts in this approximate order:

    # fine-tune from WikiText-103 to IMDB
    python create_toks.py data/aclImdb/imdb_lm/
    python tok2id.py data/aclImdb/imdb_lm/
    python finetune_lm.py data/aclImdb/imdb_lm/ data/wt103/ 0 50 --lm-id pretrain_wt103 --early_stopping True
    
    # fine-tune from IMDB to RT
    python create_toks.py data/rt/rt_lm/
    python tok2id.py data/rt/rt_lm/
    python finetune_lm.py data/rt/rt_lm/ data/aclImdb/imdb_lm/ 0 50 --lm-id pretrain_aclImdb --early_stopping True --pretrain_id aclImdb
    
    # classify
    python train_clas.py data/rt/rt_clas/ 0 --lm-id pretrain_aclImdb --clas-id pretrain_aclImdb --lr 0.0001 --cl=25
    

    I then zipped up all the files necessary to run the kernel for competition submission.

    Conclusion

    To be honest, I was hoping for a more impressive result - my ok-ish result in the competition is likely a testament to the challenging task of assigning the same sentiment to all "phrases" of a sentence (down to single punctuation marks). Perhaps more epochs or time spent tinkering with parameters would help.

    Acknowledgements

    All credit goes to Jeremy Howard and Sebastian Ruder. Check out "Introducing state of the art text classification with universal language models" for more explanation, plus links to the paper, video, and code.

  4. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  5. Metacritic & Rotten Tomatoes Controversial Reviews

    • kaggle.com
    Updated Jan 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ellie Lockhart (2021). Metacritic & Rotten Tomatoes Controversial Reviews [Dataset]. http://doi.org/10.34740/kaggle/dsv/1894035
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 30, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ellie Lockhart
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    Companies who produce mass media often seek to set metrics for performance, like any employer, which determine whether projects are successful and whether the product should be continued - as well as whether those involve in its production should be rewarded. In the field of video games, this has led to the controversial practice of tying salary bonuses for developers to user and critic reactions to the product - usually as quantified by the website Metacritic. While the link between RottenTomatoes - the equivalent of Metacritic for film - and anyone's bottom line is somewhat less clear, it is clear that in recent years, these two websites - Metacritic for video games and RottenTomatoes for movies - have become ideological grounds for battle in the case of high profile games and movies.

    Most recently in the summer of 2020, the Playstation 4-exclusive video game The Last of Us Part II, produced by Sony and Naughty Dog, transformed its Metacritic user review page into what can only be described after some study as a battlefield of obscenity and hatred. This instance of "review bombing" echoed what happened for Disney blockbusters Captain Marvel and, previously, Star Wars: Episode VIII: The Last Jedi. In all three cases, users diverged from largely positive (at least initial) critical reactions to launch full-on assaults with the intention of lowering the scores of the products, possibly to alter the behavior of the developers/filmmakers in the future.

    In all three of these case studies, a massive amount of reviews were generated - far more than titles that received a great deal of attention but were not subject to "review bombing." (Subsequently, I will provide examples of this disparity.) If companies are going to use publicly posted user reviews as a method of judging whether a title is a success, and certainly if these reviews factor into employee pay, understanding how to identify "review bomb" reviews which may not even originate with potential or real customers is crucial. In all three cases I cite of review bombing, e-celebrities on YouTube and anonymous users on grey-web sites played a role in driving people to post reviews. While my initial survey of these reviews does not indicate that actual automation played a significant role in review bombing, it's quite likely false accounts were used to create multiple reviews, and that people in general were more motivated to post reviews than they were for other blockbuster titles. Thus, comparing these flashpoint films and games with less controversial ones could provide the opportunity to create an algorithmic way to determine the likelihood of a given review of an entertainment product having been influenced by a targeted campaign of the sort that applied in the case of The Last of Us Part II, The Last Jedi, and Captain Marvel.

    Content

    Over a period of three months (11/20-01/21), significantly after the release of the principal controversial titles contained within, I utilized Python scripting to obtain and render into a consistent schema user scores (rounded to the nearest integer in the case of RottenTomatoes; exact in the case of Metacritic), date of posting, and textual content (the review itself) of both highly contentious titles subjected to review bombing (The Last Jedi, Captain Marvel, The Last of Us Part II) as well as "control" examples illustrating the vast difference in number of reviews as well as content between even very successful or visible titles (for instance, Logan [2017] in film to contrast with Captain Marvel). In all, the following titles are included, from the following user review pages, with .csv files labeled accordingingly:

    Review Bombing Targets - Captain Marvel - RottenTomatoes - The Last of Us Part II - Metacritic - Star Wars: The Last Jedi - RottenTomatoes

    Playstation 4 Exclusive Games Not Known to Be Significantly Subject to Review Bombing - Dark Souls (remake) - Days Gone - Final Fantasy VII Remake - Ghost of Tsushima* - God of War (2018) - Gravity Rush 2 - Horizon: Zero Dawn - Killzone: Shadow Fall - The Order: 1886 - Red Dead Redemption 2 [not Playstation 4 exclusive; included due to thematic similarities with The Last of Us Part II) - Resident Evil 7 - Sekiro: Shadows Die Twice - Marvel's Spider-Man (PS4) - Until Dawn - Yakuza 0

    Control Films (all RottenTomatoes) - Logan (2017) - Inception (2010)

    • While Ghost of Tsushima was not review bombed, it was the first title released as a PS4 exclusive boxed title after The Last of Us Part II and my initial investigation has found that the controversy about the former bled directly into the latter, with ...
  6. Movies rating in 2016,2017

    • kaggle.com
    Updated Sep 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakibul Islam (2020). Movies rating in 2016,2017 [Dataset]. https://www.kaggle.com/rislam4/movies-rating-in-20162017
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 9, 2020
    Dataset provided by
    Kaggle
    Authors
    Rakibul Islam
    Description

    Movie rating by different websites in 2016 and 2017.

    Column Information

    movie = the name of the movie year = the release year of the movie metascore = the Metacritic rating of the movie (the "Metascore" - critic score) imdb = the IMDB rating of the movie (user score) tmeter = the Rotten Tomatoes rating of the movie (the "Tomatometer" - critic score) audience = the Rotten Tomatoes rating of the movie (user score) fandango = the Fandango rating of the movie (user score) n_metascore = the Metascore normalized to a 0-5 scale n_imdb = the IMDB rating normalized to a 0-5 scale n_tmeter = the Tomatometer normalized to a 0-5 scale n_audience = the Rotten Tomatoes user score normalized to a 0-5 scale nr_metascore = the Metascore normalized to a 0-5 scale and rounded to the nearest 0.5 nr_imdb = the IMDB rating normalized to a 0-5 scale and rounded to the nearest 0.5 nr_tmeter = the Tomatometer normalized to a 0-5 scale and rounded to the nearest 0.5 nr_ normalized to a 0-5 scale and rounded to the nearest 0.5 nr_audience = the Rotten Tomatoes user score normalized to a 0-5 scale and rounded to the nearest 0.5

  7. Netflix Movies and TV shows

    • kaggle.com
    Updated Sep 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhay Kumar (2020). Netflix Movies and TV shows [Dataset]. https://www.kaggle.com/absin7/netflix-movies-and-tv-shows/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 30, 2020
    Dataset provided by
    Kaggle
    Authors
    Abhay Kumar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset consists of tv shows and movies available on Netflix as of 2019. The dataset is collected from Flixable which is a third-party Netflix search engine.

    In 2018, they released an interesting report which shows that the number of TV shows on Netflix has nearly tripled since 2010. The streaming service’s number of movies has decreased by more than 2,000 titles since 2010, while its number of TV shows has nearly tripled. It will be interesting to explore what all other insights can be obtained from the same dataset.

    Integrating this dataset with other external datasets such as IMDB ratings, rotten tomatoes can also provide many interesting findings.

    Inspiration Some of the interesting questions (tasks) which can be performed on this dataset -

    Understanding what content is available in different countries Identifying similar content by matching text-based features Network analysis of Actors / Directors and find interesting insights Is Netflix has increasingly focusing on TV rather than movies in recent years?

  8. FiveThirtyEight Fandango Dataset

    • kaggle.com
    zip
    Updated Apr 26, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FiveThirtyEight (2019). FiveThirtyEight Fandango Dataset [Dataset]. https://www.kaggle.com/fivethirtyeight/fivethirtyeight-fandango-dataset
    Explore at:
    zip(14758 bytes)Available download formats
    Dataset updated
    Apr 26, 2019
    Dataset authored and provided by
    FiveThirtyEighthttps://abcnews.go.com/538
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Content

    Fandango

    This directory contains the data behind the story Be Suspicious Of Online Movie Ratings, Especially Fandango’s.

    fandango_score_comparison.csv contains every film that has a Rotten Tomatoes rating, a RT User rating, a Metacritic score, a Metacritic User score, and IMDb score, and at least 30 fan reviews on Fandango. The data from Fandango was pulled on Aug. 24, 2015.

    ColumnDefinition
    FILMThe film in question
    RottenTomatoesThe Rotten Tomatoes Tomatometer score for the film
    RottenTomatoes_UserThe Rotten Tomatoes user score for the film
    MetacriticThe Metacritic critic score for the film
    Metacritic_UserThe Metacritic user score for the film
    IMDBThe IMDb user score for the film
    Fandango_StarsThe number of stars the film had on its Fandango movie page
    Fandango_RatingvalueThe Fandango ratingValue for the film, as pulled from the HTML of each page. This is the actual average score the movie obtained.
    RT_normThe Rotten Tomatoes Tomatometer score for the film , normalized to a 0 to 5 point system
    RT_user_normThe Rotten Tomatoes user score for the film , normalized to a 0 to 5 point system
    Metacritic_normThe Metacritic critic score for the film, normalized to a 0 to 5 point system
    Metacritic_user_nomThe Metacritic user score for the film, normalized to a 0 to 5 point system
    IMDB_normThe IMDb user score for the film, normalized to a 0 to 5 point system
    RT_norm_roundThe Rotten Tomatoes Tomatometer score for the film , normalized to a 0 to 5 point system and rounded to the nearest half-star
    RT_user_norm_roundThe Rotten Tomatoes user score for the film , normalized to a 0 to 5 point system and rounded to the nearest half-star
    Metacritic_norm_roundThe Metacritic critic score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star
    Metacritic_user_norm_roundThe Metacritic user score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star
    IMDB_norm_roundThe IMDb user score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star
    Metacritic_user_vote_countThe number of user votes the film had on Metacritic
    IMDB_user_vote_countThe number of user votes the film had on IMDb
    Fandango_votesThe number of user votes the film had on Fandango
    Fandango_DifferenceThe difference between the presented Fandango_Stars and the actual Fandango_Ratingvalue

    fandango_scrape.csv contains every film we pulled from Fandango.

    ColumnDefiniton
    FILMThe movie
    STARSNumber of stars presented on Fandango.com
    RATINGThe Fandango ratingValue for the film, as pulled from the HTML of each page. This is the actual average score the movie obtained.
    VOTESnumber of people who had reviewed the film at the time we pulled it.

    Context

    This is a dataset from FiveThirtyEight hosted on their GitHub. Explore FiveThirtyEight data using Kaggle and all of the data sources available through the FiveThirtyEight organization page!

    • Update Frequency: This dataset is updated daily.

    Acknowledgements

    This dataset is maintained using GitHub's API and Kaggle's API.

    This dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.

  9. Pixar Movies

    • kaggle.com
    Updated Oct 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rummage Labs (2024). Pixar Movies [Dataset]. https://www.kaggle.com/datasets/rummagelabs/pixar-movies
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 26, 2024
    Dataset provided by
    Kaggle
    Authors
    Rummage Labs
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Pixar Movies Dataset

    A comprehensive dataset of Pixar movies, including details on their release dates, directors, writers, cast, box office performance, and ratings. This dataset is gathered from official sources, including Pixar, Rotten Tomatoes, and IMDb, to provide accurate and relevant information for anyone interested in analyzing Pixar's films.

    About Pixar Movies

    Pixar Animation Studios, known for its quality animation and storytelling, has produced a series of animated movies that have captivated audiences around the world. This dataset captures key details from Pixar’s filmography, including box office earnings, critical ratings, and character information, making it a valuable resource for those analyzing trends in animation, its movie plot lines and beloved characters, and movie ratings. For more information, visit Pixar, Rotten Tomatoes, and IMDb.

    Dataset Information

    • Source: Data is compiled from public sources, including official information from Pixar, Rotten Tomatoes, IMDb, and Wikipedia. Cells are each derived from one or more sources and then selected/verified.
    • Purpose: The dataset is intended for research, educational, and analytical purposes.
    • Accuracy: Efforts have been made to ensure accuracy, though users are encouraged to verify individual data points for critical use.
    • Updates: This dataset captures information available up to the latest Pixar releases.

    Data Structure

    Dataset Columns

    ColumnDescription
    movieThe title of the Pixar movie
    date_releasedThe exact release date of the movie (e.g., YYYY-MM-DD)
    year_releasedThe year the movie was released (e.g., YYYY)
    length_minDuration of the movie in minutes
    plot_summaryA brief summary of the movie's plot
    directorThe name(s) of the director(s) of the movie
    writerThe name(s) of the writer(s) of the movie
    main_charactersList of main characters featured in the movie
    type_of_charactersDescription of the types of characters (e.g., human, toys, animals, vehicles)
    main_voice_actorsList of actors who voiced the main characters
    opening_weekend_box_office_salesGross box office earnings on the opening weekend in USD
    total_worldwide_gross_salesTotal gross box office earnings worldwide in USD
    rotten_tomatoes_ratingRotten Tomatoes rating, typically out of 100
    imdb_ratingIMDb rating, typically out of 10
    movie_genrePrimary genre(s) of the movie (e.g., Animation, Adventure, Comedy)
    movie_ratingThe movie’s rating (e.g., G, PG, PG-13)

    This data was compiled, enriched, reviewed, and curated using Research by Rummage Labs. Research by Rummage Labs enables you to curate verified datasets to power your enterprise. Read more here: https://rummagelabs.com/.

  10. Netflix Movies and TV Shows cleansed

    • kaggle.com
    zip
    Updated Mar 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jack Kerschner (2021). Netflix Movies and TV Shows cleansed [Dataset]. https://www.kaggle.com/jackkerschner/netflix-movies-and-tv-shows-cleansed
    Explore at:
    zip(9465254 bytes)Available download formats
    Dataset updated
    Mar 15, 2021
    Authors
    Jack Kerschner
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    First and foremost credit to Shivam Bansal for posting the original dataset.

    This version addresses the issue of comma separated in records values for cast, director, genre, and country. Each table can be joined using show_id as the primary/foreign key.

    I used this version of the data to generate this viz and would love to see someone integrate it with IMDB or Rotten Tomatoes ratings data to make an improvement over mine.

  11. Netflix Ratings 2021

    • kaggle.com
    Updated Mar 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Toshini (2021). Netflix Ratings 2021 [Dataset]. https://www.kaggle.com/toshini/netflix-ratings-2021/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2021
    Dataset provided by
    Kaggle
    Authors
    Toshini
    Description

    ******Movies and Tv shows on Netflix - 2021******

    This dataset consists of Movies and Tv shows available on Netflix as of 2021. Ratings for Movies and Tv shows are given based on IMDB and Rotten Tomatoes.

    • IMDB ratings are between 1 - 10.
    • Rotten Tomatoes ratings are between 1 - 5.
  12. Movie Dataset

    • kaggle.com
    Updated Apr 2, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chen Ni (2019). Movie Dataset [Dataset]. https://www.kaggle.com/nichen301/movie-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 2, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Chen Ni
    Description

    Context

    This is originally the dataset for the week 4 project of the course Linear Regression and Modeling by Duke University on Coursera.

    Content

    The data set is comprised of 651 randomly sampled movies produced and released before 2016.

    Some of these variables are only there for informational purposes and do not make any sense to include in a statistical analysis. It is up to you to decide which variables are meaningful and which should be omitted. For example information in the the actor1 through actor5 variables was used to determine whether the movie casts an actor or actress who won a best actor or actress Oscar.

    You might also choose to omit certain observations or restructure some of the variables to make them suitable for answering your research questions.

    When you are fitting a model you should also be careful about collinearity, as some of these variables may be dependent on each other.

    title: Title of movie

    title_type: Type of movie (Documentary, Feature Film, TV Movie)

    genre: Genre of movie (Action & Adventure, Comedy, Documentary, Drama, Horror, Mystery & Suspense, Other)

    runtime: Runtime of movie (in minutes)

    mpaa_rating: MPAA rating of the movie (G, PG, PG-13, R, Unrated)

    studio: Studio that produced the movie

    thtr_rel_year: Year the movie is released in theaters

    thtr_rel_month: Month the movie is released in theaters

    thtr_rel_day: Day of the month the movie is released in theaters

    dvd_rel_year: Year the movie is released on DVD

    dvd_rel_month: Month the movie is released on DVD

    dvd_rel_day: Day of the month the movie is released on DVD

    imdb_rating: Rating on IMDB

    imdb_num_votes: Number of votes on IMDB

    critics_rating: Categorical variable for critics rating on Rotten Tomatoes (Certified Fresh, Fresh, Rotten)

    critics_score: Critics score on Rotten Tomatoes

    audience_rating: Categorical variable for audience rating on Rotten Tomatoes (Spilled, Upright)

    audience_score: Audience score on Rotten Tomatoes

    best_pic_nom: Whether or not the movie was nominated for a best picture Oscar (no, yes)

    best_pic_win: Whether or not the movie won a best picture Oscar (no, yes)

    best_actor_win: Whether or not one of the main actors in the movie ever won an Oscar (no, yes) – note that this is not necessarily whether the actor won an Oscar for their role in the given movie

    best_actress_win: Whether or not one of the main actresses in the movie ever won an Oscar (no, yes) – not that this is not necessarily whether the actresses won an Oscar for their role in the given movie best_dir_win: Whether or not the director of the movie ever won an Oscar (no, yes) – not that this is not necessarily whether the director won an Oscar for the given movie

    top200_box: Whether or not the movie is in the Top 200 Box Office list on BoxOfficeMojo (no, yes)

    director: Director of the movie

    actor1: First main actor/actress in the abridged cast of the movie

    actor2: Second main actor/actress in the abridged cast of the movie

    actor3: Third main actor/actress in the abridged cast of the movie

    actor4: Fourth main actor/actress in the abridged cast of the movie

    actor5: Fifth main actor/actress in the abridged cast of the movie

    imdb_url: Link to IMDB page for the movie

    rt_url: Link to Rotten Tomatoes page for the movie

    Acknowledgements

    Source: Rotten Tomatoes and IMDB APIs.

  13. Cartoon dataset

    • kaggle.com
    Updated Mar 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ayushparwal2026 (2024). Cartoon dataset [Dataset]. https://www.kaggle.com/datasets/ayushparwal2026/cartoon-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 3, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ayushparwal2026
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Name of Cartoon: This column would contain the names of various cartoons or animated series. Examples include "SpongeBob SquarePants," "Tom and Jerry," "The Simpsons," "Pokemon," etc.

    Span Over the Years: This column would indicate the time period during which the cartoon aired or was produced. It could be represented as a range (e.g., "1999-2022") or specific years (e.g., "2001-2006, 2015-present").

    Rating: This column would contain the ratings of the cartoons. Ratings could be provided by various sources such as IMDb, Rotten Tomatoes, or specific rating agencies. Ratings could be numerical (e.g., out of 10) or categorical (e.g., G, PG, PG-13, etc.).

    Description: This column would include a brief description or summary of each cartoon. It would provide an overview of the storyline, main characters, genre, and any other relevant information about the cartoon.

  14. Fresh and Rotten Classification

    • kaggle.com
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swoyam Siddharth Nayak (2023). Fresh and Rotten Classification [Dataset]. https://www.kaggle.com/datasets/swoyam2609/fresh-and-stale-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Swoyam Siddharth Nayak
    License

    https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

    Description

    The Fresh and Rotten/Stale Fruits and Vegetables Classification Dataset is a comprehensive collection of high-quality images specifically curated for the purpose of training and evaluating classification models. This dataset is designed to aid in the development of computer vision algorithms that can accurately distinguish between fresh and rotten/stale produce.

    The dataset comprises a diverse range of fruits and vegetables commonly found in culinary settings, including apples, oranges, bananas, tomatoes, cucumbers, carrots, and more. Each item in the dataset is captured in multiple images, representing both fresh and rotten/stale states. The dataset encompasses a variety of fruit and vegetable types to ensure the generalization and robustness of the classification models.

    Key Features:

    1. Image Variety: The dataset contains a substantial number of images, with a significant variation in lighting conditions, angles, and backgrounds. This diversity helps to mimic real-world scenarios and challenges the classification models to be robust and accurate under various conditions.

    2. Freshness Levels: The dataset provides a clear distinction between fresh and rotten/stale states, allowing for the training of models capable of accurately identifying the level of decay in fruits and vegetables.

    3. Annotation: Each image in the dataset is carefully labeled with appropriate annotations indicating whether the item is fresh or rotten/stale. This enables supervised learning and facilitates the development of classification models.

    4. High-Quality Images: The dataset includes high-resolution images captured with professional-grade cameras. The images are meticulously edited to ensure clarity and eliminate noise, providing a solid foundation for training reliable classification models.

    5. Large Scale: With thousands of images available, the dataset offers a significant volume of data suitable for training deep learning models. This allows for more extensive training and validation, leading to more robust and accurate classification models.

    Potential Applications: The Fresh and Rotten/Stale Fruits and Vegetables Classification Dataset can be employed in a wide range of applications, including:

    1. Food Quality Inspection: The dataset can be used to develop computer vision systems for automated food quality inspection in production lines, enabling rapid identification and removal of rotten/stale produce.

    2. Smart Refrigeration Systems: By integrating the classification models trained on this dataset, smart refrigeration systems can automatically detect and alert users about the freshness of fruits and vegetables, helping to reduce food waste.

    3. Retail and E-commerce: Online grocery stores and retail outlets can utilize the dataset to enhance their product categorization and inventory management systems, ensuring only fresh produce is made available to customers.

    4. Agriculture and Farming: The dataset can aid in the development of computer vision systems for farmers, enabling early detection of spoilage in crops and assisting in timely intervention to minimize losses.

    By utilizing the Fresh and Rotten/Stale Fruits and Vegetables Classification Dataset, researchers and developers can advance the field of computer vision, leading to improved food quality assessment, reduced food waste, and enhanced agricultural practices.

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
jonsteve (2018). User reviews of 16 movies on Rotten Tomatoes [Dataset]. https://www.kaggle.com/datasets/jonsteve/user-reviews-of-16-movies-on-rotten-tomatoes
Organization logo

User reviews of 16 movies on Rotten Tomatoes

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 11, 2018
Dataset provided by
Kaggle
Authors
jonsteve
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Dataset

This dataset was created by jonsteve

Released under CC0: Public Domain

Contents

Search
Clear search
Close search
Google apps
Main menu