14 datasets found

User reviews of 16 movies on Rotten Tomatoes
kaggle.com
Updated Apr 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
jonsteve (2018). User reviews of 16 movies on Rotten Tomatoes [Dataset]. https://www.kaggle.com/datasets/jonsteve/user-reviews-of-16-movies-on-rotten-tomatoes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 11, 2018
Dataset provided by
Kaggle
Authors
jonsteve
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by jonsteve

Released under CC0: Public Domain

Contents
Netflix Series Data Rotten Tomatoes
kaggle.com
Updated Feb 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sachin_patel_01_01 (2022). Netflix Series Data Rotten Tomatoes [Dataset]. https://www.kaggle.com/datasets/sachinpatel0101/netflix-series-data-rotten-tomatoes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 23, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
sachin_patel_01_01
Description
Scrapped rotten tomatoes website for Netflix series data using Requests and BeautifulSoup libraries in python. It contains code and dataset obtained from web scrapping.
ULMFiT for Rotten Tomatoes
kaggle.com
Updated Jul 8, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nadja Rhodes (2018). ULMFiT for Rotten Tomatoes [Dataset]. https://www.kaggle.com/iconix/ulmfit-rt/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 8, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nadja Rhodes
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

As part of my OpenAI Scholars summer program, I wanted to try out the ULMFiT approach to text classification: http://nlp.fast.ai/classification/2018/05/15/introducting-ulmfit.html.

ULMFiT has been described as a "state-of-the-art AWD LSTM" language model backbone or encoder with a linear classifier head or decoder.

The language model released by Jeremy Howard and Sebastian Ruder comes pre-trained with WikiText-103, and optionally one can choose to fine-tune it with a corpus more related to the downstream task.

The general idea is to first teach the model English (Wikipedia), then teach it about more specific writing (e.g., movie reviews). With that kind of prior knowledge, sentiment analysis should be a whole lot easier.

Approach

I initially tried fine-tuning the WikiText-103 language model on the complete sentences provided by the Rotten Tomatoes dataset from the Movie Review Sentiment Analysis Playground Competition - however, my classification results were lackluster.

I got better results by fine-tuning first on the larger IMDB movie reviews dataset, then fine-tuning that on sentences from Rotten Tomatoes, then finally applying the linear head and classifying sentiment. The result of this process is the pre-trained model fwd_pretrain_aclImdb_clas_1.h5. It was pre-trained with scripts provided here. I executed the scripts in this approximate order:

# fine-tune from WikiText-103 to IMDB python create_toks.py data/aclImdb/imdb_lm/ python tok2id.py data/aclImdb/imdb_lm/ python finetune_lm.py data/aclImdb/imdb_lm/ data/wt103/ 0 50 --lm-id pretrain_wt103 --early_stopping True # fine-tune from IMDB to RT python create_toks.py data/rt/rt_lm/ python tok2id.py data/rt/rt_lm/ python finetune_lm.py data/rt/rt_lm/ data/aclImdb/imdb_lm/ 0 50 --lm-id pretrain_aclImdb --early_stopping True --pretrain_id aclImdb # classify python train_clas.py data/rt/rt_clas/ 0 --lm-id pretrain_aclImdb --clas-id pretrain_aclImdb --lr 0.0001 --cl=25

I then zipped up all the files necessary to run the kernel for competition submission.

Conclusion

To be honest, I was hoping for a more impressive result - my ok-ish result in the competition is likely a testament to the challenging task of assigning the same sentiment to all "phrases" of a sentence (down to single punctuation marks). Perhaps more epochs or time spent tinkering with parameters would help.

Acknowledgements

All credit goes to Jeremy Howard and Sebastian Ruder. Check out "Introducing state of the art text classification with universal language models" for more explanation, plus links to the paper, video, and code.
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Metacritic & Rotten Tomatoes Controversial Reviews
kaggle.com
Updated Jan 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ellie Lockhart (2021). Metacritic & Rotten Tomatoes Controversial Reviews [Dataset]. http://doi.org/10.34740/kaggle/dsv/1894035
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/1894035
Dataset updated
Jan 30, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ellie Lockhart
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

Companies who produce mass media often seek to set metrics for performance, like any employer, which determine whether projects are successful and whether the product should be continued - as well as whether those involve in its production should be rewarded. In the field of video games, this has led to the controversial practice of tying salary bonuses for developers to user and critic reactions to the product - usually as quantified by the website Metacritic. While the link between RottenTomatoes - the equivalent of Metacritic for film - and anyone's bottom line is somewhat less clear, it is clear that in recent years, these two websites - Metacritic for video games and RottenTomatoes for movies - have become ideological grounds for battle in the case of high profile games and movies.

Most recently in the summer of 2020, the Playstation 4-exclusive video game The Last of Us Part II, produced by Sony and Naughty Dog, transformed its Metacritic user review page into what can only be described after some study as a battlefield of obscenity and hatred. This instance of "review bombing" echoed what happened for Disney blockbusters Captain Marvel and, previously, Star Wars: Episode VIII: The Last Jedi. In all three cases, users diverged from largely positive (at least initial) critical reactions to launch full-on assaults with the intention of lowering the scores of the products, possibly to alter the behavior of the developers/filmmakers in the future.

In all three of these case studies, a massive amount of reviews were generated - far more than titles that received a great deal of attention but were not subject to "review bombing." (Subsequently, I will provide examples of this disparity.) If companies are going to use publicly posted user reviews as a method of judging whether a title is a success, and certainly if these reviews factor into employee pay, understanding how to identify "review bomb" reviews which may not even originate with potential or real customers is crucial. In all three cases I cite of review bombing, e-celebrities on YouTube and anonymous users on grey-web sites played a role in driving people to post reviews. While my initial survey of these reviews does not indicate that actual automation played a significant role in review bombing, it's quite likely false accounts were used to create multiple reviews, and that people in general were more motivated to post reviews than they were for other blockbuster titles. Thus, comparing these flashpoint films and games with less controversial ones could provide the opportunity to create an algorithmic way to determine the likelihood of a given review of an entertainment product having been influenced by a targeted campaign of the sort that applied in the case of The Last of Us Part II, The Last Jedi, and Captain Marvel.

Content

Over a period of three months (11/20-01/21), significantly after the release of the principal controversial titles contained within, I utilized Python scripting to obtain and render into a consistent schema user scores (rounded to the nearest integer in the case of RottenTomatoes; exact in the case of Metacritic), date of posting, and textual content (the review itself) of both highly contentious titles subjected to review bombing (The Last Jedi, Captain Marvel, The Last of Us Part II) as well as "control" examples illustrating the vast difference in number of reviews as well as content between even very successful or visible titles (for instance, Logan [2017] in film to contrast with Captain Marvel). In all, the following titles are included, from the following user review pages, with .csv files labeled accordingingly:

Review Bombing Targets - Captain Marvel - RottenTomatoes - The Last of Us Part II - Metacritic - Star Wars: The Last Jedi - RottenTomatoes

Playstation 4 Exclusive Games Not Known to Be Significantly Subject to Review Bombing - Dark Souls (remake) - Days Gone - Final Fantasy VII Remake - Ghost of Tsushima* - God of War (2018) - Gravity Rush 2 - Horizon: Zero Dawn - Killzone: Shadow Fall - The Order: 1886 - Red Dead Redemption 2 [not Playstation 4 exclusive; included due to thematic similarities with The Last of Us Part II) - Resident Evil 7 - Sekiro: Shadows Die Twice - Marvel's Spider-Man (PS4) - Until Dawn - Yakuza 0

Control Films (all RottenTomatoes) - Logan (2017) - Inception (2010)

While Ghost of Tsushima was not review bombed, it was the first title released as a PS4 exclusive boxed title after The Last of Us Part II and my initial investigation has found that the controversy about the former bled directly into the latter, with ...
Movies rating in 2016,2017
kaggle.com
Updated Sep 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rakibul Islam (2020). Movies rating in 2016,2017 [Dataset]. https://www.kaggle.com/rislam4/movies-rating-in-20162017
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 9, 2020
Dataset provided by
Kaggle
Authors
Rakibul Islam
Description
Movie rating by different websites in 2016 and 2017.

Column Information

movie = the name of the movie year = the release year of the movie metascore = the Metacritic rating of the movie (the "Metascore" - critic score) imdb = the IMDB rating of the movie (user score) tmeter = the Rotten Tomatoes rating of the movie (the "Tomatometer" - critic score) audience = the Rotten Tomatoes rating of the movie (user score) fandango = the Fandango rating of the movie (user score) n_metascore = the Metascore normalized to a 0-5 scale n_imdb = the IMDB rating normalized to a 0-5 scale n_tmeter = the Tomatometer normalized to a 0-5 scale n_audience = the Rotten Tomatoes user score normalized to a 0-5 scale nr_metascore = the Metascore normalized to a 0-5 scale and rounded to the nearest 0.5 nr_imdb = the IMDB rating normalized to a 0-5 scale and rounded to the nearest 0.5 nr_tmeter = the Tomatometer normalized to a 0-5 scale and rounded to the nearest 0.5 nr_ normalized to a 0-5 scale and rounded to the nearest 0.5 nr_audience = the Rotten Tomatoes user score normalized to a 0-5 scale and rounded to the nearest 0.5
Netflix Movies and TV shows
kaggle.com
Updated Sep 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhay Kumar (2020). Netflix Movies and TV shows [Dataset]. https://www.kaggle.com/absin7/netflix-movies-and-tv-shows/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 30, 2020
Dataset provided by
Kaggle
Authors
Abhay Kumar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset consists of tv shows and movies available on Netflix as of 2019. The dataset is collected from Flixable which is a third-party Netflix search engine.

In 2018, they released an interesting report which shows that the number of TV shows on Netflix has nearly tripled since 2010. The streaming service’s number of movies has decreased by more than 2,000 titles since 2010, while its number of TV shows has nearly tripled. It will be interesting to explore what all other insights can be obtained from the same dataset.

Integrating this dataset with other external datasets such as IMDB ratings, rotten tomatoes can also provide many interesting findings.

Inspiration Some of the interesting questions (tasks) which can be performed on this dataset -

Understanding what content is available in different countries Identifying similar content by matching text-based features Network analysis of Actors / Directors and find interesting insights Is Netflix has increasingly focusing on TV rather than movies in recent years?

FiveThirtyEight Fandango Dataset

kaggle.com

zip

Updated Apr 26, 2019

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

FiveThirtyEight (2019). FiveThirtyEight Fandango Dataset [Dataset]. https://www.kaggle.com/fivethirtyeight/fivethirtyeight-fandango-dataset

Explore at:

zip(14758 bytes)Available download formats

Dataset updated

Apr 26, 2019

Dataset authored and provided by

FiveThirtyEighthttps://abcnews.go.com/538

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Content

Fandango

This directory contains the data behind the story Be Suspicious Of Online Movie Ratings, Especially Fandango’s.

fandango_score_comparison.csv contains every film that has a Rotten Tomatoes rating, a RT User rating, a Metacritic score, a Metacritic User score, and IMDb score, and at least 30 fan reviews on Fandango. The data from Fandango was pulled on Aug. 24, 2015.

Column	Definition
FILM	The film in question
RottenTomatoes	The Rotten Tomatoes Tomatometer score for the film
RottenTomatoes_User	The Rotten Tomatoes user score for the film
Metacritic	The Metacritic critic score for the film
Metacritic_User	The Metacritic user score for the film
IMDB	The IMDb user score for the film
Fandango_Stars	The number of stars the film had on its Fandango movie page
Fandango_Ratingvalue	The Fandango ratingValue for the film, as pulled from the HTML of each page. This is the actual average score the movie obtained.
RT_norm	The Rotten Tomatoes Tomatometer score for the film , normalized to a 0 to 5 point system
RT_user_norm	The Rotten Tomatoes user score for the film , normalized to a 0 to 5 point system
Metacritic_norm	The Metacritic critic score for the film, normalized to a 0 to 5 point system
Metacritic_user_nom	The Metacritic user score for the film, normalized to a 0 to 5 point system
IMDB_norm	The IMDb user score for the film, normalized to a 0 to 5 point system
RT_norm_round	The Rotten Tomatoes Tomatometer score for the film , normalized to a 0 to 5 point system and rounded to the nearest half-star
RT_user_norm_round	The Rotten Tomatoes user score for the film , normalized to a 0 to 5 point system and rounded to the nearest half-star
Metacritic_norm_round	The Metacritic critic score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star
Metacritic_user_norm_round	The Metacritic user score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star
IMDB_norm_round	The IMDb user score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star
Metacritic_user_vote_count	The number of user votes the film had on Metacritic
IMDB_user_vote_count	The number of user votes the film had on IMDb
Fandango_votes	The number of user votes the film had on Fandango
Fandango_Difference	The difference between the presented Fandango_Stars and the actual Fandango_Ratingvalue

fandango_scrape.csv contains every film we pulled from Fandango.

Column	Definiton
FILM	The movie
STARS	Number of stars presented on Fandango.com
RATING	The Fandango ratingValue for the film, as pulled from the HTML of each page. This is the actual average score the movie obtained.
VOTES	number of people who had reviewed the film at the time we pulled it.

Context

This is a dataset from FiveThirtyEight hosted on their GitHub. Explore FiveThirtyEight data using Kaggle and all of the data sources available through the FiveThirtyEight organization page!

Update Frequency: This dataset is updated daily.

Acknowledgements

This dataset is maintained using GitHub's API and Kaggle's API.

This dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.

Pixar Movies

kaggle.com

Updated Oct 26, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Rummage Labs (2024). Pixar Movies [Dataset]. https://www.kaggle.com/datasets/rummagelabs/pixar-movies

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 26, 2024

Dataset provided by

Kaggle

Authors

Rummage Labs

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Pixar Movies Dataset

A comprehensive dataset of Pixar movies, including details on their release dates, directors, writers, cast, box office performance, and ratings. This dataset is gathered from official sources, including Pixar, Rotten Tomatoes, and IMDb, to provide accurate and relevant information for anyone interested in analyzing Pixar's films.

About Pixar Movies

Pixar Animation Studios, known for its quality animation and storytelling, has produced a series of animated movies that have captivated audiences around the world. This dataset captures key details from Pixar’s filmography, including box office earnings, critical ratings, and character information, making it a valuable resource for those analyzing trends in animation, its movie plot lines and beloved characters, and movie ratings. For more information, visit Pixar, Rotten Tomatoes, and IMDb.

Dataset Information

Source: Data is compiled from public sources, including official information from Pixar, Rotten Tomatoes, IMDb, and Wikipedia. Cells are each derived from one or more sources and then selected/verified.
Purpose: The dataset is intended for research, educational, and analytical purposes.
Accuracy: Efforts have been made to ensure accuracy, though users are encouraged to verify individual data points for critical use.
Updates: This dataset captures information available up to the latest Pixar releases.

Data Structure

Dataset Columns

Column	Description
movie	The title of the Pixar movie
date_released	The exact release date of the movie (e.g., YYYY-MM-DD)
year_released	The year the movie was released (e.g., YYYY)
length_min	Duration of the movie in minutes
plot_summary	A brief summary of the movie's plot
director	The name(s) of the director(s) of the movie
writer	The name(s) of the writer(s) of the movie
main_characters	List of main characters featured in the movie
type_of_characters	Description of the types of characters (e.g., human, toys, animals, vehicles)
main_voice_actors	List of actors who voiced the main characters
opening_weekend_box_office_sales	Gross box office earnings on the opening weekend in USD
total_worldwide_gross_sales	Total gross box office earnings worldwide in USD
rotten_tomatoes_rating	Rotten Tomatoes rating, typically out of 100
imdb_rating	IMDb rating, typically out of 10
movie_genre	Primary genre(s) of the movie (e.g., Animation, Adventure, Comedy)
movie_rating	The movie’s rating (e.g., G, PG, PG-13)

This data was compiled, enriched, reviewed, and curated using Research by Rummage Labs. Research by Rummage Labs enables you to curate verified datasets to power your enterprise. Read more here: https://rummagelabs.com/.

Netflix Movies and TV Shows cleansed
kaggle.com
zip
Updated Mar 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jack Kerschner (2021). Netflix Movies and TV Shows cleansed [Dataset]. https://www.kaggle.com/jackkerschner/netflix-movies-and-tv-shows-cleansed
Explore at:
zip(9465254 bytes)Available download formats
Dataset updated
Mar 15, 2021
Authors
Jack Kerschner
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
First and foremost credit to Shivam Bansal for posting the original dataset.

This version addresses the issue of comma separated in records values for cast, director, genre, and country. Each table can be joined using show_id as the primary/foreign key.

I used this version of the data to generate this viz and would love to see someone integrate it with IMDB or Rotten Tomatoes ratings data to make an improvement over mine.
Netflix Ratings 2021
kaggle.com
Updated Mar 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Toshini (2021). Netflix Ratings 2021 [Dataset]. https://www.kaggle.com/toshini/netflix-ratings-2021/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2021
Dataset provided by
Kaggle
Authors
Toshini
Description
******Movies and Tv shows on Netflix - 2021******

This dataset consists of Movies and Tv shows available on Netflix as of 2021. Ratings for Movies and Tv shows are given based on IMDB and Rotten Tomatoes.

IMDB ratings are between 1 - 10.

Rotten Tomatoes ratings are between 1 - 5.
Movie Dataset
kaggle.com
Updated Apr 2, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chen Ni (2019). Movie Dataset [Dataset]. https://www.kaggle.com/nichen301/movie-data/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 2, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Chen Ni
Description
Context

This is originally the dataset for the week 4 project of the course Linear Regression and Modeling by Duke University on Coursera.

Content

The data set is comprised of 651 randomly sampled movies produced and released before 2016.

Some of these variables are only there for informational purposes and do not make any sense to include in a statistical analysis. It is up to you to decide which variables are meaningful and which should be omitted. For example information in the the actor1 through actor5 variables was used to determine whether the movie casts an actor or actress who won a best actor or actress Oscar.

You might also choose to omit certain observations or restructure some of the variables to make them suitable for answering your research questions.

When you are fitting a model you should also be careful about collinearity, as some of these variables may be dependent on each other.

title: Title of movie

title_type: Type of movie (Documentary, Feature Film, TV Movie)

genre: Genre of movie (Action & Adventure, Comedy, Documentary, Drama, Horror, Mystery & Suspense, Other)

runtime: Runtime of movie (in minutes)

mpaa_rating: MPAA rating of the movie (G, PG, PG-13, R, Unrated)

studio: Studio that produced the movie

thtr_rel_year: Year the movie is released in theaters

thtr_rel_month: Month the movie is released in theaters

thtr_rel_day: Day of the month the movie is released in theaters

dvd_rel_year: Year the movie is released on DVD

dvd_rel_month: Month the movie is released on DVD

dvd_rel_day: Day of the month the movie is released on DVD

imdb_rating: Rating on IMDB

imdb_num_votes: Number of votes on IMDB

critics_rating: Categorical variable for critics rating on Rotten Tomatoes (Certified Fresh, Fresh, Rotten)

critics_score: Critics score on Rotten Tomatoes

audience_rating: Categorical variable for audience rating on Rotten Tomatoes (Spilled, Upright)

audience_score: Audience score on Rotten Tomatoes

best_pic_nom: Whether or not the movie was nominated for a best picture Oscar (no, yes)

best_pic_win: Whether or not the movie won a best picture Oscar (no, yes)

best_actor_win: Whether or not one of the main actors in the movie ever won an Oscar (no, yes) – note that this is not necessarily whether the actor won an Oscar for their role in the given movie

best_actress_win: Whether or not one of the main actresses in the movie ever won an Oscar (no, yes) – not that this is not necessarily whether the actresses won an Oscar for their role in the given movie best_dir_win: Whether or not the director of the movie ever won an Oscar (no, yes) – not that this is not necessarily whether the director won an Oscar for the given movie

top200_box: Whether or not the movie is in the Top 200 Box Office list on BoxOfficeMojo (no, yes)

director: Director of the movie

actor1: First main actor/actress in the abridged cast of the movie

actor2: Second main actor/actress in the abridged cast of the movie

actor3: Third main actor/actress in the abridged cast of the movie

actor4: Fourth main actor/actress in the abridged cast of the movie

actor5: Fifth main actor/actress in the abridged cast of the movie

imdb_url: Link to IMDB page for the movie

rt_url: Link to Rotten Tomatoes page for the movie

Acknowledgements

Source: Rotten Tomatoes and IMDB APIs.
Cartoon dataset
kaggle.com
Updated Mar 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ayushparwal2026 (2024). Cartoon dataset [Dataset]. https://www.kaggle.com/datasets/ayushparwal2026/cartoon-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 3, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ayushparwal2026
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Name of Cartoon: This column would contain the names of various cartoons or animated series. Examples include "SpongeBob SquarePants," "Tom and Jerry," "The Simpsons," "Pokemon," etc.

Span Over the Years: This column would indicate the time period during which the cartoon aired or was produced. It could be represented as a range (e.g., "1999-2022") or specific years (e.g., "2001-2006, 2015-present").

Rating: This column would contain the ratings of the cartoons. Ratings could be provided by various sources such as IMDb, Rotten Tomatoes, or specific rating agencies. Ratings could be numerical (e.g., out of 10) or categorical (e.g., G, PG, PG-13, etc.).

Description: This column would include a brief description or summary of each cartoon. It would provide an overview of the storyline, main characters, genre, and any other relevant information about the cartoon.
Fresh and Rotten Classification
kaggle.com
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swoyam Siddharth Nayak (2023). Fresh and Rotten Classification [Dataset]. https://www.kaggle.com/datasets/swoyam2609/fresh-and-stale-classification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 7, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Swoyam Siddharth Nayak
License
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
Description
The Fresh and Rotten/Stale Fruits and Vegetables Classification Dataset is a comprehensive collection of high-quality images specifically curated for the purpose of training and evaluating classification models. This dataset is designed to aid in the development of computer vision algorithms that can accurately distinguish between fresh and rotten/stale produce.

The dataset comprises a diverse range of fruits and vegetables commonly found in culinary settings, including apples, oranges, bananas, tomatoes, cucumbers, carrots, and more. Each item in the dataset is captured in multiple images, representing both fresh and rotten/stale states. The dataset encompasses a variety of fruit and vegetable types to ensure the generalization and robustness of the classification models.

Key Features:

Image Variety: The dataset contains a substantial number of images, with a significant variation in lighting conditions, angles, and backgrounds. This diversity helps to mimic real-world scenarios and challenges the classification models to be robust and accurate under various conditions.

Freshness Levels: The dataset provides a clear distinction between fresh and rotten/stale states, allowing for the training of models capable of accurately identifying the level of decay in fruits and vegetables.

Annotation: Each image in the dataset is carefully labeled with appropriate annotations indicating whether the item is fresh or rotten/stale. This enables supervised learning and facilitates the development of classification models.

High-Quality Images: The dataset includes high-resolution images captured with professional-grade cameras. The images are meticulously edited to ensure clarity and eliminate noise, providing a solid foundation for training reliable classification models.

Large Scale: With thousands of images available, the dataset offers a significant volume of data suitable for training deep learning models. This allows for more extensive training and validation, leading to more robust and accurate classification models.

Potential Applications: The Fresh and Rotten/Stale Fruits and Vegetables Classification Dataset can be employed in a wide range of applications, including:

Food Quality Inspection: The dataset can be used to develop computer vision systems for automated food quality inspection in production lines, enabling rapid identification and removal of rotten/stale produce.

Smart Refrigeration Systems: By integrating the classification models trained on this dataset, smart refrigeration systems can automatically detect and alert users about the freshness of fruits and vegetables, helping to reduce food waste.

Retail and E-commerce: Online grocery stores and retail outlets can utilize the dataset to enhance their product categorization and inventory management systems, ensuring only fresh produce is made available to customers.

Agriculture and Farming: The dataset can aid in the development of computer vision systems for farmers, enabling early detection of spoilage in crops and assisting in timely intervention to minimize losses.

By utilizing the Fresh and Rotten/Stale Fruits and Vegetables Classification Dataset, researchers and developers can advance the field of computer vision, leading to improved food quality assessment, reduced food waste, and enhanced agricultural practices.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

jonsteve (2018). User reviews of 16 movies on Rotten Tomatoes [Dataset]. https://www.kaggle.com/datasets/jonsteve/user-reviews-of-16-movies-on-rotten-tomatoes

User reviews of 16 movies on Rotten Tomatoes

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 11, 2018

Dataset provided by

Kaggle

Authors

jonsteve

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Dataset

This dataset was created by jonsteve

Released under CC0: Public Domain

Clear search

Close search

Google apps

Main menu

User reviews of 16 movies on Rotten Tomatoes

Dataset

Contents

Netflix Series Data Rotten Tomatoes

ULMFiT for Rotten Tomatoes

Context

Approach

Conclusion

Acknowledgements

Datasets for Sentiment Analysis

Metacritic & Rotten Tomatoes Controversial Reviews

Context

Content

Movies rating in 2016,2017

Column Information

Netflix Movies and TV shows

FiveThirtyEight Fandango Dataset

Content

Fandango

Context

Acknowledgements

Pixar Movies

Pixar Movies Dataset

About Pixar Movies

Dataset Information

Data Structure

Dataset Columns

Netflix Movies and TV Shows cleansed

Netflix Ratings 2021

******Movies and Tv shows on Netflix - 2021******

Movie Dataset

Context

Content

Acknowledgements

Cartoon dataset

Fresh and Rotten Classification

User reviews of 16 movies on Rotten Tomatoes

Dataset

Contents

Movies and Tv shows on Netflix - 2021