3 datasets found

o
Filtered IMDb Movies & TV Shows Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Filtered IMDb Movies & TV Shows Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/ca25d396-b298-4765-ab3b-8adf955bfc63
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Entertainment & Media Consumption
Description
This dataset provides detailed information on IMDb movies and television shows, integrating descriptions sourced from Rotten Tomatoes. It contains data for approximately 7800 titles, primarily from the 1990s onwards, and has been filtered to include English language content with specific criteria for ratings and votes. The purpose of this dataset is to facilitate projects involving cross-content analysis, content-based recommendation systems, and genre prediction tasks. It offers a rich resource for understanding entertainment media consumption and developing machine learning applications.

Columns

SNo.: Serial number for each record.

index: An internal index for the record.

tconst: A unique identifier for the title.

titleType: Specifies the type of content, such as 'movie' or 'tvSeries'.

primaryTitle: The most commonly known title for the content.

originalTitle: The official original title of the content.

isAdult?: A boolean indicator for adult content.

startYear: The year the title was released or started.

endYear: The year the title concluded (for TV series) or was released.

runtimeMinutes: The duration of the content in minutes.

Genres: Categories or types of content (multiple values may be present).

Average Rating: The average rating of the title as found on IMDb.

Num. of Votes: The total number of votes received for the rating on IMDb.

Region: The geographic region associated with the title's availability or origin.

Number of Ratings Types: Details related to how ratings are categorised.

Attributes: Additional characteristics or tags associated with the title.

Description: A textual description of the title, likely from Rotten Tomatoes.

Distribution

The dataset comprises approximately 7800 individual movie and TV show records. It is typically provided in a CSV file format. The data has been curated, filtering the original IMDb dataset to focus on content from the 1990s through to 2023. Only titles in English ('en') have been retained, and specific rating and vote thresholds have been applied, such as movies/shows from the 90s-00s with ratings of 7.9 or higher, and those from the 2000s onwards with ratings of 6.5 or higher. Titles from Canada, Greater Britain, India, and the USA are represented.

Usage

This dataset is highly suitable for various analytical and machine learning tasks, including: * Developing content-based recommendation systems using genres, descriptions, and ratings. * Performing exploratory data analysis on movie and TV show trends. * Implementing Natural Language Processing (NLP) techniques on title descriptions for insights or feature extraction. * Executing multi-label classification to predict genres from description data. * Clustering movies and shows based on their descriptions and genre attributes. * Aiding projects that require cross-content analysis across different media types.

Coverage

The dataset primarily covers movies and TV shows released from 1990 to 2023. Geographically, the data includes titles relevant to Canada, Greater Britain, India, and the USA. There is no specific demographic scope mentioned beyond the inclusion of English-language titles. The dataset has specific filtering criteria for data availability based on rating scores and the number of votes, ensuring a focus on well-received or highly-engaged content.

License

CCO

Who Can Use It

This dataset is ideal for: * Data Scientists and Analysts: For conducting exploratory data analysis, building predictive models, and deriving insights into media consumption. * Machine Learning Engineers: For developing and training recommendation engines, NLP models, and classification algorithms. * Researchers: Studying trends in film and television, cross-media analysis, and content categorisation. * Developers: Creating applications that require rich movie and TV show data, such as content discovery platforms. * Academics and Students: For educational purposes, coursework, and research projects in data science, AI, and media studies.

Dataset Name Suggestions

IMDb Films & Shows with Descriptions

Nineties and Beyond IMDb Data

Rotten Tomatoes-IMDb Integrated Dataset

Filtered IMDb Movies & TV Shows

Entertainment Content Analytics Dataset

Attributes

Original Data Source: IMDb Movies/Shows with Descriptions
movie review predictor
kaggle.com
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mwenedata Apotre (2023). movie review predictor [Dataset]. https://www.kaggle.com/datasets/mwenedataapotre/movie-review-predictor
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 31, 2023
Dataset provided by
Kaggle
Authors
Mwenedata Apotre
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This dataset contain around 4000+ rows of movie reviews and non-movie reviews each of which is labelled True if it's a movie review or False if it's not. This can be used in platforms that receive reviews about movies and filter out content that is not a movie review for better experience and ethics.

Data is collected from : imdb yelp amazon

Above that the dataset is also split into test data(20%) and training data(80%).
f
Clustering results on ACM, DBLP, IMDB.
plos.figshare.com
xls
Updated May 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guoyang Tang; Xueyi Zhao; Yanyun Fu; Xiaolin Ning (2024). Clustering results on ACM, DBLP, IMDB. [Dataset]. http://doi.org/10.1371/journal.pone.0297989.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0297989.t002
Dataset updated
May 23, 2024
Dataset provided by
PLOS ONE
Authors
Guoyang Tang; Xueyi Zhao; Yanyun Fu; Xiaolin Ning
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In light of the exponential growth in information volume, the significance of graph data has intensified. Graph clustering plays a pivotal role in graph data processing by jointly modeling the graph structure and node attributes. Notably, the practical significance of multi-view graph clustering is heightened due to the presence of diverse relationships within real-world graph data. Nonetheless, prevailing graph clustering techniques, predominantly grounded in deep learning neural networks, face challenges in effectively handling multi-view graph data. These challenges include the incapability to concurrently explore the relationships between multiple view structures and node attributes, as well as difficulties in processing multi-view graph data with varying features. To tackle these issues, this research proposes a straightforward yet effective multi-view graph clustering approach known as SLMGC. This approach uses graph filtering to filter noise, reduces computational complexity by extracting samples based on node importance, enhances clustering representations through graph contrastive regularization, and achieves the final clustering outcomes using a self-training clustering algorithm. Notably, unlike neural network algorithms, this approach avoids the need for intricate parameter settings. Comprehensive experiments validate the supremacy of the SLMGC approach in multi-view graph clustering endeavors when contrasted with prevailing deep neural network techniques.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Datasimple (2025). Filtered IMDb Movies & TV Shows Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/ca25d396-b298-4765-ab3b-8adf955bfc63

Filtered IMDb Movies & TV Shows Dataset

Explore at:

.undefinedAvailable download formats

Dataset updated

Jul 3, 2025

Dataset authored and provided by

Datasimple

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered

Entertainment & Media Consumption

Description

This dataset provides detailed information on IMDb movies and television shows, integrating descriptions sourced from Rotten Tomatoes. It contains data for approximately 7800 titles, primarily from the 1990s onwards, and has been filtered to include English language content with specific criteria for ratings and votes. The purpose of this dataset is to facilitate projects involving cross-content analysis, content-based recommendation systems, and genre prediction tasks. It offers a rich resource for understanding entertainment media consumption and developing machine learning applications.

Columns

SNo.: Serial number for each record.
index: An internal index for the record.
tconst: A unique identifier for the title.
titleType: Specifies the type of content, such as 'movie' or 'tvSeries'.
primaryTitle: The most commonly known title for the content.
originalTitle: The official original title of the content.
isAdult?: A boolean indicator for adult content.
startYear: The year the title was released or started.
endYear: The year the title concluded (for TV series) or was released.
runtimeMinutes: The duration of the content in minutes.
Genres: Categories or types of content (multiple values may be present).
Average Rating: The average rating of the title as found on IMDb.
Num. of Votes: The total number of votes received for the rating on IMDb.
Region: The geographic region associated with the title's availability or origin.
Number of Ratings Types: Details related to how ratings are categorised.
Attributes: Additional characteristics or tags associated with the title.
Description: A textual description of the title, likely from Rotten Tomatoes.

Distribution

The dataset comprises approximately 7800 individual movie and TV show records. It is typically provided in a CSV file format. The data has been curated, filtering the original IMDb dataset to focus on content from the 1990s through to 2023. Only titles in English ('en') have been retained, and specific rating and vote thresholds have been applied, such as movies/shows from the 90s-00s with ratings of 7.9 or higher, and those from the 2000s onwards with ratings of 6.5 or higher. Titles from Canada, Greater Britain, India, and the USA are represented.

Usage

This dataset is highly suitable for various analytical and machine learning tasks, including: * Developing content-based recommendation systems using genres, descriptions, and ratings. * Performing exploratory data analysis on movie and TV show trends. * Implementing Natural Language Processing (NLP) techniques on title descriptions for insights or feature extraction. * Executing multi-label classification to predict genres from description data. * Clustering movies and shows based on their descriptions and genre attributes. * Aiding projects that require cross-content analysis across different media types.

Coverage

The dataset primarily covers movies and TV shows released from 1990 to 2023. Geographically, the data includes titles relevant to Canada, Greater Britain, India, and the USA. There is no specific demographic scope mentioned beyond the inclusion of English-language titles. The dataset has specific filtering criteria for data availability based on rating scores and the number of votes, ensuring a focus on well-received or highly-engaged content.

License

CCO

Who Can Use It

This dataset is ideal for: * Data Scientists and Analysts: For conducting exploratory data analysis, building predictive models, and deriving insights into media consumption. * Machine Learning Engineers: For developing and training recommendation engines, NLP models, and classification algorithms. * Researchers: Studying trends in film and television, cross-media analysis, and content categorisation. * Developers: Creating applications that require rich movie and TV show data, such as content discovery platforms. * Academics and Students: For educational purposes, coursework, and research projects in data science, AI, and media studies.

Dataset Name Suggestions

IMDb Films & Shows with Descriptions
Nineties and Beyond IMDb Data
Rotten Tomatoes-IMDb Integrated Dataset
Filtered IMDb Movies & TV Shows
Entertainment Content Analytics Dataset

Attributes

Original Data Source: IMDb Movies/Shows with Descriptions

Clear search

Close search

Google apps

Main menu

Filtered IMDb Movies & TV Shows Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

movie review predictor

Clustering results on ACM, DBLP, IMDB.

Filtered IMDb Movies & TV Shows Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes