3 datasets found
  1. o

    Filtered IMDb Movies & TV Shows Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Filtered IMDb Movies & TV Shows Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/ca25d396-b298-4765-ab3b-8adf955bfc63
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Entertainment & Media Consumption
    Description

    This dataset provides detailed information on IMDb movies and television shows, integrating descriptions sourced from Rotten Tomatoes. It contains data for approximately 7800 titles, primarily from the 1990s onwards, and has been filtered to include English language content with specific criteria for ratings and votes. The purpose of this dataset is to facilitate projects involving cross-content analysis, content-based recommendation systems, and genre prediction tasks. It offers a rich resource for understanding entertainment media consumption and developing machine learning applications.

    Columns

    • SNo.: Serial number for each record.
    • index: An internal index for the record.
    • tconst: A unique identifier for the title.
    • titleType: Specifies the type of content, such as 'movie' or 'tvSeries'.
    • primaryTitle: The most commonly known title for the content.
    • originalTitle: The official original title of the content.
    • isAdult?: A boolean indicator for adult content.
    • startYear: The year the title was released or started.
    • endYear: The year the title concluded (for TV series) or was released.
    • runtimeMinutes: The duration of the content in minutes.
    • Genres: Categories or types of content (multiple values may be present).
    • Average Rating: The average rating of the title as found on IMDb.
    • Num. of Votes: The total number of votes received for the rating on IMDb.
    • Region: The geographic region associated with the title's availability or origin.
    • Number of Ratings Types: Details related to how ratings are categorised.
    • Attributes: Additional characteristics or tags associated with the title.
    • Description: A textual description of the title, likely from Rotten Tomatoes.

    Distribution

    The dataset comprises approximately 7800 individual movie and TV show records. It is typically provided in a CSV file format. The data has been curated, filtering the original IMDb dataset to focus on content from the 1990s through to 2023. Only titles in English ('en') have been retained, and specific rating and vote thresholds have been applied, such as movies/shows from the 90s-00s with ratings of 7.9 or higher, and those from the 2000s onwards with ratings of 6.5 or higher. Titles from Canada, Greater Britain, India, and the USA are represented.

    Usage

    This dataset is highly suitable for various analytical and machine learning tasks, including: * Developing content-based recommendation systems using genres, descriptions, and ratings. * Performing exploratory data analysis on movie and TV show trends. * Implementing Natural Language Processing (NLP) techniques on title descriptions for insights or feature extraction. * Executing multi-label classification to predict genres from description data. * Clustering movies and shows based on their descriptions and genre attributes. * Aiding projects that require cross-content analysis across different media types.

    Coverage

    The dataset primarily covers movies and TV shows released from 1990 to 2023. Geographically, the data includes titles relevant to Canada, Greater Britain, India, and the USA. There is no specific demographic scope mentioned beyond the inclusion of English-language titles. The dataset has specific filtering criteria for data availability based on rating scores and the number of votes, ensuring a focus on well-received or highly-engaged content.

    License

    CCO

    Who Can Use It

    This dataset is ideal for: * Data Scientists and Analysts: For conducting exploratory data analysis, building predictive models, and deriving insights into media consumption. * Machine Learning Engineers: For developing and training recommendation engines, NLP models, and classification algorithms. * Researchers: Studying trends in film and television, cross-media analysis, and content categorisation. * Developers: Creating applications that require rich movie and TV show data, such as content discovery platforms. * Academics and Students: For educational purposes, coursework, and research projects in data science, AI, and media studies.

    Dataset Name Suggestions

    • IMDb Films & Shows with Descriptions
    • Nineties and Beyond IMDb Data
    • Rotten Tomatoes-IMDb Integrated Dataset
    • Filtered IMDb Movies & TV Shows
    • Entertainment Content Analytics Dataset

    Attributes

    Original Data Source: IMDb Movies/Shows with Descriptions

  2. movie review predictor

    • kaggle.com
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mwenedata Apotre (2023). movie review predictor [Dataset]. https://www.kaggle.com/datasets/mwenedataapotre/movie-review-predictor
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 31, 2023
    Dataset provided by
    Kaggle
    Authors
    Mwenedata Apotre
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset contain around 4000+ rows of movie reviews and non-movie reviews each of which is labelled True if it's a movie review or False if it's not. This can be used in platforms that receive reviews about movies and filter out content that is not a movie review for better experience and ethics.

    Data is collected from : imdb yelp amazon

    Above that the dataset is also split into test data(20%) and training data(80%).

  3. f

    Clustering results on ACM, DBLP, IMDB.

    • plos.figshare.com
    xls
    Updated May 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guoyang Tang; Xueyi Zhao; Yanyun Fu; Xiaolin Ning (2024). Clustering results on ACM, DBLP, IMDB. [Dataset]. http://doi.org/10.1371/journal.pone.0297989.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 23, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Guoyang Tang; Xueyi Zhao; Yanyun Fu; Xiaolin Ning
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In light of the exponential growth in information volume, the significance of graph data has intensified. Graph clustering plays a pivotal role in graph data processing by jointly modeling the graph structure and node attributes. Notably, the practical significance of multi-view graph clustering is heightened due to the presence of diverse relationships within real-world graph data. Nonetheless, prevailing graph clustering techniques, predominantly grounded in deep learning neural networks, face challenges in effectively handling multi-view graph data. These challenges include the incapability to concurrently explore the relationships between multiple view structures and node attributes, as well as difficulties in processing multi-view graph data with varying features. To tackle these issues, this research proposes a straightforward yet effective multi-view graph clustering approach known as SLMGC. This approach uses graph filtering to filter noise, reduces computational complexity by extracting samples based on node importance, enhances clustering representations through graph contrastive regularization, and achieves the final clustering outcomes using a self-training clustering algorithm. Notably, unlike neural network algorithms, this approach avoids the need for intricate parameter settings. Comprehensive experiments validate the supremacy of the SLMGC approach in multi-view graph clustering endeavors when contrasted with prevailing deep neural network techniques.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Datasimple (2025). Filtered IMDb Movies & TV Shows Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/ca25d396-b298-4765-ab3b-8adf955bfc63

Filtered IMDb Movies & TV Shows Dataset

Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered
Entertainment & Media Consumption
Description

This dataset provides detailed information on IMDb movies and television shows, integrating descriptions sourced from Rotten Tomatoes. It contains data for approximately 7800 titles, primarily from the 1990s onwards, and has been filtered to include English language content with specific criteria for ratings and votes. The purpose of this dataset is to facilitate projects involving cross-content analysis, content-based recommendation systems, and genre prediction tasks. It offers a rich resource for understanding entertainment media consumption and developing machine learning applications.

Columns

  • SNo.: Serial number for each record.
  • index: An internal index for the record.
  • tconst: A unique identifier for the title.
  • titleType: Specifies the type of content, such as 'movie' or 'tvSeries'.
  • primaryTitle: The most commonly known title for the content.
  • originalTitle: The official original title of the content.
  • isAdult?: A boolean indicator for adult content.
  • startYear: The year the title was released or started.
  • endYear: The year the title concluded (for TV series) or was released.
  • runtimeMinutes: The duration of the content in minutes.
  • Genres: Categories or types of content (multiple values may be present).
  • Average Rating: The average rating of the title as found on IMDb.
  • Num. of Votes: The total number of votes received for the rating on IMDb.
  • Region: The geographic region associated with the title's availability or origin.
  • Number of Ratings Types: Details related to how ratings are categorised.
  • Attributes: Additional characteristics or tags associated with the title.
  • Description: A textual description of the title, likely from Rotten Tomatoes.

Distribution

The dataset comprises approximately 7800 individual movie and TV show records. It is typically provided in a CSV file format. The data has been curated, filtering the original IMDb dataset to focus on content from the 1990s through to 2023. Only titles in English ('en') have been retained, and specific rating and vote thresholds have been applied, such as movies/shows from the 90s-00s with ratings of 7.9 or higher, and those from the 2000s onwards with ratings of 6.5 or higher. Titles from Canada, Greater Britain, India, and the USA are represented.

Usage

This dataset is highly suitable for various analytical and machine learning tasks, including: * Developing content-based recommendation systems using genres, descriptions, and ratings. * Performing exploratory data analysis on movie and TV show trends. * Implementing Natural Language Processing (NLP) techniques on title descriptions for insights or feature extraction. * Executing multi-label classification to predict genres from description data. * Clustering movies and shows based on their descriptions and genre attributes. * Aiding projects that require cross-content analysis across different media types.

Coverage

The dataset primarily covers movies and TV shows released from 1990 to 2023. Geographically, the data includes titles relevant to Canada, Greater Britain, India, and the USA. There is no specific demographic scope mentioned beyond the inclusion of English-language titles. The dataset has specific filtering criteria for data availability based on rating scores and the number of votes, ensuring a focus on well-received or highly-engaged content.

License

CCO

Who Can Use It

This dataset is ideal for: * Data Scientists and Analysts: For conducting exploratory data analysis, building predictive models, and deriving insights into media consumption. * Machine Learning Engineers: For developing and training recommendation engines, NLP models, and classification algorithms. * Researchers: Studying trends in film and television, cross-media analysis, and content categorisation. * Developers: Creating applications that require rich movie and TV show data, such as content discovery platforms. * Academics and Students: For educational purposes, coursework, and research projects in data science, AI, and media studies.

Dataset Name Suggestions

  • IMDb Films & Shows with Descriptions
  • Nineties and Beyond IMDb Data
  • Rotten Tomatoes-IMDb Integrated Dataset
  • Filtered IMDb Movies & TV Shows
  • Entertainment Content Analytics Dataset

Attributes

Original Data Source: IMDb Movies/Shows with Descriptions

Search
Clear search
Close search
Google apps
Main menu