CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides detailed information on IMDb movies and television shows, integrating descriptions sourced from Rotten Tomatoes. It contains data for approximately 7800 titles, primarily from the 1990s onwards, and has been filtered to include English language content with specific criteria for ratings and votes. The purpose of this dataset is to facilitate projects involving cross-content analysis, content-based recommendation systems, and genre prediction tasks. It offers a rich resource for understanding entertainment media consumption and developing machine learning applications.
The dataset comprises approximately 7800 individual movie and TV show records. It is typically provided in a CSV file format. The data has been curated, filtering the original IMDb dataset to focus on content from the 1990s through to 2023. Only titles in English ('en') have been retained, and specific rating and vote thresholds have been applied, such as movies/shows from the 90s-00s with ratings of 7.9 or higher, and those from the 2000s onwards with ratings of 6.5 or higher. Titles from Canada, Greater Britain, India, and the USA are represented.
This dataset is highly suitable for various analytical and machine learning tasks, including: * Developing content-based recommendation systems using genres, descriptions, and ratings. * Performing exploratory data analysis on movie and TV show trends. * Implementing Natural Language Processing (NLP) techniques on title descriptions for insights or feature extraction. * Executing multi-label classification to predict genres from description data. * Clustering movies and shows based on their descriptions and genre attributes. * Aiding projects that require cross-content analysis across different media types.
The dataset primarily covers movies and TV shows released from 1990 to 2023. Geographically, the data includes titles relevant to Canada, Greater Britain, India, and the USA. There is no specific demographic scope mentioned beyond the inclusion of English-language titles. The dataset has specific filtering criteria for data availability based on rating scores and the number of votes, ensuring a focus on well-received or highly-engaged content.
CCO
This dataset is ideal for: * Data Scientists and Analysts: For conducting exploratory data analysis, building predictive models, and deriving insights into media consumption. * Machine Learning Engineers: For developing and training recommendation engines, NLP models, and classification algorithms. * Researchers: Studying trends in film and television, cross-media analysis, and content categorisation. * Developers: Creating applications that require rich movie and TV show data, such as content discovery platforms. * Academics and Students: For educational purposes, coursework, and research projects in data science, AI, and media studies.
Original Data Source: IMDb Movies/Shows with Descriptions
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset contain around 4000+ rows of movie reviews and non-movie reviews each of which is labelled True if it's a movie review or False if it's not. This can be used in platforms that receive reviews about movies and filter out content that is not a movie review for better experience and ethics.
Data is collected from : imdb yelp amazon
Above that the dataset is also split into test data(20%) and training data(80%).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In light of the exponential growth in information volume, the significance of graph data has intensified. Graph clustering plays a pivotal role in graph data processing by jointly modeling the graph structure and node attributes. Notably, the practical significance of multi-view graph clustering is heightened due to the presence of diverse relationships within real-world graph data. Nonetheless, prevailing graph clustering techniques, predominantly grounded in deep learning neural networks, face challenges in effectively handling multi-view graph data. These challenges include the incapability to concurrently explore the relationships between multiple view structures and node attributes, as well as difficulties in processing multi-view graph data with varying features. To tackle these issues, this research proposes a straightforward yet effective multi-view graph clustering approach known as SLMGC. This approach uses graph filtering to filter noise, reduces computational complexity by extracting samples based on node importance, enhances clustering representations through graph contrastive regularization, and achieves the final clustering outcomes using a self-training clustering algorithm. Notably, unlike neural network algorithms, this approach avoids the need for intricate parameter settings. Comprehensive experiments validate the supremacy of the SLMGC approach in multi-view graph clustering endeavors when contrasted with prevailing deep neural network techniques.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides detailed information on IMDb movies and television shows, integrating descriptions sourced from Rotten Tomatoes. It contains data for approximately 7800 titles, primarily from the 1990s onwards, and has been filtered to include English language content with specific criteria for ratings and votes. The purpose of this dataset is to facilitate projects involving cross-content analysis, content-based recommendation systems, and genre prediction tasks. It offers a rich resource for understanding entertainment media consumption and developing machine learning applications.
The dataset comprises approximately 7800 individual movie and TV show records. It is typically provided in a CSV file format. The data has been curated, filtering the original IMDb dataset to focus on content from the 1990s through to 2023. Only titles in English ('en') have been retained, and specific rating and vote thresholds have been applied, such as movies/shows from the 90s-00s with ratings of 7.9 or higher, and those from the 2000s onwards with ratings of 6.5 or higher. Titles from Canada, Greater Britain, India, and the USA are represented.
This dataset is highly suitable for various analytical and machine learning tasks, including: * Developing content-based recommendation systems using genres, descriptions, and ratings. * Performing exploratory data analysis on movie and TV show trends. * Implementing Natural Language Processing (NLP) techniques on title descriptions for insights or feature extraction. * Executing multi-label classification to predict genres from description data. * Clustering movies and shows based on their descriptions and genre attributes. * Aiding projects that require cross-content analysis across different media types.
The dataset primarily covers movies and TV shows released from 1990 to 2023. Geographically, the data includes titles relevant to Canada, Greater Britain, India, and the USA. There is no specific demographic scope mentioned beyond the inclusion of English-language titles. The dataset has specific filtering criteria for data availability based on rating scores and the number of votes, ensuring a focus on well-received or highly-engaged content.
CCO
This dataset is ideal for: * Data Scientists and Analysts: For conducting exploratory data analysis, building predictive models, and deriving insights into media consumption. * Machine Learning Engineers: For developing and training recommendation engines, NLP models, and classification algorithms. * Researchers: Studying trends in film and television, cross-media analysis, and content categorisation. * Developers: Creating applications that require rich movie and TV show data, such as content discovery platforms. * Academics and Students: For educational purposes, coursework, and research projects in data science, AI, and media studies.
Original Data Source: IMDb Movies/Shows with Descriptions