Facebook
TwitterLarge Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imdb_reviews', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Facebook
TwitterThe "IMDB Dataset of 50K Movie Reviews" dataset is a tabular dataset with listings for 50k reviews from IMDB. There are two fields: "review", containing the review text, and "sentiment", containing either the value "positive" or the value "negative".
Using HQ Data Profiler, data quality issues in the original dataset were identified and fixed and this CLEANED version prepared.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F29643712%2Fff70cdf355229a9160466f64a0816b4e%2FIMDB%20Promo.png?generation=1762216952842160&alt=media" alt="Data quality improvements">
HQ Data Profiler's comprehensive profile report showed that the original dataset contained 418 duplicated "review" values. All rows with duplicated review values were removed. The dataset was then balanced by randomly removing rows in the more populated sentiment category. Result: 24698 "positive" and 24698 "negative" reviews, with no duplicates.
Original dataset link (uncleaned): https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
Dataset citation ( https://ai.stanford.edu/~amaas/data/sentiment/ ): @InProceedings{maas-EtAl:2011:ACL-HLT2011, author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher}, title = {Learning Word Vectors for Sentiment Analysis}, booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies}, month = {June}, year = {2011}, address = {Portland, Oregon, USA}, publisher = {Association for Computational Linguistics}, pages = {142--150}, url = {http://www.aclweb.org/anthology/P11-1015} }
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Facebook
TwitterBy Jared Fernandez [source]
This dataset contains a collection of Word2Vec embeddings for nearly 12,000 reviews from movies and other films. These embeddings allow the reviews to be represented in a meaningful way, providing insight into topics and trends present in the reviews. By utilizing this source of data, researchers can gain better understanding of language patterns that appear across various types of movie reviews. Additionally, models with these embeddings can be used to help create/improve models for sentiment analysis and other natural language processing tasks. Each row includes the reviewer's unique ID along with their review text and related word2vec embedding representing textual relationships found therein
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use this Dataset:
- Download the dataset ‘Movie Reviews Word2Vec Embeddings’ from Kaggle.
- This dataset contains an embedding type of word2vec, which is a type of neural network that creates high-dimensional vector representations of words based on their context in a training corpus.
- Before making use of these embeddings, it’s important to understand what they are representing and how you can match them with other datasets for analysis purposes. The word2vec embeddings contain two columns – word (the specific word), and vec (the vector representation associated with that particular word).
- To leverage the data from this text corpus effectively, it is important to first extract meaningful information out of them such as sentiment ratings or determining various topics that appears more frequently in movie reviews etc.. Sorting through millions of reviews will require automated processing – either by leveraging machine learning algorithms or using natural language processing to determine sentiment polarities and extracting relevant keywords/topics for each review.
- You can also use the pre-processed Word Vectors (embeddings) along with supervised or unsupervised approaches available like Logistic Regression, BERT models etc.. to create features such as sentiment scoring or topic modelling - classifying texts into distinct categories etc.. That may be useful while doing some predictive analysis such as predicting movie ratings based on user reviews etc..
6 Once you have made use of the pre-processed data from this dataset, you can extend your model's performance further by having better understanding about how those words relate one another using the vectors derived from thems (i.e., Cosine Similarity measurement) which shows relatedness between words thus providing additional insights about relationships among different text fragments or paragraphs in documents eventually helping your model understand better contextual relationships while performing analytics tasks on text corpora involving movie reviews data!
- Automatically clustering movies with similar sentiment and themes.
- Automatically generating movie plot summaries based on sentiment analysis of reviews.
- Developing a movie recommendation system based on users’ preference in different genres or topics related to the movie in question
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Jared Fernandez.
Facebook
Twitterhttps://ai.stanford.edu/~amaas/data/sentimenthttps://ai.stanford.edu/~amaas/data/sentiment
The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The providers also include an additional 50,000 unlabeled documents for unsupervised learning.
The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset also contains an additional 50,000 unlabeled documents for unsupervised learning. See the README file contained in the release for more details.
The data is split into a train (25k reviews) and test (25k reviews) set. A preview file cannot be provided - please download the data directly from the data provider's website.
When using the dataset, please cite: Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Yueming
Released under Database: Open Database, Contents: Database Contents
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.
This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.
Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.
Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more
Train LLMs or chatbots on cinematic language and metadata
Build or enrich movie recommendation engines
Run cross-lingual or multi-region film analytics
Benchmark genre popularity across time periods
Power academic studies or entertainment dashboards
Feed into knowledge graphs, search engines, or NLP pipelines
Facebook
TwitterThis Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains detailed information about movies listed on IMDb, including titles, genres, release dates, and ratings. It also includes user reviews and ratings, making it an excellent resource for sentiment analysis and trend analysis in the movie industry. This dataset can be used to gain insights into movie trends, audience preferences, and the correlation between movie attributes and ratings. The second file has additional feature called poster_src which is a link Movies poster image. The second is bigger than the first file and has a wider range of moives.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
The TMDb (The Movie Database) is a comprehensive movie database that provides information about movies, including details like titles, ratings, release dates, revenue, genres, and much more.
This dataset contains a collection of 1,000,000 movies from the TMDB database.
Dataset is updated daily. If you find this dataset valuable, don't forget to hit the upvote button! 😊💝
Clash of Clans Clans Dataset 2023 (3.5M Clans)
Black-White Wage Gap in the USA Dataset
USA Unemployment Rates by Demographics & Race
Photo by Onur Binay on Unsplash
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset is used in the paper entitled "The Sentiment Analysis of Spider-Man: No Way Home Film Based on IMDb Reviews". Download full paper at http://jurnal.iaii.or.id/index.php/RESTI/article/view/3851.
Facebook
TwitterBy Himanshu Sekhar Paul [source]
This inspiring IMDB Movie Dataset is a comprehensive database of movie ratings, featuring director_name, duration, actor_2_name, genres, actor_1_name, movie title and more. Whether you're a fan of dramatic thrillers or nostalgic '90s classics from our childhoods; here you'll find information about the most voted movies from users across the world. Delve into num_voted_users trends and discover the language each movie was released in to craft your very own personal film library of country-specific titles released in any given year. With this dataset at your disposal comparing imdb scores will never be easier! Who will come out top when the votes have been tallied? Dive into data for a journey unparalleled!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset offers a comprehensive overview of the movie ratings from IMDB. It includes data about director name, duration, actors, genres, movie title, number of votes, language, country of origin, year released and IMDB score.
To use this dataset to get a deeper understanding of how movies are rated on IMDB you can take the following steps:
- Look through each column of the data to get an overall understanding. This will help you identify any specific trends or correlations in the data that you can then analyze further in later steps.
- Take some time to explore relationships between different columns such as 'Number Voted Users' and 'IMDB Score' – it could be interesting to look at how these numbers relate with each other in order better understan rating trends on IMDB?
- Analyze how particular sub-groups perform within various categories such as genre or country; this could provide insight into preferences towards certain types of movies or countries with higher associated scores than others?
- Through your analysis try and gain answers to questions related to specific demographic groups on IMDB – are there distinct preferences among age groups when it comes to what they watch? Are there any clear correlations between rating and genre within certain countries? etc…
By utilizing the questions above and taking an initial 'big picture' view before diving into more detailed analysis users should be able find value from this dataset by uncovering useful insights about movie ratings on IMDB!
- Movie Recommendation System: The dataset can be used to build a movie recommendation system using machine learning algorithms like k-nearest neighbors or collaborative filtering. Based on the user's past ratings, the system can suggest relevant movies with similar genres, actors and directors.
- Movie Popularity Index: Using the data, a metric could be designed that provides an overall popularity index for movies released over the years. This index could be constructed by considering factors such as IMDb score, number of votes and reviews collected, etc..
- Genre-based Over/Under Performance Analysis: Based on genre selections in each movie year, this dataset can provide insight into which genres are performing well and which are not. This kind of analysis could help form important decisioning when deciding to allocate resources towards production budgeting or marketing campaigns for upcoming films in different genres across different regions or markets
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: movie_data.csv | Column name | Description | |:-------------------------|:---------------------------------------------------| | director_name | Name of the director of the movie. (String) | | duration | Length of the movie in minutes. (Integer) | | actor_2_name | Name of the second actor in the movie. (String) | | genres | Genre of the movie. (String) | | actor_1_name | Name of the first actor in the movie. (String) | | movie_title | Title of the movie. (String) | | num_voted_users | Number of users who voted for the movie. (Integer) | | actor_3_name | Name of the third actor in the movie. (String) | | movie_imdb_link | Link to the movie's IMDB page. (String) | | num_user_for_reviews |...
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The "Real Movies Dataset" offers a comprehensive repository of diverse movie information, facilitating in-depth analysis and meaningful comparisons across various cinematic attributes. With its wealth of key details, this dataset serves as an invaluable resource for researchers, enthusiasts, and industry professionals alike.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F18544731%2Fbfb64d5c16fa1164befbde46928b7f83%2FMovies%20Kaggle.jpg?generation=1707490228580924&alt=media" alt="">
Each entry in the dataset includes the following attributes:
* Movie Name: The title of the movie.
* Year of Release: The year in which the movie was officially released to the public.
* Watch Time: The duration of the movie in terms of hours and minutes, indicating the length of time required to watch the entire film.
* Movie Rating: This refers to the rating assigned to the movie based on various criteria such as content, suitability for different age groups, and overall quality. Ratings could be numerical (e.g., out of 10).
* Meatscore of Movie: This is a unique metric that represents the "meatiness" or substance of the movie. It might be a score assigned based on the complexity of the plot, character development, thematic depth, or other qualitative aspects.
* Votes: The number of votes or ratings received by the movie from viewers or critics. This metric provides an indication of the movie's popularity or reception.
* Gross: The total box office gross earnings generated by the movie, typically measured in a specific currency (e.g., USD). This metric reflects the commercial success of the film.
* Description: The dataset includes a brief description field providing a summary or overview of the movie's plot, genre, themes, or notable aspects. This description offers context and insight into the content and style of each film, aiding in understanding and analysis.
Overall, the "Real Movies Dataset" serves as a valuable resource for researchers, analysts, and enthusiasts interested in exploring and studying the dynamics of the film industry, including trends in movie production, audience preferences, and financial performance.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
For details about the scraping process, explore the complete code repository on GitHub.
This dataset provides annual data for the most popular 500–600 movies per year from 1920 to 2025, extracted from IMDb. It includes over 60,000 movies, spanning more than 100 years of cinematic history. Each year’s data is divided into three CSV files for flexibility and ease of use:
- imdb_movies_[year].csv: Basic movie details.
- advanced_movies_details_[year].csv: Comprehensive metadata and financial details.
- merged_movies_data_[year].csv: A unified dataset combining both files.
imdb_movies_[year].csvEssential movie information, including:
- Title: Movie title.
- Description: Movie Description.
- méta_score: IMDB's meta score.
- Movie Link: IMDb URL for the movie.
- Year: Year of release.
- Duration: Runtime (in minutes).
- MPA: Motion Picture Association rating (e.g., PG, R).
- Rating: IMDb rating (scale of 1–10).
- Votes: Total user votes on IMDb.
advanced_movies_details_[year].csvDetailed movie metadata:
- Link: IMDb URL (for linking with other data).
- budget: Production budget (in USD).
- grossWorldWide: Global box office revenue.
- gross_US_Canada: North American box office earnings.
- opening_weekend_Gross: Opening weekend revenue.
- directors: List of directors.
- writers: List of writers.
- stars: Main cast members.
- genres: Movie genres.
- countries_origin: Countries of production.
- filming_locations: Primary filming locations.
- production_companies: Associated production companies.
- Languages: Languages spoken in the movie.
- Award_information: Information about awards, nominations and wins.
- release_date: Official release date.
merged_movies_data_[year].csvA unified dataset combining all columns from the previous two files:
- Basic Details: Title, Year, Rating, Votes.
- Advanced Features: budget, grossWorldWide, directors, genres, and awards.
Template Columns:
- imdb_movies_[year].csv:
Title, Year, Duration, MPA, Rating, Votes, meta_score, description, Movie Link
advanced_movies_details_[year].csv:
link, writers, directors, stars, budget, opening_weekend_Gross, grossWorldWide, gross_US_Canada, release_date, countries_origin, filming_locations, production_company, awards_content, genres, Languages
merged_movies_data_[year].csv:
Title, Year, Duration, MPA, Rating, Votes, meta_score, description, Movie Link, writers, directors, stars, budget, opening_weekend_Gross, grossWorldWide, gross_US_Canada, release_date, countries_origin, filming_locations, production_company, awards_content, genres, Languages
The dataset is updated annually in December to include the latest data.
This dataset is ideal for:
- Trend Analysis: Explore changes in the movie industry over six decades.
- Predictive Modeling: Build models to forecast box office revenue, ratings, or awards.
- Recommendation Systems: Use attributes like genres, cast, and ratings for personalized recommendations.
- Comparative Analysis: Study differences across eras, genres, or regions.
Facebook
TwitterRSVP Movies is an Indian film production company which has produced many super-hit movies. They have usually released movies for the Indian audience but for their next project, they are planning to release a movie for the global audience in 2022.
The production company wants to plan their every move analytically based on data. We have taken the last three years IMDB movies data and carried out the analysis using SQL. We have analysed the data set and drew meaningful insights that could help them start their new project.
For our convenience, the entire analytics process has been divided into four segments, where each segment leads to significant insights from different combinations of tables. The questions in each segment with business objectives are written in the script given below. We have written the solution code below every question.
Facebook
TwitterDescription:
This dataset contains movie reviews and their sentiment labels. All text were scraped from Internet from various websites in 2020. Reviews are available in few languages: cs, de, es, fr, pl, sk. Split into training and testing data is provided. There are three sentiment labels:
- pos - for positive sentiment,
- neg - for negative sentiment,
- n\a - not assigned, can be used for some unsupervised learning.
Distribution of training data:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4135817%2F5035e68ab296b928f1511957cd2052fa%2Ftraining.png?generation=1675604158298685&alt=media" alt="">
Distribution of testing data:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4135817%2Ffccbe9806c21850cab6f4d9fe035ff5e%2Ftesting.png?generation=1675604176597583&alt=media" alt="">
License and copyright: The Movie Reviews Dataset is distributed under the CC BY-NC 4.0. The copyright remains with the original owners of the texts.
Notice and take down policy: Should you consider that data contains material that is owned by you and should therefore not be reproduced here, please: - Identify yourself, with contact data such as an email address at which you can be contacted. - Identify the copyrighted work claimed to be infringed. - Identify the material that is claimed to be infringing and information reasonably sufficient to allow me to locate the material. - Send the request to me.
I will comply to legitimate requests by removing the affected sources from the corpus.
I've collected these reviews for scientific purposes. It has been more than 2 years since publication date of any of these reviews. That's why I've decided to share this collection. This way other people will also be able to use it for educational purposes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Football Game Film Angle is a dataset for classification tasks - it contains Film Angles annotations for 595 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains fMRI movie-watching and category localiser data in 28 developmental prosopagnosics and 45 neurologically healthy controls. Participants are additionally grouped by their familiarity with the Game of Thrones television series.
In movie-watching scans, participants passively viewed a series of short audiovisual clips (ranging from 50 to 117 s duration; total duration = 12 min 58 s) taken from the Game of Thrones television series.
In category localiser scans, participants viewed images of faces, scenes, and phase scrambled versions of the face images. These can be used to define face and scene selective regions of interest.
Please refer to the folloiwng paper when using this dataset:
Noad, K., Watson, D.M., Andrews, T.J. (In review). Natural viewing reveals an extended network of regions for familiar faces that is disrupted in developmental prosopagnosia.
participants.tsv - List of subject IDs in control and developmental
prosopagnosic groups, along with whether they were familiar or unfamiliar
with Game of Thrones.
slice_timings.tsv, fsl_slice_timings.txt - Slice timings for functional
scans. The TSV file gives the times in milliseconds, and the text file gives
the times in normalised units of the TR suitable for entering into FEAT.
Scans were acquired with the HCP/CMRR multiband sequence. More information on slices timings can be found at: https://wiki.humanconnectome.org/download/attachments/40534057/CMRR_MB_Slice_Order.pdf
behavioural_measures.tsv - Scores on PI20, CFMT, and Game of Thrones quiz
tasks (see below for more details). PI20 scores are out of 100. CFMT scores
are given as percentage accuracies. Quiz scores are given as percentage
accuracies over all questions as well as broken down by face, scene, and
narrative questions.
Subject Directories - MRI data directories for each subject:
anat - T1 anatomical imagesfmap - Magnitude and phase difference fieldmap imagesfunc - Movie-watching (Game of Thrones) and category localiser dataWe provide two measures of face processing ability (PI20 and CFMT) and a quiz assessing familiarity with the Game of Thrones TV series. All participants completed the Game of Thrones quiz, and all developmental prosopagnosics completed the PI20 and CFMT assessments. Approximately half of the control subjects also completed the CFMT.
PI20 - 20-item prosopagnosia index, used as initial screening for
developmental prosopagnosia. All developmental prosopagnosic participants
comleted this.
Reference: Shah et al. (2015), Royal Society Open Science, 2(150305), 1-6.
CFMT - Cambridge Face Memory Test, used as secondary screening for
developmental prosopagnosia. All developmentral prosopagnosic participants
and approximately half of the control participants completed this.
Reference: Duchaine & Nakayama (2006), Neuropsychologia, 44(4), 576-585.
Game of Thrones Quiz - We developed this quiz to assess familiarity with
the Game of Thrones television series. All participants completed this quiz.
The quiz comprised 3 types of questions:
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F0411cd02654d97cd74132c69908feae3%2FMEGAPACK3A.png?generation=1721222705178453&alt=media" alt="">
The MOTHER OF ALL MOVIE REVIEW DATASETS for all your NLP, research, and learning needs!
I wrote my own scripts to get data from Rotten Tomatoes
Generated with Bing Image Generator
I'm looking forward to the community creating and generating analyses, content, and insights from this MOTHER OF ALL MOVIE REVIEW DATASETS! @bwandowando
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This data repository can be used as benchmark data for the purpose of material characterization, particularly for investigating nanostructures and chemical properties in materials using SEHI (Secondary Electron Hyperspectral Imaging), as well as research in Scanning Electron Microscopy and Secondary Electron (SE) spectroscopy, and advanced image processing and data analysis (computer vision and machine learning) techniques.This work is supported by the UK EPSRC EP/V012126/1 the grant ‘‘SEE MORE, MAKE MORE: Secondary Electron Energy Measurement Optimisation for Reliable Manufacturing of Key Materials’’. Contact: SM3 (SEE MORE MAKE MORE) project PI, Professor Cornelia Rodenburg, c.rodenburg@shefield.ac.uk.We also acknowledge the support from Insigneo Institute for In Silico Medicine in Sheffield.The complex metal alloy (palladium silver, abbreviated as Pd-Ag) and carbon films were printed by University of Liverpool, and a Helios Nanolab G3 UC microscope was used to acquire the raw image stacks [1]. One can find more information from [1] regarding the sample preparation, and experimental conditions. This dataset contains four processed SEHI stacks (cropped and aligned) collected from different regions of interest, and the associated metadata.[1] Abrams, K.J., Dapor, M., Stehling, N., Azzolini, M., Kyle, S.J., Schäfer, J., Quade, A., Mika, F., Kratky, S., Pokorna, Z., et al., 2019. Making sense of complex carbon and metal/carbon systems by secondary electron hyperspectral imaging. Advanced Science 6, 1900719.
Facebook
TwitterLarge Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imdb_reviews', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.