This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper:
https://brightdata.com/licensehttps://brightdata.com/license
Utilize our Amazon reviews dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset can aid in understanding customer behavior, product performance, and market trends, empowering organizations to refine their product and marketing strategies. Access the entire dataset or tailor a subset to fit your requirements. Popular use cases include: Product Performance Analysis: Analyze Amazon reviews to assess product performance, uncovering customer satisfaction levels, common issues, and highly praised features to inform product improvements and marketing messages. Customer Behavior Insights: Gain insights into customer behavior, purchasing patterns, and preferences, enabling more personalized marketing and product recommendations. Demand Forecasting: Leverage Amazon reviews to predict future product demand by analyzing historical review data and identifying trends, helping to optimize inventory management and sales strategies. Accessing and analyzing the Amazon reviews dataset supports market strategy optimization by leveraging insights to analyze key market trends and customer preferences, enhancing overall business decision-making.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Air travel is one of the most used ways of transit in our daily lives. So it's no wonder that more and more people are sharing their experiences with airlines and airports using web-based online surveys. This dataset aims to do topic modeling and sentiment analysis on Skytrax (airlinequality.com) and Tripadvisor (tripadvisor.com) postings where there is a lot of interest and engagement from people who have used it or want to use it for airlines.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 4 million synthetic e-commerce product reviews across 8 popular categories, including:
Each row includes:
- product_id
: Synthetic product identifier
- product_title
: Product name (e.g., “Wireless Bluetooth Earbuds”)
- category
: One of 8 categories
- review_text
: Realistic user review
- rating
: Integer (1 to 5 stars)
- sentiment
: Sentiment derived from review text (Positive, Neutral, Negative)
CSV format (UTF-8 encoded)
Public Domain – CC0 1.0 Universal
Yearly data of Quality Review ratings from 2005 to 2017
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides user reviews for ChatGPT, offering valuable qualitative feedback, satisfaction ratings, and submission dates. It captures a diverse array of user sentiments, from concise remarks to more detailed feedback. The ratings are provided on a scale of 1 to 5, indicating different levels of user satisfaction. The dataset spans several months, which allows for temporal analysis of sentiment trends, as each review includes a timestamp. This data is ideal for gaining insights into user characteristics and for improving application features and services.
The dataset is provided as a free resource. While a sample file will be updated separately to the platform, the data quality is assessed as 5 out of 5, with the current version being 1.0. It was listed on 08/06/2025, with 1 view and 0 downloads recorded so far. The dataset contains approximately 193,154 unique reviews.
This dataset is particularly useful for various analytical applications, including: * Sentiment Analysis: Developing models to predict the emotional tone or sentiment conveyed in user reviews. * Customer Feedback Analysis: Extracting actionable insights that can inform and guide improvements to application features and services. * Review Classification: Building machine learning models to categorise user reviews, for instance, as positive or negative. * Data Visualisation: Creating visual representations of review patterns and trends. * Exploratory Data Analysis: Investigating the characteristics and underlying patterns within the review data. * Natural Language Processing (NLP): Applying NLP techniques to understand and process the textual feedback. * Text Mining: Discovering patterns and insights from the large collection of text reviews. * Time-Series Analysis: Examining how sentiment and ratings evolve over time based on review timestamps.
This dataset comprises user reviews for ChatGPT collected from 25th July 2023 to 24th August 2024. The data collection is global, reflecting feedback from users worldwide.
CCO
This dataset is ideal for a range of users interested in understanding user feedback and sentiment, including: * Data Scientists and Machine Learning Engineers for building and training sentiment analysis and classification models. * Product Managers and App Developers to gain actionable insights for product improvement and feature development. * Market Researchers to understand user satisfaction and market perception of AI applications. * Academic Researchers studying human-computer interaction, natural language processing, or user behaviour.
Original Data Source: ChatGPT Users Reviews
6000 French user reviews from three applications on Google Play (Garmin Connect, Huawei Health, Samsung Health) are labelled manually. We selected four labels: rating, bug report, feature request and user experience.
Ratings are simple text which express the overall evaluation to that app, including praise, criticism, or dissuasion. Bug reports show the problems that users have met while using the app, like loss of data, crash of app, connection error, etc. Feature requests reflect the demande of users on new function, new content, new interface, etc. In user experience, users describe their experience in relation to the functionality of the app, how does certain functions be helpful.
As we can observe from the following table, that shows examples of labelled user reviews, each review belongs to one or more categories.
App | Total | Rating | Bug report | Feature request | User experience |
---|---|---|---|---|---|
Garmin Connect | 2000 | 1260 | 757 | 170 | 493 |
Huawei Health | 2000 | 1068 | 819 | 384 | 289 |
Samsung Health | 2000 | 1324 | 491 | 486 | 349 |
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
The Instant Data Scraper crawler crawler crawler crawls the Amazon review Data set.
The State Review Framework is a primary means by which EPA conducts oversight of three core federal statutes: Clean Air Act, Clean Water Act, and Resource Conservation and Recovery Act. The routine, nationwide review provides a consistent process for evaluating the performance of state, local and EPA compliance and enforcement programs. The overarching goal of the reviews is to ensure fair and consistent enforcement necessary to protect human health and the environment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Code review is an important practice that improves the overall quality of a proposed patch (i.e. code changes). While much research focused on tool-based code reviews (e.g. a Gerrit code review tool, GitHub), many traditional open-source software (OSS) projects still conduct code reviews through emails. However, due to the nature of unstructured email-based data, it can be challenging to mine email-based code reviews, hindering researchers from delving into the code review practice of such long-standing OSS projects. Therefore, this paper presents large-scale datasets of email-based code reviews of 167 projects across three OSS communities (i.e. Linux Kernel, OzLabs, and FFmpeg). We mined the data from Patchwork, a web-based patch-tracking system for email-based code review, and curated the data by grouping a submitted patch and its revised versions and grouping email aliases. Our datasets include a total of 4.2M patches with 2.1M patch groups and 169K email addresses belonging to 141K individuals. Our published artefacts include the datasets as well as a tool suite to crawl, curate, and store Patchwork data. With our datasets, future work can directly delve into an email-based code review practice of large OSS projects without additional effort in data collection and curation.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset comprises user reviews and associated data for Overwatch 2, a popular video game title, sourced from the official Steam store. Overwatch 2 is the highly anticipated sequel to the original Overwatch game, developed by Blizzard Entertainment. As we know, it's renowned for its unfavorable reviews on Steam.
I don't scrape many reviews because it would take a wicked amount of time and resources to do so.
Disclaimer All data belongs to Valve Corporation and are not mine
CC0
Original Data Source: Overwatch 2 - Steam Review Dataset
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Felipe Navarro
Released under Database: Open Database, Contents: Database Contents
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset is a collection of user reviews for various Google Apps available on the Play Store. It provides detailed insights into user feedback, ratings, and engagement with different applications. The dataset's primary purpose is to offer a rich resource for understanding user sentiment, identifying app performance issues, and tracking user satisfaction over time. It is a valuable asset for analytics and natural language processing tasks related to app reviews.
The dataset contains over 90,000 app reviews. The score
column shows a distribution across ratings, with substantial counts for scores like 1.00-1.20, 2.00-2.20, 3.00-3.20, 4.00-4.20, and 4.80-5.00. For thumbsUpCount
, the majority of reviews have a relatively low number of likes (0-720), but there are instances with significantly higher counts, reaching up to over 14,000 likes. The reviewCreatedVersion
column shows a variety of app versions, with some being more frequently reviewed than others. Review creation dates span a period from April 2014 to February 2021, with a notable increase in review volume towards the later years, particularly between May 2020 and February 2021.
This dataset is ideal for: * Sentiment analysis of app reviews. * Natural Language Processing (NLP) tasks, such as topic modelling, text classification, and entity recognition. * App performance monitoring and identifying user pain points. * Market research on user satisfaction and trends in app usage. * Developing AI and Machine Learning models for predicting app ratings or automatically classifying feedback.
The dataset offers global coverage for app reviews. The time range for review creation spans from 10th April 2014 to 4th February 2021. While developer replies are included, the data on repliedAt
primarily indicates a single latest date (4th February 2021) with the majority being null, suggesting that developer reply timestamps are not as broadly distributed across the dataset as review creation times.
CC0
Original Data Source: Google Apps Playstore Reviews
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Muaz Tahir
Released under CC0: Public Domain
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains 1,000 text reviews gathered from various restaurants, with each review clearly marked as either positive or negative. It has been created with beginners in mind, particularly for those delving into the fields of sentiment analysis and natural language processing (NLP). The dataset serves as an excellent starting point for understanding how to process and classify textual data.
The dataset is provided as a CSV (Comma-Separated Values) file, named Beginner_Reviews_dataset.csv
. It has a file size of approximately 66.84 kB. The dataset consists of 1,000 records or rows, with each row representing a single restaurant review and its corresponding sentiment label.
This dataset is designed to be user-friendly for those new to data science. It can be utilised to train and evaluate sentiment analysis models, making it ideal for binary classification tasks. It is well-suited for educational purposes, assisting learners in developing skills in text preprocessing, feature extraction, and various classification algorithms within the NLP domain.
The reviews included in this dataset originate from various restaurants, implying a global scope rather than a specific geographic region. There is no specific time range for the reviews themselves detailed in the provided information, nor any particular demographic focus beyond being restaurant reviews.
CC0
This dataset is primarily intended for beginners in sentiment analysis and natural language processing. It is suitable for: * Students learning text analytics and machine learning. * New practitioners looking for simple datasets to practise building classification models. * Anyone interested in educational projects involving text data and sentiment classification.
Original Data Source: ❤️ vs 😡: Sentiment Analysis 📝
Amazon Customer Reviews Dataset is a dataset of user-generated product reviews on the shopping website Amazon. It contains over 130 million product reviews.
This dataset contains a tiny fraction of that dataset processed and prepared specifically for language generation.
To know how the dataset is prepared, then please check the GitHub repository for this dataset. https://github.com/imdeepmind/AmazonReview-LanguageGenerationDataset
The dataset is stored in an SQLite database. The database contains one table called reviews. This table contains two columns sequence and next.
The sequence column contains sequences of characters. In this dataset, each sequence of 40 characters long.
The next column contains the next character after the sequence.
There are about 200 million samples are in the dataset.
Thanks to Amazon for making this awesome dataset. Here is the link for the dataset: https://s3.amazonaws.com/amazon-reviews-pds/readme.html
This dataset can be used for Language Generation. As it contains 200 million samples, complex Deep Learning models can be trained on this data.
About the Dataset This dataset contains: A list of School Food Authorities (SFAs) that have recently undergone Administrative Reviews with TDA, including: Types of school nutrition program operated Special provision programs utilized Whether or not there were Findings This report can be found on SquareMeals Compliance for NSLP for the current program year and will be posted to ODP within three months after the end of the program year.
The White paper on Telecommunication markets review
AutoTrain Dataset for project: imdb-sentiment-analysis
Dataset Description
This dataset has been automatically processed by AutoTrain for project imdb-sentiment-analysis.
Languages
The BCP-47 code for the dataset's language is en.
Dataset Structure
Data Instances
A sample from this dataset looks as follows: [ { "text": "Me neither, but this flick is unfortunately one of those movies that are too bad to be good and… See the full description on the dataset page: https://huggingface.co/datasets/linktimecloud/autotrain-data-imdb-sentiment-analysis.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words.
This dataset is converted to a csv file by using the data descriped in the paper of Pang and Lee (2005) which is frequently used as a benchmark for text classification tasks. Also, special thanks to Product School for enabling the photo used in the banner for public use.
Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper: