This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper:
Yearly data of Quality Review ratings from 2005 to 2017
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I have always been a binge watcher and with so many movies and series to watch, a sentiment analysis of movie reviews is a good start to know more about them.
This dataset contains the text of the reviews, together with a label that indi‐ cates whether a review is “positive” or “negative.” The IMDb website itself contains ratings from 1 to 10. To simplify the modeling, this annotation is summarized as a two-class classification dataset where reviews with a score of 6 or higher are labeled as positive, and the rest as negative.
author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher}, title = {Learning Word Vectors for Sentiment Analysis}, booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies}
One of the most simple but effective and commonly used ways to represent text for machine learning is using the bag-of-words representation. Classify the dataset with highest cross-validation accuracy with or without bag-of-words.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides user reviews for ChatGPT, offering valuable qualitative feedback, satisfaction ratings, and submission dates. It captures a diverse array of user sentiments, from concise remarks to more detailed feedback. The ratings are provided on a scale of 1 to 5, indicating different levels of user satisfaction. The dataset spans several months, which allows for temporal analysis of sentiment trends, as each review includes a timestamp. This data is ideal for gaining insights into user characteristics and for improving application features and services.
The dataset is provided as a free resource. While a sample file will be updated separately to the platform, the data quality is assessed as 5 out of 5, with the current version being 1.0. It was listed on 08/06/2025, with 1 view and 0 downloads recorded so far. The dataset contains approximately 193,154 unique reviews.
This dataset is particularly useful for various analytical applications, including: * Sentiment Analysis: Developing models to predict the emotional tone or sentiment conveyed in user reviews. * Customer Feedback Analysis: Extracting actionable insights that can inform and guide improvements to application features and services. * Review Classification: Building machine learning models to categorise user reviews, for instance, as positive or negative. * Data Visualisation: Creating visual representations of review patterns and trends. * Exploratory Data Analysis: Investigating the characteristics and underlying patterns within the review data. * Natural Language Processing (NLP): Applying NLP techniques to understand and process the textual feedback. * Text Mining: Discovering patterns and insights from the large collection of text reviews. * Time-Series Analysis: Examining how sentiment and ratings evolve over time based on review timestamps.
This dataset comprises user reviews for ChatGPT collected from 25th July 2023 to 24th August 2024. The data collection is global, reflecting feedback from users worldwide.
CCO
This dataset is ideal for a range of users interested in understanding user feedback and sentiment, including: * Data Scientists and Machine Learning Engineers for building and training sentiment analysis and classification models. * Product Managers and App Developers to gain actionable insights for product improvement and feature development. * Market Researchers to understand user satisfaction and market perception of AI applications. * Academic Researchers studying human-computer interaction, natural language processing, or user behaviour.
Original Data Source: ChatGPT Users Reviews
This data is from the California Department of Managed Health Care (DMHC). It contains all decisions from Independent Medical Reviews (IMR) administered by the DMHC since January 1, 2001. An IMR is an independent review of a denied, delayed, or modified health care service that the health plan has determined to be not medically necessary, experimental/investigational or non-emergent/urgent. If the IMR is decided in an enrollees favor, the health plan must authorize the service or treatment requested.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Code review is an important practice that improves the overall quality of a proposed patch (i.e. code changes). While much research focused on tool-based code reviews (e.g. a Gerrit code review tool, GitHub), many traditional open-source software (OSS) projects still conduct code reviews through emails. However, due to the nature of unstructured email-based data, it can be challenging to mine email-based code reviews, hindering researchers from delving into the code review practice of such long-standing OSS projects. Therefore, this paper presents large-scale datasets of email-based code reviews of 167 projects across three OSS communities (i.e. Linux Kernel, OzLabs, and FFmpeg). We mined the data from Patchwork, a web-based patch-tracking system for email-based code review, and curated the data by grouping a submitted patch and its revised versions and grouping email aliases. Our datasets include a total of 4.2M patches with 2.1M patch groups and 169K email addresses belonging to 141K individuals. Our published artefacts include the datasets as well as a tool suite to crawl, curate, and store Patchwork data. With our datasets, future work can directly delve into an email-based code review practice of large OSS projects without additional effort in data collection and curation.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset comprises user reviews and associated data for Overwatch 2, a popular video game title, sourced from the official Steam store. Overwatch 2 is the highly anticipated sequel to the original Overwatch game, developed by Blizzard Entertainment. As we know, it's renowned for its unfavorable reviews on Steam.
I don't scrape many reviews because it would take a wicked amount of time and resources to do so.
Disclaimer All data belongs to Valve Corporation and are not mine
CC0
Original Data Source: Overwatch 2 - Steam Review Dataset
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset consist of 2300+ action movies reviews. It has most of the info. related to movies including ratings. Collecting from most active critics, very helpful for all NLP tasks and ML operations.
According to a survey conducted by in December 2022 in India, 25 percent of Indians referred to Google reviews 50 percent to 75 percent of the times they use the search engine to find out more about a business. Comparatively, less than 10 percent of respondents never referred to Google reviews. In response to increasing complaints about fake online reviews, the Bureau of Indian Standards (BIS) guidelines were brought into effect in November 2022.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Felipe Navarro
Released under Database: Open Database, Contents: Database Contents
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
ashishkgpian/reviews dataset hosted on Hugging Face and contributed by the HF Datasets community
There's a story behind every dataset and here's your opportunity to share yours. Welcome there are 1000 rows of reviews of different products at https://www.etsy.com/?ref=lgo. Etsy is a global online marketplace, where people come together to make, sell, buy and collect unique items.
There are 2 columns in the data set, the first column is the review text and the second is the number of stars out of 5.
Thank you for all members of DSI7, instructors, Ais, and students.
sentiment analysis.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset is a collection of user reviews for various Google Apps available on the Play Store. It provides detailed insights into user feedback, ratings, and engagement with different applications. The dataset's primary purpose is to offer a rich resource for understanding user sentiment, identifying app performance issues, and tracking user satisfaction over time. It is a valuable asset for analytics and natural language processing tasks related to app reviews.
The dataset contains over 90,000 app reviews. The score
column shows a distribution across ratings, with substantial counts for scores like 1.00-1.20, 2.00-2.20, 3.00-3.20, 4.00-4.20, and 4.80-5.00. For thumbsUpCount
, the majority of reviews have a relatively low number of likes (0-720), but there are instances with significantly higher counts, reaching up to over 14,000 likes. The reviewCreatedVersion
column shows a variety of app versions, with some being more frequently reviewed than others. Review creation dates span a period from April 2014 to February 2021, with a notable increase in review volume towards the later years, particularly between May 2020 and February 2021.
This dataset is ideal for: * Sentiment analysis of app reviews. * Natural Language Processing (NLP) tasks, such as topic modelling, text classification, and entity recognition. * App performance monitoring and identifying user pain points. * Market research on user satisfaction and trends in app usage. * Developing AI and Machine Learning models for predicting app ratings or automatically classifying feedback.
The dataset offers global coverage for app reviews. The time range for review creation spans from 10th April 2014 to 4th February 2021. While developer replies are included, the data on repliedAt
primarily indicates a single latest date (4th February 2021) with the majority being null, suggesting that developer reply timestamps are not as broadly distributed across the dataset as review creation times.
CC0
Original Data Source: Google Apps Playstore Reviews
The State Review Framework is a primary means by which EPA conducts oversight of three core federal statutes: Clean Air Act, Clean Water Act, and Resource Conservation and Recovery Act. The routine, nationwide review provides a consistent process for evaluating the performance of state, local and EPA compliance and enforcement programs. The overarching goal of the reviews is to ensure fair and consistent enforcement necessary to protect human health and the environment.
Younger U.S. online users were more likely to spend a longer time reading online reviews than older internet users. During the November 2019 survey, seven percent of respondents aged 18 to 34 years stated that on average they spend at least an hour reading reviews making reviews before making a purchase decision. Only two percent of respondents aged 35 to 54 years stated the same.
6000 French user reviews from three applications on Google Play (Garmin Connect, Huawei Health, Samsung Health) are labelled manually. We selected four labels: rating, bug report, feature request and user experience.
Ratings are simple text which express the overall evaluation to that app, including praise, criticism, or dissuasion. Bug reports show the problems that users have met while using the app, like loss of data, crash of app, connection error, etc. Feature requests reflect the demande of users on new function, new content, new interface, etc. In user experience, users describe their experience in relation to the functionality of the app, how does certain functions be helpful.
As we can observe from the following table, that shows examples of labelled user reviews, each review belongs to one or more categories.
App | Total | Rating | Bug report | Feature request | User experience |
---|---|---|---|---|---|
Garmin Connect | 2000 | 1260 | 757 | 170 | 493 |
Huawei Health | 2000 | 1068 | 819 | 384 | 289 |
Samsung Health | 2000 | 1324 | 491 | 486 | 349 |
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
A complete list of live websites using the Google Reviews Widget technology, compiled through global website indexing conducted by WebTechSurvey.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A small but rich dataset of Indian street food reviews, categorized by dish, location, sentiment, and star rating. Perfect for projects involving sentiment analysis, regional food preference prediction, NLP, and recommendation systems.
This dataset contains short user-generated reviews (1–2 sentences) of popular Indian street foods from different regions across India. Each review includes:
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Muaz Tahir
Released under CC0: Public Domain
hugginglearners/amazon-all-categories-best-sellers-reviews dataset hosted on Hugging Face and contributed by the HF Datasets community
This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper: