100+ datasets found

Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
amazon-reviews-sentiment-analysis
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fastai X Hugging Face Group 2022, amazon-reviews-sentiment-analysis [Dataset]. https://huggingface.co/datasets/hugginglearners/amazon-reviews-sentiment-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
fastai X Hugging Face Group 2022
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for amazon reviews for sentiment analysis

Dataset Summary

One of the most important problems in e-commerce is the correct calculation of the points given to after-sales products. The solution to this problem is to provide greater customer satisfaction for the e-commerce site, product prominence for sellers, and a seamless shopping experience for buyers. Another problem is the correct ordering of the comments given to the products. The prominence of misleading… See the full description on the dataset page: https://huggingface.co/datasets/hugginglearners/amazon-reviews-sentiment-analysis.
i
Mobile review dataset for aspect level sentiment analysis
ieee-dataport.org
Updated Sep 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Piyush Soni (2024). Mobile review dataset for aspect level sentiment analysis [Dataset]. https://ieee-dataport.org/documents/mobile-review-dataset-aspect-level-sentiment-analysis
Explore at:
Dataset updated
Sep 17, 2024
Authors
Piyush Soni
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
camera
h
course-review-multilabel-sentiment-analysis
huggingface.co
Updated May 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nguyen Minh Chi (2024). course-review-multilabel-sentiment-analysis [Dataset]. https://huggingface.co/datasets/chillies/course-review-multilabel-sentiment-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 28, 2024
Authors
Nguyen Minh Chi
Description
chillies/course-review-multilabel-sentiment-analysis dataset hosted on Hugging Face and contributed by the HF Datasets community
d
Product Review Datasets for User Sentiment Analysis
datarade.ai
Updated Sep 28, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2018). Product Review Datasets for User Sentiment Analysis [Dataset]. https://datarade.ai/data-products/product-review-datasets-for-user-sentiment-analysis-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Sep 28, 2018
Dataset authored and provided by
Oxylabs
Area covered
Libya, Barbados, Sudan, Italy, Canada, Antigua and Barbuda, Egypt, Argentina, South Africa, Hong Kong
Description
Product Review Datasets: Uncover user sentiment

Harness the power of Product Review Datasets to understand user sentiment and insights deeply. These datasets are designed to elevate your brand and product feature analysis, help you evaluate your competitive stance, and assess investment risks.

Data sources:

Trustpilot: datasets encompassing general consumer reviews and ratings across various businesses, products, and services.

Leave the data collection challenges to us and dive straight into market insights with clean, structured, and actionable data, including:

Product name;

Product category;

Number of ratings;

Ratings average;

Review title;

Review body;

Choose from multiple data delivery options to suit your needs:

Receive data in easy-to-read formats like spreadsheets or structured JSON files.

Select your preferred data storage solutions, including SFTP, Webhooks, Google Cloud Storage, AWS S3, and Microsoft Azure Storage.

Tailor data delivery frequencies, whether on-demand or per your agreed schedule.

Why choose Oxylabs?

Fresh and accurate data: Access organized, structured, and comprehensive data collected by our leading web scraping professionals.

Time and resource savings: Concentrate on your core business goals while we efficiently handle the data extraction process at an affordable cost.

Adaptable solutions: Share your specific data requirements, and we'll craft a customized data collection approach to meet your objectives.

Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA standards.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Join the ranks of satisfied customers who appreciate our meticulous attention to detail and personalized support. Experience the power of Product Review Datasets today to uncover valuable insights and enhance decision-making.
g
Sentiment Analysis for Movie Reviews
gts.ai
json
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2023). Sentiment Analysis for Movie Reviews [Dataset]. https://gts.ai/case-study/sentiment-analysis-for-movie-reviews/
Explore at:
jsonAvailable download formats
Dataset updated
Nov 20, 2023
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The objective of sentiment analysis for movie reviews is to automatically analyze and categorize the sentiments expressed in reviews, providing insights into audience opinions, emotions, and reactions towards films.
Z
Sentiment analysis in Galaxy with IMDB movie review dataset
data.niaid.nih.gov
Updated Aug 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaivan Kamali (2022). Sentiment analysis in Galaxy with IMDB movie review dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4477880
Explore at:
Dataset updated
Aug 4, 2022
Dataset authored and provided by
Kaivan Kamali
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IMDB movie review sentiment classification dataset (Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011)). For more information please refer to: https://ai.stanford.edu/~amaas/data/sentiment/

The IMDB dataset was modified as follows to prepare it for use in a Galaxy Training Tutorial (https://training.galaxyproject.org/):

The top 50 words are excluded (mostly stop words). Included the next 10,000 top words. Reviews are limited to 500 words max (Longer reviews trimmed and shorter reviews are padded). 25,000 reviews are used for training and testing each. Files are in tsv (tab separated value) format to be consumed by Galaxy (www.usegalaxy.org).
P
FABSA Dataset
paperswithcode.com
Updated Mar 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). FABSA Dataset [Dataset]. https://paperswithcode.com/dataset/fabsa
Explore at:
Dataset updated
Mar 24, 2024
Description
FABSA, An aspect-based sentiment analysis dataset in the Customer Feedback space (Trustpilot, Google Play and Apple Store reviews).

A professionally annotated dataset released by Chattermill AI, with 8 years of experience in leveraging advanced ML analytics in the customer feedback space for high-profile clients such as Amazon and Uber.

Two annotators possess extensive experience in developing human-labeled ABSA datasets for commercial companies, while the third annotator holds a PhD in computational linguistics.

There has been a lack of high-quality ABSA datasets covering broad domains and addressing real-world applications. Academic progress has been confined to benchmarking on domain-specific, toy datasets such as restaurants and laptops, which are limited in size (e.g., SemEval Task ABSA or SentiHood).

This dataset is part of the FABSA paper, and we release it hoping to advance academic progress as tools for ingesting and analyzing customer feedback at scale improve significantly, yet evaluation datasets continue to lag. FABSA is a new, large-scale, multi-domain ABSA dataset of feedback reviews, consisting of approximately 10,500 reviews spanning 10 domains (Fashion, Consulting, Travel Booking, Ride-hailing, Banking, Trading, Streaming, Price Comparison, Information Technology, and Groceries).

Academic Paper

@article{KONTONATSIOS2023126867, title = {FABSA: An aspect-based sentiment analysis dataset of user reviews}, journal = {Neurocomputing}, volume = {562}, pages = {126867}, year = {2023}, issn = {0925-2312}, doi = {https://doi.org/10.1016/j.neucom.2023.126867}, url = {https://www.sciencedirect.com/science/article/pii/S0925231223009906}, author = {Georgios Kontonatsios and Jordan Clive and Georgia Harrison and Thomas Metcalfe and Patrycja Sliwiak and Hassan Tahir and Aji Ghose}, keywords = {ABSA, Multi-domain dataset, Deep learning}, }
Arabic Companies Reviews For Sentiment Analysis
kaggle.com
Updated May 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mohamed ali salama (2023). Arabic Companies Reviews For Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/mohamedalisalama/arabic-companies-reviews-for-sentiment-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 23, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
mohamed ali salama
Description
Context

The data has 67K+ reviews in Arabic for sentiment analysis Data collecting using web scraping for many companies Like ( talabat,kabiter,nasla,swifil,alsiwidiu,kilubatra,dumati,.........etc)

Content

Coulnms

Reviews : review description rating : 1 postive , 0 neutral , -1 negative Company : continues company name for each review
o
Preprocessed Amazon Review Sentiment
opendatabay.com
.undefined
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Preprocessed Amazon Review Sentiment [Dataset]. https://www.opendatabay.com/data/ai-ml/d9407519-5dc6-4a8d-a031-917714147912
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 4, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Entertainment & Media Consumption
Description
This dataset contains preprocessed Amazon product review data for the Gen3EcoDot, primarily scraped from amazon.in. It is designed to facilitate the training and testing of classification models, particularly for sentiment analysis. The reviews have been stemmed and lemmatised using NLTK, and sentiment labels are generated using TextBlob polarity scores, making it ready for direct use in machine learning and natural language processing tasks.

Columns

Index: A unique identifier for each record within the dataset.

Review: The original, raw text of the customer review.

Stemmed and Lemmatised review using nltk: The preprocessed version of the review text, optimised for text analysis and model training.

Polarity: The numerical polarity score derived from the TextBlob analysis, indicating the sentiment expressed in the review (ranging from -1.00 for negative to 1.00 for positive sentiment).

Division: A categorical label generated based on the polarity score, providing discrete sentiment categories.

Distribution

The dataset is provided in a tabular format, typically a CSV file, and is preprocessed for immediate use. While specific row counts are not explicitly stated as a single number, the 'division' column includes approximately 4156 records, categorised into various ranges. The 'polarity' column also contains counts for sentiment ranges, summing to approximately 4084 records, indicating positive, neutral, and negative sentiments.

Usage

This dataset is ideal for a variety of applications, including: * Training and testing sentiment classification models. * Developing and evaluating Natural Language Processing (NLP) algorithms. * Conducting sentiment analysis on product reviews. * Academic research in text analytics and machine learning. * Building applications that require pre-classified text data.

Coverage

The data primarily covers product reviews from amazon.in, providing a global scope for e-commerce sentiment. The listing date for the dataset is noted as 16/06/2025. No specific historical time range for the reviews themselves or demographic details are provided.

License

CC0

Who Can Use It

This dataset is suitable for: * Data scientists and machine learning engineers looking for preprocessed text data to train classification models. * Researchers and academics in NLP, sentiment analysis, and text mining. * Students learning about text preprocessing and sentiment modelling. * Developers building applications that require sentiment understanding from review data.

Dataset Name Suggestions

Preprocessed Amazon Review Sentiment

Gen3EcoDot Sentiment Analysis Dataset

E-commerce Product Review Polarity

NLTK TextBlob Sentiment Data

Attributes

Original Data Source: Preprocessed Dataset Sentiment Analysis
P
MR Dataset
paperswithcode.com
Updated Apr 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). MR Dataset [Dataset]. https://paperswithcode.com/dataset/mr
Explore at:
Dataset updated
Apr 28, 2021
Description
MR Movie Reviews is a dataset for use in sentiment-analysis experiments. Available are collections of movie-review documents labeled with respect to their overall sentiment polarity (positive or negative) or subjective rating (e.g., "two and a half stars") and sentences labeled with respect to their subjectivity status (subjective or objective) or polarity.
Z
AWARE: Dataset for Aspect-Based Sentiment Analysis of Apps Reviews
data.niaid.nih.gov
explore.openaire.eu
Updated Jan 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hamoud Aljamaan (2022). AWARE: Dataset for Aspect-Based Sentiment Analysis of Apps Reviews [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5528480
Explore at:
Dataset updated
Jan 25, 2022
Dataset provided by
Malak Baslyman
Nouf Alturaief
Hamoud Aljamaan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The peer-reviewed paper of AWARE dataset is published in ASEW 2021, and can be accessed through: http://doi.org/10.1109/ASEW52652.2021.00049. Kindly cite this paper when using AWARE dataset.

Aspect-Based Sentiment Analysis (ABSA) aims to identify the opinion (sentiment) with respect to a specific aspect. Since there is a lack of smartphone apps reviews dataset that is annotated to support the ABSA task, we present AWARE: ABSA Warehouse of Apps REviews.

AWARE contains apps reviews from three different domains (Productivity, Social Networking, and Games), as each domain has its distinct functionalities and audience. Each sentence is annotated with three labels, as follows:

Aspect Term: a term that exists in the sentence and describes an aspect of the app that is expressed by the sentiment. A term value of “N/A” means that the term is not explicitly mentioned in the sentence.

Aspect Category: one of the pre-defined set of domain-specific categories that represent an aspect of the app (e.g., security, usability, etc.).

Sentiment: positive or negative.

Note: games domain does not contain aspect terms.

We provide a comprehensive dataset of 11323 sentences from the three domains, where each sentence is additionally annotated with a Boolean value indicating whether the sentence expresses a positive/negative opinion. In addition, we provide three separate datasets, one for each domain, containing only sentences that express opinions. The file named “AWARE_metadata.csv” contains a description of the dataset’s columns.

How AWARE can be used?

We designed AWARE such that it can be used to serve various tasks. The tasks can be, but are not limited to:

Sentiment Analysis.

Aspect Term Extraction.

Aspect Category Classification.

Aspect Sentiment Analysis.

Explicit/Implicit Aspect Term Classification.

Opinion/Not-Opinion Classification.

Furthermore, researchers can experiment with and investigate the effects of different domains on users' feedback.
d
Review Dataset [Cross-Industry] – Public consumer feedback for sentiment and...
datarade.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WiserBrand.com, Review Dataset [Cross-Industry] – Public consumer feedback for sentiment and experience mapping [Dataset]. https://datarade.ai/data-products/review-dataset-cross-industry-public-consumer-feedback-fo-wiserbrand-com
Explore at:
.json, .csv, .xls, .txtAvailable download formats
Dataset provided by
WiserBrand.com
Area covered
Malta, Finland, Germany, Portugal, El Salvador, Ireland, Austria, Gibraltar, San Marino, Denmark
Description
"This dataset includes consumer-submitted reviews from over 160 industries, covering both product- and service-based businesses. It’s built to support CX, AI, and analytics teams seeking structured insight into what real customers say, feel, and expect — across sectors like finance, healthcare, travel, telecom, retail, and more.

Each review includes:

Authentic customer reviews (text, rating, pros and cons)

Labeled sentiment and tone (positive, neutral, negative)

Service context across industries: purchase, delivery, support, return, usage

Industry and company filters (fully customizable per buyer request)

Optional metadata: platform, review length, timestamp, geo-location

The list may vary based on the industry and can be customized as per your request.

Use this dataset to:

Track public perception trends across specific brands or verticals

Segment sentiment insights by industry, region, or company

Power NLP pipelines that require diverse tone, emotion, and domain specificity

Build dashboards or LLM prompts grounded in real user language

Train review summarization, classification, or escalation engines

This dataset offers flexibility for custom delivery-by industry, domain, or company, making it ideal for teams needing scalable consumer voice data tailored to specific strategic goals."
c
Unlocking User Sentiment: The App Store Reviews Dataset
crawlfeeds.com
json, zip
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Unlocking User Sentiment: The App Store Reviews Dataset [Dataset]. https://crawlfeeds.com/datasets/app-store-reviews-dataset
Explore at:
json, zipAvailable download formats
Dataset updated
Jun 20, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
This dataset offers a focused and invaluable window into user perceptions and experiences with applications listed on the Apple App Store. It is a vital resource for app developers, product managers, market analysts, and anyone seeking to understand the direct voice of the customer in the dynamic mobile app ecosystem.

Dataset Specifications:

Investment: $45.0

Status: Published and immediately available.

Category: Ratings and Reviews Data

Format: Compressed ZIP archive containing JSON files, ensuring easy integration into your analytical tools and platforms.

Volume: Comprises 10,000 unique app reviews, providing a robust sample for qualitative and quantitative analysis of user feedback.

Timeliness: Last crawled: (This field is blank in your provided info, which means its recency is currently unknown. If this were a real product, specifying this would be critical for its value proposition.)

Richness of Detail (11 Comprehensive Fields):

Each record in this dataset provides a detailed breakdown of a single App Store review, enabling multi-dimensional analysis:

Review Content:

review: The full text of the user's written feedback, crucial for Natural Language Processing (NLP) to extract themes, sentiment, and common keywords.

title: The title given to the review by the user, often summarizing their main point.

isEdited: A boolean flag indicating whether the review has been edited by the user since its initial submission. This can be important for tracking evolving sentiment or understanding user behavior.

Reviewer & Rating Information:

username: The public username of the reviewer, allowing for analysis of engagement patterns from specific users (though not personally identifiable).

rating: The star rating (typically 1-5) given by the user, providing a quantifiable measure of satisfaction.

App & Origin Context:

app_name: The name of the application being reviewed.

app_id: A unique identifier for the application within the App Store, enabling direct linking to app details or other datasets.

country: The country of the App Store storefront where the review was left, allowing for geographic segmentation of feedback.

Metadata & Timestamps:

_id: A unique identifier for the specific review record in the dataset.

crawled_at: The timestamp indicating when this particular review record was collected by the data provider (Crawl Feeds).

date: The original date the review was posted by the user on the App Store.

Expanded Use Cases & Analytical Applications:

This dataset is a goldmine for understanding what users truly think and feel about mobile applications. Here's how it can be leveraged:

Product Development & Improvement:

Bug Detection & Prioritization: Analyze negative review text to identify recurring technical issues, crashes, or bugs, allowing developers to prioritize fixes based on user impact.

Feature Requests & Roadmap Prioritization: Extract feature suggestions from positive and neutral review text to inform future product roadmap decisions and develop features users actively desire.

User Experience (UX) Enhancement: Understand pain points related to app design, navigation, and overall usability by analyzing common complaints in the review field.

Version Impact Analysis: If integrated with app version data, track changes in rating and sentiment after new app updates to assess the effectiveness of bug fixes or new features.

Market Research & Competitive Intelligence:

Competitor Benchmarking: Analyze reviews of competitor apps (if included or combined with similar datasets) to identify their strengths, weaknesses, and user expectations within a specific app category.

Market Gap Identification: Discover unmet user needs or features that users desire but are not adequately provided by existing apps.

Niche Opportunities: Identify specific use cases or user segments that are underserved based on recurring feedback.

Marketing & App Store Optimization (ASO):

Sentiment Analysis: Perform sentiment analysis on the review and title fields to gauge overall user satisfaction, pinpoint specific positive and negative aspects, and track sentiment shifts over time.

Keyword Optimization: Identify frequently used keywords and phrases in reviews to optimize app store listings, improving discoverability and search ranking.

Messaging Refinement: Understand how users describe and use the app in their own words, which can inform marketing copy and advertising campaigns.

Reputation Management: Monitor rating trends and identify critical reviews quickly to facilitate timely responses and proactive customer engagement.

Academic & Data Science Research:

Natural Language Processing (NLP): The review and title fields are excellent for training and testing NLP models for sentiment analysis, topic modeling, named entity recognition, and text summarization.

User Behavior Analysis: Study patterns in rating distribution, isEdited status, and date to understand user engagement and feedback cycles.

Cross-Country Comparisons: Analyze country-specific reviews to understand regional differences in app perception, feature preferences, or cultural nuances in feedback.

This App Store Reviews dataset provides a direct, unfiltered conduit to understanding user needs and ultimately driving better app performance and greater user satisfaction. Its structured format and granular detail make it an indispensable asset for data-driven decision-making in the mobile app industry.
P
IMDb Movie Reviews Dataset
paperswithcode.com
Updated Dec 20, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew L. Maas; Raymond E. Daly; Peter T. Pham; Dan Huang; Andrew Y. Ng; Christopher Potts (2013). IMDb Movie Reviews Dataset [Dataset]. https://paperswithcode.com/dataset/imdb-movie-reviews
Explore at:
Dataset updated
Dec 20, 2013
Authors
Andrew L. Maas; Raymond E. Daly; Peter T. Pham; Dan Huang; Andrew Y. Ng; Christopher Potts
Description
The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset contains additional unlabeled data.
m
BDFoodSent: A Large-Scale Sentiment-Labeled Restaurant Review Dataset from...
data.mendeley.com
Updated Dec 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ehsanur Rahman Rhythm (2024). BDFoodSent: A Large-Scale Sentiment-Labeled Restaurant Review Dataset from Bangladesh [Dataset]. http://doi.org/10.17632/532fxhnwbb.2
Explore at:
Unique identifier
https://doi.org/10.17632/532fxhnwbb.2
Dataset updated
Dec 2, 2024
Authors
Ehsanur Rahman Rhythm
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh
Description
BDFoodReview is a large-scale dataset containing 334,119 restaurant reviews collected from "Foodpanda Bangladesh". The dataset includes customer reviews in mixed languages (Bangla, English, and Banglish), translated into English, along with their corresponding ratings and sentiment labels.

Dataset Statistics Total Reviews: 334,119 Features/Columns: 19

Potential Applications Sentiment Analysis Restaurant Review Classification Customer Satisfaction Analysis Opinion Mining Natural Language Processing Research Food Service Industry Analysis
IMDb Movie Review Sentiment
kaggle.com
Updated Dec 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). IMDb Movie Review Sentiment [Dataset]. https://www.kaggle.com/datasets/thedevastator/imdb-movie-review-sentiment-dataset/suggestions?status=pending&yourSuggestions=true
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 2, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
IMDb Movie Review Sentiment

Movie Review Sentiment

By imdb (From Huggingface) [source]

About this dataset

The IMDb Large Movie Review Dataset is a comprehensive collection of movie reviews used for sentiment classification. The dataset includes a wide range of movie reviews along with their corresponding sentiment labels, which indicate whether the review is positive or negative in nature. This invaluable dataset is aimed at facilitating sentiment analysis and classification tasks in the field of natural language processing.

The main purpose of the train.csv file within this dataset is to provide a curated collection of movie reviews, each accompanied by its respective sentiment label. This file proves particularly useful for training machine learning models to accurately predict sentiment and classify reviews based on their emotional tone.

Similarly, the test.csv file contains another set of movie reviews along with corresponding sentiment labels. Meant for testing and validating the performance of trained models, this dataset enables researchers and developers to evaluate their models' effectiveness in real-world scenarios.

Additionally, the unsupervised.csv file offers an alternative subset within the dataset. Unlike train.csv and test.csv, unsupervised.csv does not include any associated sentiment labels for individual movie reviews. This specific subset serves as a valuable resource for exploring unsupervised learning techniques within the domain of sentiment classification.

By utilizing this meticulously compiled IMDb Large Movie Review Dataset, researchers and data scientists can delve into various aspects related to analyzing sentiments in textual data. With its carefully labeled data points covering both positive and negative sentiments expressed in diverse film critiques, this dataset empowers users to develop sophisticated machine learning algorithms that accurately assess subjective opinions from text data

How to use the dataset

Introduction:

Dataset Overview: - Train.csv: This file contains a set of movie reviews along with their sentiment labels. It is intended for training your sentiment analysis models. - Test.csv: This file provides another set of movie reviews along with their corresponding sentiment labels. You can use this file to evaluate the performance of your trained models. - Unsupervised.csv: This file includes movie reviews without any associated sentiment labels. It can be used for unsupervised sentiment classification tasks.

Columns in the Dataset: - text: The main column containing the text of each movie review. - label: The sentiment label assigned to each review, indicating whether it is positive or negative.

Guidelines for Using the Dataset:

Training Your Model:

Begin by loading and preprocessing the data from train.csv

Treat 'text' as your input feature and 'label' as your target variable

Explore different machine learning or deep learning algorithms suitable for text classification

Train your model using various techniques, such as bag-of-words, word embeddings, or transformers

Evaluate and fine-tune your model's performance using test.csv

Evaluating Your Model:

Load test.csv and preprocess the data similar to what you did with train.csv

Use this preprocessed test data to evaluate the accuracy, precision, recall, F1 score or other relevant metrics of your trained model on unseen data

Analyze these metrics to understand how well your model is performing in predicting sentiments

Advancing Your Model (Unsupervised Classification):

Utilize unsupervised.csv for unsupervised sentiment classification tasks

Preprocess the movie reviews in this file and explore techniques like clustering, topic modeling, or self-supervised learning

Extract patterns, themes, or sentiments from the reviews without any guidance from labeled data

Conclusion:

Research Ideas

Sentiment Analysis: This dataset can be used to train models for sentiment analysis, where the goal is to predict whether a movie review is positive or negative based on its text.

NLP Research: The dataset can be used for various natural language processing (NLP) tasks such as text classification, information extraction, or named entity recognition. Researchers and practitioners can leverage this dataset to develop and evaluate new algorithms and techniques in the field of NLP.

Recommendation Systems: The sentiment labels in this dataset can be used as a source of feedback or user preferences for recommendation systems. By analyzing the sentiments expressed in reviews,...
oyo-reviews-dataset
kaggle.com
zip
Updated Jun 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepkumar patel (2023). oyo-reviews-dataset [Dataset]. https://www.kaggle.com/datasets/deeppatel9095/oyo-reviews-dataset
Explore at:
zip(32300432 bytes)Available download formats
Dataset updated
Jun 24, 2023
Authors
Deepkumar patel
Description
The inspiration behind creating the OYO Review Dataset for sentiment analysis was to explore the sentiment and opinions expressed in hotel reviews on the OYO Hotels platform. Analyzing the sentiment of customer reviews can provide valuable insights into the overall satisfaction of guests, identify areas for improvement, and assist in making data-driven decisions to enhance the hotel experience. By collecting and curating this dataset, Deep Patel, Nikki Patel, and Nimil aimed to contribute to the field of sentiment analysis in the context of the hospitality industry. Sentiment analysis allows us to classify the sentiment expressed in textual data, such as reviews, into positive, negative, or neutral categories. This analysis can help hotel management and stakeholders understand customer sentiments, identify common patterns, and address concerns or issues that may affect the reputation and customer satisfaction of OYO Hotels. The dataset provides a valuable resource for training and evaluating sentiment analysis models specifically tailored to the hospitality domain. Researchers, data scientists, and practitioners can utilize this dataset to develop and test various machine learning and natural language processing techniques for sentiment analysis, such as classification algorithms, sentiment lexicons, or deep learning models. Overall, the goal of creating the OYO Review Dataset for sentiment analysis was to facilitate research and analysis in the area of customer sentiments and opinions in the hotel industry. By understanding the sentiment of hotel reviews, businesses can strive to improve their services, enhance customer satisfaction, and make data-driven decisions to elevate the overall guest experience.

Deep Patel: https://www.linkedin.com/in/deep-patel-55ab48199/ Nikki Patel: https://www.linkedin.com/in/nikipatel9/ Nimil lathiya: https://www.linkedin.com/in/nimil-lathiya-059a281b1/
Johns Hopkins Multi-Domain Sentiment Dataset ∑∞
kaggle.com
Updated Jan 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jérøme E. Blanch∑xt (2020). Johns Hopkins Multi-Domain Sentiment Dataset ∑∞ [Dataset]. https://www.kaggle.com/jeromeblanchet/multidomain-sentiment-analysis-dataset/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 14, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jérøme E. Blanch∑xt
Description
Multidomain sentiment analysis dataset

Amazon review from Johns Hopkins University’s Department of Computer Science

Source: https://www.cs.jhu.edu/~mdredze/datasets/sentiment/

Kaggle kernel take care of the tar.gz files for you :-)

This dataset features slightly older product reviews from Amazon and derives from the Johns Hopkins University’s Department of Computer Science.

Dataset included

unprocessed.tar.gz processed_acl.tar.gz processed_stars.tar.gz

This sentiment dataset has been used in several papers:

John Blitzer, Mark Dredze, Fernando Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. Association of Computational Linguistics (ACL), 2007. [PDF]

John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jenn Wortman. Learning Bounds for Domain Adaptation. Neural Information Processing Systems (NIPS), 2008. [PDF]

Mark Dredze, Koby Crammer, and Fernando Pereira. Confidence-Weighted Linear Classification. International Conference on Machine Learning (ICML), 2008. [PDF]

Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Domain Adaptation with Multiple Sources. Neural Information Processing Systems (NIPS), 2009.

If you use this data for your research or a publication, please cite the first (ACL 2007) paper as the reference for the data. Also, please drop me a line so I know that you found the data useful.

The Multi-Domain Sentiment Dataset contains product reviews taken from Amazon.com from many product types (domains). Some domains (books and dvds) have hundreds of thousands of reviews. Others (musical instruments) have only a few hundred. Reviews contain star ratings (1 to 5 stars) that can be converted into binary labels if needed. This page contains some descriptions about the data. If you have questions, please email Mark Dredze or John Blitzer.

A few notes regarding the data sets.

1) unprocessed.tar.gz contains the original data. 2) processed.acl.tar.gz contains the data pre-processed and balanced. That is, the format of Blitzer et al. (ACL 2007) 3) processed.realvalued.tar.gz contains the data pre-processed and balanced, but with the number of stars, rather than just positive or negative. That is, the format of Mansour et al. (NIPS 2009)
Amazon Customer Review Data
zenodo.org
pdf
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akash Shashikant Vaykar; Abhishek Kaushik; Abhishek Kaushik; Akash Shashikant Vaykar (2024). Amazon Customer Review Data [Dataset]. http://doi.org/10.5281/zenodo.3549704
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3549704
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Akash Shashikant Vaykar; Abhishek Kaushik; Abhishek Kaushik; Akash Shashikant Vaykar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset: Amazon Customer Review Data for sentiment analysis

Size: 60889 appox.

Format: .CSV

Period: 2013 to 2019

Categories: 5…… (Mobiles, Smart TV, Books, Mobile Accessories, Refrigerator)

Unique_ID: Customized (Primary Key)

Review_Header: user’s comment in few words

Review_Text: User’s comment in details (3-4 lines)

Rating: (1- Very Low, 2 🡪 Low, 3🡪 Avg, 4 🡪 Good, 5 - Excellent)

Posting Period: 2013 to 2019

Own_Rating: for 1-2 🡪 Negative, 3🡪 Neutral, 4-5 🡪 Positive

Facebook

Twitter

Click to copy link

Link copied

Cite

Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504

Datasets for Sentiment Analysis

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.10157504

Dataset updated

Dec 10, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------

The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

File name: sts_gold_tweet.csv

----------- Amazon Sales Dataset ----------------

This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

Features:

product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product

License: CC BY-NC-SA 4.0

File name: amazon.csv

----------- Rotten Tomatoes Reviews Dataset ----------------

This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

File name: data_rt.csv

----------- Preprocessed Dataset Sentiment Analysis ----------------

Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.

The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

DOI: 10.34740/kaggle/dsv/3877817

Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

This dataset was used in the experimental phase of my research.

File name: EcoPreprocessed.csv

----------- Amazon Earphones Reviews ----------------

This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

License: U.S. Government Works

Source: www.amazon.in

File name (original): AllProductReviews.csv (contains 14337 reviews)

File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

----------- Amazon Musical Instruments Reviews ----------------

This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

Source: http://jmcauley.ucsd.edu/data/amazon/

File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

Clear search

Close search

Google apps

Main menu

Datasets for Sentiment Analysis

amazon-reviews-sentiment-analysis

Mobile review dataset for aspect level sentiment analysis

course-review-multilabel-sentiment-analysis

Product Review Datasets for User Sentiment Analysis

Sentiment Analysis for Movie Reviews

Sentiment analysis in Galaxy with IMDB movie review dataset

FABSA Dataset

Arabic Companies Reviews For Sentiment Analysis

Context

Content

Preprocessed Amazon Review Sentiment

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

MR Dataset

AWARE: Dataset for Aspect-Based Sentiment Analysis of Apps Reviews

Review Dataset [Cross-Industry] – Public consumer feedback for sentiment and...

Unlocking User Sentiment: The App Store Reviews Dataset

IMDb Movie Reviews Dataset

BDFoodSent: A Large-Scale Sentiment-Labeled Restaurant Review Dataset from...

IMDb Movie Review Sentiment

IMDb Movie Review Sentiment

Movie Review Sentiment

About this dataset

How to use the dataset

Research Ideas

oyo-reviews-dataset

Johns Hopkins Multi-Domain Sentiment Dataset ∑∞

Multidomain sentiment analysis dataset

Amazon review from Johns Hopkins University’s Department of Computer Science

Dataset included

This sentiment dataset has been used in several papers:

A few notes regarding the data sets.

Amazon Customer Review Data

Datasets for Sentiment Analysis