100+ datasets found

Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
amazon-reviews-sentiment-analysis
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fastai X Hugging Face Group 2022, amazon-reviews-sentiment-analysis [Dataset]. https://huggingface.co/datasets/hugginglearners/amazon-reviews-sentiment-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
fastai X Hugging Face Group 2022
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for amazon reviews for sentiment analysis

Dataset Summary

One of the most important problems in e-commerce is the correct calculation of the points given to after-sales products. The solution to this problem is to provide greater customer satisfaction for the e-commerce site, product prominence for sellers, and a seamless shopping experience for buyers. Another problem is the correct ordering of the comments given to the products. The prominence of misleading… See the full description on the dataset page: https://huggingface.co/datasets/hugginglearners/amazon-reviews-sentiment-analysis.
d
Product Review Datasets for User Sentiment Analysis
datarade.ai
Updated Sep 28, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2018). Product Review Datasets for User Sentiment Analysis [Dataset]. https://datarade.ai/data-products/product-review-datasets-for-user-sentiment-analysis-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Sep 28, 2018
Dataset authored and provided by
Oxylabs
Area covered
Sudan, Argentina, Egypt, Antigua and Barbuda, Italy, Libya, Barbados, South Africa, Canada, Hong Kong
Description
Product Review Datasets: Uncover user sentiment

Harness the power of Product Review Datasets to understand user sentiment and insights deeply. These datasets are designed to elevate your brand and product feature analysis, help you evaluate your competitive stance, and assess investment risks.

Data sources:

Trustpilot: datasets encompassing general consumer reviews and ratings across various businesses, products, and services.

Leave the data collection challenges to us and dive straight into market insights with clean, structured, and actionable data, including:

Product name;

Product category;

Number of ratings;

Ratings average;

Review title;

Review body;

Choose from multiple data delivery options to suit your needs:

Receive data in easy-to-read formats like spreadsheets or structured JSON files.

Select your preferred data storage solutions, including SFTP, Webhooks, Google Cloud Storage, AWS S3, and Microsoft Azure Storage.

Tailor data delivery frequencies, whether on-demand or per your agreed schedule.

Why choose Oxylabs?

Fresh and accurate data: Access organized, structured, and comprehensive data collected by our leading web scraping professionals.

Time and resource savings: Concentrate on your core business goals while we efficiently handle the data extraction process at an affordable cost.

Adaptable solutions: Share your specific data requirements, and we'll craft a customized data collection approach to meet your objectives.

Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA standards.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Join the ranks of satisfied customers who appreciate our meticulous attention to detail and personalized support. Experience the power of Product Review Datasets today to uncover valuable insights and enhance decision-making.
Amazon Product Reviews Dataset
kaggle.com
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gözde Kızılkaya Atik (2025). Amazon Product Reviews Dataset [Dataset]. https://www.kaggle.com/datasets/gzdekzlkaya/amazon-product-reviews-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 16, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gözde Kızılkaya Atik
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
🛍️ Dataset Overview

This dataset contains over 4,900 customer reviews from Amazon, including text-based feedback, star ratings, and helpfulness votes.

It can be used for:

📊 Sentiment Analysis

🧠 Text Classification (Positive/Negative)

🔍 Review Score Prediction (based on reviewText)

🤖 Building Recommendation Systems

🧮 Helpfulness Scoring Models

📌 Key Columns

reviewText: Full written review

overall: Star rating (1 to 5)

summary: Short summary of the review

helpful_yes: Number of users who found the review helpful

total_vote: Total votes on helpfulness

day_diff: Days since the review was written

This dataset is suitable for natural language processing (NLP) and supervised learning tasks.

📎 Note

This is a publicly available dataset for educational and research use.
o
Consumer Product Reviews and Sentiment Analysis
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Consumer Product Reviews and Sentiment Analysis [Dataset]. https://www.opendatabay.com/data/consumer/2d257b09-10c2-4d4a-b01e-bc2c00f0b679
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Reviews & Ratings
Description
This dataset contains customer reviews for various products, including details about product categories, brands, user ratings, and sentiment analysis. It is designed for applications such as sentiment classification, product recommendation systems, and the analysis of consumer behaviour. The dataset allows users to identify trends in customer satisfaction and gain insights into consumer preferences based on brand and category.

Columns

item_category: The category identifier of the product under review.

item_id: The unique identifier for a specific product.

brand: The brand identifier associated with the product.

user_id: The unique identifier of the customer who submitted the review.

date: The date when the review was posted, typically in YYYY-MM-DD format.

comment: The textual content of the review as provided by the user.

rating: The numerical rating given by the user, often on a scale (e.g., 1 to 5).

tonality: The sentiment classification of the review, indicating whether it is positive or negative.

Distribution

The data file is typically available in CSV format. The dataset comprises approximately 14,221 records. Analysis of the sentiment distribution within the dataset indicates that 84% of reviews are classified as positive, while 16% are classified as negative.

Usage

This dataset is ideally suited for several applications, including: * Performing sentiment analysis on product reviews to gauge public opinion. * Identifying patterns and trends in customer satisfaction over time. * Developing and improving product recommendation systems. * Understanding consumer preferences based on specific brands and product categories.

Coverage

The dataset covers a time range from 30th July 2009 to 25th July 2017. The data has a global regional scope. No specific demographic scope is detailed within the available information.

License

CCO

Who Can Use It

This dataset is valuable for a range of users and their specific applications: * Data Scientists and Machine Learning Engineers: To train and evaluate sentiment analysis models, develop natural language processing (NLP) applications, and build recommendation engines. * Marketing Professionals: To understand customer feedback, identify popular products, and assess the impact of marketing campaigns on brand perception. * Businesses and Product Managers: To inform product development strategies, monitor customer satisfaction, and identify areas for improvement based on consumer feedback. * Researchers: For academic studies on consumer behaviour, sentiment analysis techniques, and market trends.

Dataset Name Suggestions

Consumer Product Reviews and Sentiment Analysis

Customer Feedback and Ratings

Product Review Tonality Dataset

E-commerce Customer Insights

Global Product Review Data

Attributes

Original Data Source: 🏬🛍️😀 Consumer Sentiments and Ratings
d
Review Dataset [Cross-Industry] – Public consumer feedback for sentiment and...
datarade.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WiserBrand.com, Review Dataset [Cross-Industry] – Public consumer feedback for sentiment and experience mapping [Dataset]. https://datarade.ai/data-products/review-dataset-cross-industry-public-consumer-feedback-fo-wiserbrand-com
Explore at:
.json, .csv, .xls, .txtAvailable download formats
Dataset provided by
WiserBrand.com
Area covered
El Salvador, San Marino, Austria, Finland, Gibraltar, Germany, Malta, Denmark, Portugal, Ireland
Description
"This dataset includes consumer-submitted reviews from over 160 industries, covering both product- and service-based businesses. It’s built to support CX, AI, and analytics teams seeking structured insight into what real customers say, feel, and expect — across sectors like finance, healthcare, travel, telecom, retail, and more.

Each review includes:

Authentic customer reviews (text, rating, pros and cons)

Labeled sentiment and tone (positive, neutral, negative)

Service context across industries: purchase, delivery, support, return, usage

Industry and company filters (fully customizable per buyer request)

Optional metadata: platform, review length, timestamp, geo-location

The list may vary based on the industry and can be customized as per your request.

Use this dataset to:

Track public perception trends across specific brands or verticals

Segment sentiment insights by industry, region, or company

Power NLP pipelines that require diverse tone, emotion, and domain specificity

Build dashboards or LLM prompts grounded in real user language

Train review summarization, classification, or escalation engines

This dataset offers flexibility for custom delivery-by industry, domain, or company, making it ideal for teams needing scalable consumer voice data tailored to specific strategic goals."
R
Beauty Product Review
dataverse.telkomuniversity.ac.id
tsv
Updated Mar 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Root (2022). Beauty Product Review [Dataset]. http://doi.org/10.34820/FK2/NAZYE1
Explore at:
tsv(1611354)Available download formats
Unique identifier
https://doi.org/10.34820/FK2/NAZYE1
Dataset updated
Mar 6, 2022
Dataset provided by
Root
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains the Review of Beauty Product in the Bahasa Indonesia text representation. Each text in the dataset has been categorized into Price, Packaging, Product, and Aroma. Also, each category has been classified into Positive, Neutral, and Negative.
c
Amazon UK shoes products reviews dataset
crawlfeeds.com
csv, zip
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Amazon UK shoes products reviews dataset [Dataset]. https://crawlfeeds.com/datasets/amazon-uk-shoes-products-reviews-dataset
Explore at:
csv, zipAvailable download formats
Dataset updated
Jun 27, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Unlock detailed insights with our Amazon UK Shoes Products Reviews Dataset, an invaluable resource for businesses, researchers, and data analysts. This dataset features comprehensive information, including product names, review texts, star ratings, and customer feedback for a wide range of shoe products available on Amazon UK.

Key Features:

Extensive Coverage: Includes detailed reviews and ratings for various shoe products, helping you analyze customer preferences and trends.

Structured Data: Available in easily accessible formats like product review dataset CSV, making it perfect for integration into your analytical workflows.

Actionable Insights: Leverage this dataset for customer sentiment analysis, product optimization, and competitive benchmarking.

Why Choose the Amazon UK Shoes Products Reviews Dataset?

Whether you're delving into customer behavior, conducting market research, or improving product offerings, this dataset empowers you to make informed decisions. By working with a dataset enriched with real-world feedback, you can:

Understand customer preferences: Dive into detailed reviews to uncover patterns in consumer likes and dislikes.

Enhance product offerings: Identify gaps and opportunities in the market to better meet customer demands.

Boost competitive analysis: Compare customer feedback across different brands and products.

Additional Datasets Available

Explore related datasets like the Amazon product review dataset, offering insights across various categories and regions. For specific needs, our curated product reviews dataset is tailored to help you gain a granular understanding of niche markets.
Amazon Product Reviews
kaggle.com
Updated Nov 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Amazon Product Reviews [Dataset]. https://www.kaggle.com/datasets/thedevastator/amazon-product-reviews/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 26, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Amazon Product Reviews

18 Years of Customer Ratings and Experiences

By Huggingface Hub [source]

About this dataset

The Amazon Reviews Polarity Dataset discloses eighteen years of customers' ratings and reviews from Amazon.com, offering an unparalleled trove of insight and knowledge. Drawing from the immense pool of over 35 million customer reviews, this dataset presents a broad spectrum of customer opinions on products they have bought or used. This invaluable data is a gold mine for improving products and services as it contains comprehensive information regarding customers' experiences with a product including ratings, titles, and plaintext content. At the same time, this dataset contains both customer-specific data along with product information which encourages deep analytics that could lead to great advances in providing tailored solutions for customers. Has your product been favored by the majority? Are there any aspects that need extra care? Use Amazon Reviews Polarity to gain deeper insights into what your customers want - explore now!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Analyze customer ratings to identify trends: Take a look at how many customers have rated the same product or service with the same score (e.g., 4 stars). You can use this information to identify what customers like or don’t like about it by examining common sentiment throughout the reviews. Identifying these patterns can help you make decisions on which features of your products or services to emphasize in order to boost sales and satisfaction rates.

2 Review content analysis: Analyzing review content is one of the best ways to gauge customer sentiment toward specific features or aspects of a product/service. Using natural language processing tools such as Word2Vec, Latent Dirichlet Allocation (LDA), or even simple keyword search algorithms can quickly reveal general topics that are discussed in relation to your product/service across multiple reviews - allowing you quickly pinpoint areas that may need improvement for particular items within your lines of business.

3 Track associated scores over time: By tracking customer ratings overtime, you may be able to better understand when there has been an issue with something specific related to your product/service - such as negative response toward a feature that was introduced but didn’t seem popular among customers and was removed shortly after introduction.. This can save time and money by identifying issues before they become widespread concerns with larger sets of consumers who invest their money in using your company's item(s).

4 Visualize sentiment data over time graphs : Utilizing visualizations such as bar graphs can help identify trends across different categories quicker than raw numbers alone; combining both numeric values along with color differences associated between different scores allows you spot anomalies easier - allowing faster resolution times when trying figure out why certain spikes occurred where other stayed stable (or vice-versa) when comparing similar data points through time-series based visualization models

Research Ideas

Developing a customer sentiment analysis system that can be used to quickly analyze the sentiment of reviews and identify any potential areas of improvement.

Building a product recommendation service that takes into account the ratings and reviews of customers when recommending similar products they may be interested in purchasing.

Training a machine learning model to accurately predict customers’ ratings on new products they have not yet tried and leverage this for further product development optimization initiatives

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:-------------------------------------------------------------------| | label | The sentiment of the review, either positive or negative. (String) | | title | The title of the review. (String) ...
o
Emotion Annotated Indonesian Reviews
opendatabay.com
.undefined
Updated Jul 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Emotion Annotated Indonesian Reviews [Dataset]. https://www.opendatabay.com/data/dataset/20c7c8f5-43c2-455a-9926-d58fab96d9c3
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Reviews & Ratings
Description
This dataset is a collection of Indonesian product review data, meticulously annotated with emotion and sentiment labels. It was gathered from Tokopedia, a prominent e-commerce platform in Indonesia, encompassing product reviews from 29 distinct product categories. Each review is assigned a single emotion label, such as love, happiness, anger, fear, or sadness. The emotion annotation process was conducted by a group of annotators who followed specific criteria established by an expert in clinical psychology. The dataset also includes other valuable attributes related to the product reviews, including location, price, overall rating, number sold, total reviews, and customer rating, designed to facilitate further research. The data is considered clean.

Columns

While a specific original data sample is not available to list all columns in detail, based on the dataset description, the following attributes are included: * Product Review Text: The original review content. * Emotion Label: Categorical label indicating the primary emotion (e.g., love, happiness, anger, fear, sadness). * Sentiment Label: Overall sentiment associated with the review. * Location: Geographic information related to the review or product. * Price: The price of the product reviewed. * Overall Rating: The product's general rating. * Number Sold: The quantity of the product sold. * Total Review: The total number of reviews for the product. * Customer Rating: The rating provided by the customer for the specific product.

Distribution

The dataset is typically provided in a CSV file format. It contains product reviews from 29 different product categories. Specific figures for the total number of rows or records are not detailed in the provided information.

Usage

This dataset is ideally suited for various applications and research endeavours, including: * Learning: Excellent for educational purposes in data science, natural language processing, and text analytics. * Research: Supports in-depth studies in natural language processing (NLP), text processing, consumer emotion analysis, text mining, and sentiment analysis. * Model Training: Can be used for training machine learning models, including large language models (LLMs), for tasks such as emotion classification, sentiment analysis, and text understanding in Indonesian. * Application Development: Useful for developing applications that require understanding consumer feedback and emotions from product reviews.

Coverage

The dataset's geographic scope is focused on Indonesia, specifically product reviews from an Indonesian e-commerce platform, Tokopedia, written in the Indonesian language. The listed date for the dataset on the platform is 08/06/2025; however, the actual time range during which the data was collected for the reviews themselves is not specified in the sources. There are no specific notes on data availability for certain demographic groups or years beyond general product review consumers in Indonesia.

License

CCO

Who Can Use It

This dataset is beneficial for a wide range of users, including: * Academics and Researchers: For exploring topics in NLP, sentiment analysis, and consumer behaviour. * Students: As a practical resource for learning about text data processing, emotion classification, and data analysis. * Data Scientists and Machine Learning Engineers: For building and fine-tuning models capable of understanding and classifying emotions and sentiments from textual data. * Businesses: Potentially for market research and understanding customer feedback trends, particularly within the Indonesian e-commerce sector.

Dataset Name Suggestions

Indonesian Product Review Emotions

Tokopedia Emotion & Sentiment Dataset

Indonesian E-commerce Review Sentiment

PRDECT-ID: Indonesian Consumer Emotion Data

Emotion Annotated Indonesian Reviews

Attributes

Original Data Source: PRDECT-ID: Indonesian Emotion Classification
h
Consumer_goods_reviews
huggingface.co
Updated Jan 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kevin kibebe (2025). Consumer_goods_reviews [Dataset]. https://huggingface.co/datasets/kevykibbz/Consumer_goods_reviews
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 22, 2025
Authors
kevin kibebe
Description
Amazon Product Review Dataset (2023)

Dataset Overview

The Amazon Product Review Dataset (2023) contains product reviews from Amazon customers. The dataset includes product information, review details, and metadata about the customers who left the reviews. This dataset can be used for various natural language processing (NLP) tasks, including sentiment analysis, review prediction, recommendation systems, and more.

Dataset Name: Amazon Product Review Dataset (2023) Dataset… See the full description on the dataset page: https://huggingface.co/datasets/kevykibbz/Consumer_goods_reviews.
E-Commerce Product Reviews - Dataset for ML
kaggle.com
zip
Updated Dec 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Furkan Gözükara (2021). E-Commerce Product Reviews - Dataset for ML [Dataset]. https://www.kaggle.com/furkangozukara/turkish-product-reviews
Explore at:
zip(580369522 bytes)Available download formats
Dataset updated
Dec 16, 2021
Authors
Furkan Gözükara
Description
-> If you use Turkish_Product_Reviews_by_Gozukara_and_Ozel_2016 dataset please cite: https://dergipark.org.tr/en/pub/cukurovaummfd/issue/28708/310341

@research article { cukurovaummfd310341, journal = {Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi}, issn = {1019-1011}, eissn = {2564-7520}, address = {Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi Yayın Kurulu Başkanlığı 01330 ADANA}, publisher = {Cukurova University}, year = {2016}, volume = {31}, pages = {464 - 482}, doi = {10.21605/cukurovaummfd.310341}, title = {Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme}, key = {cite}, author = {Gözükara, Furkan and Özel, Selma Ayşe} }

https://doi.org/10.21605/cukurovaummfd.310341

-> Turkish_Product_Reviews_by_Gozukara_and_Ozel_2016 dataset is composed as below: ->-> Top 50 E-commerce sites in Turkey are crawled and their comments are extracted. Then randomly 2000 comments selected and manually labelled by a field expert. ->-> After manual labeling the selected comments is done, 600 negative and 600 positive comments are left. ->-> This dataset contains these comments.

-> English_Movie_Reviews_by_Pang_and_Lee_2004 ->-> Pang, B., Lee, L., 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, In Proceedings of the 42nd annual meeting on Association for Computational Linguistics (p. 271). ->-> Source: https://www.cs.cornell.edu/people/pabo/movie-review-data/ | polarity dataset v2.0 - review_polarity.tar.gz

-> English_Movie_Reviews_Sentences_by_Pang_and_Lee_2005 ->-> Pang, B., Lee, L., 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 115-124), Association for Computational Linguistics ->-> Source: https://www.cs.cornell.edu/people/pabo/movie-review-data/ | sentence polarity dataset v1.0 - rt-polaritydata.tar.gz

-> English_Product_Reviews_by_Blitzer_et_al_2007 ->-> Article of the dataset: Blitzer, J., Dredze, M., Pereira, F., 2007. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification, In ACL (Vol. 7, pp. 440-447). ->-> Source: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/ | processed_acl.tar.gz

-> Turkish_Movie_Reviews_by_Demirtas_and_Pechenizkiy_2013 ->-> Demirtas, E., Pechenizkiy, M., 2013. Cross-lingual polarity detection with machine translation, In Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining (p. 9). ACM. ->-> http://www.win.tue.nl/~mpechen/projects/smm/#Datasets Turkish_Movie_Sentiment.zip

-> The dataset files are provided as used in the article. -> Weka files are generated with Raw Frequency of terms rather than used Weighting Schemes

-> The folder Cross_Validation contains 10-fold cross-validation each fold files. -> Inside Cross_Validation folder, each turn of the cross-validation is named as test_X where X is the turn number -> Inside test_X folder * Test_Set_Negative_RAW: Contains raw negative class Test data of that cross-validation turn * Test_Set_Negative_Processed: Contains pre-processed negative class Test data of that cross-validation turn * Test_Set_Positive_RAW: Contains raw positive class Test data of that cross-validation turn * Test_Set_Positive_Processed: Contains pre-processed positive class Test data of that cross-validation turn * Train_Set_Negative_RAW: Contains raw negative class Train data of that cross-validation turn * Train_Set_Negative_Processed: Contains pre-processed negative class Train data of that cross-validation turn * Train_Set_Positive_RAW: Contains raw positive class Train data of that cross-validation turn * Train_Set_Positive_Processed: Contains pre-processed positive class Train data of that cross-validation turn * Train_Set_For_Weka: Contains processed Train set formatted for Weka * Test_Set_For_Weka: Contains processed Test set formatted for Weka

-> The folder Entire_Dataset contains files for Entire Dataset * Negative_Processed: Contains all negative comments processed data * Positive_Processed: Contains all positive comments processed data * Negative_RAW: Contains all negative comments RAW data * Positive_RAW: Contains all positive comments RAW data * Entire_Dataset_WEKA: Contains all documents processed data in WEKA format
d
Grepsr | Sentiment Analysis of Facebook/Twitter/Instagram posts, News,...
datarade.ai
Updated Mar 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grepsr (2023). Grepsr | Sentiment Analysis of Facebook/Twitter/Instagram posts, News, Product Reviews | Custom and On-demand Sentiment Analysis [Dataset]. https://datarade.ai/data-products/sentiment-analysis-of-facebook-twitter-instagram-posts-news-grepsr
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Mar 20, 2023
Dataset authored and provided by
Grepsr
Area covered
Israel, Bahrain, Gabon, Comoros, Sint Eustatius and Saba, Kenya, Mayotte, Senegal, Saint Vincent and the Grenadines, Colombia
Description
Usecase/Applications possible with the data:

Customer feedback analysis: Analyzing customer feedback can be helpful for businesses to keep customers happy, stay loyal to the brand, and identify any areas to improve.

Social media monitoring: With sentiment analysis, companies can monitor what's being said about them on social media and use that to figure out how people feel about their products and services and track any new trends.

Market research: Sentiment analysis can be used to analyze market trends and consumer preferences, which can help companies make informed business decisions and develop effective marketing strategies.

Financial analysis: You can use sentiment analysis to determine what people say about the stock market through news and social media, which can help you make investing decisions.

For e-commerce (amazon/Bestbuy/home depot and much more) following data fields can be included: Title Price Vendor Name Ratings Reviews Brand ASIN URL Sentiment analysis for each review And other fields, as per request
P
EPRSTMT Dataset
paperswithcode.com
library.toponeai.link
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liang Xu; Xiaojing Lu; Chenyang Yuan; Xuanwei Zhang; Huilin Xu; Hu Yuan; Guoao Wei; Xiang Pan; Xin Tian; Libo Qin; Hu Hai (2025). EPRSTMT Dataset [Dataset]. https://paperswithcode.com/dataset/eprstmt
Explore at:
Dataset updated
Jan 7, 2025
Authors
Liang Xu; Xiaojing Lu; Chenyang Yuan; Xuanwei Zhang; Huilin Xu; Hu Yuan; Guoao Wei; Xiang Pan; Xin Tian; Libo Qin; Hu Hai
Description
The EPRSTMT dataset, also known as EPR-sentiment, is a binary sentiment analysis dataset based on product reviews on an e-commerce platform. Each sample in the dataset is labeled as either Positive or Negative. It was collected by the ICIP Lab of Beijing Normal University and has been re-organized to make it suitable for sentiment analysis tasks.
A
‘Product Reviews and Ratings (Sentiment Analysis)’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Product Reviews and Ratings (Sentiment Analysis)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-product-reviews-and-ratings-sentiment-analysis-fb82/latest
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Product Reviews and Ratings (Sentiment Analysis)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mafaisal007/product-reviews-and-ratings-sentiment-analysis on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context

This dataset is from a toy store in Europe that contains customer reviews about a particular prodcut it is to be used for text mining and sentiment anlaysis.

--- Original source retains full ownership of the source dataset ---
o
Amazon Food Product Reviews & Ratings
opendatabay.com
.undefined
Updated Jun 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vdt. Data (2025). Amazon Food Product Reviews & Ratings [Dataset]. https://www.opendatabay.com/data/consumer/fd13df3c-b1af-410c-8596-7e11961381ed
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 18, 2025
Dataset authored and provided by
Vdt. Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
E-commerce & Online Transactions
Description
The Amazon Food Products Dataset is a large-scale collection of product listings, reviews, and metadata sourced from Amazon. This dataset is valuable for understanding consumer behaviour, analyzing product trends, and training machine learning models for recommendation systems and sentiment analysis. It includes various categories, providing insights into customer preferences, product ratings, and review sentiments.

Dataset Features

Each record in the dataset contains the following key fields:

ProductId: Unique identifier for each product.

UserId: Unique identifier for the reviewer.

ProfileName: Display the name of the reviewer.

HelpfulnessNumerator: Number of users who found the review helpful.

HelpfulnessDenominator: Total number of users who rated the review’s helpfulness.

Score: Product rating (1 to 5 stars).

Time: Unix timestamp of the review.

Summary: Short summary of the review.

Text: Full text of the review.

Distribution

Data Volume: 568454 rows and 9 columns.

Format: CSV.

Structure: Tabular format with numerical, categorical, and text data.

Usage

This dataset is ideal for a variety of applications:

Sentiment Analysis: Training NLP models to predict sentiment based on reviews.

Product Recommendation Systems: Building collaborative filtering models.

Trend Analysis: Identifying popular products and customer preferences.

Fake Review Detection: Detecting anomalous patterns in review behaviours.

Coverage

Geographic Coverage: Global.

Time Range: Multi-year dataset (over 10 years of reviews).

Demographics: General Amazon shoppers; includes various age groups and customer segments.

License

CC0

Who Can Use It

Data Scientists: For building machine learning models.

Researchers: For academic analysis of customer behaviour.

Businesses: For market insights and customer sentiment analysis.
Z
Data from: Synthetic Product Desirability Datasets for Sentiment Analysis...
data.niaid.nih.gov
paperswithcode.com
+2more
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hastings, John (2024). Synthetic Product Desirability Datasets for Sentiment Analysis Testing [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14188455
Explore at:
Dataset updated
Nov 21, 2024
Dataset provided by
Doty, Joseph
Myers, Zachary
Thompson, Warren
Weitl-Harms, Sherri
Hastings, John
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview:This collection contains three synthetic datasets produced by gpt-4o-mini for sentiment analysis and PDT (Product Desirability Toolkit) testing. Each dataset contains 1000 hypothetical software product reviews with the aim to produce a diversity of sentiment and text. The datasets were created as part of the research described in:

Hastings, J.D., Weitl-Harms, S., Doty, J., Myers, Z. L., and Thompson, W., “Utilizing Large Language Models to Synthesize Product Desirability Datasets,” in Proceedings of the 2024 IEEE International Conferenceon Big Data (BigData-24), Workshop on Large Language and Foundation Models (WLLFM-24), Dec. 2024.https://arxiv.org/abs/2411.13485.

Briefly, each row in the datasets was produced as follows:1) Word+Review: The LLM selected a word and synthesized a review that would align with a random target sentiment.2) Review+Word: The LLM produced a review to align with the target sentiment score, and then selected a word appropriate for the review.3) Supply-Word: A word was supplied to the LLM which was then scored, and a review was produced to align with that score.

For sentiment analysis and PDT testing, the two columns of main interest across the datasets are likely 'Selected Word' and 'Hypothetical Review'.

License:This data is licensed under the CC Attribution 4.0 international license, and may be taken and used freely with credit given. Cite as:

Hastings, J., Weitl-Harms, S., Doty, J., Myers, Z., & Thompson, W. (2024). Synthetic Product Desirability Datasets for Sentiment Analysis Testing (1.0.0). Zenodo. https://doi.org/10.5281/zenodo.14188456
c
Unlocking User Sentiment: The App Store Reviews Dataset
crawlfeeds.com
json, zip
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Unlocking User Sentiment: The App Store Reviews Dataset [Dataset]. https://crawlfeeds.com/datasets/app-store-reviews-dataset
Explore at:
json, zipAvailable download formats
Dataset updated
Jun 20, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
This dataset offers a focused and invaluable window into user perceptions and experiences with applications listed on the Apple App Store. It is a vital resource for app developers, product managers, market analysts, and anyone seeking to understand the direct voice of the customer in the dynamic mobile app ecosystem.

Dataset Specifications:

Investment: $45.0

Status: Published and immediately available.

Category: Ratings and Reviews Data

Format: Compressed ZIP archive containing JSON files, ensuring easy integration into your analytical tools and platforms.

Volume: Comprises 10,000 unique app reviews, providing a robust sample for qualitative and quantitative analysis of user feedback.

Timeliness: Last crawled: (This field is blank in your provided info, which means its recency is currently unknown. If this were a real product, specifying this would be critical for its value proposition.)

Richness of Detail (11 Comprehensive Fields):

Each record in this dataset provides a detailed breakdown of a single App Store review, enabling multi-dimensional analysis:

Review Content:

review: The full text of the user's written feedback, crucial for Natural Language Processing (NLP) to extract themes, sentiment, and common keywords.

title: The title given to the review by the user, often summarizing their main point.

isEdited: A boolean flag indicating whether the review has been edited by the user since its initial submission. This can be important for tracking evolving sentiment or understanding user behavior.

Reviewer & Rating Information:

username: The public username of the reviewer, allowing for analysis of engagement patterns from specific users (though not personally identifiable).

rating: The star rating (typically 1-5) given by the user, providing a quantifiable measure of satisfaction.

App & Origin Context:

app_name: The name of the application being reviewed.

app_id: A unique identifier for the application within the App Store, enabling direct linking to app details or other datasets.

country: The country of the App Store storefront where the review was left, allowing for geographic segmentation of feedback.

Metadata & Timestamps:

_id: A unique identifier for the specific review record in the dataset.

crawled_at: The timestamp indicating when this particular review record was collected by the data provider (Crawl Feeds).

date: The original date the review was posted by the user on the App Store.

Expanded Use Cases & Analytical Applications:

This dataset is a goldmine for understanding what users truly think and feel about mobile applications. Here's how it can be leveraged:

Product Development & Improvement:

Bug Detection & Prioritization: Analyze negative review text to identify recurring technical issues, crashes, or bugs, allowing developers to prioritize fixes based on user impact.

Feature Requests & Roadmap Prioritization: Extract feature suggestions from positive and neutral review text to inform future product roadmap decisions and develop features users actively desire.

User Experience (UX) Enhancement: Understand pain points related to app design, navigation, and overall usability by analyzing common complaints in the review field.

Version Impact Analysis: If integrated with app version data, track changes in rating and sentiment after new app updates to assess the effectiveness of bug fixes or new features.

Market Research & Competitive Intelligence:

Competitor Benchmarking: Analyze reviews of competitor apps (if included or combined with similar datasets) to identify their strengths, weaknesses, and user expectations within a specific app category.

Market Gap Identification: Discover unmet user needs or features that users desire but are not adequately provided by existing apps.

Niche Opportunities: Identify specific use cases or user segments that are underserved based on recurring feedback.

Marketing & App Store Optimization (ASO):

Sentiment Analysis: Perform sentiment analysis on the review and title fields to gauge overall user satisfaction, pinpoint specific positive and negative aspects, and track sentiment shifts over time.

Keyword Optimization: Identify frequently used keywords and phrases in reviews to optimize app store listings, improving discoverability and search ranking.

Messaging Refinement: Understand how users describe and use the app in their own words, which can inform marketing copy and advertising campaigns.

Reputation Management: Monitor rating trends and identify critical reviews quickly to facilitate timely responses and proactive customer engagement.

Academic & Data Science Research:

Natural Language Processing (NLP): The review and title fields are excellent for training and testing NLP models for sentiment analysis, topic modeling, named entity recognition, and text summarization.

User Behavior Analysis: Study patterns in rating distribution, isEdited status, and date to understand user engagement and feedback cycles.

Cross-Country Comparisons: Analyze country-specific reviews to understand regional differences in app perception, feature preferences, or cultural nuances in feedback.

This App Store Reviews dataset provides a direct, unfiltered conduit to understanding user needs and ultimately driving better app performance and greater user satisfaction. Its structured format and granular detail make it an indispensable asset for data-driven decision-making in the mobile app industry.
d
Review Dataset [Consumer Sentiment] – Annotated feedback to power...
datarade.ai
Updated Mar 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WiserBrand.com (2024). Review Dataset [Consumer Sentiment] – Annotated feedback to power emotion-aware models and CX strategy [Dataset]. https://datarade.ai/data-products/review-dataset-consumer-sentiment-annotated-feedback-to-p-wiserbrand-com
Explore at:
.json, .csv, .xls, .txtAvailable download formats
Dataset updated
Mar 9, 2024
Dataset provided by
WiserBrand.com
Area covered
Luxembourg, United States of America, Andorra, Holy See, Monaco, Ireland, Latvia, Denmark, Croatia, Estonia
Description
"This dataset includes millions of consumer reviews tagged with emotion signals, making it ideal for training AI systems to detect how people feel — not just what they say. Built for sentiment-aware product development, CX strategy, and emotional behavior modeling, it offers deep insight into real consumer experience.

Features include:

-Labeled review sentiment (positive, neutral, negative) -Retail product and service context (e.g., delivery, pricing, quality) -Touchpoint mapping (pre-purchase, usage, return, support) -Optional region, channel, and timestamp data

The list may vary based on the industry and can be customized as per your request.

This dataset enables:

-Training empathetic AI agents and emotion-detecting LLMs -Mapping customer sentiment across retail segments or journey stages -dentifying emotional drivers behind repeat purchases and churn -Benchmarking brand sentiment versus competitors -Segmenting user feedback for trend and CX impact analysis

Available in clean, structured formats and optimized for large-scale NLP, this dataset is indispensable for data science, product, and CX teams focused on emotional intelligence and experience-driven growth."
Amazon Product Reviews for NLP
kaggle.com
Updated Apr 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeshan Santhush (2022). Amazon Product Reviews for NLP [Dataset]. https://www.kaggle.com/datasets/yeshmesh/inconsistent-and-consistent-amazon-reviews
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 13, 2022
Dataset provided by
Kaggle
Authors
Yeshan Santhush
Description
The dataset contains reviews which were web scraped with the Python library BeautifulSoup, where the reviews were webscraped from Amazon products.

The columns of the dataset:

reviewId

reviewDate

mainDepartment

subDepartment

productName

reviewTitle

reviewStar

reviewText

inconsistentStatus

How did I label my dataset, or rather how did I label the reviews as inconsistent (1) or consistent (0) ?

To begin, the VADER Sentiment tool was utilized to extract the compound sentiment value for each text review. Subsequently, the polarity of the review's text was assigned by labeling it as 'Positive' if the review's compound value exceeded 0.05, 'Negative' if the compound value was below -0.05, and 'Neutral' otherwise. Once the text polarity had been extracted for all reviews, the star polarity for each review was determined based on the number of stars assigned. Specifically, reviews that contained a star rating of 1 or 2 were labeled as 'Negative', reviews with a rating of 3 were labeled as 'Neutral', and those with 4 or 5 stars were labeled as 'Positive'.

In order to identify inconsistencies or mismatches within a review, a comparison was made between the review's text polarity and star polarity. Reviews that had matching polarities were labeled as 'Consistent' (represented by 0 in binary). Conversely, if there was a mismatch between the two polarities, the review was labeled as 'Inconsistent' (represented by 1 in binary). This binary value was then recorded in the 'inconsistentStatus' column.

FYI : You could delete off the column 'inconsistentStatus' and use your own logic for labelling the rows as consistent or inconsistent.

Facebook

Twitter

Click to copy link

Link copied

Cite

Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504

Datasets for Sentiment Analysis

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.10157504

Dataset updated

Dec 10, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------

The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

File name: sts_gold_tweet.csv

----------- Amazon Sales Dataset ----------------

This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

Features:

product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product

License: CC BY-NC-SA 4.0

File name: amazon.csv

----------- Rotten Tomatoes Reviews Dataset ----------------

This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

File name: data_rt.csv

----------- Preprocessed Dataset Sentiment Analysis ----------------

Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.

The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

DOI: 10.34740/kaggle/dsv/3877817

Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

This dataset was used in the experimental phase of my research.

File name: EcoPreprocessed.csv

----------- Amazon Earphones Reviews ----------------

This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

License: U.S. Government Works

Source: www.amazon.in

File name (original): AllProductReviews.csv (contains 14337 reviews)

File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

----------- Amazon Musical Instruments Reviews ----------------

This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

Source: http://jmcauley.ucsd.edu/data/amazon/

File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

Clear search

Close search

Google apps

Main menu

Datasets for Sentiment Analysis

amazon-reviews-sentiment-analysis

Product Review Datasets for User Sentiment Analysis

Amazon Product Reviews Dataset

🛍️ Dataset Overview

📌 Key Columns

📎 Note

Consumer Product Reviews and Sentiment Analysis

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Review Dataset [Cross-Industry] – Public consumer feedback for sentiment and...

Beauty Product Review

Amazon UK shoes products reviews dataset

Key Features:

Why Choose the Amazon UK Shoes Products Reviews Dataset?

Additional Datasets Available

Amazon Product Reviews

Amazon Product Reviews

18 Years of Customer Ratings and Experiences

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Emotion Annotated Indonesian Reviews

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Consumer_goods_reviews

E-Commerce Product Reviews - Dataset for ML

Grepsr | Sentiment Analysis of Facebook/Twitter/Instagram posts, News,...

EPRSTMT Dataset

‘Product Reviews and Ratings (Sentiment Analysis)’ analyzed by Analyst-2

Context

Amazon Food Product Reviews & Ratings

Dataset Features

Distribution

Usage

Coverage

License

Who Can Use It

Data from: Synthetic Product Desirability Datasets for Sentiment Analysis...

Unlocking User Sentiment: The App Store Reviews Dataset

Review Dataset [Consumer Sentiment] – Annotated feedback to power...

Amazon Product Reviews for NLP

Datasets for Sentiment Analysis