100+ datasets found
  1. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  2. amazon-reviews-sentiment-analysis

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fastai X Hugging Face Group 2022, amazon-reviews-sentiment-analysis [Dataset]. https://huggingface.co/datasets/hugginglearners/amazon-reviews-sentiment-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    fastai X Hugging Face Group 2022
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for amazon reviews for sentiment analysis

      Dataset Summary
    

    One of the most important problems in e-commerce is the correct calculation of the points given to after-sales products. The solution to this problem is to provide greater customer satisfaction for the e-commerce site, product prominence for sellers, and a seamless shopping experience for buyers. Another problem is the correct ordering of the comments given to the products. The prominence of misleading… See the full description on the dataset page: https://huggingface.co/datasets/hugginglearners/amazon-reviews-sentiment-analysis.

  3. d

    Product Review Datasets for User Sentiment Analysis

    • datarade.ai
    Updated Sep 28, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2018). Product Review Datasets for User Sentiment Analysis [Dataset]. https://datarade.ai/data-products/product-review-datasets-for-user-sentiment-analysis-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Sep 28, 2018
    Dataset authored and provided by
    Oxylabs
    Area covered
    Sudan, Argentina, Egypt, Antigua and Barbuda, Italy, Libya, Barbados, South Africa, Canada, Hong Kong
    Description

    Product Review Datasets: Uncover user sentiment

    Harness the power of Product Review Datasets to understand user sentiment and insights deeply. These datasets are designed to elevate your brand and product feature analysis, help you evaluate your competitive stance, and assess investment risks.

    Data sources:

    • Trustpilot: datasets encompassing general consumer reviews and ratings across various businesses, products, and services.

    Leave the data collection challenges to us and dive straight into market insights with clean, structured, and actionable data, including:

    • Product name;
    • Product category;
    • Number of ratings;
    • Ratings average;
    • Review title;
    • Review body;

    Choose from multiple data delivery options to suit your needs:

    1. Receive data in easy-to-read formats like spreadsheets or structured JSON files.
    2. Select your preferred data storage solutions, including SFTP, Webhooks, Google Cloud Storage, AWS S3, and Microsoft Azure Storage.
    3. Tailor data delivery frequencies, whether on-demand or per your agreed schedule.

    Why choose Oxylabs?

    1. Fresh and accurate data: Access organized, structured, and comprehensive data collected by our leading web scraping professionals.

    2. Time and resource savings: Concentrate on your core business goals while we efficiently handle the data extraction process at an affordable cost.

    3. Adaptable solutions: Share your specific data requirements, and we'll craft a customized data collection approach to meet your objectives.

    4. Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA standards.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Join the ranks of satisfied customers who appreciate our meticulous attention to detail and personalized support. Experience the power of Product Review Datasets today to uncover valuable insights and enhance decision-making.

  4. Amazon Product Reviews Dataset

    • kaggle.com
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gözde Kızılkaya Atik (2025). Amazon Product Reviews Dataset [Dataset]. https://www.kaggle.com/datasets/gzdekzlkaya/amazon-product-reviews-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 16, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gözde Kızılkaya Atik
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    🛍️ Dataset Overview

    This dataset contains over 4,900 customer reviews from Amazon, including text-based feedback, star ratings, and helpfulness votes.

    It can be used for:

    • 📊 Sentiment Analysis
    • 🧠 Text Classification (Positive/Negative)
    • 🔍 Review Score Prediction (based on reviewText)
    • 🤖 Building Recommendation Systems
    • 🧮 Helpfulness Scoring Models

    📌 Key Columns

    • reviewText: Full written review
    • overall: Star rating (1 to 5)
    • summary: Short summary of the review
    • helpful_yes: Number of users who found the review helpful
    • total_vote: Total votes on helpfulness
    • day_diff: Days since the review was written

    This dataset is suitable for natural language processing (NLP) and supervised learning tasks.

    📎 Note

    This is a publicly available dataset for educational and research use.

  5. o

    Consumer Product Reviews and Sentiment Analysis

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Consumer Product Reviews and Sentiment Analysis [Dataset]. https://www.opendatabay.com/data/consumer/2d257b09-10c2-4d4a-b01e-bc2c00f0b679
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Reviews & Ratings
    Description

    This dataset contains customer reviews for various products, including details about product categories, brands, user ratings, and sentiment analysis. It is designed for applications such as sentiment classification, product recommendation systems, and the analysis of consumer behaviour. The dataset allows users to identify trends in customer satisfaction and gain insights into consumer preferences based on brand and category.

    Columns

    • item_category: The category identifier of the product under review.
    • item_id: The unique identifier for a specific product.
    • brand: The brand identifier associated with the product.
    • user_id: The unique identifier of the customer who submitted the review.
    • date: The date when the review was posted, typically in YYYY-MM-DD format.
    • comment: The textual content of the review as provided by the user.
    • rating: The numerical rating given by the user, often on a scale (e.g., 1 to 5).
    • tonality: The sentiment classification of the review, indicating whether it is positive or negative.

    Distribution

    The data file is typically available in CSV format. The dataset comprises approximately 14,221 records. Analysis of the sentiment distribution within the dataset indicates that 84% of reviews are classified as positive, while 16% are classified as negative.

    Usage

    This dataset is ideally suited for several applications, including: * Performing sentiment analysis on product reviews to gauge public opinion. * Identifying patterns and trends in customer satisfaction over time. * Developing and improving product recommendation systems. * Understanding consumer preferences based on specific brands and product categories.

    Coverage

    The dataset covers a time range from 30th July 2009 to 25th July 2017. The data has a global regional scope. No specific demographic scope is detailed within the available information.

    License

    CCO

    Who Can Use It

    This dataset is valuable for a range of users and their specific applications: * Data Scientists and Machine Learning Engineers: To train and evaluate sentiment analysis models, develop natural language processing (NLP) applications, and build recommendation engines. * Marketing Professionals: To understand customer feedback, identify popular products, and assess the impact of marketing campaigns on brand perception. * Businesses and Product Managers: To inform product development strategies, monitor customer satisfaction, and identify areas for improvement based on consumer feedback. * Researchers: For academic studies on consumer behaviour, sentiment analysis techniques, and market trends.

    Dataset Name Suggestions

    • Consumer Product Reviews and Sentiment Analysis
    • Customer Feedback and Ratings
    • Product Review Tonality Dataset
    • E-commerce Customer Insights
    • Global Product Review Data

    Attributes

    Original Data Source: 🏬🛍️😀 Consumer Sentiments and Ratings

  6. d

    Review Dataset [Cross-Industry] – Public consumer feedback for sentiment and...

    • datarade.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WiserBrand.com, Review Dataset [Cross-Industry] – Public consumer feedback for sentiment and experience mapping [Dataset]. https://datarade.ai/data-products/review-dataset-cross-industry-public-consumer-feedback-fo-wiserbrand-com
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset provided by
    WiserBrand.com
    Area covered
    El Salvador, San Marino, Austria, Finland, Gibraltar, Germany, Malta, Denmark, Portugal, Ireland
    Description

    "This dataset includes consumer-submitted reviews from over 160 industries, covering both product- and service-based businesses. It’s built to support CX, AI, and analytics teams seeking structured insight into what real customers say, feel, and expect — across sectors like finance, healthcare, travel, telecom, retail, and more.

    Each review includes:

    • Authentic customer reviews (text, rating, pros and cons)
    • Labeled sentiment and tone (positive, neutral, negative)
    • Service context across industries: purchase, delivery, support, return, usage
    • Industry and company filters (fully customizable per buyer request)
    • Optional metadata: platform, review length, timestamp, geo-location

    The list may vary based on the industry and can be customized as per your request.

    Use this dataset to:

    • Track public perception trends across specific brands or verticals
    • Segment sentiment insights by industry, region, or company
    • Power NLP pipelines that require diverse tone, emotion, and domain specificity
    • Build dashboards or LLM prompts grounded in real user language
    • Train review summarization, classification, or escalation engines

    This dataset offers flexibility for custom delivery-by industry, domain, or company, making it ideal for teams needing scalable consumer voice data tailored to specific strategic goals."

  7. R

    Beauty Product Review

    • dataverse.telkomuniversity.ac.id
    tsv
    Updated Mar 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Root (2022). Beauty Product Review [Dataset]. http://doi.org/10.34820/FK2/NAZYE1
    Explore at:
    tsv(1611354)Available download formats
    Dataset updated
    Mar 6, 2022
    Dataset provided by
    Root
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains the Review of Beauty Product in the Bahasa Indonesia text representation. Each text in the dataset has been categorized into Price, Packaging, Product, and Aroma. Also, each category has been classified into Positive, Neutral, and Negative.

  8. c

    Amazon UK shoes products reviews dataset

    • crawlfeeds.com
    csv, zip
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Amazon UK shoes products reviews dataset [Dataset]. https://crawlfeeds.com/datasets/amazon-uk-shoes-products-reviews-dataset
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Jun 27, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Unlock detailed insights with our Amazon UK Shoes Products Reviews Dataset, an invaluable resource for businesses, researchers, and data analysts. This dataset features comprehensive information, including product names, review texts, star ratings, and customer feedback for a wide range of shoe products available on Amazon UK.

    Key Features:

    • Extensive Coverage: Includes detailed reviews and ratings for various shoe products, helping you analyze customer preferences and trends.

    • Structured Data: Available in easily accessible formats like product review dataset CSV, making it perfect for integration into your analytical workflows.

    • Actionable Insights: Leverage this dataset for customer sentiment analysis, product optimization, and competitive benchmarking.

    Why Choose the Amazon UK Shoes Products Reviews Dataset?

    Whether you're delving into customer behavior, conducting market research, or improving product offerings, this dataset empowers you to make informed decisions. By working with a dataset enriched with real-world feedback, you can:

    • Understand customer preferences: Dive into detailed reviews to uncover patterns in consumer likes and dislikes.

    • Enhance product offerings: Identify gaps and opportunities in the market to better meet customer demands.

    • Boost competitive analysis: Compare customer feedback across different brands and products.

    Additional Datasets Available

    Explore related datasets like the Amazon product review dataset, offering insights across various categories and regions. For specific needs, our curated product reviews dataset is tailored to help you gain a granular understanding of niche markets.

  9. Amazon Product Reviews

    • kaggle.com
    Updated Nov 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Amazon Product Reviews [Dataset]. https://www.kaggle.com/datasets/thedevastator/amazon-product-reviews/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 26, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Amazon Product Reviews

    18 Years of Customer Ratings and Experiences

    By Huggingface Hub [source]

    About this dataset

    The Amazon Reviews Polarity Dataset discloses eighteen years of customers' ratings and reviews from Amazon.com, offering an unparalleled trove of insight and knowledge. Drawing from the immense pool of over 35 million customer reviews, this dataset presents a broad spectrum of customer opinions on products they have bought or used. This invaluable data is a gold mine for improving products and services as it contains comprehensive information regarding customers' experiences with a product including ratings, titles, and plaintext content. At the same time, this dataset contains both customer-specific data along with product information which encourages deep analytics that could lead to great advances in providing tailored solutions for customers. Has your product been favored by the majority? Are there any aspects that need extra care? Use Amazon Reviews Polarity to gain deeper insights into what your customers want - explore now!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    • Analyze customer ratings to identify trends: Take a look at how many customers have rated the same product or service with the same score (e.g., 4 stars). You can use this information to identify what customers like or don’t like about it by examining common sentiment throughout the reviews. Identifying these patterns can help you make decisions on which features of your products or services to emphasize in order to boost sales and satisfaction rates.

    2 Review content analysis: Analyzing review content is one of the best ways to gauge customer sentiment toward specific features or aspects of a product/service. Using natural language processing tools such as Word2Vec, Latent Dirichlet Allocation (LDA), or even simple keyword search algorithms can quickly reveal general topics that are discussed in relation to your product/service across multiple reviews - allowing you quickly pinpoint areas that may need improvement for particular items within your lines of business.

    3 Track associated scores over time: By tracking customer ratings overtime, you may be able to better understand when there has been an issue with something specific related to your product/service - such as negative response toward a feature that was introduced but didn’t seem popular among customers and was removed shortly after introduction.. This can save time and money by identifying issues before they become widespread concerns with larger sets of consumers who invest their money in using your company's item(s).

    4 Visualize sentiment data over time graphs : Utilizing visualizations such as bar graphs can help identify trends across different categories quicker than raw numbers alone; combining both numeric values along with color differences associated between different scores allows you spot anomalies easier - allowing faster resolution times when trying figure out why certain spikes occurred where other stayed stable (or vice-versa) when comparing similar data points through time-series based visualization models

    Research Ideas

    • Developing a customer sentiment analysis system that can be used to quickly analyze the sentiment of reviews and identify any potential areas of improvement.
    • Building a product recommendation service that takes into account the ratings and reviews of customers when recommending similar products they may be interested in purchasing.
    • Training a machine learning model to accurately predict customers’ ratings on new products they have not yet tried and leverage this for further product development optimization initiatives

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:-------------------------------------------------------------------| | label | The sentiment of the review, either positive or negative. (String) | | title | The title of the review. (String) ...

  10. o

    Emotion Annotated Indonesian Reviews

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Emotion Annotated Indonesian Reviews [Dataset]. https://www.opendatabay.com/data/dataset/20c7c8f5-43c2-455a-9926-d58fab96d9c3
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Reviews & Ratings
    Description

    This dataset is a collection of Indonesian product review data, meticulously annotated with emotion and sentiment labels. It was gathered from Tokopedia, a prominent e-commerce platform in Indonesia, encompassing product reviews from 29 distinct product categories. Each review is assigned a single emotion label, such as love, happiness, anger, fear, or sadness. The emotion annotation process was conducted by a group of annotators who followed specific criteria established by an expert in clinical psychology. The dataset also includes other valuable attributes related to the product reviews, including location, price, overall rating, number sold, total reviews, and customer rating, designed to facilitate further research. The data is considered clean.

    Columns

    While a specific original data sample is not available to list all columns in detail, based on the dataset description, the following attributes are included: * Product Review Text: The original review content. * Emotion Label: Categorical label indicating the primary emotion (e.g., love, happiness, anger, fear, sadness). * Sentiment Label: Overall sentiment associated with the review. * Location: Geographic information related to the review or product. * Price: The price of the product reviewed. * Overall Rating: The product's general rating. * Number Sold: The quantity of the product sold. * Total Review: The total number of reviews for the product. * Customer Rating: The rating provided by the customer for the specific product.

    Distribution

    The dataset is typically provided in a CSV file format. It contains product reviews from 29 different product categories. Specific figures for the total number of rows or records are not detailed in the provided information.

    Usage

    This dataset is ideally suited for various applications and research endeavours, including: * Learning: Excellent for educational purposes in data science, natural language processing, and text analytics. * Research: Supports in-depth studies in natural language processing (NLP), text processing, consumer emotion analysis, text mining, and sentiment analysis. * Model Training: Can be used for training machine learning models, including large language models (LLMs), for tasks such as emotion classification, sentiment analysis, and text understanding in Indonesian. * Application Development: Useful for developing applications that require understanding consumer feedback and emotions from product reviews.

    Coverage

    The dataset's geographic scope is focused on Indonesia, specifically product reviews from an Indonesian e-commerce platform, Tokopedia, written in the Indonesian language. The listed date for the dataset on the platform is 08/06/2025; however, the actual time range during which the data was collected for the reviews themselves is not specified in the sources. There are no specific notes on data availability for certain demographic groups or years beyond general product review consumers in Indonesia.

    License

    CCO

    Who Can Use It

    This dataset is beneficial for a wide range of users, including: * Academics and Researchers: For exploring topics in NLP, sentiment analysis, and consumer behaviour. * Students: As a practical resource for learning about text data processing, emotion classification, and data analysis. * Data Scientists and Machine Learning Engineers: For building and fine-tuning models capable of understanding and classifying emotions and sentiments from textual data. * Businesses: Potentially for market research and understanding customer feedback trends, particularly within the Indonesian e-commerce sector.

    Dataset Name Suggestions

    • Indonesian Product Review Emotions
    • Tokopedia Emotion & Sentiment Dataset
    • Indonesian E-commerce Review Sentiment
    • PRDECT-ID: Indonesian Consumer Emotion Data
    • Emotion Annotated Indonesian Reviews

    Attributes

    Original Data Source: PRDECT-ID: Indonesian Emotion Classification

  11. h

    Consumer_goods_reviews

    • huggingface.co
    Updated Jan 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kevin kibebe (2025). Consumer_goods_reviews [Dataset]. https://huggingface.co/datasets/kevykibbz/Consumer_goods_reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 22, 2025
    Authors
    kevin kibebe
    Description

    Amazon Product Review Dataset (2023)

      Dataset Overview
    

    The Amazon Product Review Dataset (2023) contains product reviews from Amazon customers. The dataset includes product information, review details, and metadata about the customers who left the reviews. This dataset can be used for various natural language processing (NLP) tasks, including sentiment analysis, review prediction, recommendation systems, and more.

    Dataset Name: Amazon Product Review Dataset (2023) Dataset… See the full description on the dataset page: https://huggingface.co/datasets/kevykibbz/Consumer_goods_reviews.

  12. E-Commerce Product Reviews - Dataset for ML

    • kaggle.com
    zip
    Updated Dec 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Furkan Gözükara (2021). E-Commerce Product Reviews - Dataset for ML [Dataset]. https://www.kaggle.com/furkangozukara/turkish-product-reviews
    Explore at:
    zip(580369522 bytes)Available download formats
    Dataset updated
    Dec 16, 2021
    Authors
    Furkan Gözükara
    Description

    -> If you use Turkish_Product_Reviews_by_Gozukara_and_Ozel_2016 dataset please cite: https://dergipark.org.tr/en/pub/cukurovaummfd/issue/28708/310341

    @research article { cukurovaummfd310341, journal = {Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi}, issn = {1019-1011}, eissn = {2564-7520}, address = {Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi Yayın Kurulu Başkanlığı 01330 ADANA}, publisher = {Cukurova University}, year = {2016}, volume = {31}, pages = {464 - 482}, doi = {10.21605/cukurovaummfd.310341}, title = {Türkçe ve İngilizce Yorumların Duygu Analizinde Doküman Vektörü Hesaplama Yöntemleri için Bir Deneysel İnceleme}, key = {cite}, author = {Gözükara, Furkan and Özel, Selma Ayşe} }

    https://doi.org/10.21605/cukurovaummfd.310341

    -> Turkish_Product_Reviews_by_Gozukara_and_Ozel_2016 dataset is composed as below: ->-> Top 50 E-commerce sites in Turkey are crawled and their comments are extracted. Then randomly 2000 comments selected and manually labelled by a field expert. ->-> After manual labeling the selected comments is done, 600 negative and 600 positive comments are left. ->-> This dataset contains these comments.

    -> English_Movie_Reviews_by_Pang_and_Lee_2004 ->-> Pang, B., Lee, L., 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, In Proceedings of the 42nd annual meeting on Association for Computational Linguistics (p. 271). ->-> Source: https://www.cs.cornell.edu/people/pabo/movie-review-data/ | polarity dataset v2.0 - review_polarity.tar.gz

    -> English_Movie_Reviews_Sentences_by_Pang_and_Lee_2005 ->-> Pang, B., Lee, L., 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 115-124), Association for Computational Linguistics ->-> Source: https://www.cs.cornell.edu/people/pabo/movie-review-data/ | sentence polarity dataset v1.0 - rt-polaritydata.tar.gz

    -> English_Product_Reviews_by_Blitzer_et_al_2007 ->-> Article of the dataset: Blitzer, J., Dredze, M., Pereira, F., 2007. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification, In ACL (Vol. 7, pp. 440-447). ->-> Source: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/ | processed_acl.tar.gz

    -> Turkish_Movie_Reviews_by_Demirtas_and_Pechenizkiy_2013 ->-> Demirtas, E., Pechenizkiy, M., 2013. Cross-lingual polarity detection with machine translation, In Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining (p. 9). ACM. ->-> http://www.win.tue.nl/~mpechen/projects/smm/#Datasets Turkish_Movie_Sentiment.zip

    -> The dataset files are provided as used in the article. -> Weka files are generated with Raw Frequency of terms rather than used Weighting Schemes

    -> The folder Cross_Validation contains 10-fold cross-validation each fold files. -> Inside Cross_Validation folder, each turn of the cross-validation is named as test_X where X is the turn number -> Inside test_X folder * Test_Set_Negative_RAW: Contains raw negative class Test data of that cross-validation turn * Test_Set_Negative_Processed: Contains pre-processed negative class Test data of that cross-validation turn * Test_Set_Positive_RAW: Contains raw positive class Test data of that cross-validation turn * Test_Set_Positive_Processed: Contains pre-processed positive class Test data of that cross-validation turn * Train_Set_Negative_RAW: Contains raw negative class Train data of that cross-validation turn * Train_Set_Negative_Processed: Contains pre-processed negative class Train data of that cross-validation turn * Train_Set_Positive_RAW: Contains raw positive class Train data of that cross-validation turn * Train_Set_Positive_Processed: Contains pre-processed positive class Train data of that cross-validation turn * Train_Set_For_Weka: Contains processed Train set formatted for Weka * Test_Set_For_Weka: Contains processed Test set formatted for Weka

    -> The folder Entire_Dataset contains files for Entire Dataset * Negative_Processed: Contains all negative comments processed data * Positive_Processed: Contains all positive comments processed data * Negative_RAW: Contains all negative comments RAW data * Positive_RAW: Contains all positive comments RAW data * Entire_Dataset_WEKA: Contains all documents processed data in WEKA format

  13. d

    Grepsr | Sentiment Analysis of Facebook/Twitter/Instagram posts, News,...

    • datarade.ai
    Updated Mar 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grepsr (2023). Grepsr | Sentiment Analysis of Facebook/Twitter/Instagram posts, News, Product Reviews | Custom and On-demand Sentiment Analysis [Dataset]. https://datarade.ai/data-products/sentiment-analysis-of-facebook-twitter-instagram-posts-news-grepsr
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Mar 20, 2023
    Dataset authored and provided by
    Grepsr
    Area covered
    Israel, Bahrain, Gabon, Comoros, Sint Eustatius and Saba, Kenya, Mayotte, Senegal, Saint Vincent and the Grenadines, Colombia
    Description

    Usecase/Applications possible with the data:

    Customer feedback analysis: Analyzing customer feedback can be helpful for businesses to keep customers happy, stay loyal to the brand, and identify any areas to improve.

    Social media monitoring: With sentiment analysis, companies can monitor what's being said about them on social media and use that to figure out how people feel about their products and services and track any new trends.

    Market research: Sentiment analysis can be used to analyze market trends and consumer preferences, which can help companies make informed business decisions and develop effective marketing strategies.

    Financial analysis: You can use sentiment analysis to determine what people say about the stock market through news and social media, which can help you make investing decisions.

    For e-commerce (amazon/Bestbuy/home depot and much more) following data fields can be included: Title Price Vendor Name Ratings Reviews Brand ASIN URL Sentiment analysis for each review And other fields, as per request

  14. P

    EPRSTMT Dataset

    • paperswithcode.com
    • library.toponeai.link
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liang Xu; Xiaojing Lu; Chenyang Yuan; Xuanwei Zhang; Huilin Xu; Hu Yuan; Guoao Wei; Xiang Pan; Xin Tian; Libo Qin; Hu Hai (2025). EPRSTMT Dataset [Dataset]. https://paperswithcode.com/dataset/eprstmt
    Explore at:
    Dataset updated
    Jan 7, 2025
    Authors
    Liang Xu; Xiaojing Lu; Chenyang Yuan; Xuanwei Zhang; Huilin Xu; Hu Yuan; Guoao Wei; Xiang Pan; Xin Tian; Libo Qin; Hu Hai
    Description

    The EPRSTMT dataset, also known as EPR-sentiment, is a binary sentiment analysis dataset based on product reviews on an e-commerce platform. Each sample in the dataset is labeled as either Positive or Negative. It was collected by the ICIP Lab of Beijing Normal University and has been re-organized to make it suitable for sentiment analysis tasks.

  15. A

    ‘Product Reviews and Ratings (Sentiment Analysis)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Product Reviews and Ratings (Sentiment Analysis)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-product-reviews-and-ratings-sentiment-analysis-fb82/latest
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Product Reviews and Ratings (Sentiment Analysis)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mafaisal007/product-reviews-and-ratings-sentiment-analysis on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset is from a toy store in Europe that contains customer reviews about a particular prodcut it is to be used for text mining and sentiment anlaysis.

    --- Original source retains full ownership of the source dataset ---

  16. o

    Amazon Food Product Reviews & Ratings

    • opendatabay.com
    .undefined
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vdt. Data (2025). Amazon Food Product Reviews & Ratings [Dataset]. https://www.opendatabay.com/data/consumer/fd13df3c-b1af-410c-8596-7e11961381ed
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 18, 2025
    Dataset authored and provided by
    Vdt. Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    E-commerce & Online Transactions
    Description

    The Amazon Food Products Dataset is a large-scale collection of product listings, reviews, and metadata sourced from Amazon. This dataset is valuable for understanding consumer behaviour, analyzing product trends, and training machine learning models for recommendation systems and sentiment analysis. It includes various categories, providing insights into customer preferences, product ratings, and review sentiments.

    Dataset Features

    Each record in the dataset contains the following key fields:

    • ProductId: Unique identifier for each product.
    • UserId: Unique identifier for the reviewer.
    • ProfileName: Display the name of the reviewer.
    • HelpfulnessNumerator: Number of users who found the review helpful.
    • HelpfulnessDenominator: Total number of users who rated the review’s helpfulness.
    • Score: Product rating (1 to 5 stars).
    • Time: Unix timestamp of the review.
    • Summary: Short summary of the review.
    • Text: Full text of the review.

    Distribution

    • Data Volume: 568454 rows and 9 columns.
    • Format: CSV.
    • Structure: Tabular format with numerical, categorical, and text data.

    Usage

    This dataset is ideal for a variety of applications:

    • Sentiment Analysis: Training NLP models to predict sentiment based on reviews.
    • Product Recommendation Systems: Building collaborative filtering models.
    • Trend Analysis: Identifying popular products and customer preferences.
    • Fake Review Detection: Detecting anomalous patterns in review behaviours.

    Coverage

    • Geographic Coverage: Global.
    • Time Range: Multi-year dataset (over 10 years of reviews).
    • Demographics: General Amazon shoppers; includes various age groups and customer segments.

    License

    CC0

    Who Can Use It

    • Data Scientists: For building machine learning models.
    • Researchers: For academic analysis of customer behaviour.
    • Businesses: For market insights and customer sentiment analysis.
  17. Z

    Data from: Synthetic Product Desirability Datasets for Sentiment Analysis...

    • data.niaid.nih.gov
    • paperswithcode.com
    • +2more
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hastings, John (2024). Synthetic Product Desirability Datasets for Sentiment Analysis Testing [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14188455
    Explore at:
    Dataset updated
    Nov 21, 2024
    Dataset provided by
    Doty, Joseph
    Myers, Zachary
    Thompson, Warren
    Weitl-Harms, Sherri
    Hastings, John
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview:This collection contains three synthetic datasets produced by gpt-4o-mini for sentiment analysis and PDT (Product Desirability Toolkit) testing. Each dataset contains 1000 hypothetical software product reviews with the aim to produce a diversity of sentiment and text. The datasets were created as part of the research described in:

    Hastings, J.D., Weitl-Harms, S., Doty, J., Myers, Z. L., and Thompson, W., “Utilizing Large Language Models to Synthesize Product Desirability Datasets,” in Proceedings of the 2024 IEEE International Conferenceon Big Data (BigData-24), Workshop on Large Language and Foundation Models (WLLFM-24), Dec. 2024.https://arxiv.org/abs/2411.13485.

    Briefly, each row in the datasets was produced as follows:1) Word+Review: The LLM selected a word and synthesized a review that would align with a random target sentiment.2) Review+Word: The LLM produced a review to align with the target sentiment score, and then selected a word appropriate for the review.3) Supply-Word: A word was supplied to the LLM which was then scored, and a review was produced to align with that score.

    For sentiment analysis and PDT testing, the two columns of main interest across the datasets are likely 'Selected Word' and 'Hypothetical Review'.

    License:This data is licensed under the CC Attribution 4.0 international license, and may be taken and used freely with credit given. Cite as:

    Hastings, J., Weitl-Harms, S., Doty, J., Myers, Z., & Thompson, W. (2024). Synthetic Product Desirability Datasets for Sentiment Analysis Testing (1.0.0). Zenodo. https://doi.org/10.5281/zenodo.14188456

  18. c

    Unlocking User Sentiment: The App Store Reviews Dataset

    • crawlfeeds.com
    json, zip
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Unlocking User Sentiment: The App Store Reviews Dataset [Dataset]. https://crawlfeeds.com/datasets/app-store-reviews-dataset
    Explore at:
    json, zipAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    This dataset offers a focused and invaluable window into user perceptions and experiences with applications listed on the Apple App Store. It is a vital resource for app developers, product managers, market analysts, and anyone seeking to understand the direct voice of the customer in the dynamic mobile app ecosystem.

    Dataset Specifications:

    • Investment: $45.0
    • Status: Published and immediately available.
    • Category: Ratings and Reviews Data
    • Format: Compressed ZIP archive containing JSON files, ensuring easy integration into your analytical tools and platforms.
    • Volume: Comprises 10,000 unique app reviews, providing a robust sample for qualitative and quantitative analysis of user feedback.
    • Timeliness: Last crawled: (This field is blank in your provided info, which means its recency is currently unknown. If this were a real product, specifying this would be critical for its value proposition.)

    Richness of Detail (11 Comprehensive Fields):

    Each record in this dataset provides a detailed breakdown of a single App Store review, enabling multi-dimensional analysis:

    1. Review Content:

      • review: The full text of the user's written feedback, crucial for Natural Language Processing (NLP) to extract themes, sentiment, and common keywords.
      • title: The title given to the review by the user, often summarizing their main point.
      • isEdited: A boolean flag indicating whether the review has been edited by the user since its initial submission. This can be important for tracking evolving sentiment or understanding user behavior.
    2. Reviewer & Rating Information:

      • username: The public username of the reviewer, allowing for analysis of engagement patterns from specific users (though not personally identifiable).
      • rating: The star rating (typically 1-5) given by the user, providing a quantifiable measure of satisfaction.
    3. App & Origin Context:

      • app_name: The name of the application being reviewed.
      • app_id: A unique identifier for the application within the App Store, enabling direct linking to app details or other datasets.
      • country: The country of the App Store storefront where the review was left, allowing for geographic segmentation of feedback.
    4. Metadata & Timestamps:

      • _id: A unique identifier for the specific review record in the dataset.
      • crawled_at: The timestamp indicating when this particular review record was collected by the data provider (Crawl Feeds).
      • date: The original date the review was posted by the user on the App Store.

    Expanded Use Cases & Analytical Applications:

    This dataset is a goldmine for understanding what users truly think and feel about mobile applications. Here's how it can be leveraged:

    • Product Development & Improvement:

      • Bug Detection & Prioritization: Analyze negative review text to identify recurring technical issues, crashes, or bugs, allowing developers to prioritize fixes based on user impact.
      • Feature Requests & Roadmap Prioritization: Extract feature suggestions from positive and neutral review text to inform future product roadmap decisions and develop features users actively desire.
      • User Experience (UX) Enhancement: Understand pain points related to app design, navigation, and overall usability by analyzing common complaints in the review field.
      • Version Impact Analysis: If integrated with app version data, track changes in rating and sentiment after new app updates to assess the effectiveness of bug fixes or new features.
    • Market Research & Competitive Intelligence:

      • Competitor Benchmarking: Analyze reviews of competitor apps (if included or combined with similar datasets) to identify their strengths, weaknesses, and user expectations within a specific app category.
      • Market Gap Identification: Discover unmet user needs or features that users desire but are not adequately provided by existing apps.
      • Niche Opportunities: Identify specific use cases or user segments that are underserved based on recurring feedback.
    • Marketing & App Store Optimization (ASO):

      • Sentiment Analysis: Perform sentiment analysis on the review and title fields to gauge overall user satisfaction, pinpoint specific positive and negative aspects, and track sentiment shifts over time.
      • Keyword Optimization: Identify frequently used keywords and phrases in reviews to optimize app store listings, improving discoverability and search ranking.
      • Messaging Refinement: Understand how users describe and use the app in their own words, which can inform marketing copy and advertising campaigns.
      • Reputation Management: Monitor rating trends and identify critical reviews quickly to facilitate timely responses and proactive customer engagement.
    • Academic & Data Science Research:

      • Natural Language Processing (NLP): The review and title fields are excellent for training and testing NLP models for sentiment analysis, topic modeling, named entity recognition, and text summarization.
      • User Behavior Analysis: Study patterns in rating distribution, isEdited status, and date to understand user engagement and feedback cycles.
      • Cross-Country Comparisons: Analyze country-specific reviews to understand regional differences in app perception, feature preferences, or cultural nuances in feedback.

    This App Store Reviews dataset provides a direct, unfiltered conduit to understanding user needs and ultimately driving better app performance and greater user satisfaction. Its structured format and granular detail make it an indispensable asset for data-driven decision-making in the mobile app industry.

  19. d

    Review Dataset [Consumer Sentiment] – Annotated feedback to power...

    • datarade.ai
    Updated Mar 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WiserBrand.com (2024). Review Dataset [Consumer Sentiment] – Annotated feedback to power emotion-aware models and CX strategy [Dataset]. https://datarade.ai/data-products/review-dataset-consumer-sentiment-annotated-feedback-to-p-wiserbrand-com
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Mar 9, 2024
    Dataset provided by
    WiserBrand.com
    Area covered
    Luxembourg, United States of America, Andorra, Holy See, Monaco, Ireland, Latvia, Denmark, Croatia, Estonia
    Description

    "This dataset includes millions of consumer reviews tagged with emotion signals, making it ideal for training AI systems to detect how people feel — not just what they say. Built for sentiment-aware product development, CX strategy, and emotional behavior modeling, it offers deep insight into real consumer experience.

    Features include:

    -Labeled review sentiment (positive, neutral, negative) -Retail product and service context (e.g., delivery, pricing, quality) -Touchpoint mapping (pre-purchase, usage, return, support) -Optional region, channel, and timestamp data

    The list may vary based on the industry and can be customized as per your request.

    This dataset enables:

    -Training empathetic AI agents and emotion-detecting LLMs -Mapping customer sentiment across retail segments or journey stages -dentifying emotional drivers behind repeat purchases and churn -Benchmarking brand sentiment versus competitors -Segmenting user feedback for trend and CX impact analysis

    Available in clean, structured formats and optimized for large-scale NLP, this dataset is indispensable for data science, product, and CX teams focused on emotional intelligence and experience-driven growth."

  20. Amazon Product Reviews for NLP

    • kaggle.com
    Updated Apr 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yeshan Santhush (2022). Amazon Product Reviews for NLP [Dataset]. https://www.kaggle.com/datasets/yeshmesh/inconsistent-and-consistent-amazon-reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 13, 2022
    Dataset provided by
    Kaggle
    Authors
    Yeshan Santhush
    Description

    The dataset contains reviews which were web scraped with the Python library BeautifulSoup, where the reviews were webscraped from Amazon products.

    The columns of the dataset:

    1. reviewId
    2. reviewDate
    3. mainDepartment
    4. subDepartment
    5. productName
    6. reviewTitle
    7. reviewStar
    8. reviewText
    9. inconsistentStatus

    How did I label my dataset, or rather how did I label the reviews as inconsistent (1) or consistent (0) ?

    To begin, the VADER Sentiment tool was utilized to extract the compound sentiment value for each text review. Subsequently, the polarity of the review's text was assigned by labeling it as 'Positive' if the review's compound value exceeded 0.05, 'Negative' if the compound value was below -0.05, and 'Neutral' otherwise. Once the text polarity had been extracted for all reviews, the star polarity for each review was determined based on the number of stars assigned. Specifically, reviews that contained a star rating of 1 or 2 were labeled as 'Negative', reviews with a rating of 3 were labeled as 'Neutral', and those with 4 or 5 stars were labeled as 'Positive'.

    In order to identify inconsistencies or mismatches within a review, a comparison was made between the review's text polarity and star polarity. Reviews that had matching polarities were labeled as 'Consistent' (represented by 0 in binary). Conversely, if there was a mismatch between the two polarities, the review was labeled as 'Inconsistent' (represented by 1 in binary). This binary value was then recorded in the 'inconsistentStatus' column.

    FYI : You could delete off the column 'inconsistentStatus' and use your own logic for labelling the rows as consistent or inconsistent.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Organization logo

Datasets for Sentiment Analysis

Explore at:
csvAvailable download formats
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------

The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

File name: sts_gold_tweet.csv

----------- Amazon Sales Dataset ----------------

This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

Features:

  • product_id - Product ID
  • product_name - Name of the Product
  • category - Category of the Product
  • discounted_price - Discounted Price of the Product
  • actual_price - Actual Price of the Product
  • discount_percentage - Percentage of Discount for the Product
  • rating - Rating of the Product
  • rating_count - Number of people who voted for the Amazon rating
  • about_product - Description about the Product
  • user_id - ID of the user who wrote review for the Product
  • user_name - Name of the user who wrote review for the Product
  • review_id - ID of the user review
  • review_title - Short review
  • review_content - Long review
  • img_link - Image Link of the Product
  • product_link - Official Website Link of the Product

License: CC BY-NC-SA 4.0

File name: amazon.csv

----------- Rotten Tomatoes Reviews Dataset ----------------

This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

File name: data_rt.csv

----------- Preprocessed Dataset Sentiment Analysis ----------------

Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.

The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

DOI: 10.34740/kaggle/dsv/3877817

Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

This dataset was used in the experimental phase of my research.

File name: EcoPreprocessed.csv

----------- Amazon Earphones Reviews ----------------

This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

License: U.S. Government Works

Source: www.amazon.in

File name (original): AllProductReviews.csv (contains 14337 reviews)

File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

----------- Amazon Musical Instruments Reviews ----------------

This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

Source: http://jmcauley.ucsd.edu/data/amazon/

File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

Search
Clear search
Close search
Google apps
Main menu