48 datasets found
  1. amazon-reviews-sentiment-analysis

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fastai X Hugging Face Group 2022, amazon-reviews-sentiment-analysis [Dataset]. https://huggingface.co/datasets/hugginglearners/amazon-reviews-sentiment-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    fastai X Hugging Face Group 2022
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for amazon reviews for sentiment analysis

      Dataset Summary
    

    One of the most important problems in e-commerce is the correct calculation of the points given to after-sales products. The solution to this problem is to provide greater customer satisfaction for the e-commerce site, product prominence for sellers, and a seamless shopping experience for buyers. Another problem is the correct ordering of the comments given to the products. The prominence of misleading… See the full description on the dataset page: https://huggingface.co/datasets/hugginglearners/amazon-reviews-sentiment-analysis.

  2. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  3. h

    Amazon-Reviews-2023

    • huggingface.co
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    McAuley-Lab (2023). Amazon-Reviews-2023 [Dataset]. https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023
    Explore at:
    Dataset updated
    Sep 15, 2023
    Dataset authored and provided by
    McAuley-Lab
    Description

    Amazon Review 2023 is an updated version of the Amazon Review 2018 dataset. This dataset mainly includes reviews (ratings, text) and item metadata (desc- riptions, category information, price, brand, and images). Compared to the pre- vious versions, the 2023 version features larger size, newer reviews (up to Sep 2023), richer and cleaner meta data, and finer-grained timestamps (from day to milli-second).

  4. Amazon Customer Review Data

    • zenodo.org
    pdf
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akash Shashikant Vaykar; Abhishek Kaushik; Abhishek Kaushik; Akash Shashikant Vaykar (2024). Amazon Customer Review Data [Dataset]. http://doi.org/10.5281/zenodo.3549704
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Akash Shashikant Vaykar; Abhishek Kaushik; Abhishek Kaushik; Akash Shashikant Vaykar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset: Amazon Customer Review Data for sentiment analysis

    Size: 60889 appox.

    Format: .CSV

    Period: 2013 to 2019

    Categories: 5…… (Mobiles, Smart TV, Books, Mobile Accessories, Refrigerator)

    Unique_ID: Customized (Primary Key)

    Review_Header: user’s comment in few words

    Review_Text: User’s comment in details (3-4 lines)

    Rating: (1- Very Low, 2 🡪 Low, 3🡪 Avg, 4 🡪 Good, 5 - Excellent)

    Posting Period: 2013 to 2019

    Own_Rating: for 1-2 🡪 Negative, 3🡪 Neutral, 4-5 🡪 Positive

  5. Johns Hopkins Multi-Domain Sentiment Dataset ∑∞

    • kaggle.com
    Updated Jan 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jérøme E. Blanch∑xt (2020). Johns Hopkins Multi-Domain Sentiment Dataset ∑∞ [Dataset]. https://www.kaggle.com/jeromeblanchet/multidomain-sentiment-analysis-dataset/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 14, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jérøme E. Blanch∑xt
    Description

    Multidomain sentiment analysis dataset

    Amazon review from Johns Hopkins University’s Department of Computer Science

    Source: https://www.cs.jhu.edu/~mdredze/datasets/sentiment/

    Kaggle kernel take care of the tar.gz files for you :-)

    This dataset features slightly older product reviews from Amazon and derives from the Johns Hopkins University’s Department of Computer Science.

    Dataset included

    unprocessed.tar.gz processed_acl.tar.gz processed_stars.tar.gz

    This sentiment dataset has been used in several papers:

    John Blitzer, Mark Dredze, Fernando Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. Association of Computational Linguistics (ACL), 2007. [PDF]

    John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jenn Wortman. Learning Bounds for Domain Adaptation. Neural Information Processing Systems (NIPS), 2008. [PDF]

    Mark Dredze, Koby Crammer, and Fernando Pereira. Confidence-Weighted Linear Classification. International Conference on Machine Learning (ICML), 2008. [PDF]

    Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Domain Adaptation with Multiple Sources. Neural Information Processing Systems (NIPS), 2009.

    If you use this data for your research or a publication, please cite the first (ACL 2007) paper as the reference for the data. Also, please drop me a line so I know that you found the data useful.

    The Multi-Domain Sentiment Dataset contains product reviews taken from Amazon.com from many product types (domains). Some domains (books and dvds) have hundreds of thousands of reviews. Others (musical instruments) have only a few hundred. Reviews contain star ratings (1 to 5 stars) that can be converted into binary labels if needed. This page contains some descriptions about the data. If you have questions, please email Mark Dredze or John Blitzer.

    A few notes regarding the data sets.

    1) unprocessed.tar.gz contains the original data. 2) processed.acl.tar.gz contains the data pre-processed and balanced. That is, the format of Blitzer et al. (ACL 2007) 3) processed.realvalued.tar.gz contains the data pre-processed and balanced, but with the number of stars, rather than just positive or negative. That is, the format of Mansour et al. (NIPS 2009)

  6. Amazon Product Reviews Dataset

    • kaggle.com
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gözde Kızılkaya Atik (2025). Amazon Product Reviews Dataset [Dataset]. https://www.kaggle.com/datasets/gzdekzlkaya/amazon-product-reviews-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 16, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gözde Kızılkaya Atik
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    🛍️ Dataset Overview

    This dataset contains over 4,900 customer reviews from Amazon, including text-based feedback, star ratings, and helpfulness votes.

    It can be used for:

    • 📊 Sentiment Analysis
    • 🧠 Text Classification (Positive/Negative)
    • 🔍 Review Score Prediction (based on reviewText)
    • 🤖 Building Recommendation Systems
    • 🧮 Helpfulness Scoring Models

    📌 Key Columns

    • reviewText: Full written review
    • overall: Star rating (1 to 5)
    • summary: Short summary of the review
    • helpful_yes: Number of users who found the review helpful
    • total_vote: Total votes on helpfulness
    • day_diff: Days since the review was written

    This dataset is suitable for natural language processing (NLP) and supervised learning tasks.

    📎 Note

    This is a publicly available dataset for educational and research use.

  7. h

    Consumer_goods_reviews

    • huggingface.co
    Updated Jan 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kevin kibebe (2025). Consumer_goods_reviews [Dataset]. https://huggingface.co/datasets/kevykibbz/Consumer_goods_reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 22, 2025
    Authors
    kevin kibebe
    Description

    Amazon Product Review Dataset (2023)

      Dataset Overview
    

    The Amazon Product Review Dataset (2023) contains product reviews from Amazon customers. The dataset includes product information, review details, and metadata about the customers who left the reviews. This dataset can be used for various natural language processing (NLP) tasks, including sentiment analysis, review prediction, recommendation systems, and more.

    Dataset Name: Amazon Product Review Dataset (2023) Dataset… See the full description on the dataset page: https://huggingface.co/datasets/kevykibbz/Consumer_goods_reviews.

  8. c

    Amazon UK shoes products reviews dataset

    • crawlfeeds.com
    csv, zip
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Amazon UK shoes products reviews dataset [Dataset]. https://crawlfeeds.com/datasets/amazon-uk-shoes-products-reviews-dataset
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Jun 27, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Unlock detailed insights with our Amazon UK Shoes Products Reviews Dataset, an invaluable resource for businesses, researchers, and data analysts. This dataset features comprehensive information, including product names, review texts, star ratings, and customer feedback for a wide range of shoe products available on Amazon UK.

    Key Features:

    • Extensive Coverage: Includes detailed reviews and ratings for various shoe products, helping you analyze customer preferences and trends.

    • Structured Data: Available in easily accessible formats like product review dataset CSV, making it perfect for integration into your analytical workflows.

    • Actionable Insights: Leverage this dataset for customer sentiment analysis, product optimization, and competitive benchmarking.

    Why Choose the Amazon UK Shoes Products Reviews Dataset?

    Whether you're delving into customer behavior, conducting market research, or improving product offerings, this dataset empowers you to make informed decisions. By working with a dataset enriched with real-world feedback, you can:

    • Understand customer preferences: Dive into detailed reviews to uncover patterns in consumer likes and dislikes.

    • Enhance product offerings: Identify gaps and opportunities in the market to better meet customer demands.

    • Boost competitive analysis: Compare customer feedback across different brands and products.

    Additional Datasets Available

    Explore related datasets like the Amazon product review dataset, offering insights across various categories and regions. For specific needs, our curated product reviews dataset is tailored to help you gain a granular understanding of niche markets.

  9. f

    Feature description of the Amazon dataset.

    • plos.figshare.com
    xls
    Updated Feb 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noura A. Semary; Wesam Ahmed; Khalid Amin; Paweł Pławiak; Mohamed Hammad (2024). Feature description of the Amazon dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0294968.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Noura A. Semary; Wesam Ahmed; Khalid Amin; Paweł Pławiak; Mohamed Hammad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A crucial part of sentiment classification is featuring extraction because it involves extracting valuable information from text data, which affects the model’s performance. The goal of this paper is to help in selecting a suitable feature extraction method to enhance the performance of sentiment analysis tasks. In order to provide directions for future machine learning and feature extraction research, it is important to analyze and summarize feature extraction techniques methodically from a machine learning standpoint. There are several methods under consideration, including Bag-of-words (BOW), Word2Vector, N-gram, Term Frequency- Inverse Document Frequency (TF-IDF), Hashing Vectorizer (HV), and Global vector for word representation (GloVe). To prove the ability of each feature extractor, we applied it to the Twitter US airlines and Amazon musical instrument reviews datasets. Finally, we trained a random forest classifier using 70% of the training data and 30% of the testing data, enabling us to evaluate and compare the performance using different metrics. Based on our results, we find that the TD-IDF technique demonstrates superior performance, with an accuracy of 99% in the Amazon reviews dataset and 96% in the Twitter US airlines dataset. This study underscores the paramount significance of feature extraction in sentiment analysis, endowing pragmatic insights to elevate model performance and steer future research pursuits.

  10. Amazon Product Reviews

    • kaggle.com
    Updated Nov 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Amazon Product Reviews [Dataset]. https://www.kaggle.com/datasets/thedevastator/amazon-product-reviews/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 26, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Amazon Product Reviews

    18 Years of Customer Ratings and Experiences

    By Huggingface Hub [source]

    About this dataset

    The Amazon Reviews Polarity Dataset discloses eighteen years of customers' ratings and reviews from Amazon.com, offering an unparalleled trove of insight and knowledge. Drawing from the immense pool of over 35 million customer reviews, this dataset presents a broad spectrum of customer opinions on products they have bought or used. This invaluable data is a gold mine for improving products and services as it contains comprehensive information regarding customers' experiences with a product including ratings, titles, and plaintext content. At the same time, this dataset contains both customer-specific data along with product information which encourages deep analytics that could lead to great advances in providing tailored solutions for customers. Has your product been favored by the majority? Are there any aspects that need extra care? Use Amazon Reviews Polarity to gain deeper insights into what your customers want - explore now!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    • Analyze customer ratings to identify trends: Take a look at how many customers have rated the same product or service with the same score (e.g., 4 stars). You can use this information to identify what customers like or don’t like about it by examining common sentiment throughout the reviews. Identifying these patterns can help you make decisions on which features of your products or services to emphasize in order to boost sales and satisfaction rates.

    2 Review content analysis: Analyzing review content is one of the best ways to gauge customer sentiment toward specific features or aspects of a product/service. Using natural language processing tools such as Word2Vec, Latent Dirichlet Allocation (LDA), or even simple keyword search algorithms can quickly reveal general topics that are discussed in relation to your product/service across multiple reviews - allowing you quickly pinpoint areas that may need improvement for particular items within your lines of business.

    3 Track associated scores over time: By tracking customer ratings overtime, you may be able to better understand when there has been an issue with something specific related to your product/service - such as negative response toward a feature that was introduced but didn’t seem popular among customers and was removed shortly after introduction.. This can save time and money by identifying issues before they become widespread concerns with larger sets of consumers who invest their money in using your company's item(s).

    4 Visualize sentiment data over time graphs : Utilizing visualizations such as bar graphs can help identify trends across different categories quicker than raw numbers alone; combining both numeric values along with color differences associated between different scores allows you spot anomalies easier - allowing faster resolution times when trying figure out why certain spikes occurred where other stayed stable (or vice-versa) when comparing similar data points through time-series based visualization models

    Research Ideas

    • Developing a customer sentiment analysis system that can be used to quickly analyze the sentiment of reviews and identify any potential areas of improvement.
    • Building a product recommendation service that takes into account the ratings and reviews of customers when recommending similar products they may be interested in purchasing.
    • Training a machine learning model to accurately predict customers’ ratings on new products they have not yet tried and leverage this for further product development optimization initiatives

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:-------------------------------------------------------------------| | label | The sentiment of the review, either positive or negative. (String) | | title | The title of the review. (String) ...

  11. T

    amazon_us_reviews

    • tensorflow.org
    • huggingface.co
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). amazon_us_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/amazon_us_reviews
    Explore at:
    Dataset updated
    Dec 6, 2022
    Description

    Amazon Customer Reviews (a.k.a. Product Reviews) is one of Amazons iconic products. In a period of over two decades since the first review in 1995, millions of Amazon customers have contributed over a hundred million reviews to express opinions and describe their experiences regarding products on the Amazon.com website. This makes Amazon Customer Reviews a rich source of information for academic researchers in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML), amongst others. Accordingly, we are releasing this data to further research in multiple disciplines related to understanding customer product experiences. Specifically, this dataset was constructed to represent a sample of customer evaluations and opinions, variation in the perception of a product across geographical regions, and promotional intent or bias in reviews.

    Over 130+ million customer reviews are available to researchers as part of this release. The data is available in TSV files in the amazon-reviews-pds S3 bucket in AWS US East Region. Each line in the data files corresponds to an individual review (tab delimited, with no quote and escape characters).

    Each Dataset contains the following columns : marketplace - 2 letter country code of the marketplace where the review was written. customer_id - Random identifier that can be used to aggregate reviews written by a single author. review_id - The unique ID of the review. product_id - The unique Product ID the review pertains to. In the multilingual dataset the reviews for the same product in different countries can be grouped by the same product_id. product_parent - Random identifier that can be used to aggregate reviews for the same product. product_title - Title of the product. product_category - Broad product category that can be used to group reviews (also used to group the dataset into coherent parts). star_rating - The 1-5 star rating of the review. helpful_votes - Number of helpful votes. total_votes - Number of total votes the review received. vine - Review was written as part of the Vine program. verified_purchase - The review is on a verified purchase. review_headline - The title of the review. review_body - The review text. review_date - The date the review was written.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('amazon_us_reviews', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  12. h

    amazon-food-reviews-dataset

    • huggingface.co
    Updated Dec 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    misschestnut (2023). amazon-food-reviews-dataset [Dataset]. https://huggingface.co/datasets/jhan21/amazon-food-reviews-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 12, 2023
    Authors
    misschestnut
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Dataset Card for "Amazon Food Reviews"

      Dataset Summary
    

    This dataset consists of reviews of fine foods from amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plain text review. It also includes reviews from all other Amazon categories.

      Supported Tasks and Leaderboards
    

    This dataset can be used for numerous tasks like sentiment analysis, text… See the full description on the dataset page: https://huggingface.co/datasets/jhan21/amazon-food-reviews-dataset.

  13. E

    Amazon Fine Food Reviews

    • live.european-language-grid.eu
    csv
    Updated Dec 30, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). Amazon Fine Food Reviews [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/4949
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 30, 2013
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Dataset consists of reviews of fine foods from amazon.

  14. m

    Review-Subjectivity-Dataset

    • data.mendeley.com
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tian Xu (2025). Review-Subjectivity-Dataset [Dataset]. http://doi.org/10.17632/r8s6ztpkkb.1
    Explore at:
    Dataset updated
    Feb 12, 2025
    Authors
    Tian Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Review Subjectivity Dataset is a collection of Amazon product reviews, spanning various product types. The dataset aims to analyze the subjectivity of customer reviews, which is crucial for understanding customer sentiment and improving product recommendations.

  15. h

    Amazon_Reviews_for_Sentiment_Analysis_fine_grained_5_classes

    • huggingface.co
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yassir acharki (2025). Amazon_Reviews_for_Sentiment_Analysis_fine_grained_5_classes [Dataset]. https://huggingface.co/datasets/yassiracharki/Amazon_Reviews_for_Sentiment_Analysis_fine_grained_5_classes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 3, 2025
    Authors
    yassir acharki
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for Dataset Name

    The Amazon reviews full score dataset is constructed by randomly taking 600,000 training samples and 130,000 testing samples for each review score from 1 to 5. In total there are 3,000,000 trainig samples and 650,000 testing samples.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 3 columns in them, corresponding to class index (1 to 5)… See the full description on the dataset page: https://huggingface.co/datasets/yassiracharki/Amazon_Reviews_for_Sentiment_Analysis_fine_grained_5_classes.

  16. a

    Amazon reviews - Polarity

    • academictorrents.com
    bittorrent
    Updated Oct 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiang Zhang et al., 2015 (2018). Amazon reviews - Polarity [Dataset]. https://academictorrents.com/details/db0cd5603a0d154ec3dcfc6ff7862d47d3884b83
    Explore at:
    bittorrent(688339454)Available download formats
    Dataset updated
    Oct 16, 2018
    Dataset authored and provided by
    Xiang Zhang et al., 2015
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    34,686,770 Amazon reviews from 6,643,669 users on 2,441,053 products, from the Stanford Network Analysis Project (SNAP). This subset contains 1,800,000 training samples and 200,000 testing samples in each polarity sentiment.

  17. Amazon Reviews for Sentiment Analysis

    • kaggle.com
    zip
    Updated Nov 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Bittlingmayer (2019). Amazon Reviews for Sentiment Analysis [Dataset]. https://www.kaggle.com/bittlingmayer/amazonreviews
    Explore at:
    zip(517080965 bytes)Available download formats
    Dataset updated
    Nov 18, 2019
    Authors
    Adam Bittlingmayer
    Description

    This dataset consists of a few million Amazon customer reviews (input text) and star ratings (output labels) for learning how to train fastText for sentiment analysis.

    The idea here is a dataset is more than a toy - real business data on a reasonable scale - but can be trained in minutes on a modest laptop.

    Content

    The fastText supervised learning tutorial requires data in the following format:

    _label_
    
  18. h

    Amazon_Reviews_Binary_for_Sentiment_Analysis

    • huggingface.co
    Updated Jul 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yassir acharki (2024). Amazon_Reviews_Binary_for_Sentiment_Analysis [Dataset]. https://huggingface.co/datasets/yassiracharki/Amazon_Reviews_Binary_for_Sentiment_Analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 26, 2024
    Authors
    yassir acharki
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for Dataset Name

    The Amazon reviews polarity dataset is constructed by taking review score 1 and 2 as negative, and 4 and 5 as positive. Samples of score 3 is ignored. In the dataset, class 1 is the negative and class 2 is the positive. Each class has 1,800,000 training samples and 200,000 testing samples.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 3… See the full description on the dataset page: https://huggingface.co/datasets/yassiracharki/Amazon_Reviews_Binary_for_Sentiment_Analysis.

  19. h

    Multi-Domain-Sentiment-Dataset

    • huggingface.co
    Updated Apr 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jia (2023). Multi-Domain-Sentiment-Dataset [Dataset]. https://huggingface.co/datasets/JSSICE/Multi-Domain-Sentiment-Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 10, 2023
    Authors
    Jia
    Description

    Using it for assessment.

      Dataset for Multi Domain (Including Kitchen, Books, DVDs, and Electronics)
    

    Multi-Domain Sentiment Dataset by John Blitzer, Mark Dredze, Fernando Pereira.

      Description:
    

    The Multi-Domain Sentiment Dataset contains product reviews taken from Amazon.com from 4 product types (domains): Kitchen, Books, DVDs, and Electronics. Each domain has several thousand reviews, but the exact number varies by domain. Reviews contain star ratings (1 to 5 stars) that… See the full description on the dataset page: https://huggingface.co/datasets/JSSICE/Multi-Domain-Sentiment-Dataset.

  20. Word Embedding of Amazon Product Review Corpus

    • zenodo.org
    bin, txt
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marc Schulder; Marc Schulder; Michael Wiegand; Michael Wiegand (2020). Word Embedding of Amazon Product Review Corpus [Dataset]. http://doi.org/10.5281/zenodo.3370051
    Explore at:
    txt, binAvailable download formats
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marc Schulder; Marc Schulder; Michael Wiegand; Michael Wiegand
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A word embedding of the Amazon Product Review Corpus (Jindal and Liu, 2008).

    Created using Word2Vec in CBOW mode, 500 dimensions and window size 5.

    Words have been lemmatised and particle verbs have been merged into a single token (e.g. calm_down).

    Attribution

    This dataset was created as part of the following publication:

    Marc Schulder, Michael Wiegand, Josef Ruppenhofer and Benjamin Roth (2017). "Towards Bootstrapping a Polarity Shifter Lexicon using Linguistic Features". Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP). Taipei, Taiwan, November 27 - December 3, 2017. DOI: 10.5281/zenodo.3365609.

    If you use the data in your research or work, please cite the publication.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
fastai X Hugging Face Group 2022, amazon-reviews-sentiment-analysis [Dataset]. https://huggingface.co/datasets/hugginglearners/amazon-reviews-sentiment-analysis
Organization logo

amazon-reviews-sentiment-analysis

hugginglearners/amazon-reviews-sentiment-analysis

Explore at:
34 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
fastai X Hugging Face Group 2022
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Dataset Card for amazon reviews for sentiment analysis

  Dataset Summary

One of the most important problems in e-commerce is the correct calculation of the points given to after-sales products. The solution to this problem is to provide greater customer satisfaction for the e-commerce site, product prominence for sellers, and a seamless shopping experience for buyers. Another problem is the correct ordering of the comments given to the products. The prominence of misleading… See the full description on the dataset page: https://huggingface.co/datasets/hugginglearners/amazon-reviews-sentiment-analysis.

Search
Clear search
Close search
Google apps
Main menu