15 datasets found
  1. h

    Amazon-Reviews-2023

    • huggingface.co
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    McAuley-Lab (2023). Amazon-Reviews-2023 [Dataset]. https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023
    Explore at:
    Dataset updated
    Sep 15, 2023
    Dataset authored and provided by
    McAuley-Lab
    Description

    Amazon Review 2023 is an updated version of the Amazon Review 2018 dataset. This dataset mainly includes reviews (ratings, text) and item metadata (desc- riptions, category information, price, brand, and images). Compared to the pre- vious versions, the 2023 version features larger size, newer reviews (up to Sep 2023), richer and cleaner meta data, and finer-grained timestamps (from day to milli-second).

  2. U.S. consumers confident in having seen fake reviews on Amazon 2024

    • statista.com
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). U.S. consumers confident in having seen fake reviews on Amazon 2024 [Dataset]. https://www.statista.com/statistics/997026/amazon-shopping-categories-largest-share-fake-product-reviews/
    Explore at:
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2024, ** percent of U.S. consumers answering a survey were confident in having seen fake product reviews on Amazon. Although the number might seem very high, the figure has decreased compared to 2023, when ** percent of respondents stated the same.

  3. P

    Amazon-Fraud Dataset

    • paperswithcode.com
    Updated Dec 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu (2024). Amazon-Fraud Dataset [Dataset]. https://paperswithcode.com/dataset/amazon-fraud
    Explore at:
    Dataset updated
    Dec 23, 2024
    Authors
    Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu
    Description

    Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

    Dataset Statistics

    # Nodes%Fraud Nodes (Class=1)
    11,9449.5
    Relation# Edges
    U-P-U
    U-S-U
    U-V-U1,036,737
    All

    Graph Construction

    The Amazon dataset includes product reviews under the Musical Instruments category. Similar to this paper, we label users with more than 80% helpful votes as benign entities and users with less than 20% helpful votes as fraudulent entities. we conduct a fraudulent user detection task on the Amazon-Fraud dataset, which is a binary classification task. We take 25 handcrafted features from this paper as the raw node features for Amazon-Fraud. We take users as nodes in the graph and design three relations: 1) U-P-U: it connects users reviewing at least one same product; 2) U-S-V: it connects users having at least one same star rating within one week; 3) U-V-U: it connects users with top 5% mutual review text similarities (measured by TF-IDF) among all users.

    To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.

  4. u

    Amazon review data 2018

    • cseweb.ucsd.edu
    • nijianmo.github.io
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/
    Explore at:
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    Context

    This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

    • More reviews:

      • The total number of reviews is 233.1 million (142.8 million in 2014).
    • New reviews:

      • Current data includes reviews in the range May 1996 - Oct 2018.
    • Metadata: - We have added transaction metadata for each review shown on the review page.

      • Added more detailed metadata of the product landing page.

    Acknowledgements

    If you publish articles based on this dataset, please cite the following paper:

    • Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.
  5. Amazon fake reviews + scrapped

    • kaggle.com
    Updated Dec 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia “Zow” Ormazabal (2022). Amazon fake reviews + scrapped [Dataset]. https://www.kaggle.com/datasets/sofiazowormazabal/amazon-fake-reviews-scrapped/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sofia “Zow” Ormazabal
    Description

    Dataset

    This dataset was created by Sofia “Zow” Ormazabal

    Contents

  6. T

    amazon_us_reviews

    • tensorflow.org
    • huggingface.co
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). amazon_us_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/amazon_us_reviews
    Explore at:
    Dataset updated
    Dec 6, 2022
    Description

    Amazon Customer Reviews (a.k.a. Product Reviews) is one of Amazons iconic products. In a period of over two decades since the first review in 1995, millions of Amazon customers have contributed over a hundred million reviews to express opinions and describe their experiences regarding products on the Amazon.com website. This makes Amazon Customer Reviews a rich source of information for academic researchers in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML), amongst others. Accordingly, we are releasing this data to further research in multiple disciplines related to understanding customer product experiences. Specifically, this dataset was constructed to represent a sample of customer evaluations and opinions, variation in the perception of a product across geographical regions, and promotional intent or bias in reviews.

    Over 130+ million customer reviews are available to researchers as part of this release. The data is available in TSV files in the amazon-reviews-pds S3 bucket in AWS US East Region. Each line in the data files corresponds to an individual review (tab delimited, with no quote and escape characters).

    Each Dataset contains the following columns : marketplace - 2 letter country code of the marketplace where the review was written. customer_id - Random identifier that can be used to aggregate reviews written by a single author. review_id - The unique ID of the review. product_id - The unique Product ID the review pertains to. In the multilingual dataset the reviews for the same product in different countries can be grouped by the same product_id. product_parent - Random identifier that can be used to aggregate reviews for the same product. product_title - Title of the product. product_category - Broad product category that can be used to group reviews (also used to group the dataset into coherent parts). star_rating - The 1-5 star rating of the review. helpful_votes - Number of helpful votes. total_votes - Number of total votes the review received. vine - Review was written as part of the Vine program. verified_purchase - The review is on a verified purchase. review_headline - The title of the review. review_body - The review text. review_date - The date the review was written.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('amazon_us_reviews', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  7. o

    Amazon Food Product Reviews & Ratings

    • opendatabay.com
    .undefined
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vdt. Data (2025). Amazon Food Product Reviews & Ratings [Dataset]. https://www.opendatabay.com/data/consumer/fd13df3c-b1af-410c-8596-7e11961381ed
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 18, 2025
    Dataset authored and provided by
    Vdt. Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    E-commerce & Online Transactions
    Description

    The Amazon Food Products Dataset is a large-scale collection of product listings, reviews, and metadata sourced from Amazon. This dataset is valuable for understanding consumer behaviour, analyzing product trends, and training machine learning models for recommendation systems and sentiment analysis. It includes various categories, providing insights into customer preferences, product ratings, and review sentiments.

    Dataset Features

    Each record in the dataset contains the following key fields:

    • ProductId: Unique identifier for each product.
    • UserId: Unique identifier for the reviewer.
    • ProfileName: Display the name of the reviewer.
    • HelpfulnessNumerator: Number of users who found the review helpful.
    • HelpfulnessDenominator: Total number of users who rated the review’s helpfulness.
    • Score: Product rating (1 to 5 stars).
    • Time: Unix timestamp of the review.
    • Summary: Short summary of the review.
    • Text: Full text of the review.

    Distribution

    • Data Volume: 568454 rows and 9 columns.
    • Format: CSV.
    • Structure: Tabular format with numerical, categorical, and text data.

    Usage

    This dataset is ideal for a variety of applications:

    • Sentiment Analysis: Training NLP models to predict sentiment based on reviews.
    • Product Recommendation Systems: Building collaborative filtering models.
    • Trend Analysis: Identifying popular products and customer preferences.
    • Fake Review Detection: Detecting anomalous patterns in review behaviours.

    Coverage

    • Geographic Coverage: Global.
    • Time Range: Multi-year dataset (over 10 years of reviews).
    • Demographics: General Amazon shoppers; includes various age groups and customer segments.

    License

    CC0

    Who Can Use It

    • Data Scientists: For building machine learning models.
    • Researchers: For academic analysis of customer behaviour.
    • Businesses: For market insights and customer sentiment analysis.
  8. u

    Product Exchange/Bartering Data

    • cseweb.ucsd.edu
    json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Product Exchange/Bartering Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    These datasets contain peer-to-peer trades from various recommendation platforms.

    Metadata includes

    • peer-to-peer trades

    • have and want lists

    • image data (tradesy)

  9. P

    Yelp-Fraud Dataset

    • paperswithcode.com
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu (2025). Yelp-Fraud Dataset [Dataset]. https://paperswithcode.com/dataset/yelpchi
    Explore at:
    Dataset updated
    Apr 21, 2025
    Authors
    Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu
    Description

    Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

    Dataset Statistics

    # Nodes%Fraud Nodes (Class=1)
    45,95414.5
    Relation# Edges
    R-U-R
    R-T-R
    R-S-R3,402,743
    All

    Graph Construction

    The Yelp spam review dataset includes hotel and restaurant reviews filtered (spam) and recommended (legitimate) by Yelp. We conduct a spam review detection task on the Yelp-Fraud dataset which is a binary classification task. We take 32 handcrafted features from SpEagle paper as the raw node features for Yelp-Fraud. Based on previous studies which show that opinion fraudsters have connections in user, product, review text, and time, we take reviews as nodes in the graph and design three relations: 1) R-U-R: it connects reviews posted by the same user; 2) R-S-R: it connects reviews under the same product with the same star rating (1-5 stars); 3) R-T-R: it connects two reviews under the same product posted in the same month.

    To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.

  10. C

    Customer Review Marketing Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Customer Review Marketing Report [Dataset]. https://www.marketresearchforecast.com/reports/customer-review-marketing-29081
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Mar 7, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The customer review marketing market, valued at $1340.9 million in 2025, is poised for significant growth. This expansion is driven by several key factors. The increasing reliance on online reviews for purchase decisions by consumers fuels demand for effective review marketing strategies. Businesses across all sectors, especially e-commerce giants like Amazon and Alibaba, recognize the crucial role of positive online reviews in brand building, customer acquisition, and sales conversion. The market’s segmentation, encompassing online and offline review marketing for both physical and virtual products, presents diverse opportunities for specialized service providers. Furthermore, technological advancements enabling automated review generation and analysis, along with improved sentiment analysis tools, are enhancing market efficiency and fueling adoption. Growth is also observed across diverse geographical regions, with North America and Asia-Pacific expected to be major contributors due to high internet penetration and e-commerce maturity. However, the market faces certain challenges. The proliferation of fake reviews poses a significant threat, eroding consumer trust and necessitating robust verification mechanisms. Moreover, managing and responding to negative reviews effectively requires significant resources and expertise, posing a barrier for smaller businesses. Maintaining data privacy and complying with evolving regulations around review collection and usage is another crucial consideration for companies operating in this space. Despite these hurdles, the overall market trajectory indicates robust growth, propelled by the increasing importance of online reputation management and the continued expansion of e-commerce globally. The competitive landscape, featuring both established players and emerging service providers, suggests a dynamic environment with opportunities for both large corporations and specialized niche players.

  11. u

    Pinterest Fashion Compatibility

    • cseweb.ucsd.edu
    • beta.data.urbandatacentre.ca
    json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Pinterest Fashion Compatibility [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    This dataset contains images (scenes) containing fashion products, which are labeled with bounding boxes and links to the corresponding products.

    Metadata includes

    • product IDs

    • bounding boxes

    Basic Statistics:

    • Scenes: 47,739

    • Products: 38,111

    • Scene-Product Pairs: 93,274

  12. o

    E-commerce Headphone Sentiment Dataset

    • opendatabay.com
    .undefined
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). E-commerce Headphone Sentiment Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/eed974c6-d221-4eb3-85f6-51e99839a040
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Reviews & Ratings
    Description

    This dataset contains a collection of Amazon headphone reviews, processed for sentiment analysis. It is a small subset intended to assist in understanding customer opinions and evaluating product perceptions. The data supports analysis of review usefulness, factors influencing helpfulness, and the detection of atypical or potentially misleading reviews.

    Columns

    • Customer_Name: The name of the customer who provided the review.
    • REVIEW_TITLE: A short summary or title of the customer's review.
    • Color: The colour of the headphone product being reviewed.
    • REVIEW_DATE: The specific date when the customer submitted their review.
    • COMMENTS: Detailed comments from the customer expressing their feelings or observations about the product.
    • RATINGS: The customer's rating for the product, given on a scale of 1 to 5 stars.

    Distribution

    This dataset is typically provided in a CSV file format. It comprises approximately 1,500 individual reviews. The structure includes 6 distinct columns, making it readily available for analytical tasks.

    Usage

    This dataset is ideally suited for: * Conducting sentiment analysis on product reviews. * Exploring factors that influence the perceived helpfulness of a review. * Identifying unusual review patterns or potential outliers. * Applications in Natural Language Processing (NLP), text mining, and exploratory data analysis.

    Coverage

    The data spans a time range from 28 May 2021 to 13 June 2022. It covers various customer names, including "Amazon Customer" and "Rahul", alongside a large proportion of "Other" customers. Product colours predominantly include "White" and "Black". The ratings are distributed across several ranges, from 1.00-1.40 up to 4.60-5.00. The geographical scope of the data is global.

    License

    CCO

    Who Can Use It

    This dataset is beneficial for data scientists, machine learning engineers, business analysts, and researchers interested in: * Developing sentiment analysis models. * Understanding consumer feedback and product performance. * Performing text-based data analysis. * Exploring e-commerce review patterns.

    Dataset Name Suggestions

    • Amazon Headphone Reviews for Sentiment Analysis
    • Headphone Customer Review Data
    • E-commerce Headphone Sentiment Dataset
    • Product Review Analysis Data (Headphones)

    Attributes

    Original Data Source: HEADPHONE DATASET REVIEW ANALYSIS

  13. C

    Customer Review Marketing Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Customer Review Marketing Report [Dataset]. https://www.marketresearchforecast.com/reports/customer-review-marketing-31669
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Mar 9, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Customer Review Marketing market, valued at $698.7 million in 2025, is projected to experience robust growth, fueled by a Compound Annual Growth Rate (CAGR) of 9.8% from 2025 to 2033. This expansion is driven by several key factors. E-commerce's continued dominance necessitates transparent and trustworthy customer feedback mechanisms, making review marketing integral to brand building and sales conversion. Increasing consumer reliance on online reviews for purchasing decisions reinforces the importance of strategic review management. Furthermore, the proliferation of social media platforms and review aggregator sites provides businesses with expanded opportunities to leverage positive reviews and address negative ones proactively. The market is segmented by application into Physical and Virtual Products, with the former currently dominating but both sectors witnessing significant growth as online and offline purchasing converge. Companies like Amazon, Alibaba, and eBay are leveraging sophisticated review systems, while smaller businesses utilize platforms like Shopify and AWIN to optimize their review marketing strategies. Geographical analysis reveals strong market penetration in North America and Europe, with significant growth potential in rapidly developing Asia-Pacific markets like India and China. The ongoing refinement of AI-driven sentiment analysis tools and the increasing focus on combating fake reviews will further shape market dynamics in the coming years. The forecast period reveals a continuously expanding market, with substantial opportunities for businesses of all sizes. The increasing sophistication of marketing analytics allows companies to directly track ROI on their review marketing efforts, leading to increased investment. Competitive pressures also drive adoption, with businesses recognizing the competitive advantage of superior customer review management. While potential restraints such as concerns regarding review authenticity and the need for robust data privacy measures exist, the overall trend points towards sustained and healthy market growth. The geographical distribution is expected to evolve, with emerging markets contributing increasingly to the global market size over the next decade. This expansion presents significant opportunities for both established players and innovative startups in the customer review marketing space.

  14. u

    PDMX

    • cseweb.ucsd.edu
    json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, PDMX [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    We introduce PDMX: a Public Domain MusicXML dataset for symbolic music processing, including over 250k musical scores in MusicXML format. PDMX is the largest publicly available, copyright-free MusicXML dataset in existence. PDMX includes genre, tag, description, and popularity metadata for every file.

  15. c

    Amazon India products dataset in CSV format

    • crawlfeeds.com
    csv, zip
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Amazon India products dataset in CSV format [Dataset]. https://crawlfeeds.com/datasets/amazon-india-products-dataset-in-csv-format
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Mar 27, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Area covered
    India
    Description

    Gain access to a structured dataset featuring thousands of products listed on Amazon India. This dataset is ideal for e-commerce analytics, competitor research, pricing strategies, and market trend analysis.

    Dataset Features:

    • Product Details: Name, Brand, Category, and Unique ID

    • Pricing Information: Current Price, Discounted Price, and Currency

    • Availability & Ratings: Stock Status, Customer Ratings, and Reviews

    • Seller Information: Seller Name and Fulfillment Details

    • Additional Attributes: Product Description, Specifications, and Images

    Dataset Specifications:

    • Format: CSV

    • Number of Records: 50,000+

    • Delivery Time: 3 Days

    • Price: $149.00

    • Availability: Immediate

    This dataset provides structured and actionable insights to support e-commerce businesses, pricing strategies, and product optimization. If you're looking for more datasets for e-commerce analysis, explore our E-commerce datasets for a broader selection.

  16. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
McAuley-Lab (2023). Amazon-Reviews-2023 [Dataset]. https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023

Amazon-Reviews-2023

McAuley-Lab/Amazon-Reviews-2023

Explore at:
68 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Sep 15, 2023
Dataset authored and provided by
McAuley-Lab
Description

Amazon Review 2023 is an updated version of the Amazon Review 2018 dataset. This dataset mainly includes reviews (ratings, text) and item metadata (desc- riptions, category information, price, brand, and images). Compared to the pre- vious versions, the 2023 version features larger size, newer reviews (up to Sep 2023), richer and cleaner meta data, and finer-grained timestamps (from day to milli-second).

Search
Clear search
Close search
Google apps
Main menu