Amazon Review 2023 is an updated version of the Amazon Review 2018 dataset. This dataset mainly includes reviews (ratings, text) and item metadata (desc- riptions, category information, price, brand, and images). Compared to the pre- vious versions, the 2023 version features larger size, newer reviews (up to Sep 2023), richer and cleaner meta data, and finer-grained timestamps (from day to milli-second).
In 2024, ** percent of U.S. consumers answering a survey were confident in having seen fake product reviews on Amazon. Although the number might seem very high, the figure has decreased compared to 2023, when ** percent of respondents stated the same.
Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.
Dataset Statistics
# Nodes | %Fraud Nodes (Class=1) |
---|---|
11,944 | 9.5 |
Relation | # Edges |
---|---|
U-P-U | |
U-S-U | |
U-V-U | 1,036,737 |
All |
Graph Construction
The Amazon dataset includes product reviews under the Musical Instruments category. Similar to this paper, we label users with more than 80% helpful votes as benign entities and users with less than 20% helpful votes as fraudulent entities. we conduct a fraudulent user detection task on the Amazon-Fraud dataset, which is a binary classification task. We take 25 handcrafted features from this paper as the raw node features for Amazon-Fraud. We take users as nodes in the graph and design three relations: 1) U-P-U: it connects users reviewing at least one same product; 2) U-S-V: it connects users having at least one same star rating within one week; 3) U-V-U: it connects users with top 5% mutual review text similarities (measured by TF-IDF) among all users.
To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.
This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper:
This dataset was created by Sofia “Zow” Ormazabal
Amazon Customer Reviews (a.k.a. Product Reviews) is one of Amazons iconic products. In a period of over two decades since the first review in 1995, millions of Amazon customers have contributed over a hundred million reviews to express opinions and describe their experiences regarding products on the Amazon.com website. This makes Amazon Customer Reviews a rich source of information for academic researchers in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML), amongst others. Accordingly, we are releasing this data to further research in multiple disciplines related to understanding customer product experiences. Specifically, this dataset was constructed to represent a sample of customer evaluations and opinions, variation in the perception of a product across geographical regions, and promotional intent or bias in reviews.
Over 130+ million customer reviews are available to researchers as part of this release. The data is available in TSV files in the amazon-reviews-pds S3 bucket in AWS US East Region. Each line in the data files corresponds to an individual review (tab delimited, with no quote and escape characters).
Each Dataset contains the following columns : marketplace - 2 letter country code of the marketplace where the review was written. customer_id - Random identifier that can be used to aggregate reviews written by a single author. review_id - The unique ID of the review. product_id - The unique Product ID the review pertains to. In the multilingual dataset the reviews for the same product in different countries can be grouped by the same product_id. product_parent - Random identifier that can be used to aggregate reviews for the same product. product_title - Title of the product. product_category - Broad product category that can be used to group reviews (also used to group the dataset into coherent parts). star_rating - The 1-5 star rating of the review. helpful_votes - Number of helpful votes. total_votes - Number of total votes the review received. vine - Review was written as part of the Vine program. verified_purchase - The review is on a verified purchase. review_headline - The title of the review. review_body - The review text. review_date - The date the review was written.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('amazon_us_reviews', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Amazon Food Products Dataset is a large-scale collection of product listings, reviews, and metadata sourced from Amazon. This dataset is valuable for understanding consumer behaviour, analyzing product trends, and training machine learning models for recommendation systems and sentiment analysis. It includes various categories, providing insights into customer preferences, product ratings, and review sentiments.
Each record in the dataset contains the following key fields:
This dataset is ideal for a variety of applications:
CC0
These datasets contain peer-to-peer trades from various recommendation platforms.
Metadata includes
peer-to-peer trades
have and want lists
image data (tradesy)
Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.
Dataset Statistics
# Nodes | %Fraud Nodes (Class=1) |
---|---|
45,954 | 14.5 |
Relation | # Edges |
---|---|
R-U-R | |
R-T-R | |
R-S-R | 3,402,743 |
All |
Graph Construction
The Yelp spam review dataset includes hotel and restaurant reviews filtered (spam) and recommended (legitimate) by Yelp. We conduct a spam review detection task on the Yelp-Fraud dataset which is a binary classification task. We take 32 handcrafted features from SpEagle paper as the raw node features for Yelp-Fraud. Based on previous studies which show that opinion fraudsters have connections in user, product, review text, and time, we take reviews as nodes in the graph and design three relations: 1) R-U-R: it connects reviews posted by the same user; 2) R-S-R: it connects reviews under the same product with the same star rating (1-5 stars); 3) R-T-R: it connects two reviews under the same product posted in the same month.
To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The customer review marketing market, valued at $1340.9 million in 2025, is poised for significant growth. This expansion is driven by several key factors. The increasing reliance on online reviews for purchase decisions by consumers fuels demand for effective review marketing strategies. Businesses across all sectors, especially e-commerce giants like Amazon and Alibaba, recognize the crucial role of positive online reviews in brand building, customer acquisition, and sales conversion. The market’s segmentation, encompassing online and offline review marketing for both physical and virtual products, presents diverse opportunities for specialized service providers. Furthermore, technological advancements enabling automated review generation and analysis, along with improved sentiment analysis tools, are enhancing market efficiency and fueling adoption. Growth is also observed across diverse geographical regions, with North America and Asia-Pacific expected to be major contributors due to high internet penetration and e-commerce maturity. However, the market faces certain challenges. The proliferation of fake reviews poses a significant threat, eroding consumer trust and necessitating robust verification mechanisms. Moreover, managing and responding to negative reviews effectively requires significant resources and expertise, posing a barrier for smaller businesses. Maintaining data privacy and complying with evolving regulations around review collection and usage is another crucial consideration for companies operating in this space. Despite these hurdles, the overall market trajectory indicates robust growth, propelled by the increasing importance of online reputation management and the continued expansion of e-commerce globally. The competitive landscape, featuring both established players and emerging service providers, suggests a dynamic environment with opportunities for both large corporations and specialized niche players.
This dataset contains images (scenes) containing fashion products, which are labeled with bounding boxes and links to the corresponding products.
Metadata includes
product IDs
bounding boxes
Basic Statistics:
Scenes: 47,739
Products: 38,111
Scene-Product Pairs: 93,274
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains a collection of Amazon headphone reviews, processed for sentiment analysis. It is a small subset intended to assist in understanding customer opinions and evaluating product perceptions. The data supports analysis of review usefulness, factors influencing helpfulness, and the detection of atypical or potentially misleading reviews.
This dataset is typically provided in a CSV file format. It comprises approximately 1,500 individual reviews. The structure includes 6 distinct columns, making it readily available for analytical tasks.
This dataset is ideally suited for: * Conducting sentiment analysis on product reviews. * Exploring factors that influence the perceived helpfulness of a review. * Identifying unusual review patterns or potential outliers. * Applications in Natural Language Processing (NLP), text mining, and exploratory data analysis.
The data spans a time range from 28 May 2021 to 13 June 2022. It covers various customer names, including "Amazon Customer" and "Rahul", alongside a large proportion of "Other" customers. Product colours predominantly include "White" and "Black". The ratings are distributed across several ranges, from 1.00-1.40 up to 4.60-5.00. The geographical scope of the data is global.
CCO
This dataset is beneficial for data scientists, machine learning engineers, business analysts, and researchers interested in: * Developing sentiment analysis models. * Understanding consumer feedback and product performance. * Performing text-based data analysis. * Exploring e-commerce review patterns.
Original Data Source: HEADPHONE DATASET REVIEW ANALYSIS
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The Customer Review Marketing market, valued at $698.7 million in 2025, is projected to experience robust growth, fueled by a Compound Annual Growth Rate (CAGR) of 9.8% from 2025 to 2033. This expansion is driven by several key factors. E-commerce's continued dominance necessitates transparent and trustworthy customer feedback mechanisms, making review marketing integral to brand building and sales conversion. Increasing consumer reliance on online reviews for purchasing decisions reinforces the importance of strategic review management. Furthermore, the proliferation of social media platforms and review aggregator sites provides businesses with expanded opportunities to leverage positive reviews and address negative ones proactively. The market is segmented by application into Physical and Virtual Products, with the former currently dominating but both sectors witnessing significant growth as online and offline purchasing converge. Companies like Amazon, Alibaba, and eBay are leveraging sophisticated review systems, while smaller businesses utilize platforms like Shopify and AWIN to optimize their review marketing strategies. Geographical analysis reveals strong market penetration in North America and Europe, with significant growth potential in rapidly developing Asia-Pacific markets like India and China. The ongoing refinement of AI-driven sentiment analysis tools and the increasing focus on combating fake reviews will further shape market dynamics in the coming years. The forecast period reveals a continuously expanding market, with substantial opportunities for businesses of all sizes. The increasing sophistication of marketing analytics allows companies to directly track ROI on their review marketing efforts, leading to increased investment. Competitive pressures also drive adoption, with businesses recognizing the competitive advantage of superior customer review management. While potential restraints such as concerns regarding review authenticity and the need for robust data privacy measures exist, the overall trend points towards sustained and healthy market growth. The geographical distribution is expected to evolve, with emerging markets contributing increasingly to the global market size over the next decade. This expansion presents significant opportunities for both established players and innovative startups in the customer review marketing space.
We introduce PDMX: a Public Domain MusicXML dataset for symbolic music processing, including over 250k musical scores in MusicXML format. PDMX is the largest publicly available, copyright-free MusicXML dataset in existence. PDMX includes genre, tag, description, and popularity metadata for every file.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Gain access to a structured dataset featuring thousands of products listed on Amazon India. This dataset is ideal for e-commerce analytics, competitor research, pricing strategies, and market trend analysis.
Product Details: Name, Brand, Category, and Unique ID
Pricing Information: Current Price, Discounted Price, and Currency
Availability & Ratings: Stock Status, Customer Ratings, and Reviews
Seller Information: Seller Name and Fulfillment Details
Additional Attributes: Product Description, Specifications, and Images
Format: CSV
Number of Records: 50,000+
Delivery Time: 3 Days
Price: $149.00
Availability: Immediate
This dataset provides structured and actionable insights to support e-commerce businesses, pricing strategies, and product optimization. If you're looking for more datasets for e-commerce analysis, explore our E-commerce datasets for a broader selection.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Amazon Review 2023 is an updated version of the Amazon Review 2018 dataset. This dataset mainly includes reviews (ratings, text) and item metadata (desc- riptions, category information, price, brand, and images). Compared to the pre- vious versions, the 2023 version features larger size, newer reviews (up to Sep 2023), richer and cleaner meta data, and finer-grained timestamps (from day to milli-second).