15 datasets found

h
Amazon-Reviews-2023
huggingface.co
Updated Sep 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
McAuley-Lab (2023). Amazon-Reviews-2023 [Dataset]. https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023
Explore at:
Dataset updated
Sep 15, 2023
Dataset authored and provided by
McAuley-Lab
Description
Amazon Review 2023 is an updated version of the Amazon Review 2018 dataset. This dataset mainly includes reviews (ratings, text) and item metadata (desc- riptions, category information, price, brand, and images). Compared to the pre- vious versions, the 2023 version features larger size, newer reviews (up to Sep 2023), richer and cleaner meta data, and finer-grained timestamps (from day to milli-second).
U.S. consumers confident in having seen fake reviews on Amazon 2024
statista.com
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). U.S. consumers confident in having seen fake reviews on Amazon 2024 [Dataset]. https://www.statista.com/statistics/997026/amazon-shopping-categories-largest-share-fake-product-reviews/
Explore at:
Dataset updated
Jun 24, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description
In 2024, ** percent of U.S. consumers answering a survey were confident in having seen fake product reviews on Amazon. Although the number might seem very high, the figure has decreased compared to 2023, when ** percent of respondents stated the same.
P
Amazon-Fraud Dataset
paperswithcode.com
Updated Dec 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu (2024). Amazon-Fraud Dataset [Dataset]. https://paperswithcode.com/dataset/amazon-fraud
Explore at:
Dataset updated
Dec 23, 2024
Authors
Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu
Description
Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

Dataset Statistics

# Nodes %Fraud Nodes (Class=1)
11,944 9.5

Relation # Edges
U-P-U
U-S-U
U-V-U 1,036,737
All

Graph Construction

The Amazon dataset includes product reviews under the Musical Instruments category. Similar to this paper, we label users with more than 80% helpful votes as benign entities and users with less than 20% helpful votes as fraudulent entities. we conduct a fraudulent user detection task on the Amazon-Fraud dataset, which is a binary classification task. We take 25 handcrafted features from this paper as the raw node features for Amazon-Fraud. We take users as nodes in the graph and design three relations: 1) U-P-U: it connects users reviewing at least one same product; 2) U-S-V: it connects users having at least one same star rating within one week; 3) U-V-U: it connects users with top 5% mutual review text similarities (measured by TF-IDF) among all users.

To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.
u
Amazon review data 2018
cseweb.ucsd.edu
nijianmo.github.io
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/
Explore at:
Dataset authored and provided by
UCSD CSE Research Project
Description
Context

This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

More reviews:

The total number of reviews is 233.1 million (142.8 million in 2014).

New reviews:

Current data includes reviews in the range May 1996 - Oct 2018.

Metadata: - We have added transaction metadata for each review shown on the review page.

Added more detailed metadata of the product landing page.

Acknowledgements

If you publish articles based on this dataset, please cite the following paper:

Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.
Amazon fake reviews + scrapped
kaggle.com
Updated Dec 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia “Zow” Ormazabal (2022). Amazon fake reviews + scrapped [Dataset]. https://www.kaggle.com/datasets/sofiazowormazabal/amazon-fake-reviews-scrapped/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 19, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sofia “Zow” Ormazabal
Description
Dataset

This dataset was created by Sofia “Zow” Ormazabal

Contents
T
amazon_us_reviews
tensorflow.org
huggingface.co
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). amazon_us_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/amazon_us_reviews
Explore at:
Dataset updated
Dec 6, 2022
Description
Amazon Customer Reviews (a.k.a. Product Reviews) is one of Amazons iconic products. In a period of over two decades since the first review in 1995, millions of Amazon customers have contributed over a hundred million reviews to express opinions and describe their experiences regarding products on the Amazon.com website. This makes Amazon Customer Reviews a rich source of information for academic researchers in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML), amongst others. Accordingly, we are releasing this data to further research in multiple disciplines related to understanding customer product experiences. Specifically, this dataset was constructed to represent a sample of customer evaluations and opinions, variation in the perception of a product across geographical regions, and promotional intent or bias in reviews.

Over 130+ million customer reviews are available to researchers as part of this release. The data is available in TSV files in the amazon-reviews-pds S3 bucket in AWS US East Region. Each line in the data files corresponds to an individual review (tab delimited, with no quote and escape characters).

Each Dataset contains the following columns : marketplace - 2 letter country code of the marketplace where the review was written. customer_id - Random identifier that can be used to aggregate reviews written by a single author. review_id - The unique ID of the review. product_id - The unique Product ID the review pertains to. In the multilingual dataset the reviews for the same product in different countries can be grouped by the same product_id. product_parent - Random identifier that can be used to aggregate reviews for the same product. product_title - Title of the product. product_category - Broad product category that can be used to group reviews (also used to group the dataset into coherent parts). star_rating - The 1-5 star rating of the review. helpful_votes - Number of helpful votes. total_votes - Number of total votes the review received. vine - Review was written as part of the Vine program. verified_purchase - The review is on a verified purchase. review_headline - The title of the review. review_body - The review text. review_date - The date the review was written.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('amazon_us_reviews', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
o
Amazon Food Product Reviews & Ratings
opendatabay.com
.undefined
Updated Jun 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vdt. Data (2025). Amazon Food Product Reviews & Ratings [Dataset]. https://www.opendatabay.com/data/consumer/fd13df3c-b1af-410c-8596-7e11961381ed
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 18, 2025
Dataset authored and provided by
Vdt. Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
E-commerce & Online Transactions
Description
The Amazon Food Products Dataset is a large-scale collection of product listings, reviews, and metadata sourced from Amazon. This dataset is valuable for understanding consumer behaviour, analyzing product trends, and training machine learning models for recommendation systems and sentiment analysis. It includes various categories, providing insights into customer preferences, product ratings, and review sentiments.

Dataset Features

Each record in the dataset contains the following key fields:

ProductId: Unique identifier for each product.

UserId: Unique identifier for the reviewer.

ProfileName: Display the name of the reviewer.

HelpfulnessNumerator: Number of users who found the review helpful.

HelpfulnessDenominator: Total number of users who rated the review’s helpfulness.

Score: Product rating (1 to 5 stars).

Time: Unix timestamp of the review.

Summary: Short summary of the review.

Text: Full text of the review.

Distribution

Data Volume: 568454 rows and 9 columns.

Format: CSV.

Structure: Tabular format with numerical, categorical, and text data.

Usage

This dataset is ideal for a variety of applications:

Sentiment Analysis: Training NLP models to predict sentiment based on reviews.

Product Recommendation Systems: Building collaborative filtering models.

Trend Analysis: Identifying popular products and customer preferences.

Fake Review Detection: Detecting anomalous patterns in review behaviours.

Coverage

Geographic Coverage: Global.

Time Range: Multi-year dataset (over 10 years of reviews).

Demographics: General Amazon shoppers; includes various age groups and customer segments.

License

CC0

Who Can Use It

Data Scientists: For building machine learning models.

Researchers: For academic analysis of customer behaviour.

Businesses: For market insights and customer sentiment analysis.
u
Product Exchange/Bartering Data
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Product Exchange/Bartering Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets contain peer-to-peer trades from various recommendation platforms.

Metadata includes

peer-to-peer trades

have and want lists

image data (tradesy)
P
Yelp-Fraud Dataset
paperswithcode.com
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu (2025). Yelp-Fraud Dataset [Dataset]. https://paperswithcode.com/dataset/yelpchi
Explore at:
Dataset updated
Apr 21, 2025
Authors
Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu
Description
Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

Dataset Statistics

# Nodes %Fraud Nodes (Class=1)
45,954 14.5

Relation # Edges
R-U-R
R-T-R
R-S-R 3,402,743
All

Graph Construction

The Yelp spam review dataset includes hotel and restaurant reviews filtered (spam) and recommended (legitimate) by Yelp. We conduct a spam review detection task on the Yelp-Fraud dataset which is a binary classification task. We take 32 handcrafted features from SpEagle paper as the raw node features for Yelp-Fraud. Based on previous studies which show that opinion fraudsters have connections in user, product, review text, and time, we take reviews as nodes in the graph and design three relations: 1) R-U-R: it connects reviews posted by the same user; 2) R-S-R: it connects reviews under the same product with the same star rating (1-5 stars); 3) R-T-R: it connects two reviews under the same product posted in the same month.

To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.
C
Customer Review Marketing Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Customer Review Marketing Report [Dataset]. https://www.marketresearchforecast.com/reports/customer-review-marketing-29081
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Mar 7, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The customer review marketing market, valued at $1340.9 million in 2025, is poised for significant growth. This expansion is driven by several key factors. The increasing reliance on online reviews for purchase decisions by consumers fuels demand for effective review marketing strategies. Businesses across all sectors, especially e-commerce giants like Amazon and Alibaba, recognize the crucial role of positive online reviews in brand building, customer acquisition, and sales conversion. The market’s segmentation, encompassing online and offline review marketing for both physical and virtual products, presents diverse opportunities for specialized service providers. Furthermore, technological advancements enabling automated review generation and analysis, along with improved sentiment analysis tools, are enhancing market efficiency and fueling adoption. Growth is also observed across diverse geographical regions, with North America and Asia-Pacific expected to be major contributors due to high internet penetration and e-commerce maturity. However, the market faces certain challenges. The proliferation of fake reviews poses a significant threat, eroding consumer trust and necessitating robust verification mechanisms. Moreover, managing and responding to negative reviews effectively requires significant resources and expertise, posing a barrier for smaller businesses. Maintaining data privacy and complying with evolving regulations around review collection and usage is another crucial consideration for companies operating in this space. Despite these hurdles, the overall market trajectory indicates robust growth, propelled by the increasing importance of online reputation management and the continued expansion of e-commerce globally. The competitive landscape, featuring both established players and emerging service providers, suggests a dynamic environment with opportunities for both large corporations and specialized niche players.
u
Pinterest Fashion Compatibility
cseweb.ucsd.edu
beta.data.urbandatacentre.ca
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Pinterest Fashion Compatibility [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
This dataset contains images (scenes) containing fashion products, which are labeled with bounding boxes and links to the corresponding products.

Metadata includes

product IDs

bounding boxes

Basic Statistics:

Scenes: 47,739

Products: 38,111

Scene-Product Pairs: 93,274
o
E-commerce Headphone Sentiment Dataset
opendatabay.com
.undefined
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). E-commerce Headphone Sentiment Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/eed974c6-d221-4eb3-85f6-51e99839a040
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Reviews & Ratings
Description
This dataset contains a collection of Amazon headphone reviews, processed for sentiment analysis. It is a small subset intended to assist in understanding customer opinions and evaluating product perceptions. The data supports analysis of review usefulness, factors influencing helpfulness, and the detection of atypical or potentially misleading reviews.

Columns

Customer_Name: The name of the customer who provided the review.

REVIEW_TITLE: A short summary or title of the customer's review.

Color: The colour of the headphone product being reviewed.

REVIEW_DATE: The specific date when the customer submitted their review.

COMMENTS: Detailed comments from the customer expressing their feelings or observations about the product.

RATINGS: The customer's rating for the product, given on a scale of 1 to 5 stars.

Distribution

This dataset is typically provided in a CSV file format. It comprises approximately 1,500 individual reviews. The structure includes 6 distinct columns, making it readily available for analytical tasks.

Usage

This dataset is ideally suited for: * Conducting sentiment analysis on product reviews. * Exploring factors that influence the perceived helpfulness of a review. * Identifying unusual review patterns or potential outliers. * Applications in Natural Language Processing (NLP), text mining, and exploratory data analysis.

Coverage

The data spans a time range from 28 May 2021 to 13 June 2022. It covers various customer names, including "Amazon Customer" and "Rahul", alongside a large proportion of "Other" customers. Product colours predominantly include "White" and "Black". The ratings are distributed across several ranges, from 1.00-1.40 up to 4.60-5.00. The geographical scope of the data is global.

License

CCO

Who Can Use It

This dataset is beneficial for data scientists, machine learning engineers, business analysts, and researchers interested in: * Developing sentiment analysis models. * Understanding consumer feedback and product performance. * Performing text-based data analysis. * Exploring e-commerce review patterns.

Dataset Name Suggestions

Amazon Headphone Reviews for Sentiment Analysis

Headphone Customer Review Data

E-commerce Headphone Sentiment Dataset

Product Review Analysis Data (Headphones)

Attributes

Original Data Source: HEADPHONE DATASET REVIEW ANALYSIS
C
Customer Review Marketing Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Customer Review Marketing Report [Dataset]. https://www.marketresearchforecast.com/reports/customer-review-marketing-31669
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Mar 9, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Customer Review Marketing market, valued at $698.7 million in 2025, is projected to experience robust growth, fueled by a Compound Annual Growth Rate (CAGR) of 9.8% from 2025 to 2033. This expansion is driven by several key factors. E-commerce's continued dominance necessitates transparent and trustworthy customer feedback mechanisms, making review marketing integral to brand building and sales conversion. Increasing consumer reliance on online reviews for purchasing decisions reinforces the importance of strategic review management. Furthermore, the proliferation of social media platforms and review aggregator sites provides businesses with expanded opportunities to leverage positive reviews and address negative ones proactively. The market is segmented by application into Physical and Virtual Products, with the former currently dominating but both sectors witnessing significant growth as online and offline purchasing converge. Companies like Amazon, Alibaba, and eBay are leveraging sophisticated review systems, while smaller businesses utilize platforms like Shopify and AWIN to optimize their review marketing strategies. Geographical analysis reveals strong market penetration in North America and Europe, with significant growth potential in rapidly developing Asia-Pacific markets like India and China. The ongoing refinement of AI-driven sentiment analysis tools and the increasing focus on combating fake reviews will further shape market dynamics in the coming years. The forecast period reveals a continuously expanding market, with substantial opportunities for businesses of all sizes. The increasing sophistication of marketing analytics allows companies to directly track ROI on their review marketing efforts, leading to increased investment. Competitive pressures also drive adoption, with businesses recognizing the competitive advantage of superior customer review management. While potential restraints such as concerns regarding review authenticity and the need for robust data privacy measures exist, the overall trend points towards sustained and healthy market growth. The geographical distribution is expected to evolve, with emerging markets contributing increasingly to the global market size over the next decade. This expansion presents significant opportunities for both established players and innovative startups in the customer review marketing space.
u
PDMX
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, PDMX [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
We introduce PDMX: a Public Domain MusicXML dataset for symbolic music processing, including over 250k musical scores in MusicXML format. PDMX is the largest publicly available, copyright-free MusicXML dataset in existence. PDMX includes genre, tag, description, and popularity metadata for every file.
c
Amazon India products dataset in CSV format
crawlfeeds.com
csv, zip
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Amazon India products dataset in CSV format [Dataset]. https://crawlfeeds.com/datasets/amazon-india-products-dataset-in-csv-format
Explore at:
csv, zipAvailable download formats
Dataset updated
Mar 27, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Area covered
India
Description
Gain access to a structured dataset featuring thousands of products listed on Amazon India. This dataset is ideal for e-commerce analytics, competitor research, pricing strategies, and market trend analysis.

Dataset Features:

Product Details: Name, Brand, Category, and Unique ID

Pricing Information: Current Price, Discounted Price, and Currency

Availability & Ratings: Stock Status, Customer Ratings, and Reviews

Seller Information: Seller Name and Fulfillment Details

Additional Attributes: Product Description, Specifications, and Images

Dataset Specifications:

Format: CSV

Number of Records: 50,000+

Delivery Time: 3 Days

Price: $149.00

Availability: Immediate

This dataset provides structured and actionable insights to support e-commerce businesses, pricing strategies, and product optimization. If you're looking for more datasets for e-commerce analysis, explore our E-commerce datasets for a broader selection.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

# Nodes	%Fraud Nodes (Class=1)
11,944	9.5

Relation	# Edges
	U-P-U
	U-S-U
U-V-U	1,036,737
	All

# Nodes	%Fraud Nodes (Class=1)
45,954	14.5

Relation	# Edges
	R-U-R
	R-T-R
R-S-R	3,402,743
	All

Facebook

Twitter

Click to copy link

Link copied

Cite

McAuley-Lab (2023). Amazon-Reviews-2023 [Dataset]. https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023

Amazon-Reviews-2023

McAuley-Lab/Amazon-Reviews-2023

Explore at:

68 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Sep 15, 2023

Dataset authored and provided by

McAuley-Lab

Description

Amazon Review 2023 is an updated version of the Amazon Review 2018 dataset. This dataset mainly includes reviews (ratings, text) and item metadata (desc- riptions, category information, price, brand, and images). Compared to the pre- vious versions, the 2023 version features larger size, newer reviews (up to Sep 2023), richer and cleaner meta data, and finer-grained timestamps (from day to milli-second).

Clear search

Close search

Google apps

Main menu

Amazon-Reviews-2023

U.S. consumers confident in having seen fake reviews on Amazon 2024

Amazon-Fraud Dataset

Amazon review data 2018

Context

Acknowledgements

Amazon fake reviews + scrapped

Dataset

Contents

amazon_us_reviews

Amazon Food Product Reviews & Ratings

Dataset Features

Distribution

Usage

Coverage

License

Who Can Use It

Product Exchange/Bartering Data

Yelp-Fraud Dataset

Customer Review Marketing Report

Pinterest Fashion Compatibility

E-commerce Headphone Sentiment Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Customer Review Marketing Report

PDMX

Amazon India products dataset in CSV format

Dataset Features:

Dataset Specifications:

Amazon-Reviews-2023

McAuley-Lab/Amazon-Reviews-2023