This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper:
https://brightdata.com/licensehttps://brightdata.com/license
Gain extensive insights with our Amazon datasets, encompassing detailed product information including pricing, reviews, ratings, brand names, product categories, sellers, ASINs, images, and much more. Ideal for market researchers, data analysts, and eCommerce professionals looking to excel in the competitive online marketplace. Over 425M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:
Title Asin Main Image Brand Name Description Availability Subcategory Categories Parent Asin Type Product Type Name Model Number Manufacturer Color Size Date First Available Released Model Year Item Model Number Part Number Price Total Reviews Total Ratings Average Rating Features Best Sellers Rank Subcategory Buybox Buybox Seller Id Buybox Is Amazon Images Product URL And more
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore our extensive Amazon Product Dataset, featuring detailed information on prices, ratings, sales volume, and more.
These datasets contain 1.48 million question and answer pairs about products from Amazon.
Metadata includes
question and answer text
is the question binary (yes/no), and if so does it have a yes/no answer?
timestamps
product ID (to reference the review dataset)
Basic Statistics:
Questions: 1.48 million
Answers: 4,019,744
Labeled yes/no questions: 309,419
Number of unique products with questions: 191,185
https://brightdata.com/licensehttps://brightdata.com/license
Buy Amazon datasets and get access to over 300 million records from any Amazon domain. Get insights on Amazon products, sellers, and reviews.
https://brightdata.com/licensehttps://brightdata.com/license
Utilize our Amazon reviews dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset can aid in understanding customer behavior, product performance, and market trends, empowering organizations to refine their product and marketing strategies. Access the entire dataset or tailor a subset to fit your requirements. Popular use cases include: Product Performance Analysis: Analyze Amazon reviews to assess product performance, uncovering customer satisfaction levels, common issues, and highly praised features to inform product improvements and marketing messages. Customer Behavior Insights: Gain insights into customer behavior, purchasing patterns, and preferences, enabling more personalized marketing and product recommendations. Demand Forecasting: Leverage Amazon reviews to predict future product demand by analyzing historical review data and identifying trends, helping to optimize inventory management and sales strategies. Accessing and analyzing the Amazon reviews dataset supports market strategy optimization by leveraging insights to analyze key market trends and customer preferences, enhancing overall business decision-making.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Amazon Berkeley Objects (ABO) is a collection of 147,702 product listings with multilingual metadata and 398,212 unique catalog images. 8,222 listings come with turntable photography (also referred as "spin" or "360º-View" images), as sequences of 24 or 72 images, for a total of 586,584 images in 8,209 unique sequences. For 7,953 products, the collection also provides high-quality 3d models, as glTF 2.0 files.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Amazon Bin Image Dataset contains 50,000 images and metadata from bins in an Amazon Fulfillment Center. Each image is paired with metadata in JSON format, stored in metadata.sqlite and retrievable via SQL queries. The dataset captures diverse product assortments in randomly organized bins, supporting research in object detection, inventory management, and weakly-tagged learning.
From website:
Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications. An initial list of data sets is already available, and more will be added soon.
Previously, large data sets such as the mapping of the Human Genome and the US Census data required hours or days to locate, download, customize, and analyze. Now, anyone can access these data sets from their Amazon Elastic Compute Cloud (Amazon EC2) instances and start computing on the data within minutes. Users can also leverage the entire AWS ecosystem and easily collaborate with other AWS users. For example, users can produce or use prebuilt server images with tools and applications to analyze the data sets. By hosting this important and useful data with cost-efficient services such as Amazon EC2, AWS hopes to provide researchers across a variety of disciplines and industries with tools to enable more innovation, more quickly.
http://www.gnu.org/licenses/fdl-1.3.htmlhttp://www.gnu.org/licenses/fdl-1.3.html
This dataset contains transaction data from a fictitious SaaS company selling sales and marketing software to other companies (B2B). In the dataset, each row represents a single transaction/order (9,994 transactions), and the columns include:
Here is the Original Dataset: https://ee-assets-prod-us-east-1.s3.amazonaws.com/modules/337d5d05acc64a6fa37bcba6b921071c/v1/SaaS-Sales.csv
| # | Name of the attribute | Description | | -- | --------------------- | -------------------------------------------------------- | | 1 | Row ID | A unique identifier for each transaction. | | 2 | Order ID | A unique identifier for each order. | | 3 | Order Date | The date when the order was placed. | | 4 | Date Key | A numerical representation of the order date (YYYYMMDD). | | 5 | Contact Name | The name of the person who placed the order. | | 6 | Country | The country where the order was placed. | | 7 | City | The city where the order was placed. | | 8 | Region | The region where the order was placed. | | 9 | Subregion | The subregion where the order was placed. | | 10 | Customer | The name of the company that placed the order. | | 11 | Customer ID | A unique identifier for each customer. | | 13 | Industry | The industry the customer belongs to. | | 14 | Segment | The customer segment (SMB, Strategic, Enterprise, etc.). | | 15 | Product | The product was ordered. | | 16 | License | The license key for the product. | | 17 | Sales | The total sales amount for the transaction. | | 18 | Quantity | The total number of items in the transaction. | | 19 | Discount | The discount applied to the transaction. | | 20 | Profit | The profit from the transaction. |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To read any dataset you can use the following code
>>> import numpy as np
>>> embed_image = np.load('embed_image.npy')
>>> embed_image.shape
(33962, 768)
>>> embed_text = np.load('embed_text.npy')
>>> embed_text.shape
(33962, 768)
>>> import pandas as pd
>>> items = pd.read_csv('items.txt')
>>> m = len(items)
>>> print(f'{m} items in dataset')
33962
>>> users = pd.read_csv('users.txt')
>>> n = len(users)
>>> print(f'{n} users in dataset')
14790
>>> train = pd.read_csv('train.txt')
>>> train
user item
0 13444 23557
1 13444 33739
... ... ...
317109 13506 29993
317110 13506 13931
>>> from scipy.sparse import csr_matrix
>>> train_matrix = csr_matrix((np.ones(len(train)), (train.user, train.item)), shape=(n,m))
This dataset contains six datasets. Each dataset is duplicated with seven combinations of different Image and Text encoders, so you should see 42 folders.
Each folder is the name of the dataset and the encoder used for the visual and textual parts. For example: bookcrossing-vit_bert
.
The datasets are: - Clothing, Shoes and Jewelry (Amazon) - Home and Kitchen (Amazon) - Musical Instruments (Amazon) - Movies and TV (Amazon) - Book-Crossing - Movielens 25M
And the encoders are:
- CLIP (Image and Text) (*-clip_clip
). This is the main one used in the experiments.
- ViT and BERT (*-vit_bert
)
- CLIP (only visual data) *-clip_none
- ViT only *-vit_none
- BERT only *-none_bert
- CLIP (text only) *-clip_none
- No textual or visual information *-none_none
For each dataset, we have the following files, considering we have M
items and N
users, textual embeddings with D (like 1024) dimensions, and Visual with E dimensions (like 768)
- embed_image.npy
A NumPy array of MxE
elements.
- embed_text.npy
A NumPy array of MXD
elements.
- items.csv
A CSV with the Item ID in the original dataset (like the Amazon ASIN, the Movie ID, etc.) and the item number, an integer from 0 to M-1
- users.csv
A CSV with the User ID in the original dataset (like the Amazon Reviewer Id) and the item number, an integer from 0 to N-1
- train.txt
, validation.txt
and test.txt
are CSV files with the portions of the reviews for train validation and test. It has the item the user liked or reviewed positively. Each row has a positive user item.
We consider a review "positive" if the rating is four or more (or 8 or more for Book-crossing).
The vector is zeroed out if an Item does not have an image or text.
Dataset | Users | Item | Ratings | Density |
---|---|---|---|---|
Clothing & Shoes & Jewelry | 23318 | 38493 | 178944 | 0.020% |
Home & Kitchen | 5968 | 57645 | 135839 | 0.040% |
Movies & TV | 21974 | 23958 | 216110 | 0.041% |
Musical Instruments | 14429 | 29040 | 93923 | 0.022% |
Book-crossing | 14790 | 33962 | 519613 | 0.103% |
Movielens 25M | 162541 | 59047 | 25000095 | 0.260% |
Only a tiny fraction of the dataset was taken for the Amazon Datasets by considering reviews in a specific date range.
For the Bookcrossing dataset, only items with images were considered.
There are various other minor tweaks on how to obtain images and texts. The repo https://github.com/igui/MultimodalRecomAnalysis has the Notebook and scripts to reproduce the dataset extraction from scratch.
These datasets contain attributes about products sold on ModCloth and Amazon which may be sources of bias in recommendations (in particular, attributes about how the products are marketed). Data also includes user/item interactions for recommendation.
Metadata includes
ratings
product images
user identities
item sizes, user genders
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These datasets consist of product reviews we ourselves collected from Amazon.com, starting from the year 2008 to 2020, spanning across seven different domains, namely, book (Becoming by Michelle Obama), pharmaceutical (Turmeric Curcumin Supplement by Natures Nutrition), electronics (Echo Dot 3rd Gen by Amazon), grocery (Sparkling Ice Blue Variety Pack), healthcare (EnerPlex 3-Ply Re-usable Face Mask), entertainment (Harry Potter: The Complete 8-Film Collection), and personal care (Nautica Voyage By Nautica). These datasets consist of 5000 reviews each.
vessl/amazon-beauty-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
These datasets contain reviews from the Goodreads book review website, and a variety of attributes describing the items. Critically, these datasets have multiple levels of user interaction, raging from adding to a shelf, rating, and reading.
Metadata includes
reviews
add-to-shelf, read, review actions
book attributes: title, isbn
graph of similar books
Basic Statistics:
Items: 1,561,465
Users: 808,749
Interactions: 225,394,930
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Access a comprehensive dataset of over 240,000 shoe product listings directly from Amazon UK. This dataset is ideal for researchers, e-commerce analysts, and AI developers looking to explore pricing trends, brand performance, product features, or build training data for retail-focused models.
All data is neatly packaged in a downloadable ZIP archive containing files in JSON format, making it easy to integrate with your preferred analytics or database tools.
Price and discount trend analysis
Competitor benchmarking
Product attribute extraction and modeling
AI/ML training datasets (e.g., shoe recommendation systems)
Retail assortment planning
This dataset is available as a static snapshot, but you can request weekly or monthly updates through the Crawl Feeds dashboard. Upon purchase, the data will be bundled and delivered via a direct download link.
These datasets contain reviews from the Steam video game platform, and information about which games were bundled together.
Metadata includes
reviews
purchases, plays, recommends (likes)
product bundles
pricing information
Basic Statistics:
Reviews: 7,793,069
Users: 2,567,538
Items: 15,474
Bundles: 615
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Amazon is a dataset for object detection tasks - it contains Esya annotations for 389 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
These datasets contain peer-to-peer trades from various recommendation platforms.
Metadata includes
peer-to-peer trades
have and want lists
image data (tradesy)
By ANil [source]
This dataset provides an in-depth look at the profitability of e-commerce sales. It contains data on a variety of sales channels, including Shiprocket and INCREFF, as well as financial information on related expenses and profits. The columns contain data such as SKU codes, design numbers, stock levels, product categories, sizes and colors. In addition to this we have included the MRPs across multiple stores like Ajio MRP , Amazon MRP , Amazon FBA MRP , Flipkart MRP , Limeroad MRP Myntra MRP and PaytmMRP along with other key parameters like amount paid by customer for the purchase , rate per piece for every individual transaction Also we have added transactional parameters like Date of sale months category fulfilledby B2b Status Qty Currency Gross amt . This is a must-have dataset for anyone trying to uncover the profitability of e-commerce sales in today's marketplace
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides a comprehensive overview of e-commerce sales data from different channels covering a variety of products. Using this dataset, retailers and digital marketers can measure the performance of their campaigns more accurately and efficiently.
The following steps help users make the most out of this dataset: - Analyze the general sales trends by examining info such as month, category, currency, stock level, and customer for each sale. This will give you an idea about how your e-commerce business is performing in each channel.
- Review the Shiprocket and INCREF data to compare and analyze profitability via different fulfilment methods. This comparison would enable you to make better decisions towards maximizing profit while minimizing costs associated with each method’s referral fees and fulfillment rates.
- Compare prices between various channels such as Amazon FBA MRP, Myntra MRP, Ajio MRP etc using the corresponding columns for each store (Amazon MRP etc). You can judge which stores are offering more profitable margins without compromising on quality by analyzing these pricing points in combination with other information related to product sales (TP1/TP2 - cost per piece).
- Look at customer specific data such as TP 1/TP 2 combination wise Gross Amount or Rate info in terms price per piece or total gross amount generated by any SKU dispersed over multiple customers with relevant dates associated to track individual item performance relative to others within its category over time periods shortlisted/filtered appropriately.. Have an eye on items commonly utilized against offers or promotional discounts offered hence crafting strategies towards inventory optimization leading up-selling operations.?
- Finally Use Overall ‘Stock’ details along all the P & L Data including Yearly Expenses_IIGF information record for takeaways which might be aimed towards essential cost cutting measures like switching amongst delivery options carefully chosen out of Shiprocket & INCREFF leadings away from manual inspections catering savings under support personnel outsourcing structures.?By employing a comprehensive understanding on how our internal subsidiaries perform globally unless attached respective audits may provide us remarkably lower operational costs servicing confidence; costing far lesser than being incurred taking into account entire pallet shipments tracking sheets representing current level supply chains efficiencies achieved internally., then one may finally scale profits exponentially increases cut down unseen losses followed up introducing newer marketing campaigns necessarily tailored according playing around multiple goods based spectrums due powerful backing suitable transportation boundaries set carefully
- Analysing the difference in profitability between sales made through Shiprocket and INCREFF. This data can be used to see where the biggest profit margins lie, and strategize accordingly.
- Examining the Complete Cost structure of a product with all its components and their contribution towards revenue or profitability, i.e., TP 1 & 2, MRP Old & Final MRP Old together with Platform based MRP - Amazon, Myntra and Paytm etc., Currency based Profit Margin etc.
- Building a predictive model using Machine Learning by leveraging historical data to predict future sales volume and profits for e-commerce products across multiple categories/devices/platforms such as Amazon, Flipkart, Myntra etc as well providing m...
This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper: