https://brightdata.com/licensehttps://brightdata.com/license
Gain extensive insights with our Amazon datasets, encompassing detailed product information including pricing, reviews, ratings, brand names, product categories, sellers, ASINs, images, and much more. Ideal for market researchers, data analysts, and eCommerce professionals looking to excel in the competitive online marketplace. Over 425M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:
Title Asin Main Image Brand Name Description Availability Subcategory Categories Parent Asin Type Product Type Name Model Number Manufacturer Color Size Date First Available Released Model Year Item Model Number Part Number Price Total Reviews Total Ratings Average Rating Features Best Sellers Rank Subcategory Buybox Buybox Seller Id Buybox Is Amazon Images Product URL And more
https://brightdata.com/licensehttps://brightdata.com/license
Buy Amazon datasets and get access to over 300 million records from any Amazon domain. Get insights on Amazon products, sellers, and reviews.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Access a comprehensive dataset of over 240,000 shoe product listings directly from Amazon UK. This dataset is ideal for researchers, e-commerce analysts, and AI developers looking to explore pricing trends, brand performance, product features, or build training data for retail-focused models.
All data is neatly packaged in a downloadable ZIP archive containing files in JSON format, making it easy to integrate with your preferred analytics or database tools.
Price and discount trend analysis
Competitor benchmarking
Product attribute extraction and modeling
AI/ML training datasets (e.g., shoe recommendation systems)
Retail assortment planning
This dataset is available as a static snapshot, but you can request weekly or monthly updates through the Crawl Feeds dashboard. Upon purchase, the data will be bundled and delivered via a direct download link.
These datasets contain 1.48 million question and answer pairs about products from Amazon.
Metadata includes
question and answer text
is the question binary (yes/no), and if so does it have a yes/no answer?
timestamps
product ID (to reference the review dataset)
Basic Statistics:
Questions: 1.48 million
Answers: 4,019,744
Labeled yes/no questions: 309,419
Number of unique products with questions: 191,185
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These datasets consist of product reviews we ourselves collected from Amazon.com, starting from the year 2008 to 2020, spanning across seven different domains, namely, book (Becoming by Michelle Obama), pharmaceutical (Turmeric Curcumin Supplement by Natures Nutrition), electronics (Echo Dot 3rd Gen by Amazon), grocery (Sparkling Ice Blue Variety Pack), healthcare (EnerPlex 3-Ply Re-usable Face Mask), entertainment (Harry Potter: The Complete 8-Film Collection), and personal care (Nautica Voyage By Nautica). These datasets consist of 5000 reviews each.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To read any dataset you can use the following code
>>> import numpy as np
>>> embed_image = np.load('embed_image.npy')
>>> embed_image.shape
(33962, 768)
>>> embed_text = np.load('embed_text.npy')
>>> embed_text.shape
(33962, 768)
>>> import pandas as pd
>>> items = pd.read_csv('items.txt')
>>> m = len(items)
>>> print(f'{m} items in dataset')
33962
>>> users = pd.read_csv('users.txt')
>>> n = len(users)
>>> print(f'{n} users in dataset')
14790
>>> train = pd.read_csv('train.txt')
>>> train
user item
0 13444 23557
1 13444 33739
... ... ...
317109 13506 29993
317110 13506 13931
>>> from scipy.sparse import csr_matrix
>>> train_matrix = csr_matrix((np.ones(len(train)), (train.user, train.item)), shape=(n,m))
This dataset contains six datasets. Each dataset is duplicated with seven combinations of different Image and Text encoders, so you should see 42 folders.
Each folder is the name of the dataset and the encoder used for the visual and textual parts. For example: bookcrossing-vit_bert
.
The datasets are: - Clothing, Shoes and Jewelry (Amazon) - Home and Kitchen (Amazon) - Musical Instruments (Amazon) - Movies and TV (Amazon) - Book-Crossing - Movielens 25M
And the encoders are:
- CLIP (Image and Text) (*-clip_clip
). This is the main one used in the experiments.
- ViT and BERT (*-vit_bert
)
- CLIP (only visual data) *-clip_none
- ViT only *-vit_none
- BERT only *-none_bert
- CLIP (text only) *-clip_none
- No textual or visual information *-none_none
For each dataset, we have the following files, considering we have M
items and N
users, textual embeddings with D (like 1024) dimensions, and Visual with E dimensions (like 768)
- embed_image.npy
A NumPy array of MxE
elements.
- embed_text.npy
A NumPy array of MXD
elements.
- items.csv
A CSV with the Item ID in the original dataset (like the Amazon ASIN, the Movie ID, etc.) and the item number, an integer from 0 to M-1
- users.csv
A CSV with the User ID in the original dataset (like the Amazon Reviewer Id) and the item number, an integer from 0 to N-1
- train.txt
, validation.txt
and test.txt
are CSV files with the portions of the reviews for train validation and test. It has the item the user liked or reviewed positively. Each row has a positive user item.
We consider a review "positive" if the rating is four or more (or 8 or more for Book-crossing).
The vector is zeroed out if an Item does not have an image or text.
Dataset | Users | Item | Ratings | Density |
---|---|---|---|---|
Clothing & Shoes & Jewelry | 23318 | 38493 | 178944 | 0.020% |
Home & Kitchen | 5968 | 57645 | 135839 | 0.040% |
Movies & TV | 21974 | 23958 | 216110 | 0.041% |
Musical Instruments | 14429 | 29040 | 93923 | 0.022% |
Book-crossing | 14790 | 33962 | 519613 | 0.103% |
Movielens 25M | 162541 | 59047 | 25000095 | 0.260% |
Only a tiny fraction of the dataset was taken for the Amazon Datasets by considering reviews in a specific date range.
For the Bookcrossing dataset, only items with images were considered.
There are various other minor tweaks on how to obtain images and texts. The repo https://github.com/igui/MultimodalRecomAnalysis has the Notebook and scripts to reproduce the dataset extraction from scratch.
https://brightdata.com/licensehttps://brightdata.com/license
Unlock powerful insights with the Amazon Prime dataset, offering access to millions of records from any Amazon domain. This dataset provides comprehensive data points such as product titles, descriptions, exclusive Prime discounts, brand details, pricing (initial and discounted), availability, customer ratings, reviews, and product categories. Additionally, it includes unique identifiers like ASINs, images, and seller information, allowing you to analyze Prime offerings, trends, and customer preferences with precision. Use this dataset to optimize your eCommerce strategies by analyzing Prime-exclusive pricing strategies, identifying top-performing brands and products, and tracking customer sentiment through reviews and ratings. Gain valuable insights into consumer demand, seasonal trends, and the impact of Prime discounts to make data-driven decisions that enhance your inventory management, marketing campaigns, and pricing strategies. Whether you’re a retailer, marketer, data analyst, or researcher, the Amazon Prime dataset empowers you with the data needed to stay competitive in the dynamic eCommerce landscape. Available in various formats such as JSON, CSV, and Parquet, and delivered via flexible options like API, S3, or email, this dataset ensures seamless integration into your workflows.
By ANil [source]
This dataset provides an in-depth look at the profitability of e-commerce sales. It contains data on a variety of sales channels, including Shiprocket and INCREFF, as well as financial information on related expenses and profits. The columns contain data such as SKU codes, design numbers, stock levels, product categories, sizes and colors. In addition to this we have included the MRPs across multiple stores like Ajio MRP , Amazon MRP , Amazon FBA MRP , Flipkart MRP , Limeroad MRP Myntra MRP and PaytmMRP along with other key parameters like amount paid by customer for the purchase , rate per piece for every individual transaction Also we have added transactional parameters like Date of sale months category fulfilledby B2b Status Qty Currency Gross amt . This is a must-have dataset for anyone trying to uncover the profitability of e-commerce sales in today's marketplace
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides a comprehensive overview of e-commerce sales data from different channels covering a variety of products. Using this dataset, retailers and digital marketers can measure the performance of their campaigns more accurately and efficiently.
The following steps help users make the most out of this dataset: - Analyze the general sales trends by examining info such as month, category, currency, stock level, and customer for each sale. This will give you an idea about how your e-commerce business is performing in each channel.
- Review the Shiprocket and INCREF data to compare and analyze profitability via different fulfilment methods. This comparison would enable you to make better decisions towards maximizing profit while minimizing costs associated with each method’s referral fees and fulfillment rates.
- Compare prices between various channels such as Amazon FBA MRP, Myntra MRP, Ajio MRP etc using the corresponding columns for each store (Amazon MRP etc). You can judge which stores are offering more profitable margins without compromising on quality by analyzing these pricing points in combination with other information related to product sales (TP1/TP2 - cost per piece).
- Look at customer specific data such as TP 1/TP 2 combination wise Gross Amount or Rate info in terms price per piece or total gross amount generated by any SKU dispersed over multiple customers with relevant dates associated to track individual item performance relative to others within its category over time periods shortlisted/filtered appropriately.. Have an eye on items commonly utilized against offers or promotional discounts offered hence crafting strategies towards inventory optimization leading up-selling operations.?
- Finally Use Overall ‘Stock’ details along all the P & L Data including Yearly Expenses_IIGF information record for takeaways which might be aimed towards essential cost cutting measures like switching amongst delivery options carefully chosen out of Shiprocket & INCREFF leadings away from manual inspections catering savings under support personnel outsourcing structures.?By employing a comprehensive understanding on how our internal subsidiaries perform globally unless attached respective audits may provide us remarkably lower operational costs servicing confidence; costing far lesser than being incurred taking into account entire pallet shipments tracking sheets representing current level supply chains efficiencies achieved internally., then one may finally scale profits exponentially increases cut down unseen losses followed up introducing newer marketing campaigns necessarily tailored according playing around multiple goods based spectrums due powerful backing suitable transportation boundaries set carefully
- Analysing the difference in profitability between sales made through Shiprocket and INCREFF. This data can be used to see where the biggest profit margins lie, and strategize accordingly.
- Examining the Complete Cost structure of a product with all its components and their contribution towards revenue or profitability, i.e., TP 1 & 2, MRP Old & Final MRP Old together with Platform based MRP - Amazon, Myntra and Paytm etc., Currency based Profit Margin etc.
- Building a predictive model using Machine Learning by leveraging historical data to predict future sales volume and profits for e-commerce products across multiple categories/devices/platforms such as Amazon, Flipkart, Myntra etc as well providing m...
http://www.gnu.org/licenses/fdl-1.3.htmlhttp://www.gnu.org/licenses/fdl-1.3.html
This dataset contains transaction data from a fictitious SaaS company selling sales and marketing software to other companies (B2B). In the dataset, each row represents a single transaction/order (9,994 transactions), and the columns include:
Here is the Original Dataset: https://ee-assets-prod-us-east-1.s3.amazonaws.com/modules/337d5d05acc64a6fa37bcba6b921071c/v1/SaaS-Sales.csv
| # | Name of the attribute | Description | | -- | --------------------- | -------------------------------------------------------- | | 1 | Row ID | A unique identifier for each transaction. | | 2 | Order ID | A unique identifier for each order. | | 3 | Order Date | The date when the order was placed. | | 4 | Date Key | A numerical representation of the order date (YYYYMMDD). | | 5 | Contact Name | The name of the person who placed the order. | | 6 | Country | The country where the order was placed. | | 7 | City | The city where the order was placed. | | 8 | Region | The region where the order was placed. | | 9 | Subregion | The subregion where the order was placed. | | 10 | Customer | The name of the company that placed the order. | | 11 | Customer ID | A unique identifier for each customer. | | 13 | Industry | The industry the customer belongs to. | | 14 | Segment | The customer segment (SMB, Strategic, Enterprise, etc.). | | 15 | Product | The product was ordered. | | 16 | License | The license key for the product. | | 17 | Sales | The total sales amount for the transaction. | | 18 | Quantity | The total number of items in the transaction. | | 19 | Discount | The discount applied to the transaction. | | 20 | Profit | The profit from the transaction. |
These datasets contain attributes about products sold on ModCloth and Amazon which may be sources of bias in recommendations (in particular, attributes about how the products are marketed). Data also includes user/item interactions for recommendation.
Metadata includes
ratings
product images
user identities
item sizes, user genders
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Amazon Review Full Score DatasetVersion 3, Updated 09/09/2015ORIGINThe Amazon reviews dataset consists of reviews from amazon. The data span a period of 18 years, including ~35 million reviews up to March 2013. Reviews include product and user information, ratings, and a plaintext review. For more information, please refer to the following paper: J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.The Amazon reviews full score dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the above dataset. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).DESCRIPTIONThe Amazon reviews full score dataset is constructed by randomly taking 600,000 training samples and 130,000 testing samples for each review score from 1 to 5. In total there are 3,000,000 trainig samples and 650,000 testing samples.The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 3 columns in them, corresponding to class index (1 to 5), review title and review text. The review title and text are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is " ".
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset consists of reviews of fine foods from amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plain text review. It also includes reviews from all other Amazon categories.
Data includes:
- Reviews from Oct 1999 - Oct 2012
- 568,454 reviews
- 256,059 users
- 74,258 products
- 260 users with > 50 reviews
See this SQLite query for a quick sample of the dataset.
If you publish articles based on this dataset, please cite the following paper:
https://brightdata.com/licensehttps://brightdata.com/license
Unlock powerful insights with the Amazon Electronics dataset, offering access to millions of records from any Amazon domain. This dataset provides comprehensive data points such as product titles, descriptions, brand details, pricing (initial and discounted), availability, customer ratings, reviews, and product categories. Additionally, it includes unique identifiers like ASINs, images, and seller information, allowing you to analyze product listings, trends, and customer preferences with precision. Use this dataset to optimize your eCommerce strategies by benchmarking competitor pricing, identifying top-performing brands, and tracking customer sentiment through reviews and ratings. Gain valuable insights into consumer demand, seasonal trends, and market gaps to make data-driven decisions that enhance your inventory management, marketing campaigns, and pricing strategies. Whether you’re a retailer, marketer, data analyst, or researcher, the Amazon Electronics dataset empowers you with the data needed to stay competitive in the dynamic eCommerce landscape. Available in various formats such as JSON, CSV, and Parquet, and delivered via flexible options like API, S3, or email, this dataset ensures seamless integration into your workflows.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
The Amazon Cat Supplies dataset provides a rich and structured collection of products specifically tailored for feline needs.
It includes detailed information on cat food, toys, grooming products, litter, and accessories.
Perfect for ecommerce analysis, market research, building recommendation systems, and understanding pet product trends.
Each record contains product titles, descriptions, categories, prices, and brand details — making it a ready-to-use resource for data-driven projects.
These datasets contain peer-to-peer trades from various recommendation platforms.
Metadata includes
peer-to-peer trades
have and want lists
image data (tradesy)
https://brightdata.com/licensehttps://brightdata.com/license
Unlock powerful insights with the Amazon Fine Food dataset, offering access to millions of records from any Amazon domain. This dataset provides comprehensive data points such as product titles, descriptions, brand details, pricing (initial and discounted), availability, customer ratings, reviews, and product categories. Additionally, it includes unique identifiers like ASINs, ingredients, and seller information, allowing you to analyze food listings, trends, and customer preferences with precision. Use this dataset to optimize your eCommerce strategies by benchmarking competitor pricing, identifying top-performing brands, and tracking customer sentiment through reviews and ratings. Gain valuable insights into consumer demand, dietary preferences, and market gaps to make data-driven decisions that enhance your inventory management, marketing campaigns, and product strategies. Whether you’re a retailer, marketer, data analyst, or researcher, the Amazon Fine Food dataset empowers you with the data needed to stay competitive in the dynamic eCommerce landscape. Available in various formats such as JSON, CSV, and Parquet, and delivered via flexible options like API, S3, or email, this dataset ensures seamless integration into your workflows.
Introducing E-Commerce Product Datasets!
Unlock the full potential of your product strategy with E-Commerce Product Datasets. Gain invaluable insights to optimize your product offerings and pricing, analyze top-selling strategies, and assess customer sentiment.
Our E-Commerce Datasets Source:
Amazon: Access accurate product data from Amazon, including categories, pricing, reviews, and more.
Walmart: Receive comprehensive product information from Walmart, covering pricing, sellers, ratings, availability, and more.
E-Commerce Product Datasets provide structured and actionable data, empowering you to understand customer needs and enhance product strategies. We deliver fresh and precise public e-commerce data, including product names, brands, prices, number of sellers, review counts, ratings, and availability.
You have the flexibility to tailor data delivery to your specific needs:
Why Choose Oxylabs E-Commerce Datasets:
Fresh and accurate data: Access clean and structured public e-commerce data collected by our leading web scraping professionals.
Time and resource savings: Let our experts handle data extraction at an affordable cost, allowing you to focus on your core business objectives.
Customizable solutions: Share your unique business needs, and our team will craft customized dataset solutions tailored to your requirements.
Legal compliance: Partner with a trusted leader in ethical data collection, endorsed by Fortune 500 companies and fully compliant with GDPR and CCPA regulations.
Pricing Options:
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Unlock the potential of your e-commerce strategy with E-Commerce Product Datasets!
This is a mutli-modal dataset for restaurants from Google Local (Google Maps). Data includes images and reviews posted by users, as well as metadata for each restaurant.
This dataset contains longitudinal purchases data from 5027 Amazon.com users in the US, spanning 2018 through 2022: amazon-purchases.csv It also includes demographic data and other consumer level variables for each user with data in the dataset. These consumer level variables were collected through an online survey and are included in survey.csv fields.csv describes the columns in the survey.csv file, where fields/survey columns correspond to survey questions. The dataset also contains the survey instrument used to collect the data. More details about the survey questions and possible responses, and the format in which they were presented can be found by viewing the survey instrument. A 'Survey ResponseID' column is present in both the amazon-purchases.csv and survey.csv files. It links a user's survey responses to their Amazon.com purchases. The 'Survey ResponseID' was randomly generated at the time of data collection. amazon-purchases.csv Each row in this file corresponds to an Amazon order. Each such row has the following columns: Survey ResponseID Order date Shipping address state Purchase price per unit Quantity ASIN/ISBN (Product Code) Title Category The data were exported by the Amazon users from Amazon.com and shared by users with their informed consent. PII and other information not listed above were stripped from the data. This processing occurred on users' machines before sharing with researchers.
Likes and image data from the community art website Behance. This is a small, anonymized, version of a larger proprietary dataset.
Metadata includes
appreciates (likes)
timestamps
extracted image features
Basic Statistics:
Users: 63,497
Items: 178,788
Appreciates (likes): 1,000,000
https://brightdata.com/licensehttps://brightdata.com/license
Gain extensive insights with our Amazon datasets, encompassing detailed product information including pricing, reviews, ratings, brand names, product categories, sellers, ASINs, images, and much more. Ideal for market researchers, data analysts, and eCommerce professionals looking to excel in the competitive online marketplace. Over 425M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:
Title Asin Main Image Brand Name Description Availability Subcategory Categories Parent Asin Type Product Type Name Model Number Manufacturer Color Size Date First Available Released Model Year Item Model Number Part Number Price Total Reviews Total Ratings Average Rating Features Best Sellers Rank Subcategory Buybox Buybox Seller Id Buybox Is Amazon Images Product URL And more