https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This dataset is a sample extraction of product listings from Zoro.com, a leading industrial supply e-commerce platform. It provides structured product-level data that can be used for market research, price comparison engines, product matching models, and e-commerce analytics.
The sample includes a variety of products across tools, hardware, safety equipment, and industrial supplies — with clean, structured fields suitable for both analysis and model training.
Also available: Grainger Product Datasets – structured data from a top industrial supplier.
Submit your custom data requests via the Zoro products page or contact us directly at contact@crawlfeeds.com.
Ideal for previewing before requesting larger or full Zoro datasets
Building product comparison or search engines
Price intelligence and competitor monitoring
Product classification and attribute extraction
Training data for e-commerce AI models
This is a sample of a much larger dataset extracted from Zoro.com.
👉 Contact us to access full datasets or request custom category extractions.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset Card for "amazon-product-data-filter"
Dataset Summary
The Amazon Product Dataset contains product listing data from the Amazon US website. It can be used for various NLP and classification tasks, such as text generation, product type classification, attribute extraction, image recognition and more. NOTICE: This is a sample of the full Amazon Product Dataset, which contains 1K examples. Follow the link to gain access to the full dataset.
Languages… See the full description on the dataset page: https://huggingface.co/datasets/iarbel/amazon-product-data-sample.
Note:- Only publicly available data can be worked upon
In today's ever-evolving Ecommerce landscape, success hinges on the ability to harness the power of data. APISCRAPY is your strategic ally, dedicated to providing a comprehensive solution for extracting critical Ecommerce data, including Ecommerce market data, Ecommerce product data, and Ecommerce datasets. With the Ecommerce arena being more competitive than ever, having a data-driven approach is no longer a luxury but a necessity.
APISCRAPY's forte lies in its ability to unearth valuable Ecommerce market data. We recognize that understanding the market dynamics, trends, and fluctuations is essential for making informed decisions.
APISCRAPY's AI-driven ecommerce data scraping service presents several advantages for individuals and businesses seeking comprehensive insights into the ecommerce market. Here are key benefits associated with their advanced data extraction technology:
Ecommerce Product Data: APISCRAPY's AI-driven approach ensures the extraction of detailed Ecommerce Product Data, including product specifications, images, and pricing information. This comprehensive data is valuable for market analysis and strategic decision-making.
Data Customization: APISCRAPY enables users to customize the data extraction process, ensuring that the extracted ecommerce data aligns precisely with their informational needs. This customization option adds versatility to the service.
Efficient Data Extraction: APISCRAPY's technology streamlines the data extraction process, saving users time and effort. The efficiency of the extraction workflow ensures that users can obtain relevant ecommerce data swiftly and consistently.
Realtime Insights: Businesses can gain real-time insights into the dynamic Ecommerce Market by accessing rapidly extracted data. This real-time information is crucial for staying ahead of market trends and making timely adjustments to business strategies.
Scalability: The technology behind APISCRAPY allows scalable extraction of ecommerce data from various sources, accommodating evolving data needs and handling increased volumes effortlessly.
Beyond the broader market, a deeper dive into specific products can provide invaluable insights. APISCRAPY excels in collecting Ecommerce product data, enabling businesses to analyze product performance, pricing strategies, and customer reviews.
To navigate the complexities of the Ecommerce world, you need access to robust datasets. APISCRAPY's commitment to providing comprehensive Ecommerce datasets ensures businesses have the raw materials required for effective decision-making.
Our primary focus is on Amazon data, offering businesses a wealth of information to optimize their Amazon presence. By doing so, we empower our clients to refine their strategies, enhance their products, and make data-backed decisions.
[Tags: Ecommerce data, Ecommerce Data Sample, Ecommerce Product Data, Ecommerce Datasets, Ecommerce market data, Ecommerce Market Datasets, Ecommerce Sales data, Ecommerce Data API, Amazon Ecommerce API, Ecommerce scraper, Ecommerce Web Scraping, Ecommerce Data Extraction, Ecommerce Crawler, Ecommerce data scraping, Amazon Data, Ecommerce web data]
https://brightdata.com/licensehttps://brightdata.com/license
The Product Catalog Data provides a comprehensive overview of products across various categories. This dataset includes detailed product titles, descriptions, barcodes, category-specific attributes, weight, measurements, and imagery. It's tailored for marketplaces, eCommerce sites, and data analysts who require in-depth product information to enhance user experience, SEO, and product categorization.
Popular Attributes:
✔ Detailed product information
✔ High-quality imagery
✔ Extensive attribute coverage
✔ Ideal for UX and SEO optimization
✔ Comprehensive product categorization
Key Information:
Rich dataset with 30+ attributes per product
Pricing: Flexible subscription models
Update Frequency: Daily updates
Coverage: Global and specific markets
Historical Data: 12 Months +
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains images of Television, Sofas, Jeans and T-shirt. It Actual raw and unstructured image data extracted from online sites.
All images are of different sites. You may also find some junk images in data for example in television dataset you will find the television remote images.
This dataset is not refined intentionally to make sure practitioners should get taste of What kind of data ML/Data Science Engineer get when they start working on any project in industry.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Sample Sales Data is a retail sales dataset of 2,823 orders and 25 columns that includes a variety of sales-related data, including order numbers, product information, quantity, unit price, sales, order date, order status, customer and delivery information.
2) Data Utilization (1) Sample Sales Data has characteristics that: • This dataset consists of numerical (sales, quantity, unit price, etc.), categorical (product, country, city, customer name, transaction size, etc.), and date (order date) variables, with missing values in some columns (STATE, ADDRESSLINE2, POSTALCODE, etc.). (2) Sample Sales Data can be used to: • Analysis of sales trends and performance by product: Key variables such as order date, product line, and country can be used to visualize and analyze monthly and yearly sales trends, the proportion of sales by product line, and top sales by country and region. • Segmentation and marketing strategies: Segmentation of customer groups based on customer information, transaction size, and regional data, and use them to design targeted marketing and customized promotion strategies.
First of all, Amazon product datasets are indispensable for reverse engineering your rivals. For example, you can collect a list of keywords you already rank for or want to, and go through DataForSEO Amazon Products Database to find other sellers appearing as the top results for these terms.
Next, you can narrow down the scope of your contenders to those performing the best. To do so, you can filter out sellers who won the “Amazon’s Choice” and those whose products got listed multiple times on the first page.
Once you’ve compiled the final list of your challengers, Amazon Products Database will help you to quickly examine product titles, descriptions, prices, images, and other details that will let you grasp the main contributors to your competitors’ success. Once you’ve figured that out, you can start optimizing your product listings and pricing strategies to increase conversions.
However, the number of use cases for Amazon product data isn’t limited to competitor analysis. It can be applied to monitoring product rankings, running price comparisons, and more.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This comprehensive IKEA USA products dataset contains detailed information about thousands of authentic IKEA furniture items, home decor, and household products available in the United States market. The dataset provides complete product specifications, pricing, availability, and detailed descriptions for ecommerce analysis, price comparison, and furniture retail research.
Key Features:
Get Free Sample: Download your free sample dataset now to explore the data quality and structure before purchasing the complete IKEA USA products database. The free sample includes representative product entries with all key fields populated.
Applications: Perfect for furniture market analysis, home improvement research, interior design planning, competitive pricing analysis, and retail intelligence. This dataset enables businesses to understand IKEA pricing strategies, product positioning, and market trends in the home furnishing industry.
Product Categories Included: Office furniture, bedroom furniture, storage solutions, outdoor dining sets, kitchen systems, home organization products, decorative accessories, plant containers, and sustainable furniture options. All products include comprehensive details for business intelligence and market research applications.
Sample purchasing data containing information on suppliers, the products they provide, and the projects those products are used for. Data created or adapted from publicly available sources.
These datasets contain peer-to-peer trades from various recommendation platforms.
Metadata includes
peer-to-peer trades
have and want lists
image data (tradesy)
https://choosealicense.com/licenses/agpl-3.0/https://choosealicense.com/licenses/agpl-3.0/
Open Food Facts Database
What is 🍊 Open Food Facts?
A food products database
Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.
Made by everyone
Open Food Facts is a non-profit association of volunteers. 25.000+ contributors like you have added 1.7 million + products from 150 countries using our Android or iPhone app or their camera to scan… See the full description on the dataset page: https://huggingface.co/datasets/openfoodfacts/product-database.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
nfinite-product-masks-sample
Version of the release: 1.0.0-alphaRelease date: 2025/08/30
Dataset Summary
The nfinite-product-masks-sample dataset is a dataset of images from 3D models for objects usually found in the home & living room space. Each image has been rendered photo-realistically from 3D models.Those 3D models are generic models, from any IP (as explained in the Personal and Sensitive Information part, any resemblance to an object from real life is purely… See the full description on the dataset page: https://huggingface.co/datasets/Nfiniteai/product-masks-sample.
Company Datasets for valuable business insights!
Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.
These datasets are sourced from top industry providers, ensuring you have access to high-quality information:
We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:
You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.
Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.
With Oxylabs Datasets, you can count on:
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1.Introduction
Sales data collection is a crucial aspect of any manufacturing industry as it provides valuable insights about the performance of products, customer behaviour, and market trends. By gathering and analysing this data, manufacturers can make informed decisions about product development, pricing, and marketing strategies in Internet of Things (IoT) business environments like the dairy supply chain.
One of the most important benefits of the sales data collection process is that it allows manufacturers to identify their most successful products and target their efforts towards those areas. For example, if a manufacturer could notice that a particular product is selling well in a certain region, this information could be utilised to develop new products, optimise the supply chain or improve existing ones to meet the changing needs of customers.
This dataset includes information about 7 of MEVGAL’s products [1]. According to the above information the data published will help researchers to understand the dynamics of the dairy market and its consumption patterns, which is creating the fertile ground for synergies between academia and industry and eventually help the industry in making informed decisions regarding product development, pricing and market strategies in the IoT playground. The use of this dataset could also aim to understand the impact of various external factors on the dairy market such as the economic, environmental, and technological factors. It could help in understanding the current state of the dairy industry and identifying potential opportunities for growth and development.
2. Citation
Please cite the following papers when using this dataset:
3. Dataset Modalities
The dataset includes data regarding the daily sales of a series of dairy product codes offered by MEVGAL. In particular, the dataset includes information gathered by the logistics division and agencies within the industrial infrastructures overseeing the production of each product code. The products included in this dataset represent the daily sales and logistics of a variety of yogurt-based stock. Each of the different files include the logistics for that product on a daily basis for three years, from 2020 to 2022.
3.1 Data Collection
The process of building this dataset involves several steps to ensure that the data is accurate, comprehensive and relevant.
The first step is to determine the specific data that is needed to support the business objectives of the industry, i.e., in this publication’s case the daily sales data.
Once the data requirements have been identified, the next step is to implement an effective sales data collection method. In MEVGAL’s case this is conducted through direct communication and reports generated each day by representatives & selling points.
It is also important for MEVGAL to ensure that the data collection process conducted is in an ethical and compliant manner, adhering to data privacy laws and regulation. The industry also has a data management plan in place to ensure that the data is securely stored and protected from unauthorised access.
The published dataset is consisted of 13 features providing information about the date and the number of products that have been sold. Finally, the dataset was anonymised in consideration to the privacy requirement of the data owner (MEVGAL).
File |
Period |
Number of Samples (days) |
product 1 2020.xlsx |
01/01/2020–31/12/2020 |
363 |
product 1 2021.xlsx |
01/01/2021–31/12/2021 |
364 |
product 1 2022.xlsx |
01/01/2022–31/12/2022 |
365 |
product 2 2020.xlsx |
01/01/2020–31/12/2020 |
363 |
product 2 2021.xlsx |
01/01/2021–31/12/2021 |
364 |
product 2 2022.xlsx |
01/01/2022–31/12/2022 |
365 |
product 3 2020.xlsx |
01/01/2020–31/12/2020 |
363 |
product 3 2021.xlsx |
01/01/2021–31/12/2021 |
364 |
product 3 2022.xlsx |
01/01/2022–31/12/2022 |
365 |
product 4 2020.xlsx |
01/01/2020–31/12/2020 |
363 |
product 4 2021.xlsx |
01/01/2021–31/12/2021 |
364 |
product 4 2022.xlsx |
01/01/2022–31/12/2022 |
364 |
product 5 2020.xlsx |
01/01/2020–31/12/2020 |
363 |
product 5 2021.xlsx |
01/01/2021–31/12/2021 |
364 |
product 5 2022.xlsx |
01/01/2022–31/12/2022 |
365 |
product 6 2020.xlsx |
01/01/2020–31/12/2020 |
362 |
product 6 2021.xlsx |
01/01/2021–31/12/2021 |
364 |
product 6 2022.xlsx |
01/01/2022–31/12/2022 |
365 |
product 7 2020.xlsx |
01/01/2020–31/12/2020 |
362 |
product 7 2021.xlsx |
01/01/2021–31/12/2021 |
364 |
product 7 2022.xlsx |
01/01/2022–31/12/2022 |
365 |
3.2 Dataset Overview
The following table enumerates and explains the features included across all of the included files.
Feature |
Description |
Unit |
Day |
day of the month |
- |
Month |
Month |
- |
Year |
Year |
- |
daily_unit_sales |
Daily sales - the amount of products, measured in units, that during that specific day were sold |
units |
previous_year_daily_unit_sales |
Previous Year’s sales - the amount of products, measured in units, that during that specific day were sold the previous year |
units |
percentage_difference_daily_unit_sales |
The percentage difference between the two above values |
% |
daily_unit_sales_kg |
The amount of products, measured in kilograms, that during that specific day were sold |
kg |
previous_year_daily_unit_sales_kg |
Previous Year’s sales - the amount of products, measured in kilograms, that during that specific day were sold, the previous year |
kg |
percentage_difference_daily_unit_sales_kg |
The percentage difference between the two above values |
kg |
daily_unit_returns_kg |
The percentage of the products that were shipped to selling points and were returned |
% |
previous_year_daily_unit_returns_kg |
The percentage of the products that were shipped to |
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Amazon Product Description Dataset
This dataset is a cleaned version of Amazon Product Data. Cleaned by team at https://exnrt.com
421K Unique Examples Empty description rows are being removed. Description Smaller then 200 characters are removed Convert to Proper Format Remove non-ASCII characters from both column And few more techniques
Original Dataset
This original dataset has 10 Million Examples. Original, Un-cleaned DataSet:… See the full description on the dataset page: https://huggingface.co/datasets/Ateeqq/Amazon-Product-Description.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Domain-Adaptive Data Synthesis for Large-Scale Supermarket Product Recognition
This repository contains the data synthesis pipeline and synthetic product recognition datasets proposed in [1].
Data Synthesis Pipeline:
We provide the Blender 3.1 project files and Python source code of our data synthesis pipeline pipeline.zip, accompanied by the FastCUT models used for synthetic-to-real domain translation models.zip. For the synthesis of new shelf images, a product assortment list and product images must be provided in the corresponding directories products/assortment/ and products/img/. The pipeline expects product images to follow the naming convention c.png, with c corresponding to a GTIN or generic class label (e.g., 9120050882171.png). The assortment list, assortment.csv, is expected to use the sample format [c, w, d, h], with c being the class label and w, d, and h being the packaging dimensions of the given product in mm (e.g., [4004218143128, 140, 70, 160]). The assortment list to use and the number of images to generate can be specified in generateImages.py (see comments). The rendering process is initiated by either executing load.py from within Blender or within a command-line terminal as a background process.
Datasets:
SG3k - Synthetic GroZi-3.2k (SG3k) dataset, consisting of 10,000 synthetic shelf images with 851,801 instances of 3,234 GroZi-3.2k products. Instance-level bounding boxes and generic class labels are provided for all product instances.
SG3kt - Domain-translated version of SGI3k, utilizing GroZi-3.2k as the target domain. Instance-level bounding boxes and generic class labels are provided for all product instances.
SGI3k - Synthetic GroZi-3.2k (SG3k) dataset, consisting of 10,000 synthetic shelf images with 838,696 instances of 1,063 GroZi-3.2k products. Instance-level bounding boxes and generic class labels are provided for all product instances.
SGI3kt - Domain-translated version of SGI3k, utilizing GroZi-3.2k as the target domain. Instance-level bounding boxes and generic class labels are provided for all product instances.
SPS8k - Synthetic Product Shelves 8k (SPS8k) dataset, comprised of 16,224 synthetic shelf images with 1,981,967 instances of 8,112 supermarket products. Instance-level bounding boxes and GTIN class labels are provided for all product instances.
SPS8kt - Domain-translated version of SPS8k, utilizing SKU110k as the target domain. Instance-level bounding boxes and GTIN class labels for all product instances.
Table 1: Dataset characteristics.
Dataset
labels
translation
SG3k 10,000 3,234 851,801 bounding box & generic class¹ none
SG3kt 10,000 3,234 851,801 bounding box & generic class¹ GroZi-3.2k
SGI3k 10,000 1,063 838,696 bounding box & generic class² none
SGI3kt 10,000 1,063 838,696 bounding box & generic class² GroZi-3.2k
SPS8k 16,224 8,112 1,981,967 bounding box & GTIN none
SPS8kt 16,224 8,112 1,981,967 bounding box & GTIN SKU110k
Sample Format
A sample consists of an RGB image (i.png) and an accompanying label file (i.txt), which contains the labels for all product instances present in the image. Labels use the YOLO format [c, x, y, w, h].
¹SG3k and SG3kt use generic pseudo-GTIN class labels, created by combining the GroZi-3.2k food product category number i (1-27) with the product image index j (j.jpg), following the convention i0000j (e.g., 13000097).
²SGI3k and SGI3kt use the generic GroZi-3.2k class labels from https://arxiv.org/abs/2003.06800.
Download and UseThis data may be used for non-commercial research purposes only. If you publish material based on this data, we request that you include a reference to our paper [1].
[1] Strohmayer, Julian, and Martin Kampel. "Domain-Adaptive Data Synthesis for Large-Scale Supermarket Product Recognition." International Conference on Computer Analysis of Images and Patterns. Cham: Springer Nature Switzerland, 2023.
BibTeX citation:
@inproceedings{strohmayer2023domain, title={Domain-Adaptive Data Synthesis for Large-Scale Supermarket Product Recognition}, author={Strohmayer, Julian and Kampel, Martin}, booktitle={International Conference on Computer Analysis of Images and Patterns}, pages={239--250}, year={2023}, organization={Springer} }
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
🔍 TSMPD-US-Public v1.0 – Small Merchant Product Dataset (Public Sample)
This dataset provides a public sample of structured product listings from 355,722 verified small U.S.-based merchants, containing:
~3.2 million product records
Text fields only (vendor, title, description, tags, category, last_updated)
No images or variant (SKU) data
It is designed for LLM research, product grounding, semantic commerce, and agent training.
🔐 Looking for the full dataset?
The Partner/Reserve version includes:
All products per merchant (11.9M+ total)
Product variants (67M SKUs)
Product images (54M URLs)
Store domains and product URLs
Dataset watermark for traceability
📬 To request access: email jim@tokuhn.com
This extended version is offered under a commercial or research license to ensure fair and traceable use in LLM applications.
Unlock the potential of Ecommerce data scraping and extraction with APISCRAPY. Dive into Amazon data and tap into the vast Ecommerce market's secrets. Stay ahead of the competition by leveraging our powerful tool for comprehensive Ecommerce data insights.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Access the Home Depot products dataset, a comprehensive collection of web-scraped data featuring home improvement products. Discover trending tools, hardware, appliances, décor, and gardening essentials to enhance your projects. From power tools and building materials to lighting, furniture, and outdoor living items, this dataset provides insights into top-rated products, best-selling brands, and emerging trends.
Download now to explore detailed product data for smarter decision-making in home improvement, DIY, and construction projects.
For a closer look at the product-level data we’ve extracted from Home Depot, including pricing, stock status, and detailed specifications, visit the Home Depot dataset page. You can explore sample records and submit a request for tailored extracts directly from there.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a comprehensive overview of online sales transactions across different product categories. Each row represents a single transaction with detailed information such as the order ID, date, category, product name, quantity sold, unit price, total price, region, and payment method.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This dataset is a sample extraction of product listings from Zoro.com, a leading industrial supply e-commerce platform. It provides structured product-level data that can be used for market research, price comparison engines, product matching models, and e-commerce analytics.
The sample includes a variety of products across tools, hardware, safety equipment, and industrial supplies — with clean, structured fields suitable for both analysis and model training.
Also available: Grainger Product Datasets – structured data from a top industrial supplier.
Submit your custom data requests via the Zoro products page or contact us directly at contact@crawlfeeds.com.
Ideal for previewing before requesting larger or full Zoro datasets
Building product comparison or search engines
Price intelligence and competitor monitoring
Product classification and attribute extraction
Training data for e-commerce AI models
This is a sample of a much larger dataset extracted from Zoro.com.
👉 Contact us to access full datasets or request custom category extractions.