https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Gain access to a structured dataset featuring thousands of products listed on Amazon India. This dataset is ideal for e-commerce analytics, competitor research, pricing strategies, and market trend analysis.
Product Details: Name, Brand, Category, and Unique ID
Pricing Information: Current Price, Discounted Price, and Currency
Availability & Ratings: Stock Status, Customer Ratings, and Reviews
Seller Information: Seller Name and Fulfillment Details
Additional Attributes: Product Description, Specifications, and Images
Format: CSV
Number of Records: 50,000+
Delivery Time: 3 Days
Price: $149.00
Availability: Immediate
This dataset provides structured and actionable insights to support e-commerce businesses, pricing strategies, and product optimization. If you're looking for more datasets for e-commerce analysis, explore our E-commerce datasets for a broader selection.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a comprehensive overview of online sales transactions across different product categories. Each row represents a single transaction with detailed information such as the order ID, date, category, product name, quantity sold, unit price, total price, region, and payment method.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
The Waitrose Product Dataset offers a comprehensive and structured collection of grocery items listed on the Waitrose online platform. This dataset includes 25,000+ product records across multiple categories, curated specifically for use in retail analytics, pricing comparison, AI training, and eCommerce integration.
Each record contains detailed attributes such as:
Product title, brand, MPN, and product ID
Price and currency
Availability status
Description, ingredients, and raw nutrition data
Review count and average rating
Breadcrumbs, image links, and more
Delivered in CSV format (ZIP archive), this dataset is ideal for professionals in the FMCG, retail, and grocery tech industries who need structured, crawl-ready data for their projects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
About
This is a mock dataset with Amazon product reviews. Classes are structured: 6 "level 1" classes, 64 "level 2" classes, and 510 "level 3" classes.
3 files are shared:
Level 1 classes are: health personal care, toys games, beauty, pet supplies, baby products, and grocery gourmet food.
Dataset originally from https://www.kaggle.com/datasets/kashnitsky/hierarchical-text-classification
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Sales data for all Islanders Board Of Industry & Service (IBIS) stores.
https://brightdata.com/licensehttps://brightdata.com/license
Buy Amazon datasets and get access to over 300 million records from any Amazon domain. Get insights on Amazon products, sellers, and reviews.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Amazon Product Description Dataset
This dataset is a cleaned version of Amazon Product Data. Cleaned by team at https://exnrt.com
421K Unique Examples Empty description rows are being removed. Description Smaller then 200 characters are removed Convert to Proper Format Remove non-ASCII characters from both column And few more techniques
Original Dataset
This original dataset has 10 Million Examples. Original, Un-cleaned DataSet:… See the full description on the dataset page: https://huggingface.co/datasets/Ateeqq/Amazon-Product-Description.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:
Context:
Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.
Inspiration:
The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.
Dataset Information:
The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:
Use Cases:
Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This curated dataset contains only products from CultBeauty.com that include detailed ingredient information, ideal for brands, formulators, analysts, and researchers seeking transparency in cosmetics and skincare data.
It focuses on ingredient-rich listings — allowing deep analysis of formulation trends, compliance mapping, and clean beauty initiatives. Whether you're building an internal database or powering an AI model, this dataset offers a clean, structured foundation for insight.
Product Name
Brand
Full Ingredient List
Category
Product URL
Price (if available)
Description
Image links
Timestamps
Ingredient analysis for clean beauty scoring
Competitor formulation comparison
Cosmetic safety mapping (e.g., for allergen research)
Building training sets for AI/ML models in skincare
Trend monitoring across skincare and cosmetic products
Monthly or on demand
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This data set contains 322 Apple products, from 1976-2025. The dataset has the following header:
Release Date,Model,Family,Discontinued
* Release Date
: the date the product was released on
* Model
: Product name
* Family
: Product type
* Discontinued
: date of product discontinuation
Better version with more products/info coming soon
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by mohamed mahmoud 55
Released under MIT
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This dataset features only products from Ulta.com that include detailed ingredient lists, ideal for product transparency tools, clean label research, and beauty data modeling.
Designed for professionals and researchers working in beauty tech, compliance, formulation, and product analysis, it focuses on ingredient-rich listings for advanced use cases.
Product Name
Brand
Full Ingredient List
Category (e.g., Hair, Skin, Makeup)
Product URL
Price (if available)
Description
Images
Date Extracted
Clean beauty app builders
Ingredient risk assessment and allergen tracking
Comparative cosmetic formulation
Beauty AI and ML dataset training
Ingredient transparency dashboards for e-commerce
Available weekly or monthly or on request
https://brightdata.com/licensehttps://brightdata.com/license
Gain extensive insights with our Amazon datasets, encompassing detailed product information including pricing, reviews, ratings, brand names, product categories, sellers, ASINs, images, and much more. Ideal for market researchers, data analysts, and eCommerce professionals looking to excel in the competitive online marketplace. Over 425M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:
Title Asin Main Image Brand Name Description Availability Subcategory Categories Parent Asin Type Product Type Name Model Number Manufacturer Color Size Date First Available Released Model Year Item Model Number Part Number Price Total Reviews Total Ratings Average Rating Features Best Sellers Rank Subcategory Buybox Buybox Seller Id Buybox Is Amazon Images Product URL And more
https://brightdata.com/licensehttps://brightdata.com/license
Access our extensive eBay datasets that provide detailed information on product listings and seller performance. Gain insights into product details, pricing, item condition, seller ratings, shipping policies, and customer reviews. Free samples are available for evaluation. 400K+ records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:
Product ID & URL Product Title & Images Seller Name, Rating & Reviews Price & Currency Item Condition Available & Sold Count Item Location & Shipping Details Return Policy Product Specifications Product Ratings & Customer Reviews Related & Sponsored Items And more
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains over 4,900 customer reviews from Amazon, including text-based feedback, star ratings, and helpfulness votes.
It can be used for:
reviewText
: Full written reviewoverall
: Star rating (1 to 5)summary
: Short summary of the reviewhelpful_yes
: Number of users who found the review helpfultotal_vote
: Total votes on helpfulnessday_diff
: Days since the review was writtenThis dataset is suitable for natural language processing (NLP) and supervised learning tasks.
This is a publicly available dataset for educational and research use.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Zara UK fashion products dataset. Crawl Feeds team extracted for reasearch and analysis purposes. Last extracted on Jul 12 2022
Product Lists
fashion data,fashion Ecommerce data,Zara products data,Zara fashion dataset
12594
$120.00
Company Datasets for valuable business insights!
Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.
These datasets are sourced from top industry providers, ensuring you have access to high-quality information:
We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:
You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.
Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.
With Oxylabs Datasets, you can count on:
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
https://brightdata.com/licensehttps://brightdata.com/license
Unlock powerful insights with the Amazon Prime dataset, offering access to millions of records from any Amazon domain. This dataset provides comprehensive data points such as product titles, descriptions, exclusive Prime discounts, brand details, pricing (initial and discounted), availability, customer ratings, reviews, and product categories. Additionally, it includes unique identifiers like ASINs, images, and seller information, allowing you to analyze Prime offerings, trends, and customer preferences with precision. Use this dataset to optimize your eCommerce strategies by analyzing Prime-exclusive pricing strategies, identifying top-performing brands and products, and tracking customer sentiment through reviews and ratings. Gain valuable insights into consumer demand, seasonal trends, and the impact of Prime discounts to make data-driven decisions that enhance your inventory management, marketing campaigns, and pricing strategies. Whether you’re a retailer, marketer, data analyst, or researcher, the Amazon Prime dataset empowers you with the data needed to stay competitive in the dynamic eCommerce landscape. Available in various formats such as JSON, CSV, and Parquet, and delivered via flexible options like API, S3, or email, this dataset ensures seamless integration into your workflows.
HitHorizons Newly Established Companies Dataset gives access to aggregated firmographic data on 80M+ companies from the whole of Europe and beyond.
Company registration data: company name national identifier and its type registered address: street, postal code, city, state / province, country business activity: SIC code, local activity code with classification system year of establishment company type location type
Sales and number of employees data: sales in EUR, USD and local currency (with local currency code) total number of employees sales and number of employees accuracy local number of employees (in case of multiple branches) companies’ sales and number of employees market position compared to other companies in a country / industry / region
Industry data: size of the whole industry size of all companies operating within a particular SIC code benchmarking within a particular country or industry regional benchmarking (EU 27, state / province)
Contact details: company website company email domain (without person’s name)
Invoicing details available for selected countries: company name company address company VAT number
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Gain access to a structured dataset featuring thousands of products listed on Amazon India. This dataset is ideal for e-commerce analytics, competitor research, pricing strategies, and market trend analysis.
Product Details: Name, Brand, Category, and Unique ID
Pricing Information: Current Price, Discounted Price, and Currency
Availability & Ratings: Stock Status, Customer Ratings, and Reviews
Seller Information: Seller Name and Fulfillment Details
Additional Attributes: Product Description, Specifications, and Images
Format: CSV
Number of Records: 50,000+
Delivery Time: 3 Days
Price: $149.00
Availability: Immediate
This dataset provides structured and actionable insights to support e-commerce businesses, pricing strategies, and product optimization. If you're looking for more datasets for e-commerce analysis, explore our E-commerce datasets for a broader selection.