100+ datasets found
  1. Product Sales Dataset (2023-2024)

    • kaggle.com
    zip
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yash Yennewar (2025). Product Sales Dataset (2023-2024) [Dataset]. https://www.kaggle.com/datasets/yashyennewar/product-sales-dataset-2023-2024
    Explore at:
    zip(6012656 bytes)Available download formats
    Dataset updated
    Sep 30, 2025
    Authors
    Yash Yennewar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🛍️ Product Sales Dataset (2023–2024)

    📌 Overview

    This dataset contains 200,000 synthetic sales records simulating real-world product transactions across different U.S. regions. It is designed for data analysis, business intelligence, and machine learning projects, especially in the areas of sales forecasting, customer segmentation, profitability analysis, and regional trend evaluation.

    The dataset provides detailed transactional data including customer names, product categories, pricing, and revenue details, making it highly versatile for both beginners and advanced analysts.

    📂 Dataset Structure

    • Rows: 200,000
    • Columns: 14

    Features

    1. Order_ID – Unique identifier for each order
    2. Order_Date – Date of transaction
    3. Customer_Name – Name of the customer
    4. City – City of the customer
    5. State – State of the customer
    6. Region – Region (East, West, South, Centre)
    7. Country – Country (United States)
    8. Category – Broad product category (e.g., Accessories, Clothing & Apparel)
    9. Sub_Category – Subdivision of category (e.g., Sportswear, Bags)
    10. Product_Name – Product description
    11. Quantity – Units purchased
    12. Unit_Price – Price per unit (USD)
    13. Revenue – Total sales amount (Quantity × Unit Price)
    14. Profit – Net profit earned from the transaction

    🎯 Potential Use Cases

    • Sales Analysis: Track revenue, profit, and performance by product, category, or region.
    • Customer Analytics: Identify top customers, purchasing frequency, and loyalty patterns.
    • Profitability Insights: Compare profit margins across categories and sub-categories.
    • Time-Series Analysis: Study seasonal demand and forecast future sales.
    • Visualization Projects: Build dashboards in Power BI, Tableau, or Excel.
    • Machine Learning: Train models for demand prediction, price optimization, or segmentation.

    📊 Example Insights

    • Which region generates the highest revenue?
    • What are the top 10 most profitable products?
    • Are some product categories more popular in certain regions?
    • Which customers contribute the most to total revenue?

    🏷️ Tags

    business · sales · profitability · forecasting · customer analysis · retail

    📜 License

    This dataset is synthetic and created for educational and analytical purposes. You are free to use, modify, and share it under the CC BY 4.0 License.

    🙌 Acknowledgments

    This dataset was generated to provide a realistic foundation for learning and practicing Data Analytics, Power BI, Tableau, Python, and Excel projects.

  2. product-database

    • huggingface.co
    Updated Mar 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open Food Facts (2025). product-database [Dataset]. https://huggingface.co/datasets/openfoodfacts/product-database
    Explore at:
    Dataset updated
    Mar 7, 2025
    Dataset authored and provided by
    Open Food Factshttps://openfoodfacts.org/
    License

    https://choosealicense.com/licenses/agpl-3.0/https://choosealicense.com/licenses/agpl-3.0/

    Description

    Open Food Facts Database

      What is 🍊 Open Food Facts?
    
    
    
    
    
      A food products database
    

    Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.

      Made by everyone
    

    Open Food Facts is a non-profit association of volunteers. 25.000+ contributors like you have added 1.7 million + products from 150 countries using our Android or iPhone app or their camera to scan… See the full description on the dataset page: https://huggingface.co/datasets/openfoodfacts/product-database.

  3. Global Product Inventory Dataset 2025

    • kaggle.com
    zip
    Updated Feb 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keyush nisar (2025). Global Product Inventory Dataset 2025 [Dataset]. https://www.kaggle.com/datasets/keyushnisar/global-product-inventory-dataset-2025
    Explore at:
    zip(372323 bytes)Available download formats
    Dataset updated
    Feb 28, 2025
    Authors
    Keyush nisar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides a detailed snapshot of product inventory, perfect for logistics optimization, e-commerce analysis, or supply chain research. It includes key details like product names, categories, prices, stock quantities, and more—sourced from a hypothetical global supplier database. I compiled this while working on a shipment logistics optimization project, and I hope it’s useful for others exploring similar challenges!

    Key Features: - 14 columns covering product specs, pricing, stock, and tags. - Sample data includes diverse categories like Home Appliances. - Ideal for data cleaning practice, visualizations, or predictive modeling (e.g., stock depletion).

    Potential Use Cases: - Optimize shipment logistics based on stock and expiration dates. - Analyze pricing trends across product categories. - Build recommendation systems using tags and ratings.

    Notes: - Dates range from manufacturing to expiration (e.g., 2023-2026). - Some fields (e.g., Product Description) may need refinement—feel free to enhance it! - Suggestions for additional data or improvements are welcome.

    Let me know how you use it.....I’d love to hear your feedback!

  4. h

    amazon-product-data-sample

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iftach Arbel, amazon-product-data-sample [Dataset]. https://huggingface.co/datasets/iarbel/amazon-product-data-sample
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Iftach Arbel
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset Card for "amazon-product-data-filter"

      Dataset Summary
    

    The Amazon Product Dataset contains product listing data from the Amazon US website. It can be used for various NLP and classification tasks, such as text generation, product type classification, attribute extraction, image recognition and more. NOTICE: This is a sample of the full Amazon Product Dataset, which contains 1K examples. Follow the link to gain access to the full dataset.

      Languages… See the full description on the dataset page: https://huggingface.co/datasets/iarbel/amazon-product-data-sample.
    
  5. d

    Open Data Portal (ODP) Bulk Datasets API (Search, Product Data, and...

    • catalog.data.gov
    Updated Sep 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open Data Portal Team (2025). Open Data Portal (ODP) Bulk Datasets API (Search, Product Data, and Download) [Dataset]. https://catalog.data.gov/dataset/open-data-portal-odp-bulk-datasets-api-search-product-data-and-download
    Explore at:
    Dataset updated
    Sep 30, 2025
    Dataset provided by
    Open Data Portal Team
    Description

    Search - Conduct a search of the repository of raw public bulk data. It contains research datasets from the Office of the Chief Economist. The files are updated on a regular or ongoing basis. Use this endpoint if you are interested in searching across multiple patents or applications. For example, you want to return all Patent or Trademark products use productTitle and specify the products you are looking for Patent File Wrapper, for example. Product Data - Contains published, publicly available patent and trademark data in bulk form. Use this endpoint when you want data from a specific Bulk Dataset. You can test APIs right away in SwaggerUI. Download - Contains large bulk files of the Bulk Data Directory (BDD) available for download. Use this endpoint when you want to download bulk data sets.

  6. Product Catalog Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Apr 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Product Catalog Dataset [Dataset]. https://brightdata.com/products/datasets/product-catalog
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Apr 22, 2024
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    The Product Catalog Data provides a comprehensive overview of products across various categories. This dataset includes detailed product titles, descriptions, barcodes, category-specific attributes, weight, measurements, and imagery. It's tailored for marketplaces, eCommerce sites, and data analysts who require in-depth product information to enhance user experience, SEO, and product categorization.

    Popular Attributes:

    ✔ Detailed product information

    ✔ High-quality imagery

    ✔ Extensive attribute coverage

    ✔ Ideal for UX and SEO optimization

    ✔ Comprehensive product categorization

    Key Information:

    Rich dataset with 30+ attributes per product

    Pricing: Flexible subscription models

    Update Frequency: Daily updates

    Coverage: Global and specific markets

    Historical Data: 12 Months +

  7. Dairy Supply Chain Sales Dataset

    • zenodo.org
    • data.niaid.nih.gov
    pdf, zip
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimitris Iatropoulos; Konstantinos Georgakidis; Konstantinos Georgakidis; Ilias Siniosoglou; Ilias Siniosoglou; Christos Chaschatzis; Christos Chaschatzis; Anna Triantafyllou; Anna Triantafyllou; Athanasios Liatifis; Athanasios Liatifis; Dimitrios Pliatsios; Dimitrios Pliatsios; Thomas Lagkas; Thomas Lagkas; Vasileios Argyriou; Vasileios Argyriou; Panagiotis Sarigiannidis; Panagiotis Sarigiannidis; Dimitris Iatropoulos (2024). Dairy Supply Chain Sales Dataset [Dataset]. http://doi.org/10.21227/smv6-z405
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dimitris Iatropoulos; Konstantinos Georgakidis; Konstantinos Georgakidis; Ilias Siniosoglou; Ilias Siniosoglou; Christos Chaschatzis; Christos Chaschatzis; Anna Triantafyllou; Anna Triantafyllou; Athanasios Liatifis; Athanasios Liatifis; Dimitrios Pliatsios; Dimitrios Pliatsios; Thomas Lagkas; Thomas Lagkas; Vasileios Argyriou; Vasileios Argyriou; Panagiotis Sarigiannidis; Panagiotis Sarigiannidis; Dimitris Iatropoulos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1.Introduction

    Sales data collection is a crucial aspect of any manufacturing industry as it provides valuable insights about the performance of products, customer behaviour, and market trends. By gathering and analysing this data, manufacturers can make informed decisions about product development, pricing, and marketing strategies in Internet of Things (IoT) business environments like the dairy supply chain.

    One of the most important benefits of the sales data collection process is that it allows manufacturers to identify their most successful products and target their efforts towards those areas. For example, if a manufacturer could notice that a particular product is selling well in a certain region, this information could be utilised to develop new products, optimise the supply chain or improve existing ones to meet the changing needs of customers.

    This dataset includes information about 7 of MEVGAL’s products [1]. According to the above information the data published will help researchers to understand the dynamics of the dairy market and its consumption patterns, which is creating the fertile ground for synergies between academia and industry and eventually help the industry in making informed decisions regarding product development, pricing and market strategies in the IoT playground. The use of this dataset could also aim to understand the impact of various external factors on the dairy market such as the economic, environmental, and technological factors. It could help in understanding the current state of the dairy industry and identifying potential opportunities for growth and development.

    2. Citation

    Please cite the following papers when using this dataset:

    1. I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis and P. Sarigiannidis, "Evaluating the Effect of Volatile Federated Timeseries on Modern DNNs: Attention over Long/Short Memory," in the 12th International Conference on Circuits and Systems Technologies (MOCAST 2023), April 2023, Accepted

    3. Dataset Modalities

    The dataset includes data regarding the daily sales of a series of dairy product codes offered by MEVGAL. In particular, the dataset includes information gathered by the logistics division and agencies within the industrial infrastructures overseeing the production of each product code. The products included in this dataset represent the daily sales and logistics of a variety of yogurt-based stock. Each of the different files include the logistics for that product on a daily basis for three years, from 2020 to 2022.

    3.1 Data Collection

    The process of building this dataset involves several steps to ensure that the data is accurate, comprehensive and relevant.

    The first step is to determine the specific data that is needed to support the business objectives of the industry, i.e., in this publication’s case the daily sales data.

    Once the data requirements have been identified, the next step is to implement an effective sales data collection method. In MEVGAL’s case this is conducted through direct communication and reports generated each day by representatives & selling points.

    It is also important for MEVGAL to ensure that the data collection process conducted is in an ethical and compliant manner, adhering to data privacy laws and regulation. The industry also has a data management plan in place to ensure that the data is securely stored and protected from unauthorised access.

    The published dataset is consisted of 13 features providing information about the date and the number of products that have been sold. Finally, the dataset was anonymised in consideration to the privacy requirement of the data owner (MEVGAL).

    File

    Period

    Number of Samples (days)

    product 1 2020.xlsx

    01/01/2020–31/12/2020

    363

    product 1 2021.xlsx

    01/01/2021–31/12/2021

    364

    product 1 2022.xlsx

    01/01/2022–31/12/2022

    365

    product 2 2020.xlsx

    01/01/2020–31/12/2020

    363

    product 2 2021.xlsx

    01/01/2021–31/12/2021

    364

    product 2 2022.xlsx

    01/01/2022–31/12/2022

    365

    product 3 2020.xlsx

    01/01/2020–31/12/2020

    363

    product 3 2021.xlsx

    01/01/2021–31/12/2021

    364

    product 3 2022.xlsx

    01/01/2022–31/12/2022

    365

    product 4 2020.xlsx

    01/01/2020–31/12/2020

    363

    product 4 2021.xlsx

    01/01/2021–31/12/2021

    364

    product 4 2022.xlsx

    01/01/2022–31/12/2022

    364

    product 5 2020.xlsx

    01/01/2020–31/12/2020

    363

    product 5 2021.xlsx

    01/01/2021–31/12/2021

    364

    product 5 2022.xlsx

    01/01/2022–31/12/2022

    365

    product 6 2020.xlsx

    01/01/2020–31/12/2020

    362

    product 6 2021.xlsx

    01/01/2021–31/12/2021

    364

    product 6 2022.xlsx

    01/01/2022–31/12/2022

    365

    product 7 2020.xlsx

    01/01/2020–31/12/2020

    362

    product 7 2021.xlsx

    01/01/2021–31/12/2021

    364

    product 7 2022.xlsx

    01/01/2022–31/12/2022

    365

    3.2 Dataset Overview

    The following table enumerates and explains the features included across all of the included files.

    Feature

    Description

    Unit

    Day

    day of the month

    -

    Month

    Month

    -

    Year

    Year

    -

    daily_unit_sales

    Daily sales - the amount of products, measured in units, that during that specific day were sold

    units

    previous_year_daily_unit_sales

    Previous Year’s sales - the amount of products, measured in units, that during that specific day were sold the previous year

    units

    percentage_difference_daily_unit_sales

    The percentage difference between the two above values

    %

    daily_unit_sales_kg

    The amount of products, measured in kilograms, that during that specific day were sold

    kg

    previous_year_daily_unit_sales_kg

    Previous Year’s sales - the amount of products, measured in kilograms, that during that specific day were sold, the previous year

    kg

    percentage_difference_daily_unit_sales_kg

    The percentage difference between the two above values

    kg

    daily_unit_returns_kg

    The percentage of the products that were shipped to selling points and were returned

    %

    previous_year_daily_unit_returns_kg

    The percentage of the products that were shipped to

  8. Company Datasets for Business Profiling

    • datarade.ai
    Updated Feb 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 23, 2017
    Dataset authored and provided by
    Oxylabs
    Area covered
    Canada, Taiwan, British Indian Ocean Territory, Moldova (Republic of), Tunisia, Isle of Man, Northern Mariana Islands, Andorra, Nepal, Bangladesh
    Description

    Company Datasets for valuable business insights!

    Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

    These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

    • Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

    We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

    • Company name;
    • Size;
    • Founding date;
    • Location;
    • Industry;
    • Revenue;
    • Employee count;
    • Competitors.

    You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

    Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

    With Oxylabs Datasets, you can count on:

    • Fresh and accurate data collected and parsed by our expert web scraping team.
    • Time and resource savings, allowing you to focus on data analysis and achieving your business goals.
    • A customized approach tailored to your specific business needs.
    • Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!

  9. u

    Marketing Bias data

    • cseweb.ucsd.edu
    json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Marketing Bias data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    These datasets contain attributes about products sold on ModCloth and Amazon which may be sources of bias in recommendations (in particular, attributes about how the products are marketed). Data also includes user/item interactions for recommendation.

    Metadata includes

    • ratings

    • product images

    • user identities

    • item sizes, user genders

  10. Data cleaning using unstructured data

    • zenodo.org
    zip
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rihem Nasfi; Rihem Nasfi; Antoon Bronselaer; Antoon Bronselaer (2024). Data cleaning using unstructured data [Dataset]. http://doi.org/10.5281/zenodo.13135983
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 30, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rihem Nasfi; Rihem Nasfi; Antoon Bronselaer; Antoon Bronselaer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this project, we work on repairing three datasets:

    • Trials design: This dataset was obtained from the European Union Drug Regulating Authorities Clinical Trials Database (EudraCT) register and the ground truth was created from external registries. In the dataset, multiple countries, identified by the attribute country_protocol_code, conduct the same clinical trials which is identified by eudract_number. Each clinical trial has a title that can help find informative details about the design of the trial.
    • Trials population: This dataset delineates the demographic origins of participants in clinical trials primarily conducted across European countries. This dataset include structured attributes indicating whether the trial pertains to a specific gender, age group or healthy volunteers. Each of these categories is labeled as (`1') or (`0') respectively denoting whether it is included in the trials or not. It is important to note that the population category should remain consistent across all countries conducting the same clinical trial identified by an eudract_number. The ground truth samples in the dataset were established by aligning information about the trial populations provided by external registries, specifically the CT.gov database and the German Trials database. Additionally, the dataset comprises other unstructured attributes that categorize the inclusion criteria for trial participants such as inclusion.
    • Allergens: This dataset contains information about products and their allergens. The data was collected from the German version of the `Alnatura' (Access date: 24 November, 2020), a free database of food products from around the world `Open Food Facts', and the websites: `Migipedia', 'Piccantino', and `Das Ist Drin'. There may be overlapping products across these websites. Each product in the dataset is identified by a unique code. Samples with the same code represent the same product but are extracted from a differentb source. The allergens are indicated by (‘2’) if present, or (‘1’) if there are traces of it, and (‘0’) if it is absent in a product. The dataset also includes information on ingredients in the products. Overall, the dataset comprises categorical structured data describing the presence, trace, or absence of specific allergens, and unstructured text describing ingredients.

    N.B: Each '.zip' file contains a set of 5 '.csv' files which are part of the afro-mentioned datasets:

    • "{dataset_name}_train.csv": samples used for the ML-model training. (e.g "allergens_train.csv")
    • "{dataset_name}_test.csv": samples used to test the the ML-model performance. (e.g "allergens_test.csv")
    • "{dataset_name}_golden_standard.csv": samples represent the ground truth of the test samples. (e.g "allergens_golden_standard.csv")
    • "{dataset_name}_parker_train.csv": samples repaired using Parker Engine used for the ML-model training. (e.g "allergens_parker_train.csv")
    • "{dataset_name}_parker_train.csv": samples repaired using Parker Engine used to test the the ML-model performance. (e.g "allergens_parker_test.csv")
  11. E-commerce Products Image Dataset

    • kaggle.com
    zip
    Updated Jun 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sunny Kusawa (2022). E-commerce Products Image Dataset [Dataset]. https://www.kaggle.com/datasets/sunnykusawa/ecommerce-products-image-dataset
    Explore at:
    zip(42381801 bytes)Available download formats
    Dataset updated
    Jun 14, 2022
    Authors
    Sunny Kusawa
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains images of Television, Sofas, Jeans and T-shirt. It Actual raw and unstructured image data extracted from online sites.

    All images are of different sites. You may also find some junk images in data for example in television dataset you will find the television remote images.

    This dataset is not refined intentionally to make sure practitioners should get taste of What kind of data ML/Data Science Engineer get when they start working on any project in industry.

  12. Sample Purchasing / Supply Chain Data

    • catalog.data.gov
    • gimi9.com
    • +2more
    Updated Jul 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2022). Sample Purchasing / Supply Chain Data [Dataset]. https://catalog.data.gov/dataset/sample-purchasing-supply-chain-data
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    Sample purchasing data containing information on suppliers, the products they provide, and the projects those products are used for. Data created or adapted from publicly available sources.

  13. u

    Steam Video Game and Bundle Data

    • cseweb.ucsd.edu
    json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Steam Video Game and Bundle Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    These datasets contain reviews from the Steam video game platform, and information about which games were bundled together.

    Metadata includes

    • reviews

    • purchases, plays, recommends (likes)

    • product bundles

    • pricing information

    Basic Statistics:

    • Reviews: 7,793,069

    • Users: 2,567,538

    • Items: 15,474

    • Bundles: 615

  14. d

    Data from: Example Groundwater-Level Datasets and Benchmarking Results for...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Example Groundwater-Level Datasets and Benchmarking Results for the Automated Regional Correlation Analysis for Hydrologic Record Imputation (ARCHI) Software Package [Dataset]. https://catalog.data.gov/dataset/example-groundwater-level-datasets-and-benchmarking-results-for-the-automated-regional-cor
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This data release provides two example groundwater-level datasets used to benchmark the Automated Regional Correlation Analysis for Hydrologic Record Imputation (ARCHI) software package (Levy and others, 2024). The first dataset contains groundwater-level records and site metadata for wells located on Long Island, New York (NY) and some surrounding mainland sites in New York and Connecticut. The second dataset contains groundwater-level records and site metadata for wells located in the southeastern San Joaquin Valley of the Central Valley, California (CA). For ease of exposition these are referred to as NY and CA datasets, respectively. Both datasets are formatted with column headers that can be read by the ARCHI software package within the R computing environment. These datasets were used to benchmark the imputation accuracy of three ARCHI model settings (OLS, ridge, and MOVE.1) against the widely used imputation program missForest (Stekhoven and Bühlmann, 2012). The ARCHI program was used to process the NY and CA datasets on monthly and annual timesteps, respectively, filter out sites with insufficient data for imputation, and create 200 test datasets from each of the example datasets with 5 percent of observations removed at random (herein, referred to as "holdouts"). Imputation accuracy for test datasets was assessed using normalized root mean square error (NRMSE), which is the root mean square error divided by the standard deviation of the observed holdout values. ARCHI produces prediction intervals (PIs) using a non-parametric bootstrapping routine, which were assessed by computing a coverage rate (CR) defined as the proportion of holdout observations falling within the estimated PI. The multiple regression models included with the ARCHI package (OLS and ridge) were further tested on all test datasets at eleven different levels of the p_per_n input parameter, which limits the maximum ratio of regression model predictors (p) per observations (n) as a decimal fraction greater than zero and less than or equal to one. This data release contains ten tables formatted as tab-delimited text files. The “CA_data.txt” and “NY_data.txt” tables contain 243,094 and 89,997 depth-to-groundwater measurement values (value, in feet below land surface) indexed by site identifier (site_no) and measurement date (date) for CA and NY datasets, respectively. The “CA_sites.txt” and “NY_sites.txt” tables contain site metadata for the 4,380 and 476 unique sites included in the CA and NY datasets, respectively. The “CA_NRMSE.txt” and “NY_NRMSE.txt” tables contain NRMSE values computed by imputing 200 test datasets with 5 percent random holdouts to assess imputation accuracy for three different ARCHI model settings and missForest using CA and NY datasets, respectively. The “CA_CR.txt” and “NY_CR.txt” tables contain CR values used to evaluate non-parametric PIs generated by bootstrapping regressions with three different ARCHI model settings using the CA and NY test datasets, respectively. The “CA_p_per_n.txt” and “NY_p_per_n.txt” tables contain mean NRMSE values computed for 200 test datasets with 5 percent random holdouts at 11 different levels of p_per_n for OLS and ridge models compared to training error for the same models on the entire CA and NY datasets, respectively. References Cited Levy, Z.F., Stagnitta, T.J., and Glas, R.L., 2024, ARCHI: Automated Regional Correlation Analysis for Hydrologic Record Imputation, v1.0.0: U.S. Geological Survey software release, https://doi.org/10.5066/P1VVHWKE. Stekhoven, D.J., and Bühlmann, P., 2012, MissForest—non-parametric missing value imputation for mixed-type data: Bioinformatics 28(1), 112-118. https://doi.org/10.1093/bioinformatics/btr597.

  15. Data from: Shopee Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Apr 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Shopee Dataset [Dataset]. https://brightdata.com/products/datasets/shopee
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Apr 16, 2024
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    The Shopee Products Dataset is a comprehensive resource that empowers businesses, researchers, and analysts to gain a holistic view of the Shopee e-commerce ecosystem. Whether your goal is to conduct market analysis, optimize pricing strategies, understand customer behavior, or evaluate competitors, this dataset offers the essential information you need to make informed decisions and succeed in the dynamic world of Shopee. At its core, this dataset provides key attributes such as product ID, title, ratings, reviews, pricing details, and seller information, among others. These fundamental data elements offer insights into product performance, customer sentiment, and seller credibility.

  16. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  17. c

    Sample Sales Dataset

    • cubig.ai
    zip
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Sample Sales Dataset [Dataset]. https://cubig.ai/store/products/477/sample-sales-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 15, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Sample Sales Data is a retail sales dataset of 2,823 orders and 25 columns that includes a variety of sales-related data, including order numbers, product information, quantity, unit price, sales, order date, order status, customer and delivery information.

    2) Data Utilization (1) Sample Sales Data has characteristics that: • This dataset consists of numerical (sales, quantity, unit price, etc.), categorical (product, country, city, customer name, transaction size, etc.), and date (order date) variables, with missing values in some columns (STATE, ADDRESSLINE2, POSTALCODE, etc.). (2) Sample Sales Data can be used to: • Analysis of sales trends and performance by product: Key variables such as order date, product line, and country can be used to visualize and analyze monthly and yearly sales trends, the proportion of sales by product line, and top sales by country and region. • Segmentation and marketing strategies: Segmentation of customer groups based on customer information, transaction size, and regional data, and use them to design targeted marketing and customized promotion strategies.

  18. Facebook Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data, Facebook Datasets [Dataset]. https://brightdata.com/products/datasets/facebook
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Access our extensive Facebook datasets that provide detailed information on public posts, pages, and user engagement. Gain insights into post performance, audience interactions, page details, and content trends with our ethically sourced data. Free samples are available for evaluation. Over 940M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:

    Post ID Post Content & URL Date Posted Hashtags Number of Comments Number of Shares Likes & Reaction Counts (by type) Video View Count Page Name & Category Page Followers & Likes Page Verification Status Page Website & Contact Info Is Sponsored Post Attachments (Images/Videos) External Link Data And much more

  19. Data from: Product Demand Forecasting Dataset

    • kaggle.com
    zip
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chavindu Dulaj (2024). Product Demand Forecasting Dataset [Dataset]. https://www.kaggle.com/datasets/chavindudulaj/product-demand-forecasting-dataset
    Explore at:
    zip(1698931 bytes)Available download formats
    Dataset updated
    Mar 30, 2024
    Authors
    Chavindu Dulaj
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Products Demand Forecasting Dataset

    Overview

    This dataset provides synthetic data related to demand forecasting to help predict product demand based on various factors including historical sales data, marketing campaigns, seasonal trends, pricing strategies, competitor pricing, stock availability, and public holidays.

    Features

    1. Date

      • Description: Date of the sales data
      • Data Type: Date
      • Format: YYYY-MM-DD
      • Example: "2021-05-25"
      • Source: Synthetic generated data
    2. Product_ID

      • Description: Unique identifier for each product
      • Data Type: String
      • Example: "P001"
      • Source: Synthetic generated data
    3. Base_Sales

      • Description: Base sales data without any marketing or seasonal effects
      • Data Type: Numeric (Integer)
      • Format: Integer
      • Example: 120
      • Source: Synthetic generated data
    4. Marketing_Campaign

      • Description: Type of marketing campaign
      • Data Type: Categorical
      • Categories: 'None', 'Email', 'Social Media', 'TV', 'Radio'
      • Example: "Social Media"
      • Source: Synthetic generated data
    5. Marketing_Effect

      • Description: Impact of the marketing campaign on sales
      • Data Type: Numeric (Float)
      • Format: Float
      • Example: 1.5
      • Source: Calculated based on the chosen marketing campaign
    6. Seasonal_Trend

      • Description: Seasonal trend affecting the sales
      • Data Type: Categorical
      • Categories: 'Winter', 'Spring', 'Summer', 'Fall'
      • Example: "Winter"
      • Source: Synthetic generated data
    7. Seasonal_Effect

      • Description: Seasonal effect on sales
      • Data Type: Numeric (Float)
      • Format: Float
      • Example: 0.8
      • Source: Calculated based on the chosen seasonal trend
    8. Price

      • Description: Price of the product
      • Data Type: Numeric (Float)
      • Format: Float
      • Example: 50.0
      • Source: Synthetic generated data
    9. Discount

      • Description: Discount offered on the product
      • Data Type: Numeric (Float)
      • Format: Float
      • Example: 0.1 (10% discount)
      • Source: Synthetic generated data
    10. Competitor_Price

      • Description: Price of the same product offered by competitors
      • Data Type: Numeric (Float)
      • Format: Float
      • Example: 48.0
      • Source: Synthetic generated data
    11. Stock_Availability

      • Description: Number of units available in stock
      • Data Type: Numeric (Integer)
      • Format: Integer
      • Example: 100
      • Source: Synthetic generated data
    12. Public_Holiday

      • Description: Indicates whether the date is a public holiday or not
      • Data Type: Boolean
      • Format: True/False
      • Example: True
      • Source: Synthetic generated data
    13. Demand

      • Description: Final demand calculated considering marketing, seasonal effects, and other factors
      • Data Type: Numeric (Integer)
      • Format: Integer
      • Example: 180
      • Source: Calculated based on Base_Sales, Marketing_Effect, Seasonal_Effect, Price, Discount, Competitor_Price, Stock_Availability, and Public_Holiday

    Target Variable

    • Demand: Indicates the final demand after considering marketing campaigns, seasonal trends, pricing strategies, competitor pricing, stock availability, and public holidays.

    Data Range

    • Total number of records: 35000
    • Date range: January 2019 to December 2021

    Source

    This dataset is synthetic and was generated using Python. It is intended for educational and research purposes.

    Acknowledgements

    • The dataset was generated using Python and the data is synthetic.
  20. c

    Fashion products dataset from gap.com

    • crawlfeeds.com
    json, zip
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Fashion products dataset from gap.com [Dataset]. https://crawlfeeds.com/datasets/fashion-products-dataset-from-gap-com
    Explore at:
    json, zipAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Fashion Products Dataset from GAP.com offers a curated collection of over 4,500 fashion items, meticulously extracted by Crawl Feeds' in-house web scraping team for research and analysis purposes. This dataset, last updated on October 11, 2021, encompasses a diverse range of products, including clothing, accessories, and more, providing a comprehensive view of GAP's offerings.

    Key Features:

    • Comprehensive Data Points: Each entry in the dataset includes 16 essential attributes such as product URL, name, product ID (PID), brand, price, currency, condition, availability, color, SKU, product details, average rating, review count, images, breadcrumbs, and the date of data extraction.

    • Sample Dataset Access: Prospective users can view a sample of the dataset by signing in, allowing them to assess its structure and relevance to their specific needs.

    • Immediate Availability: The dataset is readily available for purchase at $14.00 and is delivered in JSON format, ensuring seamless integration into various applications and systems.

    For businesses and researchers seeking more extensive data, the Powerful Fashion Dataset offers a broader spectrum of fashion-related information. This comprehensive dataset is designed to transform your fashion business by providing insights into trend forecasting, customer behavior analysis, and market dynamics. Leveraging such data can enhance decision-making processes, optimize supply chains, and identify emerging markets, ensuring your brand stays ahead in the competitive fashion industry.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yash Yennewar (2025). Product Sales Dataset (2023-2024) [Dataset]. https://www.kaggle.com/datasets/yashyennewar/product-sales-dataset-2023-2024
Organization logo

Product Sales Dataset (2023-2024)

US Product Sales Dataset : Orders,Category,Revenue,Region and more(200,000 rows)

Explore at:
zip(6012656 bytes)Available download formats
Dataset updated
Sep 30, 2025
Authors
Yash Yennewar
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

🛍️ Product Sales Dataset (2023–2024)

📌 Overview

This dataset contains 200,000 synthetic sales records simulating real-world product transactions across different U.S. regions. It is designed for data analysis, business intelligence, and machine learning projects, especially in the areas of sales forecasting, customer segmentation, profitability analysis, and regional trend evaluation.

The dataset provides detailed transactional data including customer names, product categories, pricing, and revenue details, making it highly versatile for both beginners and advanced analysts.

📂 Dataset Structure

  • Rows: 200,000
  • Columns: 14

Features

  1. Order_ID – Unique identifier for each order
  2. Order_Date – Date of transaction
  3. Customer_Name – Name of the customer
  4. City – City of the customer
  5. State – State of the customer
  6. Region – Region (East, West, South, Centre)
  7. Country – Country (United States)
  8. Category – Broad product category (e.g., Accessories, Clothing & Apparel)
  9. Sub_Category – Subdivision of category (e.g., Sportswear, Bags)
  10. Product_Name – Product description
  11. Quantity – Units purchased
  12. Unit_Price – Price per unit (USD)
  13. Revenue – Total sales amount (Quantity × Unit Price)
  14. Profit – Net profit earned from the transaction

🎯 Potential Use Cases

  • Sales Analysis: Track revenue, profit, and performance by product, category, or region.
  • Customer Analytics: Identify top customers, purchasing frequency, and loyalty patterns.
  • Profitability Insights: Compare profit margins across categories and sub-categories.
  • Time-Series Analysis: Study seasonal demand and forecast future sales.
  • Visualization Projects: Build dashboards in Power BI, Tableau, or Excel.
  • Machine Learning: Train models for demand prediction, price optimization, or segmentation.

📊 Example Insights

  • Which region generates the highest revenue?
  • What are the top 10 most profitable products?
  • Are some product categories more popular in certain regions?
  • Which customers contribute the most to total revenue?

🏷️ Tags

business · sales · profitability · forecasting · customer analysis · retail

📜 License

This dataset is synthetic and created for educational and analytical purposes. You are free to use, modify, and share it under the CC BY 4.0 License.

🙌 Acknowledgments

This dataset was generated to provide a realistic foundation for learning and practicing Data Analytics, Power BI, Tableau, Python, and Excel projects.

Search
Clear search
Close search
Google apps
Main menu