100+ datasets found

Product Sales Dataset (2023-2024)
kaggle.com
zip
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yash Yennewar (2025). Product Sales Dataset (2023-2024) [Dataset]. https://www.kaggle.com/datasets/yashyennewar/product-sales-dataset-2023-2024
Explore at:
zip(6012656 bytes)Available download formats
Dataset updated
Sep 30, 2025
Authors
Yash Yennewar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
🛍️ Product Sales Dataset (2023–2024)

📌 Overview

This dataset contains 200,000 synthetic sales records simulating real-world product transactions across different U.S. regions. It is designed for data analysis, business intelligence, and machine learning projects, especially in the areas of sales forecasting, customer segmentation, profitability analysis, and regional trend evaluation.

The dataset provides detailed transactional data including customer names, product categories, pricing, and revenue details, making it highly versatile for both beginners and advanced analysts.

📂 Dataset Structure

Rows: 200,000

Columns: 14

Features

Order_ID – Unique identifier for each order

Order_Date – Date of transaction

Customer_Name – Name of the customer

City – City of the customer

State – State of the customer

Region – Region (East, West, South, Centre)

Country – Country (United States)

Category – Broad product category (e.g., Accessories, Clothing & Apparel)

Sub_Category – Subdivision of category (e.g., Sportswear, Bags)

Product_Name – Product description

Quantity – Units purchased

Unit_Price – Price per unit (USD)

Revenue – Total sales amount (Quantity × Unit Price)

Profit – Net profit earned from the transaction

🎯 Potential Use Cases

Sales Analysis: Track revenue, profit, and performance by product, category, or region.

Customer Analytics: Identify top customers, purchasing frequency, and loyalty patterns.

Profitability Insights: Compare profit margins across categories and sub-categories.

Time-Series Analysis: Study seasonal demand and forecast future sales.

Visualization Projects: Build dashboards in Power BI, Tableau, or Excel.

Machine Learning: Train models for demand prediction, price optimization, or segmentation.

📊 Example Insights

Which region generates the highest revenue?

What are the top 10 most profitable products?

Are some product categories more popular in certain regions?

Which customers contribute the most to total revenue?

🏷️ Tags

business · sales · profitability · forecasting · customer analysis · retail

📜 License

This dataset is synthetic and created for educational and analytical purposes. You are free to use, modify, and share it under the CC BY 4.0 License.

🙌 Acknowledgments

This dataset was generated to provide a realistic foundation for learning and practicing Data Analytics, Power BI, Tableau, Python, and Excel projects.
product-database
huggingface.co
Updated Mar 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open Food Facts (2025). product-database [Dataset]. https://huggingface.co/datasets/openfoodfacts/product-database
Explore at:
Dataset updated
Mar 7, 2025
Dataset authored and provided by
Open Food Factshttps://openfoodfacts.org/
License
https://choosealicense.com/licenses/agpl-3.0/https://choosealicense.com/licenses/agpl-3.0/
Description
Open Food Facts Database

What is 🍊 Open Food Facts? A food products database

Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.

Made by everyone

Open Food Facts is a non-profit association of volunteers. 25.000+ contributors like you have added 1.7 million + products from 150 countries using our Android or iPhone app or their camera to scan… See the full description on the dataset page: https://huggingface.co/datasets/openfoodfacts/product-database.
Global Product Inventory Dataset 2025
kaggle.com
zip
Updated Feb 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Keyush nisar (2025). Global Product Inventory Dataset 2025 [Dataset]. https://www.kaggle.com/datasets/keyushnisar/global-product-inventory-dataset-2025
Explore at:
zip(372323 bytes)Available download formats
Dataset updated
Feb 28, 2025
Authors
Keyush nisar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides a detailed snapshot of product inventory, perfect for logistics optimization, e-commerce analysis, or supply chain research. It includes key details like product names, categories, prices, stock quantities, and more—sourced from a hypothetical global supplier database. I compiled this while working on a shipment logistics optimization project, and I hope it’s useful for others exploring similar challenges!

Key Features: - 14 columns covering product specs, pricing, stock, and tags. - Sample data includes diverse categories like Home Appliances. - Ideal for data cleaning practice, visualizations, or predictive modeling (e.g., stock depletion).

Potential Use Cases: - Optimize shipment logistics based on stock and expiration dates. - Analyze pricing trends across product categories. - Build recommendation systems using tags and ratings.

Notes: - Dates range from manufacturing to expiration (e.g., 2023-2026). - Some fields (e.g., Product Description) may need refinement—feel free to enhance it! - Suggestions for additional data or improvements are welcome.

Let me know how you use it.....I’d love to hear your feedback!
h
amazon-product-data-sample
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iftach Arbel, amazon-product-data-sample [Dataset]. https://huggingface.co/datasets/iarbel/amazon-product-data-sample
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Iftach Arbel
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Card for "amazon-product-data-filter"

Dataset Summary

The Amazon Product Dataset contains product listing data from the Amazon US website. It can be used for various NLP and classification tasks, such as text generation, product type classification, attribute extraction, image recognition and more. NOTICE: This is a sample of the full Amazon Product Dataset, which contains 1K examples. Follow the link to gain access to the full dataset.

Languages… See the full description on the dataset page: https://huggingface.co/datasets/iarbel/amazon-product-data-sample.
d
Open Data Portal (ODP) Bulk Datasets API (Search, Product Data, and...
catalog.data.gov
Updated Sep 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open Data Portal Team (2025). Open Data Portal (ODP) Bulk Datasets API (Search, Product Data, and Download) [Dataset]. https://catalog.data.gov/dataset/open-data-portal-odp-bulk-datasets-api-search-product-data-and-download
Explore at:
Dataset updated
Sep 30, 2025
Dataset provided by
Open Data Portal Team
Description
Search - Conduct a search of the repository of raw public bulk data. It contains research datasets from the Office of the Chief Economist. The files are updated on a regular or ongoing basis. Use this endpoint if you are interested in searching across multiple patents or applications. For example, you want to return all Patent or Trademark products use productTitle and specify the products you are looking for Patent File Wrapper, for example. Product Data - Contains published, publicly available patent and trademark data in bulk form. Use this endpoint when you want data from a specific Bulk Dataset. You can test APIs right away in SwaggerUI. Download - Contains large bulk files of the Bulk Data Directory (BDD) available for download. Use this endpoint when you want to download bulk data sets.
Product Catalog Dataset
brightdata.com
.json, .csv, .xlsx
Updated Apr 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Product Catalog Dataset [Dataset]. https://brightdata.com/products/datasets/product-catalog
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Apr 22, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
The Product Catalog Data provides a comprehensive overview of products across various categories. This dataset includes detailed product titles, descriptions, barcodes, category-specific attributes, weight, measurements, and imagery. It's tailored for marketplaces, eCommerce sites, and data analysts who require in-depth product information to enhance user experience, SEO, and product categorization.

Popular Attributes:

✔ Detailed product information

✔ High-quality imagery

✔ Extensive attribute coverage

✔ Ideal for UX and SEO optimization

✔ Comprehensive product categorization

Key Information:

Rich dataset with 30+ attributes per product

Pricing: Flexible subscription models

Update Frequency: Daily updates

Coverage: Global and specific markets

Historical Data: 12 Months +

Dairy Supply Chain Sales Dataset

zenodo.org
data.niaid.nih.gov

pdf, zip

Updated Jul 12, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Dimitris Iatropoulos; Konstantinos Georgakidis; Konstantinos Georgakidis; Ilias Siniosoglou; Ilias Siniosoglou; Christos Chaschatzis; Christos Chaschatzis; Anna Triantafyllou; Anna Triantafyllou; Athanasios Liatifis; Athanasios Liatifis; Dimitrios Pliatsios; Dimitrios Pliatsios; Thomas Lagkas; Thomas Lagkas; Vasileios Argyriou; Vasileios Argyriou; Panagiotis Sarigiannidis; Panagiotis Sarigiannidis; Dimitris Iatropoulos (2024). Dairy Supply Chain Sales Dataset [Dataset]. http://doi.org/10.21227/smv6-z405

Explore at:

zip, pdfAvailable download formats

Unique identifier

https://doi.org/10.21227/smv6-z405

Dataset updated

Jul 12, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

1.Introduction

Sales data collection is a crucial aspect of any manufacturing industry as it provides valuable insights about the performance of products, customer behaviour, and market trends. By gathering and analysing this data, manufacturers can make informed decisions about product development, pricing, and marketing strategies in Internet of Things (IoT) business environments like the dairy supply chain.

One of the most important benefits of the sales data collection process is that it allows manufacturers to identify their most successful products and target their efforts towards those areas. For example, if a manufacturer could notice that a particular product is selling well in a certain region, this information could be utilised to develop new products, optimise the supply chain or improve existing ones to meet the changing needs of customers.

This dataset includes information about 7 of MEVGAL’s products [1]. According to the above information the data published will help researchers to understand the dynamics of the dairy market and its consumption patterns, which is creating the fertile ground for synergies between academia and industry and eventually help the industry in making informed decisions regarding product development, pricing and market strategies in the IoT playground. The use of this dataset could also aim to understand the impact of various external factors on the dairy market such as the economic, environmental, and technological factors. It could help in understanding the current state of the dairy industry and identifying potential opportunities for growth and development.

2. Citation

Please cite the following papers when using this dataset:

I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis and P. Sarigiannidis, "Evaluating the Effect of Volatile Federated Timeseries on Modern DNNs: Attention over Long/Short Memory," in the 12th International Conference on Circuits and Systems Technologies (MOCAST 2023), April 2023, Accepted

3. Dataset Modalities

The dataset includes data regarding the daily sales of a series of dairy product codes offered by MEVGAL. In particular, the dataset includes information gathered by the logistics division and agencies within the industrial infrastructures overseeing the production of each product code. The products included in this dataset represent the daily sales and logistics of a variety of yogurt-based stock. Each of the different files include the logistics for that product on a daily basis for three years, from 2020 to 2022.

3.1 Data Collection

The process of building this dataset involves several steps to ensure that the data is accurate, comprehensive and relevant.

The first step is to determine the specific data that is needed to support the business objectives of the industry, i.e., in this publication’s case the daily sales data.

Once the data requirements have been identified, the next step is to implement an effective sales data collection method. In MEVGAL’s case this is conducted through direct communication and reports generated each day by representatives & selling points.

It is also important for MEVGAL to ensure that the data collection process conducted is in an ethical and compliant manner, adhering to data privacy laws and regulation. The industry also has a data management plan in place to ensure that the data is securely stored and protected from unauthorised access.

The published dataset is consisted of 13 features providing information about the date and the number of products that have been sold. Finally, the dataset was anonymised in consideration to the privacy requirement of the data owner (MEVGAL).

File	Period	Number of Samples (days)
product 1 2020.xlsx	01/01/2020–31/12/2020	363
product 1 2021.xlsx	01/01/2021–31/12/2021	364
product 1 2022.xlsx	01/01/2022–31/12/2022	365
product 2 2020.xlsx	01/01/2020–31/12/2020	363
product 2 2021.xlsx	01/01/2021–31/12/2021	364
product 2 2022.xlsx	01/01/2022–31/12/2022	365
product 3 2020.xlsx	01/01/2020–31/12/2020	363
product 3 2021.xlsx	01/01/2021–31/12/2021	364
product 3 2022.xlsx	01/01/2022–31/12/2022	365
product 4 2020.xlsx	01/01/2020–31/12/2020	363
product 4 2021.xlsx	01/01/2021–31/12/2021	364
product 4 2022.xlsx	01/01/2022–31/12/2022	364
product 5 2020.xlsx	01/01/2020–31/12/2020	363
product 5 2021.xlsx	01/01/2021–31/12/2021	364
product 5 2022.xlsx	01/01/2022–31/12/2022	365
product 6 2020.xlsx	01/01/2020–31/12/2020	362
product 6 2021.xlsx	01/01/2021–31/12/2021	364
product 6 2022.xlsx	01/01/2022–31/12/2022	365
product 7 2020.xlsx	01/01/2020–31/12/2020	362
product 7 2021.xlsx	01/01/2021–31/12/2021	364
product 7 2022.xlsx	01/01/2022–31/12/2022	365

3.2 Dataset Overview

The following table enumerates and explains the features included across all of the included files.

Feature	Description	Unit
Day	day of the month	-
Month	Month	-
Year	Year	-
daily_unit_sales	Daily sales - the amount of products, measured in units, that during that specific day were sold	units
previous_year_daily_unit_sales	Previous Year’s sales - the amount of products, measured in units, that during that specific day were sold the previous year	units
percentage_difference_daily_unit_sales	The percentage difference between the two above values	%
daily_unit_sales_kg	The amount of products, measured in kilograms, that during that specific day were sold	kg
previous_year_daily_unit_sales_kg	Previous Year’s sales - the amount of products, measured in kilograms, that during that specific day were sold, the previous year	kg
percentage_difference_daily_unit_sales_kg	The percentage difference between the two above values	kg
daily_unit_returns_kg	The percentage of the products that were shipped to selling points and were returned	%
previous_year_daily_unit_returns_kg	The percentage of the products that were shipped to

Company Datasets for Business Profiling
datarade.ai
Updated Feb 23, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Feb 23, 2017
Dataset authored and provided by
Oxylabs
Area covered
Canada, Taiwan, British Indian Ocean Territory, Moldova (Republic of), Tunisia, Isle of Man, Northern Mariana Islands, Andorra, Nepal, Bangladesh
Description
Company Datasets for valuable business insights!

Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

Company name;

Size;

Founding date;

Location;

Industry;

Revenue;

Employee count;

Competitors.

You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

With Oxylabs Datasets, you can count on:

Fresh and accurate data collected and parsed by our expert web scraping team.

Time and resource savings, allowing you to focus on data analysis and achieving your business goals.

A customized approach tailored to your specific business needs.

Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
u
Marketing Bias data
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Marketing Bias data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets contain attributes about products sold on ModCloth and Amazon which may be sources of bias in recommendations (in particular, attributes about how the products are marketed). Data also includes user/item interactions for recommendation.

Metadata includes

ratings

product images

user identities

item sizes, user genders
Data cleaning using unstructured data
zenodo.org
zip
Updated Jul 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rihem Nasfi; Rihem Nasfi; Antoon Bronselaer; Antoon Bronselaer (2024). Data cleaning using unstructured data [Dataset]. http://doi.org/10.5281/zenodo.13135983
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13135983
Dataset updated
Jul 30, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rihem Nasfi; Rihem Nasfi; Antoon Bronselaer; Antoon Bronselaer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this project, we work on repairing three datasets:

Trials design: This dataset was obtained from the European Union Drug Regulating Authorities Clinical Trials Database (EudraCT) register and the ground truth was created from external registries. In the dataset, multiple countries, identified by the attribute country_protocol_code, conduct the same clinical trials which is identified by eudract_number. Each clinical trial has a title that can help find informative details about the design of the trial.

Trials population: This dataset delineates the demographic origins of participants in clinical trials primarily conducted across European countries. This dataset include structured attributes indicating whether the trial pertains to a specific gender, age group or healthy volunteers. Each of these categories is labeled as (`1') or (`0') respectively denoting whether it is included in the trials or not. It is important to note that the population category should remain consistent across all countries conducting the same clinical trial identified by an eudract_number. The ground truth samples in the dataset were established by aligning information about the trial populations provided by external registries, specifically the CT.gov database and the German Trials database. Additionally, the dataset comprises other unstructured attributes that categorize the inclusion criteria for trial participants such as inclusion.

Allergens: This dataset contains information about products and their allergens. The data was collected from the German version of the `Alnatura' (Access date: 24 November, 2020), a free database of food products from around the world `Open Food Facts', and the websites: `Migipedia', 'Piccantino', and `Das Ist Drin'. There may be overlapping products across these websites. Each product in the dataset is identified by a unique code. Samples with the same code represent the same product but are extracted from a differentb source. The allergens are indicated by (‘2’) if present, or (‘1’) if there are traces of it, and (‘0’) if it is absent in a product. The dataset also includes information on ingredients in the products. Overall, the dataset comprises categorical structured data describing the presence, trace, or absence of specific allergens, and unstructured text describing ingredients.

N.B: Each '.zip' file contains a set of 5 '.csv' files which are part of the afro-mentioned datasets:

"{dataset_name}_train.csv": samples used for the ML-model training. (e.g "allergens_train.csv")

"{dataset_name}_test.csv": samples used to test the the ML-model performance. (e.g "allergens_test.csv")

"{dataset_name}_golden_standard.csv": samples represent the ground truth of the test samples. (e.g "allergens_golden_standard.csv")

"{dataset_name}_parker_train.csv": samples repaired using Parker Engine used for the ML-model training. (e.g "allergens_parker_train.csv")

"{dataset_name}_parker_train.csv": samples repaired using Parker Engine used to test the the ML-model performance. (e.g "allergens_parker_test.csv")
E-commerce Products Image Dataset
kaggle.com
zip
Updated Jun 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sunny Kusawa (2022). E-commerce Products Image Dataset [Dataset]. https://www.kaggle.com/datasets/sunnykusawa/ecommerce-products-image-dataset
Explore at:
zip(42381801 bytes)Available download formats
Dataset updated
Jun 14, 2022
Authors
Sunny Kusawa
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains images of Television, Sofas, Jeans and T-shirt. It Actual raw and unstructured image data extracted from online sites.

All images are of different sites. You may also find some junk images in data for example in television dataset you will find the television remote images.

This dataset is not refined intentionally to make sure practitioners should get taste of What kind of data ML/Data Science Engineer get when they start working on any project in industry.
Sample Purchasing / Supply Chain Data
catalog.data.gov
gimi9.com
+2more
Updated Jul 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). Sample Purchasing / Supply Chain Data [Dataset]. https://catalog.data.gov/dataset/sample-purchasing-supply-chain-data
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
Sample purchasing data containing information on suppliers, the products they provide, and the projects those products are used for. Data created or adapted from publicly available sources.
u
Steam Video Game and Bundle Data
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Steam Video Game and Bundle Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets contain reviews from the Steam video game platform, and information about which games were bundled together.

Metadata includes

reviews

purchases, plays, recommends (likes)

product bundles

pricing information

Basic Statistics:

Reviews: 7,793,069

Users: 2,567,538

Items: 15,474

Bundles: 615
d
Data from: Example Groundwater-Level Datasets and Benchmarking Results for...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Example Groundwater-Level Datasets and Benchmarking Results for the Automated Regional Correlation Analysis for Hydrologic Record Imputation (ARCHI) Software Package [Dataset]. https://catalog.data.gov/dataset/example-groundwater-level-datasets-and-benchmarking-results-for-the-automated-regional-cor
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This data release provides two example groundwater-level datasets used to benchmark the Automated Regional Correlation Analysis for Hydrologic Record Imputation (ARCHI) software package (Levy and others, 2024). The first dataset contains groundwater-level records and site metadata for wells located on Long Island, New York (NY) and some surrounding mainland sites in New York and Connecticut. The second dataset contains groundwater-level records and site metadata for wells located in the southeastern San Joaquin Valley of the Central Valley, California (CA). For ease of exposition these are referred to as NY and CA datasets, respectively. Both datasets are formatted with column headers that can be read by the ARCHI software package within the R computing environment. These datasets were used to benchmark the imputation accuracy of three ARCHI model settings (OLS, ridge, and MOVE.1) against the widely used imputation program missForest (Stekhoven and Bühlmann, 2012). The ARCHI program was used to process the NY and CA datasets on monthly and annual timesteps, respectively, filter out sites with insufficient data for imputation, and create 200 test datasets from each of the example datasets with 5 percent of observations removed at random (herein, referred to as "holdouts"). Imputation accuracy for test datasets was assessed using normalized root mean square error (NRMSE), which is the root mean square error divided by the standard deviation of the observed holdout values. ARCHI produces prediction intervals (PIs) using a non-parametric bootstrapping routine, which were assessed by computing a coverage rate (CR) defined as the proportion of holdout observations falling within the estimated PI. The multiple regression models included with the ARCHI package (OLS and ridge) were further tested on all test datasets at eleven different levels of the p_per_n input parameter, which limits the maximum ratio of regression model predictors (p) per observations (n) as a decimal fraction greater than zero and less than or equal to one. This data release contains ten tables formatted as tab-delimited text files. The “CA_data.txt” and “NY_data.txt” tables contain 243,094 and 89,997 depth-to-groundwater measurement values (value, in feet below land surface) indexed by site identifier (site_no) and measurement date (date) for CA and NY datasets, respectively. The “CA_sites.txt” and “NY_sites.txt” tables contain site metadata for the 4,380 and 476 unique sites included in the CA and NY datasets, respectively. The “CA_NRMSE.txt” and “NY_NRMSE.txt” tables contain NRMSE values computed by imputing 200 test datasets with 5 percent random holdouts to assess imputation accuracy for three different ARCHI model settings and missForest using CA and NY datasets, respectively. The “CA_CR.txt” and “NY_CR.txt” tables contain CR values used to evaluate non-parametric PIs generated by bootstrapping regressions with three different ARCHI model settings using the CA and NY test datasets, respectively. The “CA_p_per_n.txt” and “NY_p_per_n.txt” tables contain mean NRMSE values computed for 200 test datasets with 5 percent random holdouts at 11 different levels of p_per_n for OLS and ridge models compared to training error for the same models on the entire CA and NY datasets, respectively. References Cited Levy, Z.F., Stagnitta, T.J., and Glas, R.L., 2024, ARCHI: Automated Regional Correlation Analysis for Hydrologic Record Imputation, v1.0.0: U.S. Geological Survey software release, https://doi.org/10.5066/P1VVHWKE. Stekhoven, D.J., and Bühlmann, P., 2012, MissForest—non-parametric missing value imputation for mixed-type data: Bioinformatics 28(1), 112-118. https://doi.org/10.1093/bioinformatics/btr597.
Data from: Shopee Dataset
brightdata.com
.json, .csv, .xlsx
Updated Apr 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Shopee Dataset [Dataset]. https://brightdata.com/products/datasets/shopee
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Apr 16, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
The Shopee Products Dataset is a comprehensive resource that empowers businesses, researchers, and analysts to gain a holistic view of the Shopee e-commerce ecosystem. Whether your goal is to conduct market analysis, optimize pricing strategies, understand customer behavior, or evaluate competitors, this dataset offers the essential information you need to make informed decisions and succeed in the dynamic world of Shopee. At its core, this dataset provides key attributes such as product ID, title, ratings, reviews, pricing details, and seller information, among others. These fundamental data elements offer insights into product performance, customer sentiment, and seller credibility.
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
c
Sample Sales Dataset
cubig.ai
zip
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Sample Sales Dataset [Dataset]. https://cubig.ai/store/products/477/sample-sales-dataset
Explore at:
zipAvailable download formats
Dataset updated
Jun 15, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Sample Sales Data is a retail sales dataset of 2,823 orders and 25 columns that includes a variety of sales-related data, including order numbers, product information, quantity, unit price, sales, order date, order status, customer and delivery information.

2) Data Utilization (1) Sample Sales Data has characteristics that: • This dataset consists of numerical (sales, quantity, unit price, etc.), categorical (product, country, city, customer name, transaction size, etc.), and date (order date) variables, with missing values in some columns (STATE, ADDRESSLINE2, POSTALCODE, etc.). (2) Sample Sales Data can be used to: • Analysis of sales trends and performance by product: Key variables such as order date, product line, and country can be used to visualize and analyze monthly and yearly sales trends, the proportion of sales by product line, and top sales by country and region. • Segmentation and marketing strategies: Segmentation of customer groups based on customer information, transaction size, and regional data, and use them to design targeted marketing and customized promotion strategies.
Facebook Datasets
brightdata.com
.json, .csv, .xlsx
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data, Facebook Datasets [Dataset]. https://brightdata.com/products/datasets/facebook
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Access our extensive Facebook datasets that provide detailed information on public posts, pages, and user engagement. Gain insights into post performance, audience interactions, page details, and content trends with our ethically sourced data. Free samples are available for evaluation. Over 940M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:

Post ID Post Content & URL Date Posted Hashtags Number of Comments Number of Shares Likes & Reaction Counts (by type) Video View Count Page Name & Category Page Followers & Likes Page Verification Status Page Website & Contact Info Is Sponsored Post Attachments (Images/Videos) External Link Data And much more
Data from: Product Demand Forecasting Dataset
kaggle.com
zip
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chavindu Dulaj (2024). Product Demand Forecasting Dataset [Dataset]. https://www.kaggle.com/datasets/chavindudulaj/product-demand-forecasting-dataset
Explore at:
zip(1698931 bytes)Available download formats
Dataset updated
Mar 30, 2024
Authors
Chavindu Dulaj
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Products Demand Forecasting Dataset

Overview

This dataset provides synthetic data related to demand forecasting to help predict product demand based on various factors including historical sales data, marketing campaigns, seasonal trends, pricing strategies, competitor pricing, stock availability, and public holidays.

Features

Date

Description: Date of the sales data

Data Type: Date

Format: YYYY-MM-DD

Example: "2021-05-25"

Source: Synthetic generated data

Product_ID

Description: Unique identifier for each product

Data Type: String

Example: "P001"

Source: Synthetic generated data

Base_Sales

Description: Base sales data without any marketing or seasonal effects

Data Type: Numeric (Integer)

Format: Integer

Example: 120

Source: Synthetic generated data

Marketing_Campaign

Description: Type of marketing campaign

Data Type: Categorical

Categories: 'None', 'Email', 'Social Media', 'TV', 'Radio'

Example: "Social Media"

Source: Synthetic generated data

Marketing_Effect

Description: Impact of the marketing campaign on sales

Data Type: Numeric (Float)

Format: Float

Example: 1.5

Source: Calculated based on the chosen marketing campaign

Seasonal_Trend

Description: Seasonal trend affecting the sales

Data Type: Categorical

Categories: 'Winter', 'Spring', 'Summer', 'Fall'

Example: "Winter"

Source: Synthetic generated data

Seasonal_Effect

Description: Seasonal effect on sales

Data Type: Numeric (Float)

Format: Float

Example: 0.8

Source: Calculated based on the chosen seasonal trend

Price

Description: Price of the product

Data Type: Numeric (Float)

Format: Float

Example: 50.0

Source: Synthetic generated data

Discount

Description: Discount offered on the product

Data Type: Numeric (Float)

Format: Float

Example: 0.1 (10% discount)

Source: Synthetic generated data

Competitor_Price

Description: Price of the same product offered by competitors

Data Type: Numeric (Float)

Format: Float

Example: 48.0

Source: Synthetic generated data

Stock_Availability

Description: Number of units available in stock

Data Type: Numeric (Integer)

Format: Integer

Example: 100

Source: Synthetic generated data

Public_Holiday

Description: Indicates whether the date is a public holiday or not

Data Type: Boolean

Format: True/False

Example: True

Source: Synthetic generated data

Demand

Description: Final demand calculated considering marketing, seasonal effects, and other factors

Data Type: Numeric (Integer)

Format: Integer

Example: 180

Source: Calculated based on Base_Sales, Marketing_Effect, Seasonal_Effect, Price, Discount, Competitor_Price, Stock_Availability, and Public_Holiday

Target Variable

Demand: Indicates the final demand after considering marketing campaigns, seasonal trends, pricing strategies, competitor pricing, stock availability, and public holidays.

Data Range

Total number of records: 35000

Date range: January 2019 to December 2021

Source

This dataset is synthetic and was generated using Python. It is intended for educational and research purposes.

Acknowledgements

The dataset was generated using Python and the data is synthetic.
c
Fashion products dataset from gap.com
crawlfeeds.com
json, zip
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Fashion products dataset from gap.com [Dataset]. https://crawlfeeds.com/datasets/fashion-products-dataset-from-gap-com
Explore at:
json, zipAvailable download formats
Dataset updated
Feb 18, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Fashion Products Dataset from GAP.com offers a curated collection of over 4,500 fashion items, meticulously extracted by Crawl Feeds' in-house web scraping team for research and analysis purposes. This dataset, last updated on October 11, 2021, encompasses a diverse range of products, including clothing, accessories, and more, providing a comprehensive view of GAP's offerings.

Key Features:

Comprehensive Data Points: Each entry in the dataset includes 16 essential attributes such as product URL, name, product ID (PID), brand, price, currency, condition, availability, color, SKU, product details, average rating, review count, images, breadcrumbs, and the date of data extraction.

Sample Dataset Access: Prospective users can view a sample of the dataset by signing in, allowing them to assess its structure and relevance to their specific needs.

Immediate Availability: The dataset is readily available for purchase at $14.00 and is delivered in JSON format, ensuring seamless integration into various applications and systems.

For businesses and researchers seeking more extensive data, the Powerful Fashion Dataset offers a broader spectrum of fashion-related information. This comprehensive dataset is designed to transform your fashion business by providing insights into trend forecasting, customer behavior analysis, and market dynamics. Leveraging such data can enhance decision-making processes, optimize supply chains, and identify emerging markets, ensuring your brand stays ahead in the competitive fashion industry.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yash Yennewar (2025). Product Sales Dataset (2023-2024) [Dataset]. https://www.kaggle.com/datasets/yashyennewar/product-sales-dataset-2023-2024

Product Sales Dataset (2023-2024)

US Product Sales Dataset : Orders,Category,Revenue,Region and more(200,000 rows)

Explore at:

zip(6012656 bytes)Available download formats

Dataset updated

Sep 30, 2025

Authors

Yash Yennewar

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

🛍️ Product Sales Dataset (2023–2024)

📌 Overview

This dataset contains 200,000 synthetic sales records simulating real-world product transactions across different U.S. regions. It is designed for data analysis, business intelligence, and machine learning projects, especially in the areas of sales forecasting, customer segmentation, profitability analysis, and regional trend evaluation.

The dataset provides detailed transactional data including customer names, product categories, pricing, and revenue details, making it highly versatile for both beginners and advanced analysts.

📂 Dataset Structure

Rows: 200,000
Columns: 14

Features

Order_ID – Unique identifier for each order
Order_Date – Date of transaction
Customer_Name – Name of the customer
City – City of the customer
State – State of the customer
Region – Region (East, West, South, Centre)
Country – Country (United States)
Category – Broad product category (e.g., Accessories, Clothing & Apparel)
Sub_Category – Subdivision of category (e.g., Sportswear, Bags)
Product_Name – Product description
Quantity – Units purchased
Unit_Price – Price per unit (USD)
Revenue – Total sales amount (Quantity × Unit Price)
Profit – Net profit earned from the transaction

🎯 Potential Use Cases

Sales Analysis: Track revenue, profit, and performance by product, category, or region.
Customer Analytics: Identify top customers, purchasing frequency, and loyalty patterns.
Profitability Insights: Compare profit margins across categories and sub-categories.
Time-Series Analysis: Study seasonal demand and forecast future sales.
Visualization Projects: Build dashboards in Power BI, Tableau, or Excel.
Machine Learning: Train models for demand prediction, price optimization, or segmentation.

📊 Example Insights

Which region generates the highest revenue?
What are the top 10 most profitable products?
Are some product categories more popular in certain regions?
Which customers contribute the most to total revenue?

🏷️ Tags

business · sales · profitability · forecasting · customer analysis · retail

📜 License

This dataset is synthetic and created for educational and analytical purposes. You are free to use, modify, and share it under the CC BY 4.0 License.

🙌 Acknowledgments

This dataset was generated to provide a realistic foundation for learning and practicing Data Analytics, Power BI, Tableau, Python, and Excel projects.

Clear search

Close search

Google apps

Main menu

Product Sales Dataset (2023-2024)

🛍️ Product Sales Dataset (2023–2024)

📌 Overview

📂 Dataset Structure

Features

🎯 Potential Use Cases

📊 Example Insights

🏷️ Tags

📜 License

🙌 Acknowledgments

product-database

Global Product Inventory Dataset 2025

amazon-product-data-sample

Open Data Portal (ODP) Bulk Datasets API (Search, Product Data, and...

Product Catalog Dataset

Dairy Supply Chain Sales Dataset

Company Datasets for Business Profiling

Marketing Bias data

Data cleaning using unstructured data

E-commerce Products Image Dataset

Sample Purchasing / Supply Chain Data

Steam Video Game and Bundle Data

Data from: Example Groundwater-Level Datasets and Benchmarking Results for...

Data from: Shopee Dataset

Datasets for Sentiment Analysis

Sample Sales Dataset

Facebook Datasets

Data from: Product Demand Forecasting Dataset

Products Demand Forecasting Dataset

Overview

Features

Target Variable

Data Range

Source

Acknowledgements

Fashion products dataset from gap.com

Product Sales Dataset (2023-2024)

US Product Sales Dataset : Orders,Category,Revenue,Region and more(200,000 rows)

🛍️ Product Sales Dataset (2023–2024)

📌 Overview

📂 Dataset Structure

Features

🎯 Potential Use Cases

📊 Example Insights

🏷️ Tags

📜 License

🙌 Acknowledgments