100+ datasets found

New 1000 Sales Records Data 2
kaggle.com
zip
Updated Jan 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
Explore at:
zip(49305 bytes)Available download formats
Dataset updated
Jan 12, 2023
Authors
Calvin Oko Mensah
Description
This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.
c
Walmart Products Dataset – Free Product Data CSV
crawlfeeds.com
csv, zip
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Walmart Products Dataset – Free Product Data CSV [Dataset]. https://crawlfeeds.com/datasets/walmart-products-free-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Dec 2, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Looking for a free Walmart product dataset? The Walmart Products Free Dataset delivers a ready-to-use ecommerce product data CSV containing ~2,100 verified product records from Walmart.com. It includes vital details like product titles, prices, categories, brand info, availability, and descriptions — perfect for data analysis, price comparison, market research, or building machine-learning models.

Key Features

Complete Product Metadata: Each entry includes URL, title, brand, SKU, price, currency, description, availability, delivery method, average rating, total ratings, image links, unique ID, and timestamp.

CSV Format, Ready to Use: Download instantly - no need for scraping, cleaning or formatting.

Good for E-commerce Research & ML: Ideal for product cataloging, price tracking, demand forecasting, recommendation systems, or data-driven projects.

Free & Easy Access: Priced at USD $0.0, making it a great starting point for developers, data analysts or students.

Who Benefits?

Data analysts & researchers exploring e-commerce trends or product catalog data.

Developers & data scientists building price-comparison tools, recommendation engines or ML models.

E-commerce strategists/marketers need product metadata for competitive analysis or market research.

Students/hobbyists needing a free dataset for learning or demo projects.

Why Use This Dataset Instead of Manual Scraping?

Time-saving: No need to write scrapers or deal with rate limits.

Clean, structured data: All records are verified and already formatted in CSV, saving hours of cleaning.

Risk-free: Avoid Terms-of-Service issues or IP blocks that come with manual scraping.
Instant access: Free and immediately downloadable.
Comprehensive Supply Chain Analysis
kaggle.com
Updated Sep 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dorothy Joel (2023). Comprehensive Supply Chain Analysis [Dataset]. https://www.kaggle.com/datasets/dorothyjoel/us-regional-sales
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dorothy Joel
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This supply chain analysis provides a comprehensive view of the company's order and distribution processes, allowing for in-depth analysis and optimization of various aspects of the supply chain, from procurement and inventory management to sales and customer satisfaction. It empowers the company to make data-driven decisions to improve efficiency, reduce costs, and enhance customer experiences. The provided supply chain analysis dataset contains various columns that capture important information related to the company's order and distribution processes:

• OrderNumber • Sales Channel • WarehouseCode • ProcuredDate • CurrencyCode • OrderDate • ShipDate • DeliveryDate • SalesTeamID • CustomerID • StoreID • ProductID • Order Quantity • Discount Applied • Unit Cost • Unit Price
Orange dataset table
figshare.com
xlsx
Updated Mar 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19146410.v1
Dataset updated
Mar 4, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Rui Simões
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
Global Retail Sales Data: Orders, Reviews & Trends
kaggle.com
zip
Updated Dec 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adarsh Anil Kumar (2024). Global Retail Sales Data: Orders, Reviews & Trends [Dataset]. https://www.kaggle.com/datasets/adarsh0806/influencer-merchandise-sales
Explore at:
zip(125403 bytes)Available download formats
Dataset updated
Dec 10, 2024
Authors
Adarsh Anil Kumar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The Global Retail Sales Data provided here is a self-generated synthetic dataset created using Random Sampling techniques provided by the Numpy Package. The dataset emulates information regarding merchandise sales through a retail website set up by a popular fictional influencer based in the US between the '23-'24 period. The influencer would sell clothing, ornaments and other products at variable rates through the retail website to all of their followers across the world. Imagine that the influencer executes high levels of promotions for the materials they sell, prompting more ratings and reviews from their followers, pushing more user engagement.

This dataset is placed to help with practicing Sentiment Analysis or/and Time Series Analysis of sales, etc. as they are very important topics for Data Analyst prospects. The column description is given as follows:

Order ID: Serves as an identifier for each order made.

Order Date: The date when the order was made.

Product ID: Serves as an identifier for the product that was ordered.

Product Category: Category of Product sold(Clothing, Ornaments, Other).

Buyer Gender: Genders of people that have ordered from the website (Male, Female).

Buyer Age: Ages of the buyers.

Order Location: The city where the order was made from.

International Shipping: Whether the product was shipped internationally or not. (Yes/No)

Sales Price: Price tag for the product.

Shipping Charges: Extra charges for international shipments.

Sales per Unit: Sales cost while including international shipping charges.

Quantity: Quantity of the product bought.

Total Sales: Total sales made through the purchase.

Rating: User rating given for the order.

Review: User review given for the order.
Fake Dataset for Practice
kaggle.com
zip
Updated Aug 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuvo Kumar Basak-4004 (2023). Fake Dataset for Practice [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasak4004/fake-dataset-for-practice
Explore at:
zip(1515599 bytes)Available download formats
Dataset updated
Aug 21, 2023
Authors
Shuvo Kumar Basak-4004
Description
Description: This dataset is created solely for the purpose of practice and learning. It contains entirely fake and fabricated information, including names, phone numbers, emails, cities, ages, and other attributes. None of the information in this dataset corresponds to real individuals or entities. It serves as a resource for those who are learning data manipulation, analysis, and machine learning techniques. Please note that the data is completely fictional and should not be treated as representing any real-world scenarios or individuals.

Attributes: - phone_number: Fake phone numbers in various formats. - name: Fictitious names generated for practice purposes. - email: Imaginary email addresses created for the dataset. - city: Made-up city names to simulate geographical diversity. - age: Randomly generated ages for practice analysis. - sex: Simulated gender values (Male, Female). - married_status: Synthetic marital status information. - job: Fictional job titles for practicing data analysis. - income: Fake income values for learning data manipulation. - religion: Pretend religious affiliations for practice. - nationality: Simulated nationalities for practice purposes.

Please be aware that this dataset is not based on real data and should be used exclusively for educational purposes.
g
Insurance Dataset
gts.ai
json
Updated Oct 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2022). Insurance Dataset [Dataset]. https://gts.ai/case-study/insurance-dataset-annotation-services-for-precision-data-analysis/
Explore at:
jsonAvailable download formats
Dataset updated
Oct 16, 2022
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Insurance Dataset project is an extensive initiative focused on collecting and analyzing insurance-related data from various sources.
Data Science Stack Exchange Dataset
kaggle.com
zip
Updated Jul 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aneesh Tickoo (2022). Data Science Stack Exchange Dataset [Dataset]. https://www.kaggle.com/datasets/aneeshtickoo/data-science-stack-exchange
Explore at:
zip(91829637 bytes)Available download formats
Dataset updated
Jul 11, 2022
Authors
Aneesh Tickoo
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Stack Exchange is a network of question-and-answer websites on topics in diverse fields, each site covering a specific topic, where questions, answers, and users are subject to a reputation award process. The reputation system allows the sites to be self-moderating.

The dataset here is specific to one such network site of Stack Exchange named Data Science Stack Exchange. The dataset is distributed over multiple files. It contains information on various Posts on data science that can be used for language processing, it has data on which posts are being liked by users more, etc. A lot of analysis can be done on this dataset.
h
Data-Science-Instruct-Dataset
huggingface.co
Updated May 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammed Habib Ahmed (2025). Data-Science-Instruct-Dataset [Dataset]. https://huggingface.co/datasets/HabibAhmed/Data-Science-Instruct-Dataset
Explore at:
Dataset updated
May 3, 2025
Authors
Mohammed Habib Ahmed
Description
HabibAhmed/Data-Science-Instruct-Dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Data from: Social Media Data Analysis
kaggle.com
zip
Updated Apr 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nafe Muhtasim (2021). Social Media Data Analysis [Dataset]. https://www.kaggle.com/datasets/nafemuhtasim/social-media-data-analysis
Explore at:
zip(29081 bytes)Available download formats
Dataset updated
Apr 16, 2021
Authors
Nafe Muhtasim
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Nafe Muhtasim

Released under CC0: Public Domain

Contents
Z
Dataset: Shell Commands Used by Participants of Hands-on Cybersecurity...
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jul 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valdemar Švábenský; Jan Vykopal; Pavel Seda; Pavel Čeleda (2023). Dataset: Shell Commands Used by Participants of Hands-on Cybersecurity Training [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5137354
Explore at:
Dataset updated
Jul 18, 2023
Dataset provided by
Masaryk University
Authors
Valdemar Švábenský; Jan Vykopal; Pavel Seda; Pavel Čeleda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains supplementary materials for the following journal paper:

Valdemar Švábenský, Jan Vykopal, Pavel Seda, Pavel Čeleda. Dataset of Shell Commands Used by Participants of Hands-on Cybersecurity Training. In Elsevier Data in Brief. 2021. https://doi.org/10.1016/j.dib.2021.107398

How to cite

If you use or build upon the materials, please use the BibTeX entry below to cite the original paper (not only this web link).

@article{Svabensky2021dataset, author = {\v{S}v\'{a}bensk\'{y}, Valdemar and Vykopal, Jan and Seda, Pavel and \v{C}eleda, Pavel}, title = {{Dataset of Shell Commands Used by Participants of Hands-on Cybersecurity Training}}, journal = {Data in Brief}, publisher = {Elsevier}, volume = {38}, year = {2021}, issn = {2352-3409}, url = {https://doi.org/10.1016/j.dib.2021.107398}, doi = {10.1016/j.dib.2021.107398}, }

The data were collected using a logging toolset referenced here.

Attached content

Dataset (data.zip). The collected data are attached here on Zenodo. A copy is also available in this repository.

Analytical tools (toolset.zip). To analyze the data, you can instantiate the toolset or this project for ELK.

Version history

Version 1 (https://zenodo.org/record/5137355) contains 13446 log records from 175 trainees. These data are precisely those that are described in the associated journal paper. Version 1 provides a snapshot of the state when the article was published.

Version 2 (https://zenodo.org/record/5517479) contains 13446 log records from 175 trainees. The data are unchanged from Version 1, but the analytical toolset includes a minor fix.

Version 3 (https://zenodo.org/record/6670113) contains 21762 log records from 275 trainees. It is a superset of Version 2, with newly collected data added to the dataset.

The current Version 4 (https://zenodo.org/record/8136017) contains 21459 log records from 275 trainees. Compared to Version 3, we cleaned 303 invalid/duplicate command records.
R
Data Analytics 2 Dataset
universe.roboflow.com
zip
Updated Jan 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRAFFIC SIGNS DATA ANALYTICS (2025). Data Analytics 2 Dataset [Dataset]. https://universe.roboflow.com/traffic-signs-data-analytics/data-analytics-2
Explore at:
zipAvailable download formats
Dataset updated
Jan 1, 2025
Dataset authored and provided by
TRAFFIC SIGNS DATA ANALYTICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
TRAFFIC LIGHTS Gztl Bounding Boxes
Description
DATA ANALYTICS 2

## Overview DATA ANALYTICS 2 is a dataset for object detection tasks - it contains TRAFFIC LIGHTS Gztl annotations for 8,579 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Excel dataset
kaggle.com
zip
Updated Jun 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pinky Verma (2023). Excel dataset [Dataset]. https://www.kaggle.com/datasets/pinkyverma0256/excel-dataset
Explore at:
zip(13123 bytes)Available download formats
Dataset updated
Jun 29, 2023
Authors
Pinky Verma
Description
Dataset

This dataset was created by Pinky Verma

Contents
D
UnrealGaussianStat: Synthetic dataset for statistical analysis on Novel View...
dataverse.no
dataverse.azure.uit.no
+1more
txt, zip
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anurag Dalal; Anurag Dalal (2025). UnrealGaussianStat: Synthetic dataset for statistical analysis on Novel View Synthesis [Dataset]. http://doi.org/10.18710/WSU7I6
Explore at:
txt(7447), zip(960339536)Available download formats
Unique identifier
https://doi.org/10.18710/WSU7I6
Dataset updated
Apr 10, 2025
Dataset provided by
DataverseNO
Authors
Anurag Dalal; Anurag Dalal
License
https://dataverse.no/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18710/WSU7I6https://dataverse.no/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18710/WSU7I6
Description
The dataset comprises three dynamic scenes characterized by both simple and complex lighting conditions. The quantity of cameras ranges from 4 to 512, including 4, 6, 8, 10, 12, 14, 16, 32, 64, 128, 256, and 512. The point clouds are randomly generated.
r
QoG OECD Dataset
researchdata.se
gimi9.com
Updated Aug 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Teorell; Staffan Kumlin; Sören Holmberg; Bo Rothstein; Aksel Sundström (2024). QoG OECD Dataset [Dataset]. http://doi.org/10.18157/qogoecdjan22
Explore at:
(88537037)Available download formats
Unique identifier
https://doi.org/10.18157/qogoecdjan22
Dataset updated
Aug 6, 2024
Dataset provided by
University of Gothenburg
Authors
Jan Teorell; Staffan Kumlin; Sören Holmberg; Bo Rothstein; Aksel Sundström
Time period covered
2015 - 2021
Description
The QoG Institute is an independent research institute within the Department of Political Science at the University of Gothenburg. The main objective of our research is to address the theoretical and empirical problem of how political institutions of high quality can be created and maintained.

To achieve said goal, the QoG Institute makes comparative data on QoG and its correlates publicly available. To accomplish this, we have compiled several datasets that draw on a number of freely available data sources, including aggregated individual-level data.

The QoG OECD Datasets focus exclusively on OECD member countries. They have a high data coverage in terms of geography and time. In the QoG OECD TS dataset, data from 1946 to 2021 is included and the unit of analysis is country-year (e.g., Sweden-1946, Sweden-1947, etc.).

In the QoG OECD Cross-Section dataset, data from and around 2018 is included. Data from 2018 is prioritized, however, if no data are available for a country for 2018, data for 2019 is included. If no data for 2019 exists, data for 2017 is included, and so on up to a maximum of +/- 3 years. In the QoG OECD Time-Series dataset, data from 1946 to 2021 are included and the unit of analysis is country-year (e.g. Sweden-1946, Sweden-1947 and so on).

The QoG OECD Datasets focus exclusively on OECD member countries. They have a high data coverage in terms of geography and time.

In the QoG OECD Cross-Section dataset, data from and around 2018 is included. Data from 2018 is prioritized, however, if no data are available for a country for 2018, data for 2019 is included. If no data for 2019 exists, data for 2017 is included, and so on up to a maximum of +/- 3 years. In the QoG OECD Time-Series dataset, data from 1946 to 2021 are included and the unit of analysis is country-year (e.g. Sweden-1946, Sweden-1947 and so on).
Social Media and Mental Health
kaggle.com
zip
Updated Jul 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SouvikAhmed071 (2023). Social Media and Mental Health [Dataset]. https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health
Explore at:
zip(10944 bytes)Available download formats
Dataset updated
Jul 18, 2023
Authors
SouvikAhmed071
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
This dataset was originally collected for a data science and machine learning project that aimed at investigating the potential correlation between the amount of time an individual spends on social media and the impact it has on their mental health.

The project involves conducting a survey to collect data, organizing the data, and using machine learning techniques to create a predictive model that can determine whether a person should seek professional help based on their answers to the survey questions.

This project was completed as part of a Statistics course at a university, and the team is currently in the process of writing a report and completing a paper that summarizes and discusses the findings in relation to other research on the topic.

The following is the Google Colab link to the project, done on Jupyter Notebook -

https://colab.research.google.com/drive/1p7P6lL1QUw1TtyUD1odNR4M6TVJK7IYN

The following is the GitHub Repository of the project -

https://github.com/daerkns/social-media-and-mental-health

Libraries used for the Project -

Pandas Numpy Matplotlib Seaborn Sci-kit Learn
HPC-ODA Dataset Collection
data.europa.eu
unknown
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). HPC-ODA Dataset Collection [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-3701440?locale=pt
Explore at:
unknown(1483441742)Available download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
HPC-ODA is a collection of datasets acquired on production HPC systems, which are representative of several real-world use cases in the field of Operational Data Analytics (ODA) for the improvement of reliability and energy efficiency. The datasets are composed of monitoring sensor data, acquired from the components of different HPC systems depending on the specific use case. Two tools, whose overhead is proven to be very light, were used to acquire data in HPC-ODA: these are the DCDB and LDMS monitoring frameworks. The aim of HPC-ODA is to provide several vertical slices (here named segments) of the monitoring data available in a large-scale HPC installation. The segments all have different granularities, in terms of data sources and time scale, and provide several use cases on which models and approaches to data processing can be evaluated. While having a production dataset from a whole HPC system - from the infrastructure down to the CPU core level - at a fine time granularity would be ideal, this is often not feasible due to the confidentiality of the data, as well as the sheer amount of storage space required. HPC-ODA includes 5 different segments: Power Consumption Prediction: a fine-granularity dataset that was collected from a single compute node in a HPC system. It contains both node-level data as well as per-CPU core metrics, and can be used to perform regression tasks such as power consumption prediction. Fault Detection: a medium-granularity dataset that was collected from a single compute node while it was subjected to fault injection. It contains only node-level data, as well as the labels for both the applications and faults being executed on the HPC node in time. This dataset can be used to perform fault classification. Application Classification: a medium-granularity dataset that was collected from 16 compute nodes in a HPC system while running different parallel MPI applications. Data is at the compute node level, separated for each of them, and is paired with the labels of the applications being executed. This dataset can be used for tasks such as application classification. Infrastructure Management: a coarse-granularity dataset containing cluster-wide data from a HPC system, about its warm water cooling system as well as power consumption. The data is at the rack level, and can be used for regression tasks such as outlet water temperature or removed heat prediction. Cross-architecture: a medium-granularity dataset that is a variant of the Application Classification one, and shares the same ODA use case. Here, however, single-node configurations of the applications were executed on three different compute node types with different CPU architectures. This dataset can be used to perform cross-architecture application classification, or performance comparison studies. The HPC-ODA dataset collection includes a readme document containing all necessary usage information, as well as a lightweight Python framework to carry out the ODA tasks described for each dataset.
m
Speech Dataset of Human and AI-Generated Voices
data.mendeley.com
kaggle.com
Updated Sep 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huzain Azis (2025). Speech Dataset of Human and AI-Generated Voices [Dataset]. http://doi.org/10.17632/5czyx2vppv.2
Explore at:
Unique identifier
https://doi.org/10.17632/5czyx2vppv.2
Dataset updated
Sep 15, 2025
Authors
Huzain Azis
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset consists of audio recordings in Indonesian language, categorized into two distinct classes: human voices (real) and synthetic voices generated using artificial intelligence (AI). Each class comprises 21 audio files, resulting in a total of 42 audio files. Each recording has a duration ranging from approximately 4 to 9 minutes, with an average length of around 6 minutes per file. All recordings are provided in WAV format and accompanied by a CSV file containing detailed duration metadata for each audio file.

This dataset is suitable for research and applications in speech recognition, voice authenticity detection, audio analysis, and related fields. It enables comparative analysis between natural Indonesian speech and AI-generated synthetic speech.
g
Dataset for 'Signature Analysis of High-Throughput Transcriptomics Screening...
gimi9.com
catalog.data.gov
Updated Jan 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Dataset for 'Signature Analysis of High-Throughput Transcriptomics Screening Data for Mechanistic Inference and Chemical Grouping' [Dataset]. https://gimi9.com/dataset/data-gov_dataset-for-signature-analysis-of-high-throughput-transcriptomics-screening-data-for-mecha/
Explore at:
Dataset updated
Jan 9, 2025
Description
Dataset for Harrill, J.A. et al., 'Signature Analysis of High-Throughput Transcriptomics Screening Data for Mechanistic Inference and Chemical Grouping' published in Toxicological Sciences, https://doi.org/10.1093/toxsci/kfae108 This dataset contains gene expression profiles and gene signature concentration-response modeling results for 1751 unique chemicals. The chemicals were tested in MCF7 cells using an exposure duration of six hours. The datasets also contains the results of molecular target enrichment and chemotype enrichment analyses performed downstream of the gene signature concentration-response modeling. Descriptions of each data file can be found in the supplementary material of the published article that is hosted by the journal. This dataset is associated with the following publication: Harrill, J., L. Everett, D. Haggard, L. Word, J. Bundy, B. Chambers, D. Harris, C. Willis, R. Thomas, I. Shah, and R. Judson. Signature Analysis of High-Throughput Transcriptomics Screening Data for Mechanistic Inference and Chemical Grouping. TOXICOLOGICAL SCIENCES. Society of Toxicology, RESTON, VA, 202(1): 103-122, (2024).
Apple_Retail_Sales_Dataset
kaggle.com
zip
Updated Sep 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aman Garg (2025). Apple_Retail_Sales_Dataset [Dataset]. https://www.kaggle.com/datasets/amangarg08/apple-retail-sales-dataset
Explore at:
zip(13514783 bytes)Available download formats
Dataset updated
Sep 28, 2025
Authors
Aman Garg
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains over 1 million rows of Apple Retail Sales data. It includes information on products, stores, sales transactions, and warranty claims across various Apple retail locations worldwide.

The dataset is designed to reflect real-world business scenarios — including multiple product categories, regional sales variations, and customer service data — making it suitable for end-to-end data analytics and machine learning projects.

Important Note

This dataset is not based on real Apple Inc. data. It was created using Python and LLM-generated insights to simulate realistic sales patterns and business metrics.

Like most company-related datasets on Kaggle (e.g., Amazon, Tesla, or Samsung), this one is synthetic, as companies do not share their actual sales or confidential data publicly due to privacy and legal restrictions.

Purpose

This dataset is intended for: Practicing data analysis, visualization, and forecasting Building and testing machine learning models Learning ETL and data-cleaning workflows on large datasets

Usage You may freely use, modify, and share this dataset for learning, research, or portfolio projects.

Facebook

Twitter

Click to copy link

Link copied

Cite

Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2

New 1000 Sales Records Data 2

Explore at:

zip(49305 bytes)Available download formats

Dataset updated

Jan 12, 2023

Authors

Calvin Oko Mensah

Description

This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.

Clear search

Close search

Google apps

Main menu

New 1000 Sales Records Data 2

Walmart Products Dataset – Free Product Data CSV

Key Features

Who Benefits?

Why Use This Dataset Instead of Manual Scraping?

Comprehensive Supply Chain Analysis

Orange dataset table

Global Retail Sales Data: Orders, Reviews & Trends

Fake Dataset for Practice

Insurance Dataset

Data Science Stack Exchange Dataset

Data-Science-Instruct-Dataset

Data from: Social Media Data Analysis

Dataset

Contents

Dataset: Shell Commands Used by Participants of Hands-on Cybersecurity...

Data Analytics 2 Dataset

DATA ANALYTICS 2

Excel dataset

Dataset

Contents

UnrealGaussianStat: Synthetic dataset for statistical analysis on Novel View...

QoG OECD Dataset

Social Media and Mental Health

HPC-ODA Dataset Collection

Speech Dataset of Human and AI-Generated Voices

Dataset for 'Signature Analysis of High-Throughput Transcriptomics Screening...

Apple_Retail_Sales_Dataset

New 1000 Sales Records Data 2