100+ datasets found
  1. New 1000 Sales Records Data 2

    • kaggle.com
    zip
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
    Explore at:
    zip(49305 bytes)Available download formats
    Dataset updated
    Jan 12, 2023
    Authors
    Calvin Oko Mensah
    Description

    This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.

  2. c

    Walmart Products Dataset – Free Product Data CSV

    • crawlfeeds.com
    csv, zip
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Walmart Products Dataset – Free Product Data CSV [Dataset]. https://crawlfeeds.com/datasets/walmart-products-free-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Dec 2, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Looking for a free Walmart product dataset? The Walmart Products Free Dataset delivers a ready-to-use ecommerce product data CSV containing ~2,100 verified product records from Walmart.com. It includes vital details like product titles, prices, categories, brand info, availability, and descriptions — perfect for data analysis, price comparison, market research, or building machine-learning models.

    Key Features

    Complete Product Metadata: Each entry includes URL, title, brand, SKU, price, currency, description, availability, delivery method, average rating, total ratings, image links, unique ID, and timestamp.

    CSV Format, Ready to Use: Download instantly - no need for scraping, cleaning or formatting.

    Good for E-commerce Research & ML: Ideal for product cataloging, price tracking, demand forecasting, recommendation systems, or data-driven projects.

    Free & Easy Access: Priced at USD $0.0, making it a great starting point for developers, data analysts or students.

    Who Benefits?

    • Data analysts & researchers exploring e-commerce trends or product catalog data.
    • Developers & data scientists building price-comparison tools, recommendation engines or ML models.
    • E-commerce strategists/marketers need product metadata for competitive analysis or market research.
    • Students/hobbyists needing a free dataset for learning or demo projects.

    Why Use This Dataset Instead of Manual Scraping?

    • Time-saving: No need to write scrapers or deal with rate limits.
    • Clean, structured data: All records are verified and already formatted in CSV, saving hours of cleaning.
    • Risk-free: Avoid Terms-of-Service issues or IP blocks that come with manual scraping.
      Instant access: Free and immediately downloadable.
  3. Comprehensive Supply Chain Analysis

    • kaggle.com
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dorothy Joel (2023). Comprehensive Supply Chain Analysis [Dataset]. https://www.kaggle.com/datasets/dorothyjoel/us-regional-sales
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 15, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Dorothy Joel
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This supply chain analysis provides a comprehensive view of the company's order and distribution processes, allowing for in-depth analysis and optimization of various aspects of the supply chain, from procurement and inventory management to sales and customer satisfaction. It empowers the company to make data-driven decisions to improve efficiency, reduce costs, and enhance customer experiences. The provided supply chain analysis dataset contains various columns that capture important information related to the company's order and distribution processes:

    • OrderNumber • Sales Channel • WarehouseCode • ProcuredDate • CurrencyCode • OrderDate • ShipDate • DeliveryDate • SalesTeamID • CustomerID • StoreID • ProductID • Order Quantity • Discount Applied • Unit Cost • Unit Price

  4. Orange dataset table

    • figshare.com
    xlsx
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Rui Simões
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

    Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

  5. Global Retail Sales Data: Orders, Reviews & Trends

    • kaggle.com
    zip
    Updated Dec 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adarsh Anil Kumar (2024). Global Retail Sales Data: Orders, Reviews & Trends [Dataset]. https://www.kaggle.com/datasets/adarsh0806/influencer-merchandise-sales
    Explore at:
    zip(125403 bytes)Available download formats
    Dataset updated
    Dec 10, 2024
    Authors
    Adarsh Anil Kumar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Global Retail Sales Data provided here is a self-generated synthetic dataset created using Random Sampling techniques provided by the Numpy Package. The dataset emulates information regarding merchandise sales through a retail website set up by a popular fictional influencer based in the US between the '23-'24 period. The influencer would sell clothing, ornaments and other products at variable rates through the retail website to all of their followers across the world. Imagine that the influencer executes high levels of promotions for the materials they sell, prompting more ratings and reviews from their followers, pushing more user engagement.

    This dataset is placed to help with practicing Sentiment Analysis or/and Time Series Analysis of sales, etc. as they are very important topics for Data Analyst prospects. The column description is given as follows:

    Order ID: Serves as an identifier for each order made.

    Order Date: The date when the order was made.

    Product ID: Serves as an identifier for the product that was ordered.

    Product Category: Category of Product sold(Clothing, Ornaments, Other).

    Buyer Gender: Genders of people that have ordered from the website (Male, Female).

    Buyer Age: Ages of the buyers.

    Order Location: The city where the order was made from.

    International Shipping: Whether the product was shipped internationally or not. (Yes/No)

    Sales Price: Price tag for the product.

    Shipping Charges: Extra charges for international shipments.

    Sales per Unit: Sales cost while including international shipping charges.

    Quantity: Quantity of the product bought.

    Total Sales: Total sales made through the purchase.

    Rating: User rating given for the order.

    Review: User review given for the order.

  6. Fake Dataset for Practice

    • kaggle.com
    zip
    Updated Aug 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuvo Kumar Basak-4004 (2023). Fake Dataset for Practice [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasak4004/fake-dataset-for-practice
    Explore at:
    zip(1515599 bytes)Available download formats
    Dataset updated
    Aug 21, 2023
    Authors
    Shuvo Kumar Basak-4004
    Description

    Description: This dataset is created solely for the purpose of practice and learning. It contains entirely fake and fabricated information, including names, phone numbers, emails, cities, ages, and other attributes. None of the information in this dataset corresponds to real individuals or entities. It serves as a resource for those who are learning data manipulation, analysis, and machine learning techniques. Please note that the data is completely fictional and should not be treated as representing any real-world scenarios or individuals.

    Attributes: - phone_number: Fake phone numbers in various formats. - name: Fictitious names generated for practice purposes. - email: Imaginary email addresses created for the dataset. - city: Made-up city names to simulate geographical diversity. - age: Randomly generated ages for practice analysis. - sex: Simulated gender values (Male, Female). - married_status: Synthetic marital status information. - job: Fictional job titles for practicing data analysis. - income: Fake income values for learning data manipulation. - religion: Pretend religious affiliations for practice. - nationality: Simulated nationalities for practice purposes.

    Please be aware that this dataset is not based on real data and should be used exclusively for educational purposes.

  7. g

    Insurance Dataset

    • gts.ai
    json
    Updated Oct 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2022). Insurance Dataset [Dataset]. https://gts.ai/case-study/insurance-dataset-annotation-services-for-precision-data-analysis/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Oct 16, 2022
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Insurance Dataset project is an extensive initiative focused on collecting and analyzing insurance-related data from various sources.

  8. Data Science Stack Exchange Dataset

    • kaggle.com
    zip
    Updated Jul 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aneesh Tickoo (2022). Data Science Stack Exchange Dataset [Dataset]. https://www.kaggle.com/datasets/aneeshtickoo/data-science-stack-exchange
    Explore at:
    zip(91829637 bytes)Available download formats
    Dataset updated
    Jul 11, 2022
    Authors
    Aneesh Tickoo
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Stack Exchange is a network of question-and-answer websites on topics in diverse fields, each site covering a specific topic, where questions, answers, and users are subject to a reputation award process. The reputation system allows the sites to be self-moderating.

    The dataset here is specific to one such network site of Stack Exchange named Data Science Stack Exchange. The dataset is distributed over multiple files. It contains information on various Posts on data science that can be used for language processing, it has data on which posts are being liked by users more, etc. A lot of analysis can be done on this dataset.

  9. h

    Data-Science-Instruct-Dataset

    • huggingface.co
    Updated May 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammed Habib Ahmed (2025). Data-Science-Instruct-Dataset [Dataset]. https://huggingface.co/datasets/HabibAhmed/Data-Science-Instruct-Dataset
    Explore at:
    Dataset updated
    May 3, 2025
    Authors
    Mohammed Habib Ahmed
    Description

    HabibAhmed/Data-Science-Instruct-Dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. Data from: Social Media Data Analysis

    • kaggle.com
    zip
    Updated Apr 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafe Muhtasim (2021). Social Media Data Analysis [Dataset]. https://www.kaggle.com/datasets/nafemuhtasim/social-media-data-analysis
    Explore at:
    zip(29081 bytes)Available download formats
    Dataset updated
    Apr 16, 2021
    Authors
    Nafe Muhtasim
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Nafe Muhtasim

    Released under CC0: Public Domain

    Contents

  11. Z

    Dataset: Shell Commands Used by Participants of Hands-on Cybersecurity...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valdemar Švábenský; Jan Vykopal; Pavel Seda; Pavel Čeleda (2023). Dataset: Shell Commands Used by Participants of Hands-on Cybersecurity Training [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5137354
    Explore at:
    Dataset updated
    Jul 18, 2023
    Dataset provided by
    Masaryk University
    Authors
    Valdemar Švábenský; Jan Vykopal; Pavel Seda; Pavel Čeleda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains supplementary materials for the following journal paper:

    Valdemar Švábenský, Jan Vykopal, Pavel Seda, Pavel Čeleda. Dataset of Shell Commands Used by Participants of Hands-on Cybersecurity Training. In Elsevier Data in Brief. 2021. https://doi.org/10.1016/j.dib.2021.107398

    How to cite

    If you use or build upon the materials, please use the BibTeX entry below to cite the original paper (not only this web link).

    @article{Svabensky2021dataset, author = {\v{S}v\'{a}bensk\'{y}, Valdemar and Vykopal, Jan and Seda, Pavel and \v{C}eleda, Pavel}, title = {{Dataset of Shell Commands Used by Participants of Hands-on Cybersecurity Training}}, journal = {Data in Brief}, publisher = {Elsevier}, volume = {38}, year = {2021}, issn = {2352-3409}, url = {https://doi.org/10.1016/j.dib.2021.107398}, doi = {10.1016/j.dib.2021.107398}, }

    The data were collected using a logging toolset referenced here.

    Attached content

    Dataset (data.zip). The collected data are attached here on Zenodo. A copy is also available in this repository.

    Analytical tools (toolset.zip). To analyze the data, you can instantiate the toolset or this project for ELK.

    Version history

    Version 1 (https://zenodo.org/record/5137355) contains 13446 log records from 175 trainees. These data are precisely those that are described in the associated journal paper. Version 1 provides a snapshot of the state when the article was published.

    Version 2 (https://zenodo.org/record/5517479) contains 13446 log records from 175 trainees. The data are unchanged from Version 1, but the analytical toolset includes a minor fix.

    Version 3 (https://zenodo.org/record/6670113) contains 21762 log records from 275 trainees. It is a superset of Version 2, with newly collected data added to the dataset.

    The current Version 4 (https://zenodo.org/record/8136017) contains 21459 log records from 275 trainees. Compared to Version 3, we cleaned 303 invalid/duplicate command records.

  12. R

    Data Analytics 2 Dataset

    • universe.roboflow.com
    zip
    Updated Jan 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRAFFIC SIGNS DATA ANALYTICS (2025). Data Analytics 2 Dataset [Dataset]. https://universe.roboflow.com/traffic-signs-data-analytics/data-analytics-2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 1, 2025
    Dataset authored and provided by
    TRAFFIC SIGNS DATA ANALYTICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    TRAFFIC LIGHTS Gztl Bounding Boxes
    Description

    DATA ANALYTICS 2

    ## Overview
    
    DATA ANALYTICS 2 is a dataset for object detection tasks - it contains TRAFFIC LIGHTS Gztl annotations for 8,579 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  13. Excel dataset

    • kaggle.com
    zip
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pinky Verma (2023). Excel dataset [Dataset]. https://www.kaggle.com/datasets/pinkyverma0256/excel-dataset
    Explore at:
    zip(13123 bytes)Available download formats
    Dataset updated
    Jun 29, 2023
    Authors
    Pinky Verma
    Description

    Dataset

    This dataset was created by Pinky Verma

    Contents

  14. D

    UnrealGaussianStat: Synthetic dataset for statistical analysis on Novel View...

    • dataverse.no
    • dataverse.azure.uit.no
    • +1more
    txt, zip
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anurag Dalal; Anurag Dalal (2025). UnrealGaussianStat: Synthetic dataset for statistical analysis on Novel View Synthesis [Dataset]. http://doi.org/10.18710/WSU7I6
    Explore at:
    txt(7447), zip(960339536)Available download formats
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    DataverseNO
    Authors
    Anurag Dalal; Anurag Dalal
    License

    https://dataverse.no/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18710/WSU7I6https://dataverse.no/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18710/WSU7I6

    Description

    The dataset comprises three dynamic scenes characterized by both simple and complex lighting conditions. The quantity of cameras ranges from 4 to 512, including 4, 6, 8, 10, 12, 14, 16, 32, 64, 128, 256, and 512. The point clouds are randomly generated.

  15. r

    QoG OECD Dataset

    • researchdata.se
    • gimi9.com
    Updated Aug 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Teorell; Staffan Kumlin; Sören Holmberg; Bo Rothstein; Aksel Sundström (2024). QoG OECD Dataset [Dataset]. http://doi.org/10.18157/qogoecdjan22
    Explore at:
    (88537037)Available download formats
    Dataset updated
    Aug 6, 2024
    Dataset provided by
    University of Gothenburg
    Authors
    Jan Teorell; Staffan Kumlin; Sören Holmberg; Bo Rothstein; Aksel Sundström
    Time period covered
    2015 - 2021
    Description

    The QoG Institute is an independent research institute within the Department of Political Science at the University of Gothenburg. The main objective of our research is to address the theoretical and empirical problem of how political institutions of high quality can be created and maintained.

    To achieve said goal, the QoG Institute makes comparative data on QoG and its correlates publicly available. To accomplish this, we have compiled several datasets that draw on a number of freely available data sources, including aggregated individual-level data.

    The QoG OECD Datasets focus exclusively on OECD member countries. They have a high data coverage in terms of geography and time. In the QoG OECD TS dataset, data from 1946 to 2021 is included and the unit of analysis is country-year (e.g., Sweden-1946, Sweden-1947, etc.).

    In the QoG OECD Cross-Section dataset, data from and around 2018 is included. Data from 2018 is prioritized, however, if no data are available for a country for 2018, data for 2019 is included. If no data for 2019 exists, data for 2017 is included, and so on up to a maximum of +/- 3 years. In the QoG OECD Time-Series dataset, data from 1946 to 2021 are included and the unit of analysis is country-year (e.g. Sweden-1946, Sweden-1947 and so on).

    The QoG OECD Datasets focus exclusively on OECD member countries. They have a high data coverage in terms of geography and time.

    In the QoG OECD Cross-Section dataset, data from and around 2018 is included. Data from 2018 is prioritized, however, if no data are available for a country for 2018, data for 2019 is included. If no data for 2019 exists, data for 2017 is included, and so on up to a maximum of +/- 3 years. In the QoG OECD Time-Series dataset, data from 1946 to 2021 are included and the unit of analysis is country-year (e.g. Sweden-1946, Sweden-1947 and so on).

  16. Social Media and Mental Health

    • kaggle.com
    zip
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SouvikAhmed071 (2023). Social Media and Mental Health [Dataset]. https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health
    Explore at:
    zip(10944 bytes)Available download formats
    Dataset updated
    Jul 18, 2023
    Authors
    SouvikAhmed071
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This dataset was originally collected for a data science and machine learning project that aimed at investigating the potential correlation between the amount of time an individual spends on social media and the impact it has on their mental health.

    The project involves conducting a survey to collect data, organizing the data, and using machine learning techniques to create a predictive model that can determine whether a person should seek professional help based on their answers to the survey questions.

    This project was completed as part of a Statistics course at a university, and the team is currently in the process of writing a report and completing a paper that summarizes and discusses the findings in relation to other research on the topic.

    The following is the Google Colab link to the project, done on Jupyter Notebook -

    https://colab.research.google.com/drive/1p7P6lL1QUw1TtyUD1odNR4M6TVJK7IYN

    The following is the GitHub Repository of the project -

    https://github.com/daerkns/social-media-and-mental-health

    Libraries used for the Project -

    Pandas
    Numpy
    Matplotlib
    Seaborn
    Sci-kit Learn
    
  17. HPC-ODA Dataset Collection

    • data.europa.eu
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). HPC-ODA Dataset Collection [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-3701440?locale=pt
    Explore at:
    unknown(1483441742)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    HPC-ODA is a collection of datasets acquired on production HPC systems, which are representative of several real-world use cases in the field of Operational Data Analytics (ODA) for the improvement of reliability and energy efficiency. The datasets are composed of monitoring sensor data, acquired from the components of different HPC systems depending on the specific use case. Two tools, whose overhead is proven to be very light, were used to acquire data in HPC-ODA: these are the DCDB and LDMS monitoring frameworks. The aim of HPC-ODA is to provide several vertical slices (here named segments) of the monitoring data available in a large-scale HPC installation. The segments all have different granularities, in terms of data sources and time scale, and provide several use cases on which models and approaches to data processing can be evaluated. While having a production dataset from a whole HPC system - from the infrastructure down to the CPU core level - at a fine time granularity would be ideal, this is often not feasible due to the confidentiality of the data, as well as the sheer amount of storage space required. HPC-ODA includes 5 different segments: Power Consumption Prediction: a fine-granularity dataset that was collected from a single compute node in a HPC system. It contains both node-level data as well as per-CPU core metrics, and can be used to perform regression tasks such as power consumption prediction. Fault Detection: a medium-granularity dataset that was collected from a single compute node while it was subjected to fault injection. It contains only node-level data, as well as the labels for both the applications and faults being executed on the HPC node in time. This dataset can be used to perform fault classification. Application Classification: a medium-granularity dataset that was collected from 16 compute nodes in a HPC system while running different parallel MPI applications. Data is at the compute node level, separated for each of them, and is paired with the labels of the applications being executed. This dataset can be used for tasks such as application classification. Infrastructure Management: a coarse-granularity dataset containing cluster-wide data from a HPC system, about its warm water cooling system as well as power consumption. The data is at the rack level, and can be used for regression tasks such as outlet water temperature or removed heat prediction. Cross-architecture: a medium-granularity dataset that is a variant of the Application Classification one, and shares the same ODA use case. Here, however, single-node configurations of the applications were executed on three different compute node types with different CPU architectures. This dataset can be used to perform cross-architecture application classification, or performance comparison studies. The HPC-ODA dataset collection includes a readme document containing all necessary usage information, as well as a lightweight Python framework to carry out the ODA tasks described for each dataset.

  18. m

    Speech Dataset of Human and AI-Generated Voices

    • data.mendeley.com
    • kaggle.com
    Updated Sep 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huzain Azis (2025). Speech Dataset of Human and AI-Generated Voices [Dataset]. http://doi.org/10.17632/5czyx2vppv.2
    Explore at:
    Dataset updated
    Sep 15, 2025
    Authors
    Huzain Azis
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset consists of audio recordings in Indonesian language, categorized into two distinct classes: human voices (real) and synthetic voices generated using artificial intelligence (AI). Each class comprises 21 audio files, resulting in a total of 42 audio files. Each recording has a duration ranging from approximately 4 to 9 minutes, with an average length of around 6 minutes per file. All recordings are provided in WAV format and accompanied by a CSV file containing detailed duration metadata for each audio file.

    This dataset is suitable for research and applications in speech recognition, voice authenticity detection, audio analysis, and related fields. It enables comparative analysis between natural Indonesian speech and AI-generated synthetic speech.

  19. g

    Dataset for 'Signature Analysis of High-Throughput Transcriptomics Screening...

    • gimi9.com
    • catalog.data.gov
    Updated Jan 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Dataset for 'Signature Analysis of High-Throughput Transcriptomics Screening Data for Mechanistic Inference and Chemical Grouping' [Dataset]. https://gimi9.com/dataset/data-gov_dataset-for-signature-analysis-of-high-throughput-transcriptomics-screening-data-for-mecha/
    Explore at:
    Dataset updated
    Jan 9, 2025
    Description

    Dataset for Harrill, J.A. et al., 'Signature Analysis of High-Throughput Transcriptomics Screening Data for Mechanistic Inference and Chemical Grouping' published in Toxicological Sciences, https://doi.org/10.1093/toxsci/kfae108 This dataset contains gene expression profiles and gene signature concentration-response modeling results for 1751 unique chemicals. The chemicals were tested in MCF7 cells using an exposure duration of six hours. The datasets also contains the results of molecular target enrichment and chemotype enrichment analyses performed downstream of the gene signature concentration-response modeling. Descriptions of each data file can be found in the supplementary material of the published article that is hosted by the journal. This dataset is associated with the following publication: Harrill, J., L. Everett, D. Haggard, L. Word, J. Bundy, B. Chambers, D. Harris, C. Willis, R. Thomas, I. Shah, and R. Judson. Signature Analysis of High-Throughput Transcriptomics Screening Data for Mechanistic Inference and Chemical Grouping. TOXICOLOGICAL SCIENCES. Society of Toxicology, RESTON, VA, 202(1): 103-122, (2024).

  20. Apple_Retail_Sales_Dataset

    • kaggle.com
    zip
    Updated Sep 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman Garg (2025). Apple_Retail_Sales_Dataset [Dataset]. https://www.kaggle.com/datasets/amangarg08/apple-retail-sales-dataset
    Explore at:
    zip(13514783 bytes)Available download formats
    Dataset updated
    Sep 28, 2025
    Authors
    Aman Garg
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains over 1 million rows of Apple Retail Sales data. It includes information on products, stores, sales transactions, and warranty claims across various Apple retail locations worldwide.

    The dataset is designed to reflect real-world business scenarios — including multiple product categories, regional sales variations, and customer service data — making it suitable for end-to-end data analytics and machine learning projects.

    Important Note

    This dataset is not based on real Apple Inc. data. It was created using Python and LLM-generated insights to simulate realistic sales patterns and business metrics.

    Like most company-related datasets on Kaggle (e.g., Amazon, Tesla, or Samsung), this one is synthetic, as companies do not share their actual sales or confidential data publicly due to privacy and legal restrictions.

    Purpose

    This dataset is intended for: Practicing data analysis, visualization, and forecasting Building and testing machine learning models Learning ETL and data-cleaning workflows on large datasets

    Usage You may freely use, modify, and share this dataset for learning, research, or portfolio projects.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
Organization logo

New 1000 Sales Records Data 2

Explore at:
zip(49305 bytes)Available download formats
Dataset updated
Jan 12, 2023
Authors
Calvin Oko Mensah
Description

This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.

Search
Clear search
Close search
Google apps
Main menu