6 datasets found
  1. Explainable AI (XAI) Drilling Dataset

    • kaggle.com
    Updated Aug 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raphael Wallsberger (2023). Explainable AI (XAI) Drilling Dataset [Dataset]. https://www.kaggle.com/datasets/raphaelwallsberger/xai-drilling-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Raphael Wallsberger
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset is part of the following publication at the TransAI 2023 conference: R. Wallsberger, R. Knauer, S. Matzka; "Explainable Artificial Intelligence in Mechanical Engineering: A Synthetic Dataset for Comprehensive Failure Mode Analysis" DOI: http://dx.doi.org/10.1109/TransAI60598.2023.00032

    This is the original XAI Drilling dataset optimized for XAI purposes and it can be used to evaluate explanations of such algortihms. The dataset comprises 20,000 data points, i.e., drilling operations, stored as rows, 10 features, one binary main failure label, and 4 binary subgroup failure modes, stored in columns. The main failure rate is about 5.0 % for the whole dataset. The features that constitute this dataset are as follows:

    • ID: Every data point in the dataset is uniquely identifiable, thanks to the ID feature. This ensures traceability and easy referencing, especially when analyzing specific drilling scenarios or anomalies.
    • Cutting speed vc (m/min): The cutting speed is a pivotal parameter in drilling, influencing the efficiency and quality of the drilling process. It represents the speed at which the drill bit's cutting edge moves through the material.
    • Spindle speed n (1/min): This feature captures the rotational speed of the spindle or drill bit, respectively.
    • Feed f (mm/rev): Feed denotes the depth the drill bit penetrates into the material with each revolution. There is a balance between speed and precision, with higher feeds leading to faster drilling but potentially compromising hole quality.
    • Feed rate vf (mm/min): The feed rate is a measure of how quickly the material is fed to the drill bit. It is a determinant of the overall drilling time and influences the heat generated during the process.
    • Power Pc (kW): The power consumption during drilling can be indicative of the efficiency of the process and the wear state of the drill bit.
    • Cooling (%): Effective cooling is paramount in drilling, preventing overheating and reducing wear. This ordinal feature captures the cooling level applied, with four distinct states representing no cooling (0%), partial cooling (25% and 50%), and high to full cooling (75% and 100%).
    • Material: The type of material being drilled can significantly influence the drilling parameters and outcomes. This dataset encompasses three primary materials: C45K hot-rolled heat-treatable steel (EN 1.0503), cast iron GJL (EN GJL-250), and aluminum-silicon (AlSi) alloy (EN AC-42000), each presenting its unique challenges and considerations. The three materials are represented as “P (Steel)” for C45K, “K (Cast Iron)” for cast iron GJL and “N (Non-ferrous metal)” for AlSi alloy.
    • Drill Bit Type: Different materials often require specialized drill bits. This feature categorizes the type of drill bit used, ensuring compatibility with the material and optimizing the drilling process. It consists of three categories, which are based on the DIN 1836: “N” for C45K, “H” for cast iron and “W” for AlSi alloy [5].
    • Process time t (s): This feature captures the full duration of each drilling operation, providing insights into efficiency and potential bottlenecks.

    • Main failure: This binary feature indicates if any significant failure on the drill bit occurred during the drilling process. A value of 1 flags a drilling process that encountered issues, which in this case is true when any of the subgroup failure modes are 1, while 0 indicates a successful drilling operation without any major failures.

    Subgroup failures: - Build-up edge failure (215x): Represented as a binary feature, a build-up edge failure indicates the occurrence of material accumulation on the cutting edge of the drill bit due to a combination of low cutting speeds and insufficient cooling. A value of 1 signifies the presence of this failure mode, while 0 denotes its absence. - Compression chips failure (344x): This binary feature captures the formation of compressed chips during drilling, resulting from the factors high feed rate, inadequate cooling and using an incompatible drill bit. A value of 1 indicates the occurrence of at least two of the three factors above, while 0 suggests a smooth drilling operation without compression chips. - Flank wear failure (278x): A binary feature representing the wear of the drill bit's flank due to a combination of high feed rates and low cutting speeds. A value of 1 indicates significant flank wear, affecting the drilling operation's accuracy and efficiency, while 0 denotes a wear-free operation. - Wrong drill bit failure (300x): As a binary feature, it indicates the use of an inappropriate drill bit for the material being drilled. A value of 1 signifies a mismatch, leading to potential drilling issues, while 0 indicates the correct drill bit usage.

  2. D

    Related Data for: Explainable AI for Property Prediction in Scrap-Based...

    • researchdata.ntu.edu.sg
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DR-NTU (Data) (2025). Related Data for: Explainable AI for Property Prediction in Scrap-Based Steel Production: A Public Data Implementation [Dataset]. http://doi.org/10.21979/N9/NYRGAA
    Explore at:
    txt(31), application/x-ipynb+json(1236302), png(162406), pdf(1903754), bin(294769117), tsv(12782)Available download formats
    Dataset updated
    Jul 8, 2025
    Dataset provided by
    DR-NTU (Data)
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This folder contains the supplementary information for the conference paper "Explainable AI for Property Prediction in Scrap-Based Steel Production: A Public Data Implementation" The datasets used are publicly available and referenced in the following: 1. Guo, S., Yu, J., Liu, X., Wang, C., & Jiang, Q. (2019). A predicting model for properties of steel using the industrial big data based on machine learning. Computational Materials Science, 160, 95-104.https://doi.org/10.17632/f6zsvbf28y.1 2. Dunn, A., Wang, Q., Ganose, A., Dopp, D., & Jain, A. (2020). Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm. npj Computational Materials, 6(1), 138. https://www.nature.com/articles/s41524-020-00406-3 3. MatNavi: Kong H. MatNavi: Mechanical properties of low-alloy steels; 2022. Accessed: 2025-05-09. https://www.kaggle.com/datasets/konghuanqing/matnavi-mechanical-properties-of-lowalloy-steels

  3. Predictive Maintenance Dataset (AI4I 2020)

    • kaggle.com
    Updated Nov 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephan Matzka (2022). Predictive Maintenance Dataset (AI4I 2020) [Dataset]. https://www.kaggle.com/datasets/stephanmatzka/predictive-maintenance-dataset-ai4i-2020
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 6, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Stephan Matzka
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Please note that this is the original dataset with additional information and proper attribution. There is at least one other version of this dataset on Kaggle that was uploaded without permission. Please be fair and attribute the original author. This synthetic dataset is modeled after an existing milling machine and consists of 10 000 data points from a stored as rows with 14 features in columns

    1. UID: unique identifier ranging from 1 to 10000
    2. product ID: consisting of a letter L, M, or H for low (50% of all products), medium (30%) and high (20%) as product quality variants and a variant-specific serial number
    3. type: just the product type L, M or H from column 2
    4. air temperature [K]: generated using a random walk process later normalized to a standard deviation of 2 K around 300 K
    5. process temperature [K]: generated using a random walk process normalized to a standard deviation of 1 K, added to the air temperature plus 10 K.
    6. rotational speed [rpm]: calculated from a power of 2860 W, overlaid with a normally distributed noise
    7. torque [Nm]: torque values are normally distributed around 40 Nm with a SD = 10 Nm and no negative values.
    8. tool wear [min]: The quality variants H/M/L add 5/3/2 minutes of tool wear to the used tool in the process.
    9. a 'machine failure' label that indicates, whether the machine has failed in this particular datapoint for any of the following failure modes are true.

    The machine failure consists of five independent failure modes 10. tool wear failure (TWF): the tool will be replaced of fail at a randomly selected tool wear time between 200 - 240 mins (120 times in our dataset). At this point in time, the tool is replaced 69 times, and fails 51 times (randomly assigned). 11. heat dissipation failure (HDF): heat dissipation causes a process failure, if the difference between air- and process temperature is below 8.6 K and the tools rotational speed is below 1380 rpm. This is the case for 115 data points. 12. power failure (PWF): the product of torque and rotational speed (in rad/s) equals the power required for the process. If this power is below 3500 W or above 9000 W, the process fails, which is the case 95 times in our dataset. 13. overstrain failure (OSF): if the product of tool wear and torque exceeds 11,000 minNm for the L product variant (12,000 M, 13,000 H), the process fails due to overstrain. This is true for 98 datapoints. 14. random failures (RNF): each process has a chance of 0,1 % to fail regardless of its process parameters. This is the case for only 5 datapoints, less than could be expected for 10,000 datapoints in our dataset. If at least one of the above failure modes is true, the process fails and the 'machine failure' label is set to 1. It is therefore not transparent to the machine learning method, which of the failure modes has caused the process to fail.

    This dataset is part of the following publication, please cite when using this dataset: S. Matzka, "Explainable Artificial Intelligence for Predictive Maintenance Applications," 2020 Third International Conference on Artificial Intelligence for Industries (AI4I), 2020, pp. 69-74, doi: 10.1109/AI4I49448.2020.00023.

    The image of the milling process is the work of Daniel Smyth @ Pexels: https://www.pexels.com/de-de/foto/industrie-herstellung-maschine-werkzeug-10406128/

  4. AMDP Dataset

    • kaggle.com
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DatasetEngineer (2025). AMDP Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/11172716
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 26, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    DatasetEngineer
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    AMDP Dataset – Age-Related Macular Degeneration Progression Dataset The AMDP dataset is a longitudinal ophthalmic dataset curated from anonymized electronic health records (EHR), diagnostic imaging, and genetic screening reports obtained from patients monitored at multiple ophthalmology centers affiliated with the Moorfields Eye Hospital Network (UK). It spans from January 2021 to February 2025, featuring 30-minute interval measurements to simulate high-frequency patient monitoring across real-world federated deployments.

    📊 Dataset Statistics Total Records: 72,528

    Number of Patients: 50

    Federated Clients: 5 (multi-site data collection)

    Features: 57 (multi-modal clinical, imaging, genetic, and longitudinal)

    Targets:

    AMD_Stage (0 = No AMD, 1 = Early, 2 = Intermediate, 3 = Late)

    AMD_Risk_Score (0.0 to 1.0)

    Time_To_Onset (months)

    🧩 Feature Categories 🧑‍⚕️ Demographics & Lifestyle Age (at each visit)

    Sex

    Ethnicity

    Smoking Status

    Alcohol Consumption

    Body Mass Index (BMI)

    Physical Activity Level

    Family History of AMD

    🩺 Medical History Hypertension

    Diabetes

    Cardiovascular Disease

    Medication Usage (Anti-VEGF, Aspirin, Statins)

    👁️ Clinical Ophthalmic Metrics Visual Acuity (Left and Right Eye)

    Intraocular Pressure (Left and Right)

    Retinal Thickness (Left and Right)

    Drusen Size (Left and Right)

    Drusen Type (Hard/Soft)

    Geographic Atrophy Presence

    Neovascularization Presence

    Lens Status (Natural, Cataract, IOL)

    🧪 Laboratory Metrics HDL, LDL, Total Cholesterol

    Triglycerides

    HbA1c (Glycated Hemoglobin)

    🧠 Imaging-Derived OCT Features Central Macular Thickness (CMT)

    RPE Elevation

    Intraretinal Fluid (IRF)

    Subretinal Fluid (SRF)

    ILM–RPE Distance

    Drusen Volume

    Drusen Reflectivity Index

    Ellipsoid Zone Integrity

    Choroidal Thickness

    Hyperreflective Foci (HRF) Count

    Radiomic Texture Features

    Optic Disc-Cup Ratio

    Retinal Vessel Tortuosity

    Lesion Size (if segmented)

    AI-derived Biological Age Estimation

    🧬 Genetic Biomarkers CFH (Y402H)

    ARMS2 (rs10490924)

    HTRA1 (rs11200638)

    C3 (rs2230199)

    APOE (ε2/ε3/ε4)

    Polygenic Risk Score (PRS)

    ⏳ Temporal Tracking & Longitudinal Data Time Between Visits

    Total Visits

    Visual Acuity Change Over Time

    Retinal Thickness Change

    Drusen Growth Rate

    Anti-VEGF or PDT Treatment Timeline

    First Diagnosis Date (if applicable)

    🏥 Federated Learning Context The dataset simulates federated learning scenarios by including:

    Client_ID (Five simulated federated hospital sites)

    Device_Type (OCT/Fundus hardware variability)

    Acquisition_Frequency (30-minute intervals)

    Site_Bias (to simulate distribution shifts across centers)

    📌 Use Case This dataset is ideal for:

    Multi-Task Learning: Stage classification, risk prediction, and time-to-onset forecasting

    Federated and Semi-Supervised Learning

    Explainable AI for Ophthalmology

    Privacy-Preserving Clinical AI

  5. f

    Environmental settings of the proposed system.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuvo Biswas; Rafid Mostafiz; Mohammad Shorif Uddin; Muhammad Shahin Uddin (2025). Environmental settings of the proposed system. [Dataset]. http://doi.org/10.1371/journal.pone.0324957.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 17, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Shuvo Biswas; Rafid Mostafiz; Mohammad Shorif Uddin; Muhammad Shahin Uddin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pneumonia, a severe lung infection caused by various viruses, presents significant challenges in diagnosis and treatment due to its similarities with other respiratory conditions. Additionally, the need to protect patient privacy complicates the sharing of sensitive clinical data. This study introduces FLPneXAINet, an effective framework that combines federated learning (FL) with deep learning (DL) and explainable AI (XAI) to securely and accurately predict pneumonia using chest X-ray (CXR) images. We utilized a benchmark dataset from Kaggle, comprising 8,402 CXR images (3,904 normal and 4,498 pneumonia). The dataset was preprocessed and augmented using a cycle-consistent generative adversarial (CycleGAN) network to increase the volume of training data. Three pre-trained DL models named VGG16, NASNetMobile, and MobileNet were employed to extract features from the augmented dataset. Further, four ensemble DL (EDL) models were used to enhance feature extraction. Feature optimization was performed using recursive feature elimination (RFE), analysis of variance (ANOVA), and random forest (RF) to select the most relevant features. These optimized features were then inputted into machine learning (ML) models, including K-nearest neighbor (KNN), naive bayes (NB), support vector machine (SVM), and RF, for pneumonia prediction. The performance of the models was evaluated in a FL environment, with the EDL network achieving the best results: accuracy 97.61%, F1 score 98.36%, recall 98.13%, and precision 98.59%. The framework’s predictions were further validated using two XAI techniques—Local Interpretable Model-Agnostic Explanations (LIME) and Grad-CAM. FLPneXAINet offers a robust solution for healthcare professionals to accurately diagnose pneumonia, ensuring timely treatment while safeguarding patient privacy.

  6. NLP_SKIN_DATA_PS_DD

    • kaggle.com
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HARINI SHREE R (2025). NLP_SKIN_DATA_PS_DD [Dataset]. http://doi.org/10.34740/kaggle/dsv/12368953
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    HARINI SHREE R
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📄 Context Skin diseases are among the most common health concerns worldwide, ranging from benign lesions like keratosis to serious conditions such as melanoma. Early and accurate diagnosis plays a vital role in preventing disease progression and improving patient outcomes. This dataset aims to assist in developing AI-driven dermatology tools by providing structured information on various skin diseases, their definitions, patient-described symptoms, and associated clinical images. 🔍 Sources The dataset is compiled from a combination of: Publicly available dermatological image repositories, such as the ISIC (International Skin Imaging Collaboration) archive, which contains labeled dermoscopic images of skin lesions. Clinical literature and dermatology textbooks, used to write concise disease definitions. Simulated patient statements, reflecting typical ways in which patients describe their skin conditions during clinical consultations. These were generated based on clinical case studies and patient interviews found in dermatology research papers. Synthetic aggregation: File names refer to images associated with each disease class, meant for easy integration with machine learning pipelines. 🌟 Inspiration This dataset was inspired by the growing need for: Explainable AI (XAI) in dermatology: Making machine learning models more understandable to clinicians and patients. Bridging the gap between clinical terminology and patient language: Helping AI models learn how real patients describe their symptoms, enhancing the usability of teledermatology tools. Supporting education and research: Assisting medical students, researchers, and AI developers in understanding skin diseases in both clinical and layman contexts. Enabling multi-modal learning: Combining text descriptions, disease definitions, and images to train more robust models that can reason across data types. 📄 Column Descriptions Disease Class - The name of the skin disease type (e.g., Actinic Keratosis, Melanoma, Benign Keratosis, etc.). There are 9 unique classes. Disease Definition - A clinical description explaining the nature and characteristics of the disease. Major Statement - Simulated patient descriptions or questions that reflect how individuals typically describe their symptoms. File Name - The corresponding image file name related to the disease case

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Raphael Wallsberger (2023). Explainable AI (XAI) Drilling Dataset [Dataset]. https://www.kaggle.com/datasets/raphaelwallsberger/xai-drilling-dataset
Organization logo

Explainable AI (XAI) Drilling Dataset

A synthetic dataset for failure mode analysis in drilling, optimized for XAI

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 24, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Raphael Wallsberger
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This dataset is part of the following publication at the TransAI 2023 conference: R. Wallsberger, R. Knauer, S. Matzka; "Explainable Artificial Intelligence in Mechanical Engineering: A Synthetic Dataset for Comprehensive Failure Mode Analysis" DOI: http://dx.doi.org/10.1109/TransAI60598.2023.00032

This is the original XAI Drilling dataset optimized for XAI purposes and it can be used to evaluate explanations of such algortihms. The dataset comprises 20,000 data points, i.e., drilling operations, stored as rows, 10 features, one binary main failure label, and 4 binary subgroup failure modes, stored in columns. The main failure rate is about 5.0 % for the whole dataset. The features that constitute this dataset are as follows:

  • ID: Every data point in the dataset is uniquely identifiable, thanks to the ID feature. This ensures traceability and easy referencing, especially when analyzing specific drilling scenarios or anomalies.
  • Cutting speed vc (m/min): The cutting speed is a pivotal parameter in drilling, influencing the efficiency and quality of the drilling process. It represents the speed at which the drill bit's cutting edge moves through the material.
  • Spindle speed n (1/min): This feature captures the rotational speed of the spindle or drill bit, respectively.
  • Feed f (mm/rev): Feed denotes the depth the drill bit penetrates into the material with each revolution. There is a balance between speed and precision, with higher feeds leading to faster drilling but potentially compromising hole quality.
  • Feed rate vf (mm/min): The feed rate is a measure of how quickly the material is fed to the drill bit. It is a determinant of the overall drilling time and influences the heat generated during the process.
  • Power Pc (kW): The power consumption during drilling can be indicative of the efficiency of the process and the wear state of the drill bit.
  • Cooling (%): Effective cooling is paramount in drilling, preventing overheating and reducing wear. This ordinal feature captures the cooling level applied, with four distinct states representing no cooling (0%), partial cooling (25% and 50%), and high to full cooling (75% and 100%).
  • Material: The type of material being drilled can significantly influence the drilling parameters and outcomes. This dataset encompasses three primary materials: C45K hot-rolled heat-treatable steel (EN 1.0503), cast iron GJL (EN GJL-250), and aluminum-silicon (AlSi) alloy (EN AC-42000), each presenting its unique challenges and considerations. The three materials are represented as “P (Steel)” for C45K, “K (Cast Iron)” for cast iron GJL and “N (Non-ferrous metal)” for AlSi alloy.
  • Drill Bit Type: Different materials often require specialized drill bits. This feature categorizes the type of drill bit used, ensuring compatibility with the material and optimizing the drilling process. It consists of three categories, which are based on the DIN 1836: “N” for C45K, “H” for cast iron and “W” for AlSi alloy [5].
  • Process time t (s): This feature captures the full duration of each drilling operation, providing insights into efficiency and potential bottlenecks.

  • Main failure: This binary feature indicates if any significant failure on the drill bit occurred during the drilling process. A value of 1 flags a drilling process that encountered issues, which in this case is true when any of the subgroup failure modes are 1, while 0 indicates a successful drilling operation without any major failures.

Subgroup failures: - Build-up edge failure (215x): Represented as a binary feature, a build-up edge failure indicates the occurrence of material accumulation on the cutting edge of the drill bit due to a combination of low cutting speeds and insufficient cooling. A value of 1 signifies the presence of this failure mode, while 0 denotes its absence. - Compression chips failure (344x): This binary feature captures the formation of compressed chips during drilling, resulting from the factors high feed rate, inadequate cooling and using an incompatible drill bit. A value of 1 indicates the occurrence of at least two of the three factors above, while 0 suggests a smooth drilling operation without compression chips. - Flank wear failure (278x): A binary feature representing the wear of the drill bit's flank due to a combination of high feed rates and low cutting speeds. A value of 1 indicates significant flank wear, affecting the drilling operation's accuracy and efficiency, while 0 denotes a wear-free operation. - Wrong drill bit failure (300x): As a binary feature, it indicates the use of an inappropriate drill bit for the material being drilled. A value of 1 signifies a mismatch, leading to potential drilling issues, while 0 indicates the correct drill bit usage.

Search
Clear search
Close search
Google apps
Main menu