Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is part of the following publication at the TransAI 2023 conference: R. Wallsberger, R. Knauer, S. Matzka; "Explainable Artificial Intelligence in Mechanical Engineering: A Synthetic Dataset for Comprehensive Failure Mode Analysis" DOI: http://dx.doi.org/10.1109/TransAI60598.2023.00032
This is the original XAI Drilling dataset optimized for XAI purposes and it can be used to evaluate explanations of such algortihms. The dataset comprises 20,000 data points, i.e., drilling operations, stored as rows, 10 features, one binary main failure label, and 4 binary subgroup failure modes, stored in columns. The main failure rate is about 5.0 % for the whole dataset. The features that constitute this dataset are as follows:
Process time t (s): This feature captures the full duration of each drilling operation, providing insights into efficiency and potential bottlenecks.
Main failure: This binary feature indicates if any significant failure on the drill bit occurred during the drilling process. A value of 1 flags a drilling process that encountered issues, which in this case is true when any of the subgroup failure modes are 1, while 0 indicates a successful drilling operation without any major failures.
Subgroup failures: - Build-up edge failure (215x): Represented as a binary feature, a build-up edge failure indicates the occurrence of material accumulation on the cutting edge of the drill bit due to a combination of low cutting speeds and insufficient cooling. A value of 1 signifies the presence of this failure mode, while 0 denotes its absence. - Compression chips failure (344x): This binary feature captures the formation of compressed chips during drilling, resulting from the factors high feed rate, inadequate cooling and using an incompatible drill bit. A value of 1 indicates the occurrence of at least two of the three factors above, while 0 suggests a smooth drilling operation without compression chips. - Flank wear failure (278x): A binary feature representing the wear of the drill bit's flank due to a combination of high feed rates and low cutting speeds. A value of 1 indicates significant flank wear, affecting the drilling operation's accuracy and efficiency, while 0 denotes a wear-free operation. - Wrong drill bit failure (300x): As a binary feature, it indicates the use of an inappropriate drill bit for the material being drilled. A value of 1 signifies a mismatch, leading to potential drilling issues, while 0 indicates the correct drill bit usage.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This folder contains the supplementary information for the conference paper "Explainable AI for Property Prediction in Scrap-Based Steel Production: A Public Data Implementation" The datasets used are publicly available and referenced in the following: 1. Guo, S., Yu, J., Liu, X., Wang, C., & Jiang, Q. (2019). A predicting model for properties of steel using the industrial big data based on machine learning. Computational Materials Science, 160, 95-104.https://doi.org/10.17632/f6zsvbf28y.1 2. Dunn, A., Wang, Q., Ganose, A., Dopp, D., & Jain, A. (2020). Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm. npj Computational Materials, 6(1), 138. https://www.nature.com/articles/s41524-020-00406-3 3. MatNavi: Kong H. MatNavi: Mechanical properties of low-alloy steels; 2022. Accessed: 2025-05-09. https://www.kaggle.com/datasets/konghuanqing/matnavi-mechanical-properties-of-lowalloy-steels
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Please note that this is the original dataset with additional information and proper attribution. There is at least one other version of this dataset on Kaggle that was uploaded without permission. Please be fair and attribute the original author. This synthetic dataset is modeled after an existing milling machine and consists of 10 000 data points from a stored as rows with 14 features in columns
The machine failure consists of five independent failure modes 10. tool wear failure (TWF): the tool will be replaced of fail at a randomly selected tool wear time between 200 - 240 mins (120 times in our dataset). At this point in time, the tool is replaced 69 times, and fails 51 times (randomly assigned). 11. heat dissipation failure (HDF): heat dissipation causes a process failure, if the difference between air- and process temperature is below 8.6 K and the tools rotational speed is below 1380 rpm. This is the case for 115 data points. 12. power failure (PWF): the product of torque and rotational speed (in rad/s) equals the power required for the process. If this power is below 3500 W or above 9000 W, the process fails, which is the case 95 times in our dataset. 13. overstrain failure (OSF): if the product of tool wear and torque exceeds 11,000 minNm for the L product variant (12,000 M, 13,000 H), the process fails due to overstrain. This is true for 98 datapoints. 14. random failures (RNF): each process has a chance of 0,1 % to fail regardless of its process parameters. This is the case for only 5 datapoints, less than could be expected for 10,000 datapoints in our dataset. If at least one of the above failure modes is true, the process fails and the 'machine failure' label is set to 1. It is therefore not transparent to the machine learning method, which of the failure modes has caused the process to fail.
This dataset is part of the following publication, please cite when using this dataset: S. Matzka, "Explainable Artificial Intelligence for Predictive Maintenance Applications," 2020 Third International Conference on Artificial Intelligence for Industries (AI4I), 2020, pp. 69-74, doi: 10.1109/AI4I49448.2020.00023.
The image of the milling process is the work of Daniel Smyth @ Pexels: https://www.pexels.com/de-de/foto/industrie-herstellung-maschine-werkzeug-10406128/
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
AMDP Dataset – Age-Related Macular Degeneration Progression Dataset The AMDP dataset is a longitudinal ophthalmic dataset curated from anonymized electronic health records (EHR), diagnostic imaging, and genetic screening reports obtained from patients monitored at multiple ophthalmology centers affiliated with the Moorfields Eye Hospital Network (UK). It spans from January 2021 to February 2025, featuring 30-minute interval measurements to simulate high-frequency patient monitoring across real-world federated deployments.
📊 Dataset Statistics Total Records: 72,528
Number of Patients: 50
Federated Clients: 5 (multi-site data collection)
Features: 57 (multi-modal clinical, imaging, genetic, and longitudinal)
Targets:
AMD_Stage (0 = No AMD, 1 = Early, 2 = Intermediate, 3 = Late)
AMD_Risk_Score (0.0 to 1.0)
Time_To_Onset (months)
🧩 Feature Categories 🧑⚕️ Demographics & Lifestyle Age (at each visit)
Sex
Ethnicity
Smoking Status
Alcohol Consumption
Body Mass Index (BMI)
Physical Activity Level
Family History of AMD
🩺 Medical History Hypertension
Diabetes
Cardiovascular Disease
Medication Usage (Anti-VEGF, Aspirin, Statins)
👁️ Clinical Ophthalmic Metrics Visual Acuity (Left and Right Eye)
Intraocular Pressure (Left and Right)
Retinal Thickness (Left and Right)
Drusen Size (Left and Right)
Drusen Type (Hard/Soft)
Geographic Atrophy Presence
Neovascularization Presence
Lens Status (Natural, Cataract, IOL)
🧪 Laboratory Metrics HDL, LDL, Total Cholesterol
Triglycerides
HbA1c (Glycated Hemoglobin)
🧠 Imaging-Derived OCT Features Central Macular Thickness (CMT)
RPE Elevation
Intraretinal Fluid (IRF)
Subretinal Fluid (SRF)
ILM–RPE Distance
Drusen Volume
Drusen Reflectivity Index
Ellipsoid Zone Integrity
Choroidal Thickness
Hyperreflective Foci (HRF) Count
Radiomic Texture Features
Optic Disc-Cup Ratio
Retinal Vessel Tortuosity
Lesion Size (if segmented)
AI-derived Biological Age Estimation
🧬 Genetic Biomarkers CFH (Y402H)
ARMS2 (rs10490924)
HTRA1 (rs11200638)
C3 (rs2230199)
APOE (ε2/ε3/ε4)
Polygenic Risk Score (PRS)
⏳ Temporal Tracking & Longitudinal Data Time Between Visits
Total Visits
Visual Acuity Change Over Time
Retinal Thickness Change
Drusen Growth Rate
Anti-VEGF or PDT Treatment Timeline
First Diagnosis Date (if applicable)
🏥 Federated Learning Context The dataset simulates federated learning scenarios by including:
Client_ID (Five simulated federated hospital sites)
Device_Type (OCT/Fundus hardware variability)
Acquisition_Frequency (30-minute intervals)
Site_Bias (to simulate distribution shifts across centers)
📌 Use Case This dataset is ideal for:
Multi-Task Learning: Stage classification, risk prediction, and time-to-onset forecasting
Federated and Semi-Supervised Learning
Explainable AI for Ophthalmology
Privacy-Preserving Clinical AI
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pneumonia, a severe lung infection caused by various viruses, presents significant challenges in diagnosis and treatment due to its similarities with other respiratory conditions. Additionally, the need to protect patient privacy complicates the sharing of sensitive clinical data. This study introduces FLPneXAINet, an effective framework that combines federated learning (FL) with deep learning (DL) and explainable AI (XAI) to securely and accurately predict pneumonia using chest X-ray (CXR) images. We utilized a benchmark dataset from Kaggle, comprising 8,402 CXR images (3,904 normal and 4,498 pneumonia). The dataset was preprocessed and augmented using a cycle-consistent generative adversarial (CycleGAN) network to increase the volume of training data. Three pre-trained DL models named VGG16, NASNetMobile, and MobileNet were employed to extract features from the augmented dataset. Further, four ensemble DL (EDL) models were used to enhance feature extraction. Feature optimization was performed using recursive feature elimination (RFE), analysis of variance (ANOVA), and random forest (RF) to select the most relevant features. These optimized features were then inputted into machine learning (ML) models, including K-nearest neighbor (KNN), naive bayes (NB), support vector machine (SVM), and RF, for pneumonia prediction. The performance of the models was evaluated in a FL environment, with the EDL network achieving the best results: accuracy 97.61%, F1 score 98.36%, recall 98.13%, and precision 98.59%. The framework’s predictions were further validated using two XAI techniques—Local Interpretable Model-Agnostic Explanations (LIME) and Grad-CAM. FLPneXAINet offers a robust solution for healthcare professionals to accurately diagnose pneumonia, ensuring timely treatment while safeguarding patient privacy.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
📄 Context Skin diseases are among the most common health concerns worldwide, ranging from benign lesions like keratosis to serious conditions such as melanoma. Early and accurate diagnosis plays a vital role in preventing disease progression and improving patient outcomes. This dataset aims to assist in developing AI-driven dermatology tools by providing structured information on various skin diseases, their definitions, patient-described symptoms, and associated clinical images. 🔍 Sources The dataset is compiled from a combination of: Publicly available dermatological image repositories, such as the ISIC (International Skin Imaging Collaboration) archive, which contains labeled dermoscopic images of skin lesions. Clinical literature and dermatology textbooks, used to write concise disease definitions. Simulated patient statements, reflecting typical ways in which patients describe their skin conditions during clinical consultations. These were generated based on clinical case studies and patient interviews found in dermatology research papers. Synthetic aggregation: File names refer to images associated with each disease class, meant for easy integration with machine learning pipelines. 🌟 Inspiration This dataset was inspired by the growing need for: Explainable AI (XAI) in dermatology: Making machine learning models more understandable to clinicians and patients. Bridging the gap between clinical terminology and patient language: Helping AI models learn how real patients describe their symptoms, enhancing the usability of teledermatology tools. Supporting education and research: Assisting medical students, researchers, and AI developers in understanding skin diseases in both clinical and layman contexts. Enabling multi-modal learning: Combining text descriptions, disease definitions, and images to train more robust models that can reason across data types. 📄 Column Descriptions Disease Class - The name of the skin disease type (e.g., Actinic Keratosis, Melanoma, Benign Keratosis, etc.). There are 9 unique classes. Disease Definition - A clinical description explaining the nature and characteristics of the disease. Major Statement - Simulated patient descriptions or questions that reflect how individuals typically describe their symptoms. File Name - The corresponding image file name related to the disease case
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is part of the following publication at the TransAI 2023 conference: R. Wallsberger, R. Knauer, S. Matzka; "Explainable Artificial Intelligence in Mechanical Engineering: A Synthetic Dataset for Comprehensive Failure Mode Analysis" DOI: http://dx.doi.org/10.1109/TransAI60598.2023.00032
This is the original XAI Drilling dataset optimized for XAI purposes and it can be used to evaluate explanations of such algortihms. The dataset comprises 20,000 data points, i.e., drilling operations, stored as rows, 10 features, one binary main failure label, and 4 binary subgroup failure modes, stored in columns. The main failure rate is about 5.0 % for the whole dataset. The features that constitute this dataset are as follows:
Process time t (s): This feature captures the full duration of each drilling operation, providing insights into efficiency and potential bottlenecks.
Main failure: This binary feature indicates if any significant failure on the drill bit occurred during the drilling process. A value of 1 flags a drilling process that encountered issues, which in this case is true when any of the subgroup failure modes are 1, while 0 indicates a successful drilling operation without any major failures.
Subgroup failures: - Build-up edge failure (215x): Represented as a binary feature, a build-up edge failure indicates the occurrence of material accumulation on the cutting edge of the drill bit due to a combination of low cutting speeds and insufficient cooling. A value of 1 signifies the presence of this failure mode, while 0 denotes its absence. - Compression chips failure (344x): This binary feature captures the formation of compressed chips during drilling, resulting from the factors high feed rate, inadequate cooling and using an incompatible drill bit. A value of 1 indicates the occurrence of at least two of the three factors above, while 0 suggests a smooth drilling operation without compression chips. - Flank wear failure (278x): A binary feature representing the wear of the drill bit's flank due to a combination of high feed rates and low cutting speeds. A value of 1 indicates significant flank wear, affecting the drilling operation's accuracy and efficiency, while 0 denotes a wear-free operation. - Wrong drill bit failure (300x): As a binary feature, it indicates the use of an inappropriate drill bit for the material being drilled. A value of 1 signifies a mismatch, leading to potential drilling issues, while 0 indicates the correct drill bit usage.