6 datasets found

Explainable AI (XAI) Drilling Dataset
kaggle.com
Updated Aug 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raphael Wallsberger (2023). Explainable AI (XAI) Drilling Dataset [Dataset]. https://www.kaggle.com/datasets/raphaelwallsberger/xai-drilling-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 24, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Raphael Wallsberger
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset is part of the following publication at the TransAI 2023 conference: R. Wallsberger, R. Knauer, S. Matzka; "Explainable Artificial Intelligence in Mechanical Engineering: A Synthetic Dataset for Comprehensive Failure Mode Analysis" DOI: http://dx.doi.org/10.1109/TransAI60598.2023.00032

This is the original XAI Drilling dataset optimized for XAI purposes and it can be used to evaluate explanations of such algortihms. The dataset comprises 20,000 data points, i.e., drilling operations, stored as rows, 10 features, one binary main failure label, and 4 binary subgroup failure modes, stored in columns. The main failure rate is about 5.0 % for the whole dataset. The features that constitute this dataset are as follows:

ID: Every data point in the dataset is uniquely identifiable, thanks to the ID feature. This ensures traceability and easy referencing, especially when analyzing specific drilling scenarios or anomalies.

Cutting speed vc (m/min): The cutting speed is a pivotal parameter in drilling, influencing the efficiency and quality of the drilling process. It represents the speed at which the drill bit's cutting edge moves through the material.

Spindle speed n (1/min): This feature captures the rotational speed of the spindle or drill bit, respectively.

Feed f (mm/rev): Feed denotes the depth the drill bit penetrates into the material with each revolution. There is a balance between speed and precision, with higher feeds leading to faster drilling but potentially compromising hole quality.

Feed rate vf (mm/min): The feed rate is a measure of how quickly the material is fed to the drill bit. It is a determinant of the overall drilling time and influences the heat generated during the process.

Power Pc (kW): The power consumption during drilling can be indicative of the efficiency of the process and the wear state of the drill bit.

Cooling (%): Effective cooling is paramount in drilling, preventing overheating and reducing wear. This ordinal feature captures the cooling level applied, with four distinct states representing no cooling (0%), partial cooling (25% and 50%), and high to full cooling (75% and 100%).

Material: The type of material being drilled can significantly influence the drilling parameters and outcomes. This dataset encompasses three primary materials: C45K hot-rolled heat-treatable steel (EN 1.0503), cast iron GJL (EN GJL-250), and aluminum-silicon (AlSi) alloy (EN AC-42000), each presenting its unique challenges and considerations. The three materials are represented as “P (Steel)” for C45K, “K (Cast Iron)” for cast iron GJL and “N (Non-ferrous metal)” for AlSi alloy.

Drill Bit Type: Different materials often require specialized drill bits. This feature categorizes the type of drill bit used, ensuring compatibility with the material and optimizing the drilling process. It consists of three categories, which are based on the DIN 1836: “N” for C45K, “H” for cast iron and “W” for AlSi alloy [5].

Process time t (s): This feature captures the full duration of each drilling operation, providing insights into efficiency and potential bottlenecks.

Main failure: This binary feature indicates if any significant failure on the drill bit occurred during the drilling process. A value of 1 flags a drilling process that encountered issues, which in this case is true when any of the subgroup failure modes are 1, while 0 indicates a successful drilling operation without any major failures.

Subgroup failures: - Build-up edge failure (215x): Represented as a binary feature, a build-up edge failure indicates the occurrence of material accumulation on the cutting edge of the drill bit due to a combination of low cutting speeds and insufficient cooling. A value of 1 signifies the presence of this failure mode, while 0 denotes its absence. - Compression chips failure (344x): This binary feature captures the formation of compressed chips during drilling, resulting from the factors high feed rate, inadequate cooling and using an incompatible drill bit. A value of 1 indicates the occurrence of at least two of the three factors above, while 0 suggests a smooth drilling operation without compression chips. - Flank wear failure (278x): A binary feature representing the wear of the drill bit's flank due to a combination of high feed rates and low cutting speeds. A value of 1 indicates significant flank wear, affecting the drilling operation's accuracy and efficiency, while 0 denotes a wear-free operation. - Wrong drill bit failure (300x): As a binary feature, it indicates the use of an inappropriate drill bit for the material being drilled. A value of 1 signifies a mismatch, leading to potential drilling issues, while 0 indicates the correct drill bit usage.
D
Related Data for: Explainable AI for Property Prediction in Scrap-Based...
researchdata.ntu.edu.sg
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DR-NTU (Data) (2025). Related Data for: Explainable AI for Property Prediction in Scrap-Based Steel Production: A Public Data Implementation [Dataset]. http://doi.org/10.21979/N9/NYRGAA
Explore at:
txt(31), application/x-ipynb+json(1236302), png(162406), pdf(1903754), bin(294769117), tsv(12782)Available download formats
Unique identifier
https://doi.org/10.21979/N9/NYRGAA
Dataset updated
Jul 8, 2025
Dataset provided by
DR-NTU (Data)
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This folder contains the supplementary information for the conference paper "Explainable AI for Property Prediction in Scrap-Based Steel Production: A Public Data Implementation" The datasets used are publicly available and referenced in the following: 1. Guo, S., Yu, J., Liu, X., Wang, C., & Jiang, Q. (2019). A predicting model for properties of steel using the industrial big data based on machine learning. Computational Materials Science, 160, 95-104.https://doi.org/10.17632/f6zsvbf28y.1 2. Dunn, A., Wang, Q., Ganose, A., Dopp, D., & Jain, A. (2020). Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm. npj Computational Materials, 6(1), 138. https://www.nature.com/articles/s41524-020-00406-3 3. MatNavi: Kong H. MatNavi: Mechanical properties of low-alloy steels; 2022. Accessed: 2025-05-09. https://www.kaggle.com/datasets/konghuanqing/matnavi-mechanical-properties-of-lowalloy-steels
Predictive Maintenance Dataset (AI4I 2020)
kaggle.com
Updated Nov 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephan Matzka (2022). Predictive Maintenance Dataset (AI4I 2020) [Dataset]. https://www.kaggle.com/datasets/stephanmatzka/predictive-maintenance-dataset-ai4i-2020
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 6, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Stephan Matzka
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Please note that this is the original dataset with additional information and proper attribution. There is at least one other version of this dataset on Kaggle that was uploaded without permission. Please be fair and attribute the original author. This synthetic dataset is modeled after an existing milling machine and consists of 10 000 data points from a stored as rows with 14 features in columns

UID: unique identifier ranging from 1 to 10000

product ID: consisting of a letter L, M, or H for low (50% of all products), medium (30%) and high (20%) as product quality variants and a variant-specific serial number

type: just the product type L, M or H from column 2

air temperature [K]: generated using a random walk process later normalized to a standard deviation of 2 K around 300 K

process temperature [K]: generated using a random walk process normalized to a standard deviation of 1 K, added to the air temperature plus 10 K.

rotational speed [rpm]: calculated from a power of 2860 W, overlaid with a normally distributed noise

torque [Nm]: torque values are normally distributed around 40 Nm with a SD = 10 Nm and no negative values.

tool wear [min]: The quality variants H/M/L add 5/3/2 minutes of tool wear to the used tool in the process.

a 'machine failure' label that indicates, whether the machine has failed in this particular datapoint for any of the following failure modes are true.

The machine failure consists of five independent failure modes 10. tool wear failure (TWF): the tool will be replaced of fail at a randomly selected tool wear time between 200 - 240 mins (120 times in our dataset). At this point in time, the tool is replaced 69 times, and fails 51 times (randomly assigned). 11. heat dissipation failure (HDF): heat dissipation causes a process failure, if the difference between air- and process temperature is below 8.6 K and the tools rotational speed is below 1380 rpm. This is the case for 115 data points. 12. power failure (PWF): the product of torque and rotational speed (in rad/s) equals the power required for the process. If this power is below 3500 W or above 9000 W, the process fails, which is the case 95 times in our dataset. 13. overstrain failure (OSF): if the product of tool wear and torque exceeds 11,000 minNm for the L product variant (12,000 M, 13,000 H), the process fails due to overstrain. This is true for 98 datapoints. 14. random failures (RNF): each process has a chance of 0,1 % to fail regardless of its process parameters. This is the case for only 5 datapoints, less than could be expected for 10,000 datapoints in our dataset. If at least one of the above failure modes is true, the process fails and the 'machine failure' label is set to 1. It is therefore not transparent to the machine learning method, which of the failure modes has caused the process to fail.

This dataset is part of the following publication, please cite when using this dataset: S. Matzka, "Explainable Artificial Intelligence for Predictive Maintenance Applications," 2020 Third International Conference on Artificial Intelligence for Industries (AI4I), 2020, pp. 69-74, doi: 10.1109/AI4I49448.2020.00023.

The image of the milling process is the work of Daniel Smyth @ Pexels: https://www.pexels.com/de-de/foto/industrie-herstellung-maschine-werkzeug-10406128/
AMDP Dataset
kaggle.com
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DatasetEngineer (2025). AMDP Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/11172716
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/11172716
Dataset updated
Mar 26, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
DatasetEngineer
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
AMDP Dataset – Age-Related Macular Degeneration Progression Dataset The AMDP dataset is a longitudinal ophthalmic dataset curated from anonymized electronic health records (EHR), diagnostic imaging, and genetic screening reports obtained from patients monitored at multiple ophthalmology centers affiliated with the Moorfields Eye Hospital Network (UK). It spans from January 2021 to February 2025, featuring 30-minute interval measurements to simulate high-frequency patient monitoring across real-world federated deployments.

📊 Dataset Statistics Total Records: 72,528

Number of Patients: 50

Federated Clients: 5 (multi-site data collection)

Features: 57 (multi-modal clinical, imaging, genetic, and longitudinal)

Targets:

AMD_Stage (0 = No AMD, 1 = Early, 2 = Intermediate, 3 = Late)

AMD_Risk_Score (0.0 to 1.0)

Time_To_Onset (months)

🧩 Feature Categories 🧑‍⚕️ Demographics & Lifestyle Age (at each visit)

Sex

Ethnicity

Smoking Status

Alcohol Consumption

Body Mass Index (BMI)

Physical Activity Level

Family History of AMD

🩺 Medical History Hypertension

Diabetes

Cardiovascular Disease

Medication Usage (Anti-VEGF, Aspirin, Statins)

👁️ Clinical Ophthalmic Metrics Visual Acuity (Left and Right Eye)

Intraocular Pressure (Left and Right)

Retinal Thickness (Left and Right)

Drusen Size (Left and Right)

Drusen Type (Hard/Soft)

Geographic Atrophy Presence

Neovascularization Presence

Lens Status (Natural, Cataract, IOL)

🧪 Laboratory Metrics HDL, LDL, Total Cholesterol

Triglycerides

HbA1c (Glycated Hemoglobin)

🧠 Imaging-Derived OCT Features Central Macular Thickness (CMT)

RPE Elevation

Intraretinal Fluid (IRF)

Subretinal Fluid (SRF)

ILM–RPE Distance

Drusen Volume

Drusen Reflectivity Index

Ellipsoid Zone Integrity

Choroidal Thickness

Hyperreflective Foci (HRF) Count

Radiomic Texture Features

Optic Disc-Cup Ratio

Retinal Vessel Tortuosity

Lesion Size (if segmented)

AI-derived Biological Age Estimation

🧬 Genetic Biomarkers CFH (Y402H)

ARMS2 (rs10490924)

HTRA1 (rs11200638)

C3 (rs2230199)

APOE (ε2/ε3/ε4)

Polygenic Risk Score (PRS)

⏳ Temporal Tracking & Longitudinal Data Time Between Visits

Total Visits

Visual Acuity Change Over Time

Retinal Thickness Change

Drusen Growth Rate

Anti-VEGF or PDT Treatment Timeline

First Diagnosis Date (if applicable)

🏥 Federated Learning Context The dataset simulates federated learning scenarios by including:

Client_ID (Five simulated federated hospital sites)

Device_Type (OCT/Fundus hardware variability)

Acquisition_Frequency (30-minute intervals)

Site_Bias (to simulate distribution shifts across centers)

📌 Use Case This dataset is ideal for:

Multi-Task Learning: Stage classification, risk prediction, and time-to-onset forecasting

Federated and Semi-Supervised Learning

Explainable AI for Ophthalmology

Privacy-Preserving Clinical AI
f
Environmental settings of the proposed system.
figshare.com
plos.figshare.com
xls
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuvo Biswas; Rafid Mostafiz; Mohammad Shorif Uddin; Muhammad Shahin Uddin (2025). Environmental settings of the proposed system. [Dataset]. http://doi.org/10.1371/journal.pone.0324957.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0324957.t005
Dataset updated
Jul 17, 2025
Dataset provided by
PLOS ONE
Authors
Shuvo Biswas; Rafid Mostafiz; Mohammad Shorif Uddin; Muhammad Shahin Uddin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pneumonia, a severe lung infection caused by various viruses, presents significant challenges in diagnosis and treatment due to its similarities with other respiratory conditions. Additionally, the need to protect patient privacy complicates the sharing of sensitive clinical data. This study introduces FLPneXAINet, an effective framework that combines federated learning (FL) with deep learning (DL) and explainable AI (XAI) to securely and accurately predict pneumonia using chest X-ray (CXR) images. We utilized a benchmark dataset from Kaggle, comprising 8,402 CXR images (3,904 normal and 4,498 pneumonia). The dataset was preprocessed and augmented using a cycle-consistent generative adversarial (CycleGAN) network to increase the volume of training data. Three pre-trained DL models named VGG16, NASNetMobile, and MobileNet were employed to extract features from the augmented dataset. Further, four ensemble DL (EDL) models were used to enhance feature extraction. Feature optimization was performed using recursive feature elimination (RFE), analysis of variance (ANOVA), and random forest (RF) to select the most relevant features. These optimized features were then inputted into machine learning (ML) models, including K-nearest neighbor (KNN), naive bayes (NB), support vector machine (SVM), and RF, for pneumonia prediction. The performance of the models was evaluated in a FL environment, with the EDL network achieving the best results: accuracy 97.61%, F1 score 98.36%, recall 98.13%, and precision 98.59%. The framework’s predictions were further validated using two XAI techniques—Local Interpretable Model-Agnostic Explanations (LIME) and Grad-CAM. FLPneXAINet offers a robust solution for healthcare professionals to accurately diagnose pneumonia, ensuring timely treatment while safeguarding patient privacy.
NLP_SKIN_DATA_PS_DD
kaggle.com
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HARINI SHREE R (2025). NLP_SKIN_DATA_PS_DD [Dataset]. http://doi.org/10.34740/kaggle/dsv/12368953
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/12368953
Dataset updated
Jul 4, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
HARINI SHREE R
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
📄 Context Skin diseases are among the most common health concerns worldwide, ranging from benign lesions like keratosis to serious conditions such as melanoma. Early and accurate diagnosis plays a vital role in preventing disease progression and improving patient outcomes. This dataset aims to assist in developing AI-driven dermatology tools by providing structured information on various skin diseases, their definitions, patient-described symptoms, and associated clinical images. 🔍 Sources The dataset is compiled from a combination of: Publicly available dermatological image repositories, such as the ISIC (International Skin Imaging Collaboration) archive, which contains labeled dermoscopic images of skin lesions. Clinical literature and dermatology textbooks, used to write concise disease definitions. Simulated patient statements, reflecting typical ways in which patients describe their skin conditions during clinical consultations. These were generated based on clinical case studies and patient interviews found in dermatology research papers. Synthetic aggregation: File names refer to images associated with each disease class, meant for easy integration with machine learning pipelines. 🌟 Inspiration This dataset was inspired by the growing need for: Explainable AI (XAI) in dermatology: Making machine learning models more understandable to clinicians and patients. Bridging the gap between clinical terminology and patient language: Helping AI models learn how real patients describe their symptoms, enhancing the usability of teledermatology tools. Supporting education and research: Assisting medical students, researchers, and AI developers in understanding skin diseases in both clinical and layman contexts. Enabling multi-modal learning: Combining text descriptions, disease definitions, and images to train more robust models that can reason across data types. 📄 Column Descriptions Disease Class - The name of the skin disease type (e.g., Actinic Keratosis, Melanoma, Benign Keratosis, etc.). There are 9 unique classes. Disease Definition - A clinical description explaining the nature and characteristics of the disease. Major Statement - Simulated patient descriptions or questions that reflect how individuals typically describe their symptoms. File Name - The corresponding image file name related to the disease case
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Raphael Wallsberger (2023). Explainable AI (XAI) Drilling Dataset [Dataset]. https://www.kaggle.com/datasets/raphaelwallsberger/xai-drilling-dataset

Explainable AI (XAI) Drilling Dataset

A synthetic dataset for failure mode analysis in drilling, optimized for XAI

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 24, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Raphael Wallsberger

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This dataset is part of the following publication at the TransAI 2023 conference: R. Wallsberger, R. Knauer, S. Matzka; "Explainable Artificial Intelligence in Mechanical Engineering: A Synthetic Dataset for Comprehensive Failure Mode Analysis" DOI: http://dx.doi.org/10.1109/TransAI60598.2023.00032

This is the original XAI Drilling dataset optimized for XAI purposes and it can be used to evaluate explanations of such algortihms. The dataset comprises 20,000 data points, i.e., drilling operations, stored as rows, 10 features, one binary main failure label, and 4 binary subgroup failure modes, stored in columns. The main failure rate is about 5.0 % for the whole dataset. The features that constitute this dataset are as follows:

ID: Every data point in the dataset is uniquely identifiable, thanks to the ID feature. This ensures traceability and easy referencing, especially when analyzing specific drilling scenarios or anomalies.
Cutting speed vc (m/min): The cutting speed is a pivotal parameter in drilling, influencing the efficiency and quality of the drilling process. It represents the speed at which the drill bit's cutting edge moves through the material.
Spindle speed n (1/min): This feature captures the rotational speed of the spindle or drill bit, respectively.
Feed f (mm/rev): Feed denotes the depth the drill bit penetrates into the material with each revolution. There is a balance between speed and precision, with higher feeds leading to faster drilling but potentially compromising hole quality.
Feed rate vf (mm/min): The feed rate is a measure of how quickly the material is fed to the drill bit. It is a determinant of the overall drilling time and influences the heat generated during the process.
Power Pc (kW): The power consumption during drilling can be indicative of the efficiency of the process and the wear state of the drill bit.
Cooling (%): Effective cooling is paramount in drilling, preventing overheating and reducing wear. This ordinal feature captures the cooling level applied, with four distinct states representing no cooling (0%), partial cooling (25% and 50%), and high to full cooling (75% and 100%).
Material: The type of material being drilled can significantly influence the drilling parameters and outcomes. This dataset encompasses three primary materials: C45K hot-rolled heat-treatable steel (EN 1.0503), cast iron GJL (EN GJL-250), and aluminum-silicon (AlSi) alloy (EN AC-42000), each presenting its unique challenges and considerations. The three materials are represented as “P (Steel)” for C45K, “K (Cast Iron)” for cast iron GJL and “N (Non-ferrous metal)” for AlSi alloy.
Drill Bit Type: Different materials often require specialized drill bits. This feature categorizes the type of drill bit used, ensuring compatibility with the material and optimizing the drilling process. It consists of three categories, which are based on the DIN 1836: “N” for C45K, “H” for cast iron and “W” for AlSi alloy [5].
Process time t (s): This feature captures the full duration of each drilling operation, providing insights into efficiency and potential bottlenecks.
Main failure: This binary feature indicates if any significant failure on the drill bit occurred during the drilling process. A value of 1 flags a drilling process that encountered issues, which in this case is true when any of the subgroup failure modes are 1, while 0 indicates a successful drilling operation without any major failures.

Subgroup failures: - Build-up edge failure (215x): Represented as a binary feature, a build-up edge failure indicates the occurrence of material accumulation on the cutting edge of the drill bit due to a combination of low cutting speeds and insufficient cooling. A value of 1 signifies the presence of this failure mode, while 0 denotes its absence. - Compression chips failure (344x): This binary feature captures the formation of compressed chips during drilling, resulting from the factors high feed rate, inadequate cooling and using an incompatible drill bit. A value of 1 indicates the occurrence of at least two of the three factors above, while 0 suggests a smooth drilling operation without compression chips. - Flank wear failure (278x): A binary feature representing the wear of the drill bit's flank due to a combination of high feed rates and low cutting speeds. A value of 1 indicates significant flank wear, affecting the drilling operation's accuracy and efficiency, while 0 denotes a wear-free operation. - Wrong drill bit failure (300x): As a binary feature, it indicates the use of an inappropriate drill bit for the material being drilled. A value of 1 signifies a mismatch, leading to potential drilling issues, while 0 indicates the correct drill bit usage.

Clear search

Close search

Google apps

Main menu

Explainable AI (XAI) Drilling Dataset

Related Data for: Explainable AI for Property Prediction in Scrap-Based...

Predictive Maintenance Dataset (AI4I 2020)

AMDP Dataset

Environmental settings of the proposed system.

NLP_SKIN_DATA_PS_DD

Explainable AI (XAI) Drilling Dataset

A synthetic dataset for failure mode analysis in drilling, optimized for XAI