https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Pragyan AI and DS School
Released under CC0: Public Domain
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Preventive Maintenance for Marine Engines: Data-Driven Insights
Introduction:
Marine engine failures can lead to costly downtime, safety risks and operational inefficiencies. This project leverages machine learning to predict maintenance needs, helping ship operators prevent unexpected breakdowns. Using a simulated dataset, we analyze key engine parameters and develop predictive models to classify maintenance status into three categories: Normal, Requires Maintenance, and Critical.
Overview This project explores preventive maintenance strategies for marine engines by analyzing operational data and applying machine learning techniques.
Key steps include: 1. Data Simulation: Creating a realistic dataset with engine performance metrics. 2. Exploratory Data Analysis (EDA): Understanding trends and patterns in engine behavior. 3. Model Training & Evaluation: Comparing machine learning models (Decision Tree, Random Forest, XGBoost) to predict maintenance needs. 4. Hyperparameter Tuning: Using GridSearchCV to optimize model performance.
Tools Used 1. Python: Data processing, analysis and modeling 2. Pandas & NumPy: Data manipulation 3. Scikit-Learn & XGBoost: Machine learning model training 4. Matplotlib & Seaborn: Data visualization
Skills Demonstrated ✔ Data Simulation & Preprocessing ✔ Exploratory Data Analysis (EDA) ✔ Feature Engineering & Encoding ✔ Supervised Machine Learning (Classification) ✔ Model Evaluation & Hyperparameter Tuning
Key Insights & Findings 📌 Engine Temperature & Vibration Level: Strong indicators of potential failures. 📌 Random Forest vs. XGBoost: After hyperparameter tuning, both models achieved comparable performance, with Random Forest performing slightly better. 📌 Maintenance Status Distribution: Balanced dataset ensures unbiased model training. 📌 Failure Modes: The most common issues were Mechanical Wear & Oil Leakage, aligning with real-world engine failure trends.
Challenges Faced 🚧 Simulating Realistic Data: Ensuring the dataset reflects real-world marine engine behavior was a key challenge. 🚧 Model Performance: The accuracy was limited (~35%) due to the complexity of failure prediction. 🚧 Feature Selection: Identifying the most impactful features required extensive analysis.
Call to Action 🔍 Explore the Dataset & Notebook: Try running different models and tweaking hyperparameters. 📊 Extend the Analysis: Incorporate additional sensor data or alternative machine learning techniques. 🚀 Real-World Application: This approach can be adapted for industrial machinery, aircraft engines, and power plants.
Dataset Overview This dataset contains 10,000 data points with 14 features each, capturing various aspects of machine operations. Here is a brief description of the columns:
UID: Unique identifier ranging from 1 to 10,000. Product ID: Indicates product quality variant (Low, Medium, High) and a serial number. Air Temperature [K]: Generated using a random walk process, normalized to a standard deviation of 2 K around 300 K. Process Temperature [K]: Generated using a random walk process, normalized to a standard deviation of 1 K, added to the air temperature plus 10 K. Rotational Speed [rpm]: Calculated from a power of 2860 W with normally distributed noise. Torque [Nm]: Normally distributed around 40 Nm with σ = 10 Nm, with no negative values. Tool Wear [min]: The quality variants (H/M/L) add 5/3/2 minutes of tool wear to the used tool. Machine Failure: Indicates whether the machine failed (1) or not (0) due to any of the following failure modes. Failure Modes The machine failure consists of five independent failure modes:
Tool Wear Failure (TWF): Occurs when tool wear time is between 200 – 240 mins. Heat Dissipation Failure (HDF): Happens if the difference between air and process temperature is below 8.6 K and rotational speed is below 1380 rpm. Power Failure (PWF): Occurs if the product of torque and rotational speed (in rad/s) is below 3500 W or above 9000 W. Overstrain Failure (OSF): Occurs if the product of tool wear and torque exceeds 11,000 minNm for the L product variant, 12,000 for M, and 13,000 for H. Random Failures (RNF): Each process has a 0.1% chance of failure regardless of process parameters. If at least one of these failure modes is true, the process fails, and the 'machine failure' label is set to 1.
This dataset was created by Amith M
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset could include various features and measurements related to the engine health of vehicles, such as engine RPM, temperature, pressure, and other sensor data. It may also include metadata on the vehicle, such as make, model, year, and mileage.
One potential project using this dataset could be to build a predictive maintenance model for automotive engines. By analyzing the patterns and trends in the data, machine learning algorithms could be trained to predict when an engine is likely to require maintenance or repair. This could help vehicle owners and mechanics proactively address potential issues before they become more severe, leading to better vehicle performance and longer engine lifetimes.
Another potential use for this dataset could be to analyze the performance of different types of engines and vehicles. Researchers could use the data to compare the performance of engines from different manufacturers, for example, or to evaluate the effectiveness of different maintenance strategies. This could help drive innovation and improvements in the automotive industry.
This dataset was created by Complex Infinite Solutions
Released under Other (specified in description)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 43 .wav files of approximately 10 seconds each, with a 16 kHz sampling frequency containing the sound of four A2212 BLDC motors submitted to different categories: healthy motors, propeller failure and bearing failure. These audio files may be useful for signal processing and ML applications in PdM.
The dataset contains 43 recordings of four different A2212 Brushless DC motors under different operation modes. Motors 1 and 2 were submitted to both normal circumstances and broken propeller fault, while motor 3 and 4 provide information only for the healthy and bearing fault categories respectively. Each motor was connected to a 30 A Electronic Speed Controller, supplied by a 11.1 V-2100 mAh LiPo battery; the speed of the motor was controlled using an ESP32, and this procedure can be performed in two manners: the first, is by reading a potentiometer and writing its value to the ESC, and the second, more IoT oriented is by connecting the ESP32 to a Blynk server, using a slider to write a value to the ESC. This second method offers more stability and control over the intervals but also is very dependent on the internet latency. Ten seconds of audio at 16 kHz are recorded using Audacity as an open-source software and the speed of the motors is incremented a fixed amount according to the category. Fig. 1 provides a scheme of the experimental setup used to obtain the audio recordings.
Original Source: https://data.mendeley.com/datasets/j4yr5fmhv4/1
Original Publication: https://www.data-in-brief.com/article/S2352-3409(23)00669-8/fulltext
Original Publication citation entry:
@article{estacio2023dataset,
title={Dataset of audio signals from brushless DC motors for predictive maintenance},
author={Estacio, Rommel Stiward Prieto and Montenegro, Diego Alberto Bravo and Rodas, Carlos Felipe Rengifo},
journal={Data in Brief},
volume={50},
pages={109569},
year={2023},
publisher={Elsevier}
}
This dataset is designed for developing predictive maintenance models for industrial equipment. It includes real-time sensor data capturing key operational parameters such as temperature, vibration, pressure, and RPM. The goal is to predict whether maintenance is required based on these metrics. The data can be used for anomaly detection, predictive maintenance modeling, and time series analysis.
Columns:
Timestamp: The date and time when the data was recorded. Temperature (°C): Temperature of the equipment in degrees Celsius. Vibration (mm/s): Vibration level measured in millimeters per second. Pressure (Pa): Pressure applied to the equipment in Pascals. RPM: Rotations per minute of the equipment. Maintenance Required: A binary indicator (Yes/No) showing whether maintenance is needed, based on predictive modeling. Usage: This dataset is ideal for machine learning enthusiasts and professionals interested in predictive maintenance and industrial IoT applications.
This dataset was created by Hemanth Kumar Akula
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 1209 operational records collected from a conveyor system used in industrial settings. Each record captures critical parameters such as speed (rpm), load (kg), temperature (°C), vibration (m/s²), and current (A) under different operational conditions.
The dataset covers six common conveyor fault types:
Ball Bearing Fault
Central Shaft Fault
Pulley Fault
Drive Motor Fault
Idler Roller Fault
Belt Slippage
The data has been expanded to simulate real-world variability and is ideal for tasks such as predictive maintenance, fault classification, anomaly detection, and machine learning model training.
The dataset enables researchers, engineers, and data scientists to develop and validate algorithms aimed at improving equipment reliability and minimizing downtime.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
There are few datasets on mechanical engineering, in particular devoted to apply Machine Learning in industrial environment. This dataset were not yet present in Kaggle. So it's good for the community to have it at hand. This dataset is an elaborated version of Gearbox Fault Diagnosis where two kind of data are included: raw data, and elaborated dataset, i.e. computing standard deviation of accelerations on consecutive sets of data points.
Gearbox Fault Diagnosis data include the vibration dataset recorded by using SpectraQuest’s Gearbox Fault Diagnostics Simulator. Dataset has been recorded using 4 vibration sensors placed in four different direction, and under variation of load from '0' to '90' percent. Two different scenario are included: 1. Healthy condition and 1. Broken Tooth Condition
There are two versions: - Raw datasets, a single file for every gearbox state: healthy and broken, whose names are **broken30hz.csv **and healthy30hz.csv respectively - Standard Deviation datasets, computed for the accelerations on consecutive sets of data points
For the Standard Deviation (stdev) datasets there are 3 versions: - stdev computed every 10 consecutive data points (from raw dataset): **broken30hz_stdev_10.csv **and healthy30hz_stdev_10.csv - stdev computed every 100 consecutive data points: **broken30hz_stdev_100.csv **and healthy30hz_stdev_100.csv - stdev computed every 1000 consecutive data points: **broken30hz_stdev_1000.csv **and healthy30hz_stdev_1000.csv
Every single file contains the records of load from 0%-90% in steps of 10%.
Dataset taken from https://openei.org/datasets/dataset/gearbox-fault-diagnosis-data
Learn the basics for applying Machine Learning to Predictive Maintenance in industrial facilities.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Justin Eskind
Released under CC0: Public Domain
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed for predicting machine maintenance based on recorded service hours. It provides a comprehensive collection of data related to machine usage patterns and the corresponding maintenance requirements. The dataset includes information on service hours, machine characteristics, and historical maintenance events.
The dataset was curated to address the growing need for predictive maintenance in industrial settings. Understanding the relationship between service hours and maintenance needs is crucial for optimizing operational efficiency and minimizing downtime. By leveraging this dataset, data scientists and engineers can develop models to forecast maintenance requirements, enabling proactive and cost-effective equipment management.
The inspiration behind creating this dataset stems from the challenges industries face in managing machine maintenance effectively. Traditional reactive approaches often result in unexpected downtime and increased maintenance costs. This dataset aims to inspire the development of predictive maintenance solutions, where insights gained from service hours data can empower businesses to schedule maintenance activities strategically, ultimately enhancing overall productivity and reducing operational disruptions.
BoostPressure
: Additional pressure generated by the engine's boost system. Indicates how high the additional pressure generated by the system is on the engine.EngineFuelRate
: The rate of fuel consumption by the engine. Indicates how much fuel the engine uses in a given period.EngineLoad
: The load or workload applied to the engine. Indicates the level of load on the engine during operation.EngineOilPressure
: Oil pressure in the engine. Indicates the level of oil pressure in the engine system.EngineRpm
: Revolutions per minute (RPM) of the engine. Indicates how fast the engine is rotating.GroundSpeed
: The speed of the engine on the ground. Indicates how fast the engine is moving on the ground.HaulDistance
: The distance transported or hauled by the machine. Indicates how far material or cargo is transported.Payload
: The load carried by the vehicle or machine. Indicates the weight or load being carried.TankFuelLevel
: The level or volume of fuel in the engine tank. Indicates how full the fuel tank of the engine is.GearSelect(ByOperator)
: The gear selected by the operator or driver. Indicates the gear currently used by the vehicle.ServiceHours
: Total operating hours of the engine. Indicates the total hours of engine operation or usage.https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset simulates sensor readings from various types of engines to detect engine failures in mechanical systems, particularly in automotive applications. It captures data related to engine performance, fault conditions, and operational modes over a series of time intervals.
Columns: Time_Stamp:
Type: Datetime Description: The timestamp of the recorded data. It is generated starting from December 24, 2024, at 10:00 AM, with 5-minute intervals between records. The time-stamped records represent real-time sensor data monitoring. Temperature (°C):
Type: Float Description: The engine temperature in degrees Celsius. It falls within the range of 60°C to 120°C, which are typical operational temperatures for engine systems. RPM (Revolutions per Minute):
Type: Float Description: The engine's RPM, representing the speed at which the engine's crankshaft rotates. The value is randomly generated within the range of 1000 to 4000 RPM, which is typical for most engines under normal conditions. Fuel_Efficiency (km/liter):
Type: Float Description: The fuel efficiency of the engine, expressed in kilometers per liter. It represents how efficiently the engine is consuming fuel during operation, with values ranging from 15 km/l to 30 km/l. Vibration_X, Vibration_Y, Vibration_Z:
Type: Float Description: Vibration readings along three axes (X, Y, Z) of the engine. These measurements capture the vibrations from the engine to assess the mechanical health and performance. Each axis is measured on a scale from 0 to 1, with higher values possibly indicating abnormal vibrations that could be linked to engine failure. Torque (Nm):
Type: Float Description: The torque produced by the engine, measured in Newton-meters (Nm). It indicates the rotational force generated by the engine, with values ranging from 50 to 200 Nm. Power_Output (kW):
Type: Float Description: The power output of the engine, measured in kilowatts (kW). This indicates the rate at which the engine is performing work, with values ranging from 20 to 100 kW. Fault_Condition:
Type: Integer (0, 1, 2, 3) Description: The fault condition of the engine, indicating the severity of the fault: 0: Normal 1: Minor Fault 2: Moderate Fault 3: Severe Fault This column is used as the target variable for classification models to predict engine failure based on the sensor readings. Operational_Mode:
Type: String Description: The operational mode of the engine, representing the different states in which the engine is functioning. The values include: Idle: The engine is running but not under load. Cruising: The engine is running at normal speed under load. Heavy Load: The engine is under significant stress or load. Dataset Characteristics: Size: The dataset contains 1,000 records, each representing a snapshot of the engine's performance at a specific time. Fault Conditions: The dataset includes four levels of fault conditions, ranging from normal (0) to severe fault (3), which can help train models to predict and diagnose potential engine failures. Generated Data: The data is synthetic, designed to simulate typical engine performance and fault scenarios. Sensor Data: The data includes key operational metrics like temperature, RPM, fuel efficiency, vibration, torque, and power output, all of which can influence the detection of faults in the engine.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Federated learning is to build machine learning models based on data sets that are distributed across multiple devices while preventing data leakage.(Q. Yang et al. 2019)
source:
smoking https://www.kaggle.com/datasets/kukuroo3/body-signal-of-smoking license = CC0: Public Domain
heart https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-prediction-dataset license = CC0: Public Domain
water https://www.kaggle.com/datasets/adityakadiwal/water-potability license = CC0: Public Domain
customer https://www.kaggle.com/datasets/imakash3011/customer-personality-analysis license = CC0: Public Domain
insurance https://www.kaggle.com/datasets/tejashvi14/travel-insurance-prediction-data license = CC0: Public Domain
credit https://www.kaggle.com/datasets/ajay1735/hmeq-data license = CC0: Public Domain
income https://www.kaggle.com/datasets/mastmustu/income license = CC0: Public Domain
machine https://www.kaggle.com/datasets/shivamb/machine-predictive-maintenance-classification license: CC0: Public Domain
skin https://www.kaggle.com/datasets/saurabhshahane/lumpy-skin-disease-dataset license = Attribution 4.0 International (CC BY 4.0)
score https://www.kaggle.com/datasets/parisrohan/credit-score-classification?select=train.csv license = CC0: Public Domain
This dataset was created by Tolga Kaplan
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Industrial IoT Fault Detection Dataset for Predictive Maintenance in Automation contains 1000 entries of sensor data collected from machinery in an industrial automation environment. The dataset includes three key sensor measurements: vibration (in mm/s), temperature (in °C), and pressure (in bar), which are crucial for monitoring the health of industrial equipment. The data also includes two derived features: RMS vibration and mean temperature, which help in the classification of potential faults.
The Fault Label column indicates the type of fault present, with possible values including:
0: No Fault 1: Bearing Fault 2: Overheating
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by yuansaijie0604
Released under CC0: Public Domain
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed for research on real-time edge computing architectures in Industrial Internet of Things (IIoT) applications. It simulates sensor data, network latency, and predictive maintenance in a smart manufacturing environment. The dataset integrates Fuzzy PID controllers for adaptive industrial process control and includes a target column (Predicted_Failure) for autonomous decision-making.
Key Features Time-Series IIoT Sensor Data (Temperature, Pressure, Vibration)
Edge Computing Metrics (Network Latency, Processing Time)
Fuzzy PID Controller Output (Optimized for real-time control)
Maintenance Status (Normal, Warning, Failure)
Predictive Failure Labels (1 = Failure, 0 = No Failure)
Overview: The NASA CMAPSS dataset consists of simulated jet engine sensor readings generated using the Commercial Modular Aero-Propulsion System Simulation (CMAPSS). It’s widely used for research in prognostics, health management, and remaining useful life (RUL) estimation.
Contents:
Training Data: Contains engine cycle information and sensor measurements. Test Data: Engine cycle data without RUL labels, to be predicted. RUL Values: Ground truth remaining useful life for the test engines. Applications: Ideal for time-series analysis, anomaly detection, and developing machine learning models focused on predictive maintenance.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Kilani Sikiru
Released under Apache 2.0
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Pragyan AI and DS School
Released under CC0: Public Domain