Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
This dataset provides information about Vibration levels , torque, process temperature and Fault.
The dataset in the image is a spreadsheet containing information about engine performance. The spreadsheet has the following variables:
UDI: This is likely a unique identifier for each engine. Product ID: This could be a specific code or identifier for the engine model. Type: This indicates the type of engine, possibly categorized by fuel type (e.g., M - motor, L - liquid). Air temperature (K): This is the air temperature in Kelvin around the engine. Process temperature [K]: This is the internal temperature of the engine during operation, measured in Kelvin. Speed (rpm): This is the rotational speed of the engine in revolutions per minute. Torque (Nm): This is the twisting force exerted by the engine, measured in Newton meters. Vibration Levels: This could be a measure of the engine's vibration intensity. Operational Hours: This is the total number of hours the engine has been operational. Tailure Type: This indicates the type of failure the engine experienced, if any. Rotational: This might be a specific type of failure related to the engine's rotation. This dataset could be used for various analytical purposes related to engine performance and maintenance. Here are some examples:
Identifying patterns of engine failure: By analyzing the data, you could identify correlations between specific variables (e.g., air temperature, operational hours) and engine failures. This could help predict potential failures and schedule preventative maintenance. Optimizing engine performance: By analyzing the data, you could identify the operating conditions (e.g., temperature, speed) that lead to optimal engine performance. This could help improve fuel efficiency and engine lifespan. Comparing engine types: The data could be used to compare the performance and efficiency of different engine types under various operating conditions. Building predictive models: The data could be used to train machine learning models to predict engine failures, optimize maintenance schedules, and improve overall engine performance. It's important to note that the specific value of this dataset would depend on the context and the intended use case. For example, if you are only interested in a specific type of engine or a particular type of failure, you might need to filter or subset the data accordingly.
Facebook
Twitterhttps://www.buymeacoffee.com/ahmet17
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2114241%2F00b3f28d987503f43483e3e06b776dc8%2F1.png?generation=1681077793140806&alt=media" alt="">
This dataset is free on my kaggle page. However, to support me, you can buy me a coffee :)
Do not forget that these datasets can be prepared with months of studies after long measurements.
Hope it will be useful for you.
Classified datasets are required for predictive maintenance. A machine system has many parts that are difficult to replace and maintain. When these parts are corrupted, the trained neural network should be able to predict with high accuracy which part is corrupted. That's why as much data is collected as possible. Some data may be fully correlated with each other. This data is still taught to the neural network because changing one parameter in the time domain can unexpectedly change other parameters. In the artificial intelligence system required for predictive maintenance, there must be LSTM next to DNN.
This data set has been prepared with measurements made on the compressor system feeding the air line of a factory. The related compressor has the characteristics of being driven by an AC current electric motor, two-pistons, water-cooled, single-stage, capable of producing maximum 8 bar compressed air.
Measurements were made with high resolution sensors and an industrial type data collector. To prepare a clean dataset, measurement lines with cable-induced noise were deleted.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Nafisur Rahman
Released under CC0: Public Domain
Facebook
TwitterThis an example data source which can be used for Predictive Maintenance Model Building. It consists of the following data:
Telemetry Time Series Data (PdM_telemetry.csv): It consists of hourly average of voltage, rotation, pressure, vibration collected from 100 machines for the year 2015.
Error (PdM_errors.csv): These are errors encountered by the machines while in operating condition. Since, these errors don't shut down the machines, these are not considered as failures. The error date and times are rounded to the closest hour since the telemetry data is collected at an hourly rate.
Maintenance (PdM_maint.csv): If a component of a machine is replaced, that is captured as a record in this table. Components are replaced under two situations: 1. During the regular scheduled visit, the technician replaced it (Proactive Maintenance) 2. A component breaks down and then the technician does an unscheduled maintenance to replace the component (Reactive Maintenance). This is considered as a failure and corresponding data is captured under Failures. Maintenance data has both 2014 and 2015 records. This data is rounded to the closest hour since the telemetry data is collected at an hourly rate.
Failures (PdM_failures.csv): Each record represents replacement of a component due to failure. This data is a subset of Maintenance data. This data is rounded to the closest hour since the telemetry data is collected at an hourly rate.
Metadata of Machines (PdM_Machines.csv): Model type & age of the Machines.
This dataset was available as a part of Azure AI Notebooks for Predictive Maintenance. But as of 15th Oct, 2020 the notebook (link) is no longer available. However, the data can still be downloaded using the following URLs:
https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_telemetry.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_errors.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_maint.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_failures.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_machines.csv
Try to use this data to build Machine Learning models related to Predictive Maintenance.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Since real predictive maintenance datasets are generally difficult to obtain and in particular difficult to publish, we present and provide a synthetic dataset that reflects real predictive maintenance encountered in the industry to the best of our knowledge.
The dataset consists of 10 000 data points stored as rows with 14 features in columns - UID: unique identifier ranging from 1 to 10000 - productID: consisting of a letter L, M, or H for low (50% of all products), medium (30%), and high (20%) as product quality variants and a variant-specific serial number - air temperature [K]: generated using a random walk process later normalized to a standard deviation of 2 K around 300 K - process temperature [K]: generated using a random walk process normalized to a standard deviation of 1 K, added to the air temperature plus 10 K. - rotational speed [rpm]: calculated from powepower of 2860 W, overlaid with a normally distributed noise - torque [Nm]: torque values are normally distributed around 40 Nm with an σ = 10 Nm and no negative values. - tool wear [min]: The quality variants H/M/L add 5/3/2 minutes of tool wear to the used tool in the process. and a 'machine failure' label that indicates, whether the machine has failed in this particular data point for any of the following failure modes are true.
UCI : https://archive.ics.uci.edu/ml/datasets/AI4I+2020+Predictive+Maintenance+Dataset
Facebook
TwitterThis dataset is designed for predictive maintenance of aircraft engines and consists of three primary files:
Training Data ("PM_train.csv"): This file contains multiple multivariate time series data, with each time series representing the operational cycles of different aircraft engines of the same type. Each cycle includes 21 sensor readings. Engines start with varying initial wear and manufacturing differences, which are unknown to the user. Initially, engines operate normally and begin to degrade over time. The degradation increases until a predefined threshold is reached, marking the engine as unsafe for further use. The final cycle in each time series indicates the failure point of the engine.
Testing Data ("PM_test.csv"): This file shares the same schema as the training data but does not specify the failure points. For example, an engine might run from cycle 1 to cycle 31 without indicating how many more cycles it can last before failure.
Ground Truth Data ("PM_truth.csv"): This file provides the actual remaining working cycles for the engines in the testing data. For instance, it shows that an engine running from cycle 1 to cycle 31 in the testing data has 112 remaining cycles before failure.
This dataset enables the development and evaluation of predictive maintenance models, allowing for the prediction of engine degradation and failure, thereby enhancing maintenance schedules and ensuring operational safety.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A company has a fleet of devices transmitting daily sensor readings. They would like to create a predictive maintenance solution to proactively identify when maintenance should be performed. This approach promises cost savings over routine or time based preventive maintenance, because tasks are performed only when warranted.
The task is to build a predictive model using machine learning to predict the probability of a device failure. When building this model, be sure to minimize false positives and false negatives. The column you are trying to Predict is called failure with binary value 0 for non-failure and 1 for failure.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The EVIoT-PredictiveMaint Dataset is a comprehensive real-world dataset collected from IoT-enabled electric vehicles (EVs) operating in diverse environments. The dataset captures multi-modal telemetry, environmental conditions, and historical maintenance records at 15-minute intervals over a 5-year period (January 2020 to January 2025). It is specifically designed for multi-horizon predictive maintenance in EV fleet management, supporting federated learning applications for failure prediction, maintenance scheduling, and component health assessment.
With 175,393 records, this dataset is ideal for research in predictive maintenance, failure analysis, and energy optimization in electric vehicle fleets. It includes sensor data, telematics, environmental conditions, and maintenance history to facilitate advanced machine learning models for predicting vehicle reliability and optimizing maintenance strategies.
Features in EVIoT-PredictiveMaint Dataset The dataset consists of 30+ features categorized into eight major groups:
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Characteristics: - Type: Multivariate, Time-Series - Subject Area: Computer Science - Associated Tasks: Classification, Regression, Causal-Discovery - Feature Type: Real - Number of Instances: 10,000 - Number of Features: 6 - Missing Values: No
Description: The AI4I 2020 Predictive Maintenance Dataset is a synthetic dataset designed to mirror real-world predictive maintenance data typically encountered in industrial settings. It provides a valuable resource for developing and testing predictive maintenance models where real datasets are often scarce and challenging to share.
Dataset Information: - Purpose: To offer a synthetic dataset reflecting real-world predictive maintenance scenarios. - Funding: Not specified. - Instances Representation: Each instance represents a data point in a predictive maintenance context.
Variables Table: - UID (ID, Integer): Unique identifier ranging from 1 to 10,000 - Product ID (ID, Categorical): Product identifier consisting of a letter (L, M, or H) indicating product quality variants (low, medium, high) and a serial number - Type (Feature, Categorical): Product type - Air temperature (Feature, Continuous): Measured in Kelvin (K), generated using a random walk process and normalized to a standard deviation of 2 K around 300 K - Process temperature (Feature, Continuous): Measured in Kelvin (K), generated using a random walk process normalized to a standard deviation of 1 K, added to the air temperature plus 10 K - Rotational speed (Feature, Integer): Measured in revolutions per minute (rpm), calculated from a power of 2860 W with normally distributed noise - Torque (Feature, Continuous): Measured in Newton meters (Nm), normally distributed around 40 Nm with a standard deviation of 10 Nm, and no negative values - Tool wear (Feature, Integer): Measured in minutes (min), varies by product quality (H, M, L) adding 5, 3, or 2 minutes respectively - Machine failure (Target, Integer): Indicates whether the machine failed at this data point - TWF (Target, Integer): Tool wear failure
Additional Variable Information: The dataset consists of 10,000 data points stored as rows with 14 features in columns. Each row includes:
Failure Mode Details: - Tool wear failure (TWF): Tool failure or replacement between 200-240 mins, randomly assigned - Heat dissipation failure (HDF): Failure if temperature difference is below 8.6 K and rotational speed is below 1380 rpm - Power failure (PWF): Failure if power (torque * rotational speed in rad/s) is below 3500 W or above 9000 W - Overstrain failure (OSF): Failure if product of tool wear and torque exceeds thresholds (11,000 minNm for L, 12,000 for M, 13,000 for H) - Random failures (RNF): Each process has a 0.1% chance of failure regardless of parameters
Introductory Paper: "Explainable Artificial Intelligence for Predictive Maintenance Applications" by S. Matzka, 2020, published in the International Conference on Artificial Intelligence for Industries.
Facebook
TwitterThis dataset was created by Complex Infinite Solutions
Released under Other (specified in description)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Datasets for performing predictive maintenance and give predictions on the future breakdowns and their root causes. The analysis of those datasets relies on life distributions and the choice over several maintenance strategies. The different natures of dataset allows to perform a wide range of analysis, which are presented in the notebook associated with this dataset.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Overview This dataset contains sensor data collected from various machines, with the aim of predicting machine failures in advance. It includes a variety of sensor readings as well as the recorded machine failures.
Columns Description footfall: The number of people or objects passing by the machine. tempMode: The temperature mode or setting of the machine. AQ: Air quality index near the machine. USS: Ultrasonic sensor data, indicating proximity measurements. CS: Current sensor readings, indicating the electrical current usage of the machine. VOC: Volatile organic compounds level detected near the machine. RP: Rotational position or RPM (revolutions per minute) of the machine parts. IP: Input pressure to the machine. Temperature: The operating temperature of the machine. fail: Binary indicator of machine failure (1 for failure, 0 for no failure).
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Please note that this is the original dataset with additional information and proper attribution. There is at least one other version of this dataset on Kaggle that was uploaded without permission. Please be fair and attribute the original author. This synthetic dataset is modeled after an existing milling machine and consists of 10 000 data points from a stored as rows with 14 features in columns
The machine failure consists of five independent failure modes 10. tool wear failure (TWF): the tool will be replaced of fail at a randomly selected tool wear time between 200 - 240 mins (120 times in our dataset). At this point in time, the tool is replaced 69 times, and fails 51 times (randomly assigned). 11. heat dissipation failure (HDF): heat dissipation causes a process failure, if the difference between air- and process temperature is below 8.6 K and the tools rotational speed is below 1380 rpm. This is the case for 115 data points. 12. power failure (PWF): the product of torque and rotational speed (in rad/s) equals the power required for the process. If this power is below 3500 W or above 9000 W, the process fails, which is the case 95 times in our dataset. 13. overstrain failure (OSF): if the product of tool wear and torque exceeds 11,000 minNm for the L product variant (12,000 M, 13,000 H), the process fails due to overstrain. This is true for 98 datapoints. 14. random failures (RNF): each process has a chance of 0,1 % to fail regardless of its process parameters. This is the case for only 5 datapoints, less than could be expected for 10,000 datapoints in our dataset. If at least one of the above failure modes is true, the process fails and the 'machine failure' label is set to 1. It is therefore not transparent to the machine learning method, which of the failure modes has caused the process to fail.
This dataset is part of the following publication, please cite when using this dataset: S. Matzka, "Explainable Artificial Intelligence for Predictive Maintenance Applications," 2020 Third International Conference on Artificial Intelligence for Industries (AI4I), 2020, pp. 69-74, doi: 10.1109/AI4I49448.2020.00023.
The image of the milling process is the work of Daniel Smyth @ Pexels: https://www.pexels.com/de-de/foto/industrie-herstellung-maschine-werkzeug-10406128/
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 5000 rows of multi-sensor readings collected from petrochemical rotating machinery operating under varying load and environmental conditions.
The dataset includes:
Sensor signals: 3-axis vibration, temperature, electrical current, rotational speed (RPM), and internal pressure.
Time-frequency features: Five features extracted using wavelet packet decomposition.
Labels: Multi-class fault type (no_fault, bearing_fault, rotor_imbalance, misalignment) and binary maintenance requirement flag.
Timestamps: High-resolution time intervals for time-series analysis.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides synthetic data related to vehicle maintenance to help predict whether a vehicle requires maintenance or not based on various features.
This dataset is synthetic and was generated using Python. It is intended for educational and research purposes.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset supports research on predictive maintenance and decision support in smart manufacturing systems. The dataset combines real-time sensor measurements, maintenance cost factors, and decision variables to prioritize equipment servicing based on failure risk and operational constraints.
The framework uses cloud computing to improve system scalability and responsiveness by synchronizing virtual asset models with real-time sensor and inspection data
Researchers and practitioners can use this dataset to explore proactive maintenance scheduling, asset health diagnostics, and intelligent factory management.
⭐ Key Features Real-Time Sensor Data: Includes temperature, vibration, pressure, and acoustic signals from simulated manufacturing equipment.
Maintenance Decision Criteria: Factors like inspection duration, technician availability, and downtime cost to support MCDM.
Failure Probability Score: Computed feature for training predictive models (range: 0–1).
Maintenance Priority Label: Target variable (High = 1, Medium = 2, Low = 3) based on failure likelihood and operational risk.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by arpan001
Released under MIT
Facebook
TwitterUID: unique identifier ranging from 1 to 10000 productID:consisting of a letter L, M, or H for low (50% of all products), medium (30%), and high (20%) as product quality variants and a variant-specific serial number air temperature [K]: generated using a random walk process later normalized to a standard deviation of 2 K around 300 K process temperature [K]: generated using a random walk process normalized to a standard deviation of 1 K, added to the air temperature plus 10 K. rotational speed [rpm]: calculated from horsepower of 2860 W, overlaid with a normally distributed noise torque [Nm]: torque values are normally distributed around 40 Nm with an σ = 10 Nm and no negative values. tool wear [min]: The quality variants H/M/L add 5/3/2 minutes of tool wear to the used tool in the process. and a The 'machine failure' label that indicates, whether the machine has failed in this particular data point for any of the following failure modes is true.
There are two Targets - Do not make the mistake of using one of them as a feature, as it will lead to leakage. Target: Failure or Not Failure Type: Type of Failure
Acknowledgments: UCI : UCI
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Preventive Maintenance for Marine Engines: Data-Driven Insights
Introduction:
Marine engine failures can lead to costly downtime, safety risks and operational inefficiencies. This project leverages machine learning to predict maintenance needs, helping ship operators prevent unexpected breakdowns. Using a simulated dataset, we analyze key engine parameters and develop predictive models to classify maintenance status into three categories: Normal, Requires Maintenance, and Critical.
Overview This project explores preventive maintenance strategies for marine engines by analyzing operational data and applying machine learning techniques.
Key steps include: 1. Data Simulation: Creating a realistic dataset with engine performance metrics. 2. Exploratory Data Analysis (EDA): Understanding trends and patterns in engine behavior. 3. Model Training & Evaluation: Comparing machine learning models (Decision Tree, Random Forest, XGBoost) to predict maintenance needs. 4. Hyperparameter Tuning: Using GridSearchCV to optimize model performance.
Tools Used 1. Python: Data processing, analysis and modeling 2. Pandas & NumPy: Data manipulation 3. Scikit-Learn & XGBoost: Machine learning model training 4. Matplotlib & Seaborn: Data visualization
Skills Demonstrated ✔ Data Simulation & Preprocessing ✔ Exploratory Data Analysis (EDA) ✔ Feature Engineering & Encoding ✔ Supervised Machine Learning (Classification) ✔ Model Evaluation & Hyperparameter Tuning
Key Insights & Findings 📌 Engine Temperature & Vibration Level: Strong indicators of potential failures. 📌 Random Forest vs. XGBoost: After hyperparameter tuning, both models achieved comparable performance, with Random Forest performing slightly better. 📌 Maintenance Status Distribution: Balanced dataset ensures unbiased model training. 📌 Failure Modes: The most common issues were Mechanical Wear & Oil Leakage, aligning with real-world engine failure trends.
Challenges Faced 🚧 Simulating Realistic Data: Ensuring the dataset reflects real-world marine engine behavior was a key challenge. 🚧 Model Performance: The accuracy was limited (~35%) due to the complexity of failure prediction. 🚧 Feature Selection: Identifying the most impactful features required extensive analysis.
Call to Action 🔍 Explore the Dataset & Notebook: Try running different models and tweaking hyperparameters. 📊 Extend the Analysis: Incorporate additional sensor data or alternative machine learning techniques. 🚀 Real-World Application: This approach can be adapted for industrial machinery, aircraft engines, and power plants.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by IRFAN ULLAH KHAN
Released under Apache 2.0
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
This dataset provides information about Vibration levels , torque, process temperature and Fault.
The dataset in the image is a spreadsheet containing information about engine performance. The spreadsheet has the following variables:
UDI: This is likely a unique identifier for each engine. Product ID: This could be a specific code or identifier for the engine model. Type: This indicates the type of engine, possibly categorized by fuel type (e.g., M - motor, L - liquid). Air temperature (K): This is the air temperature in Kelvin around the engine. Process temperature [K]: This is the internal temperature of the engine during operation, measured in Kelvin. Speed (rpm): This is the rotational speed of the engine in revolutions per minute. Torque (Nm): This is the twisting force exerted by the engine, measured in Newton meters. Vibration Levels: This could be a measure of the engine's vibration intensity. Operational Hours: This is the total number of hours the engine has been operational. Tailure Type: This indicates the type of failure the engine experienced, if any. Rotational: This might be a specific type of failure related to the engine's rotation. This dataset could be used for various analytical purposes related to engine performance and maintenance. Here are some examples:
Identifying patterns of engine failure: By analyzing the data, you could identify correlations between specific variables (e.g., air temperature, operational hours) and engine failures. This could help predict potential failures and schedule preventative maintenance. Optimizing engine performance: By analyzing the data, you could identify the operating conditions (e.g., temperature, speed) that lead to optimal engine performance. This could help improve fuel efficiency and engine lifespan. Comparing engine types: The data could be used to compare the performance and efficiency of different engine types under various operating conditions. Building predictive models: The data could be used to train machine learning models to predict engine failures, optimize maintenance schedules, and improve overall engine performance. It's important to note that the specific value of this dataset would depend on the context and the intended use case. For example, if you are only interested in a specific type of engine or a particular type of failure, you might need to filter or subset the data accordingly.