Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed to simulate supply chain operations in large-scale engineering projects. It integrates realistic data from IoT sensors, digital twins, and blockchain-enabled monitoring systems over the years 2023 to 2024.
It aims to support research in predictive maintenance, resource optimization, secure data exchange, and supply chain transparency through advanced analytics and machine learning.
⭐ Key Features Time-bound IoT Sensor Data: Includes real-time-like sensor outputs such as temperature and vibration across multiple locations and assets.
Digital Twin Sync Fields: Tracks Condition_Score and Last_Maintenance to simulate digital twin feedback loops.
Operational KPIs: Features supply chain metrics like Resource_Utilization, Delivery_Efficiency, and Downtime_Hours.
Blockchain Contextual Fit: Designed to be compatible with blockchain audit trails and smart contract triggers (e.g., anomaly response, automated logistics payments).
Labeled Targets: SupplyChain_Efficiency_Label classifies overall efficiency into 3 tiers (0: Low, 1: Medium, 2: High) based on predefined KPI thresholds.
Location-aware Simulation: Assets and operations are tagged by realistic geographic locations.
Supply Chain Economics: Captures Inventory_Level and Logistics_Cost for resource allocation analysis.
Year-specific Scope: Covers the period from 2023 to 2024, aligning with recent and ongoing digital transformation trends.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.
GitHub page: https://github.com/soarsmu/NICHE
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides insights into data engineer salaries and employment attributes for the year 2024. It includes information such as salary, job title, experience level, employment type, employee residence, remote work ratio, company location, and company size.
The dataset allows for analysis of salary trends, employment patterns, and geographic variations in data engineering roles. It can be used by researchers, analysts, and organizations to understand the evolving landscape of data engineering employment and compensation.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
This dataset contains a list of data engineering job postings scraped from Glassdoor in the USA (March 2023). It includes details such as the company name, location, job title, job description, estimated salary, company size, company type, company sector, company industry, the year the company was founded, and company revenue. The dataset can be used for exploring data engineering job trends in the USA, analyzing salaries, and identifying the most in-demand skills and qualifications.
You can see the whole project on GitHub.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
We utilized a dataset of Machine Design materials, which includes information on their mechanical properties. The dataset was obtained from the Autodesk Material Library and comprises 15 columns, also referred to as features/attributes. This dataset is a real-world dataset, and it does not contain any random values. However, due to missing values, we only utilized seven of these columns for our ML model. You can access the related GitHub Repository here: https://github.com/purushottamnawale/material-selection-using-machine-learning
To develop a ML model, we employed several Python libraries, including NumPy, pandas, scikit-learn, and graphviz, in addition to other technologies such as Weka, MS Excel, VS Code, Kaggle, Jupyter Notebook, and GitHub. We employed Weka software to swiftly visualize the data and comprehend the relationships between the features, without requiring any programming expertise.
My Problem statement is Material Selection for EV Chassis. So, if you have any specific ideas, be sure to implement them and add the codes on Kaggle.
A Detailed Research Paper is available on https://iopscience.iop.org/article/10.1088/1742-6596/2601/1/012014
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description This dataset is designed for whole life cycle management of civil engineering projects, integrating Building Information Modeling (BIM) and Artificial Intelligence (AI). It includes comprehensive project data covering cost, schedule, structural health, environmental conditions, resource allocation, safety risks, and drone-based monitoring.
Key Features Project Metadata: ID, type (bridge, road, building, etc.), location, and timeline. Financial Data: Planned vs. actual cost, cost overruns. Scheduling Data: Planned vs. actual duration, schedule deviation. Structural Health Monitoring: Vibration levels, crack width, load-bearing capacity. Environmental Factors: Temperature, humidity, air quality, weather conditions. Resource & Safety Management: Material usage, labor hours, equipment utilization, accident records. Drone-Based Monitoring: Image analysis scores, anomaly detection, completion percentage. Target Variable: Risk Level (Low, Medium, High) based on cost, schedule, safety, and structural health. Use Cases Predictive Modeling: Train AI models to forecast project risks and optimize decision-making. BIM & AI Integration: Leverage real-time IoT and drone data for smart construction management. Risk Assessment: Identify early signs of cost overruns, delays, and structural failures. Automation & Efficiency: Develop automated maintenance and safety monitoring frameworks
Facebook
Twitter**Summary ** This dataset contains over 2,600 circuit projects scraped from Instructables, focusing on the "Circuits" category. It includes project titles, authors, engagement metrics (views, likes), and the primary component used (Instruments).
** How This Data Was Collected**
I built a web scraper using Python and Selenium to gather all project links (over 2,600 of them) by handling the "Load All" button. The full page source was saved, and I then used BeautifulSoup to parse the HTML and extract the raw data for each project.
The raw data was very messy. I performed a full data cleaning pipeline in a Colab notebook using Pandas.
Views and Likes were text fields (object).Likes and Views with the mean (average) of the respective column.Views and Likes data is highly right-skewed (skewness of ~9.5). This shows a "viral" effect where a tiny number of superstar projects get the vast majority of all views and likes.[
](url)
Log-Transformation: Because of the skew, I created log_Views and log_Likes columns. A 2D density plot of these log-transformed columns shows a strong positive correlation (as likes increase, views increase) and that the most "typical" project gets around 30-40 likes and 4,000-5,000 views.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F29431778%2Fd90e2039f1be11b53308ab7191b10954%2Fdownload%20(1).png?generation=1763013545903998&alt=media" alt="">
Top Instruments: I've also analyzed the most popular instruments to see which ones get the most engagement.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F29431778%2F19fca1ce142ddddc1e16a5319a1f4fc5%2Fdownload%20(2).png?generation=1763013562400830&alt=media" alt="">
Facebook
TwitterSupply chain analytics is a valuable part of data-driven decision-making in various industries such as manufacturing, retail, healthcare, and logistics. It is the process of collecting, analyzing and interpreting data related to the movement of products and services from suppliers to customers.
Facebook
TwitterWhile searching for the dream house, the buyer looks at various factors, not just at the height of the basement ceiling or the proximity to an east-west railroad.
Using the dataset, find the factors that influence price negotiations while buying a house.
There are 79 explanatory variables describing every aspect of residential homes in Ames, Iowa.
Task to be Performed:
1) Download the “PEP1.csv” using the link given in the Feature Engineering project problem statement 2) For a detailed description of the dataset, you can download and refer to data_description.txt using the link given in the Feature Engineering project problem statement Tasks to Perform 1) Import the necessary libraries 1.1 Pandas is a Python library for data manipulation and analysis. 1.2 NumPy is a package that contains a multidimensional array object and several derivative ones. 1.3 Matplotlib is a Python visualization package for 2D array plots. 1.4 Seaborn is built on top of Matplotlib. It's used for exploratory data analysis and data visualization. 2) Read the dataset 2.1 Understand the dataset 2.2 Print the name of the columns 2.3 Print the shape of the dataframe Tasks to Perform 2.4 Check for null values 2.5 Print the unique values 2.6 Select the numerical and categorical variables 3) Descriptive stats and EDA 3.1 EDA of numerical variables 3.2 Missing value treatment 3.3 Identify the skewness and distribution 3.4 Identify significant variables using a correlation matrix 3.5 Pair plot for distribution and density Project Outcome • The aim of the project is to help understand working with the dataset and performing analysis. • This project will assess the data and prepares a fresh dataset for training and prediction • To create a box plot to identify the variables with outliers
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is part of a capstone project for the Google Data Analytics Certificate. It contains cleaned, merged, and feature-engineered Fitbit data from 35 participants, originally sourced from publicly available Fitabase exports. The goal is to explore user behavior and engagement patterns to inform marketing strategies for Bellabeat, a wellness tech company.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Description: Online Retail II
This dataset contains 525,461 transaction-level records from an online retail store based in the United Kingdom. It captures detailed information about customer purchases, products, pricing, and order timestamps, making it suitable for sales analytics, customer behavior analysis, product performance evaluation, and SQL data engineering projects.
Key Features
Invoice: A unique identifier for each order. Some invoices may represent returns or cancellations depending on business rules. StockCode: Product-level unique code identifying each item sold. Description: Text description of the product purchased. Quantity: Number of units bought. Negative values typically indicate returns. InvoiceDate: Timestamp indicating the exact date and time of the transaction. Price: Unit price of the product in the transaction currency. Customer ID: Unique identifier assigned to each registered customer. Missing values may indicate guest or unregistered buyers. Country: The country where the customer is located, enabling regional and international sales analysis.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is synthetically generated fake data designed to simulate a realistic e-commerce environment.
To provide large-scale relational datasets for practicing database operations, analytics, and testing tools like DuckDB, Pandas, and SQL engines. Ideal for benchmarking, educational projects, and data engineering experiments.
int): Unique identifier for each customer string): Customer full name string): Customer email address string): Customer gender ('Male', 'Female', 'Other') date): Date customer signed up string): Customer country of residence int): Unique identifier for each product string): Name of the product string): Product category (e.g., Electronics, Books) float): Price per unit int): Available stock count string): Product brand name int): Unique identifier for each order int): ID of the customer who placed the order (foreign key to Customers) date): Date when order was placed float): Total amount for the order string): Payment method used (Credit Card, PayPal, etc.) string): Country where the order is shipped int): Unique identifier for each order item int): ID of the order this item belongs to (foreign key to Orders) int): ID of the product ordered (foreign key to Products) int): Number of units ordered float): Price per unit at order time int): Unique identifier for each review int): ID of the reviewed product (foreign key to Products) int): ID of the customer who wrote the review (foreign key to Customers) int): Rating score (1 to 5) string): Text content of the review date): Date the review was written https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9179978%2F7681afe8fc52a116ff56a2a4e179ad19%2FEDR.png?generation=1754741998037680&alt=media" alt="">
The script saves two folders inside the specified output path:
csv/ # CSV files
parquet/ # Parquet files
MIT License
Facebook
TwitterPlease read the description file of the Data Set. The work I done was adjusting the data into a acceptable file format by kaggle standards.
1 - instance - instance indicator
1 - component - component number (integer)
2 - sup - support in the machine where measure was taken (1..4)
3 - cpm - frequency of the measure (integer)
4 - mis - measure (real)
5 - misr - earlier measure (real)
6 - dir - filter, type of the measure and direction: {vo=no filter, velocity, horizontal, va=no filter, velocity, axial, vv=no filter, velocity, vertical, ao=no filter, amplitude, horizontal, aa=no filter, amplitude, axial, av=no filter, amplitude, vertical, io=filter, velocity, horizontal, ia=filter, velocity, axial, iv=filter, velocity, vertical}
7 - omega - rpm of the machine (integer, the same for components of one example)
8 - class - classification (1..6, the same for components of one example)
9 - comb. class - combined faults
10 - other class - other faults occuring
Data Source: https://archive.ics.uci.edu/ml/datasets/Mechanical+Analysis
Facebook
TwitterRobotics for All Data Science With Python Week 3 Class 1 Review Projects
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains a synthetic simulation of cloud resource usage and carbon emissions, designed for experimentation, analysis, and forecasting in sustainability and data engineering projects.
Included Tables:
- projects → Metadata about projects/teams.
- services → Metadata about cloud services (Compute, Storage, AI, etc.).
- emission_factors → Regional grid carbon intensity (gCO₂ per kWh).
- service_energy_coefficients → Conversion rates from usage units to kWh.
- daily_usage → Raw service usage (per project × service × region × day).
- daily_emissions → Carbon emissions derived from usage × regional emission factors.
- service_cost_coefficients → Conversion rates from usage units to cost (USD per unit).
- daily_cost_emissions → Integrated fact table combining usage, energy, cost, and emissions for analysis.
Features: - Simulated seasonality (weekend dips/spikes, holiday surges, quarter-end growth). - Regional variations in carbon intensity (e.g., coal-heavy vs renewable grids). - Multiple projects and services for multi-dimensional analysis. - Directly importable into BigQuery for analytics & forecasting.
Use Cases: Explore sustainability analytics at scale. Build carbon footprint dashboards. Run AI/ML forecasting on emissions data. Practice SQL, data modeling, and visualization.
⚠️ Note: All data is synthetic and created for educational/demo purposes. It does not represent actual cloud provider emissions.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This comprehensive pharmaceutical synthetic dataset contains 1,393 records of synthetic drug information with 15 columns, designed for data science projects focusing on healthcare analytics, drug safety analysis, and pharmaceutical research. The dataset simulates real-world pharmaceutical data with appropriate variety and realistic constraints for machine learning applications.
| Attribute | Value |
|---|---|
| Total Records | 1,393 |
| Total Columns | 15 |
| File Format | CSV |
| Data Types | Mixed (intentional for data cleaning practice) |
| Domain | Pharmaceutical/Healthcare |
| Use Case | ML Training, Data Analysis, Healthcare Research |
| Column Name | Data Type | Unique Values | Description | Example Values |
|---|---|---|---|---|
drug_name | Object | 1,283 unique | Pharmaceutical drug names with realistic naming patterns | "Loxozepam32", "Amoxparin43", "Virazepam10" |
manufacturer | Object | 10 unique | Major pharmaceutical companies | Pfizer Inc., AstraZeneca, Johnson & Johnson |
drug_class | Object | 10 unique | Therapeutic drug classifications | Antibiotic, Analgesic, Antidepressant, Vaccine |
indications | Object | 10 unique | Medical conditions the drug treats | "Pain relief", "Bacterial infections", "Depression treatment" |
side_effects | Object | 434 unique | Combination of side effects (1-3 per drug) | "Nausea, Dizziness", "Headache, Fatigue, Rash" |
administration_route | Object | 7 unique | Method of drug delivery | Oral, Intravenous, Topical, Inhalation, Sublingual |
contraindications | Object | 10 unique | Medical warnings for drug usage | "Pregnancy", "Heart disease", "Liver disease" |
warnings | Object | 10 unique | Safety instructions and precautions | "Take with food", "Avoid alcohol", "Monitor blood pressure" |
batch_number | Object | 1,393 unique | Manufacturing batch identifiers | "xr691zv", "Ye266vU", "Rm082yX" |
expiry_date | Object | 782 unique | Drug expiration dates (YYYY-MM-DD) | "2025-12-13", "2027-03-09", "2026-10-06" |
side_effect_severity | Object | 3 unique | Severity classification | Mild, Moderate, Severe |
approval_status | Object | 3 unique | Regulatory approval status | Approved, Pending, Rejected |
| Column Name | Data Type | Range | Mean | Std Dev | Description |
|---|---|---|---|---|---|
approval_year | Float/String* | 1990-2024 | 2006.7 | 10.0 | FDA/regulatory approval year |
dosage_mg | Float/String* | 10-990 mg | 499.7 | 290.0 | Medication strength in milligrams |
price_usd | Float/String* | $2.32-$499.24 | $251.12 | $144.81 | Drug price in US dollars |
*Intentionally stored as mixed types for data cleaning practice
| Manufacturer | Count | Percentage |
|---|---|---|
| Pfizer Inc. | 170 | 12.2% |
| AstraZeneca | ~140 | ~10.0% |
| Merck & Co. | ~140 | ~10.0% |
| Johnson & Johnson | ~140 | ~10.0% |
| GlaxoSmithKline | ~140 | ~10.0% |
| Others | ~623 | ~44.8% |
| Drug Class | Count | Most Common |
|---|---|---|
| Anti-inflammatory | 154 | ✓ |
| Antibiotic | ~140 | |
| Antidepressant | ~140 | |
| Antiviral | ~140 | |
| Vaccine | ~140 | |
| Others | ~679 |
| Severity | Count | Percentage |
|---|---|---|
| Severe | 488 | 35.0% |
| Moderate | ~453 | ~32.5% |
| Mild | ~452 | ~32.5% |
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed to facilitate the development of deep learning-based models for real-time risk assessment in tunnel engineering projects. The data contains critical engineering parameters, geotechnical properties, and sensor-based monitoring data collected or simulated under various tunneling conditions. Each record corresponds to specific tunneling conditions and is labeled with a risk level to indicate the likelihood of structural failure or hazardous events.
Dataset Content The dataset contains 1000 samples (modifiable based on requirements), with each row representing a unique tunneling scenario. The key features include:
0 = Low Risk (Safe conditions) 1 = Medium Risk (Moderate risk, monitoring required) 2 = High Risk (High likelihood of structural stress) 3 = Critical Risk (Failure scenario or hazardous condition) The risk levels are assigned based on threshold values for tunnel displacement and settlement, which are essential indicators of tunnel stability.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This synthetically created dataset contains 1,555 conversational voice search queries captured across multiple devices, languages, and user intents. The dataset simulates realistic voice command interactions for machine learning and analytics projects in the conversational AI domain.
| Column Name | Data Type | Description | Example Values | Null % |
|---|---|---|---|---|
query_id | Integer | Unique identifier for each voice search query | 1, 2, 3... 1555 | 0% |
user_id | String (UUID) | Unique user identifier | bdd640fb-0667-4ad1-9c80-317fa3b1799d | 0% |
timestamp | DateTime | When the query was made | 2025-04-17 19:27:32 | 0% |
device_type | String | Device used for voice search | smartphone, smart speaker, smartwatch, tablet, car assistant | 0% |
query_text | String | The actual voice search text | "What's the weather like today?", "Call Mom" | 0% |
language | String | Language of the query | English, Spanish, Mandarin, Hindi, French | 0% |
intent | String | Query category/purpose | information, navigation, command, entertainment, shopping | 0% |
location | String | User's geographical location | New York, Los Angeles, London, Delhi, Shanghai, Paris, Tokyo | 0% |
query_duration_sec | Float | Duration of voice query in seconds | 1.05 to 12.71 seconds | 0% |
num_words | Float* | Number of words in the query | 2.0 to 7.0 | 0% |
is_successful | Object | Whether query returned results | True, False, None | ~15% |
confidence_score | String* | Speech recognition confidence (0.5-1.0) | "0.87", "0.61", "1.0" | 0% |
device_os_version | String | Operating system version | iOS 14, iOS 15, Android 10, Android 11, None | ~20% |
| Intent | Description | Sample Queries |
|---|---|---|
| Information | Knowledge/fact-seeking queries | "How tall is the Eiffel Tower?", "What's the weather like today?" |
| Navigation | Location/direction requests | "Directions to nearest gas station", "Find nearest coffee shop" |
| Command | Device/app control instructions | "Set an alarm for 7 AM", "Turn off the lights", "Call Mom" |
| Entertainment | Media/content requests | "Play latest movie trailers", "Show me comedy shows" |
| Shopping | Purchase/commerce related | "Order me a pizza", "Buy new headphones", "Track my Amazon order" |
| Device Type | Usage Context |
|---|---|
| Smartphone | Mobile, on-the-go queries |
| Smart Speaker | Home-based voice commands |
| Smartwatch | Quick, hands-free interactions |
| Tablet | Casual browsing and queries |
| Car Assistant | In-vehicle voice commands |
| Language | Primary Locations | Use Case |
|---|---|---|
| English | New York, Los Angeles, London | Global communication |
| Spanish | Los Angeles, New York | Hispanic markets |
| Mandarin | Shanghai, global cities | Chinese user base |
| Hindi | Delhi, global cities | Indian diaspora |
| French | Paris, global cities | European markets |
num_words stored as float instead of intconfidence_score stored as string instead of floatis_successful and device_os_version
Facebook
TwitterProject Status: Proof-of-Concept (POC) - Capstone Project
This project demonstrates a proof-of-concept system for detecting financial document anomalies within core SAP FI/CO data, specifically leveraging the New General Ledger table (FAGLFLEXA) and document headers (BKPF). It addresses the challenge that standard SAP reporting and rule-based checks often struggle to identify subtle, complex, or novel irregularities in high-volume financial postings.
The solution employs a Hybrid Anomaly Detection strategy, combining unsupervised Machine Learning models with expert-defined SAP business rules. Findings are prioritized using a multi-faceted scoring system and presented via an interactive dashboard built with Streamlit for efficient investigation.
This project was developed as a capstone, showcasing the application of AI/ML techniques to enhance financial controls within an SAP context, bridging deep SAP domain knowledge with modern data science practices.
Author: Anitha R (https://www.linkedin.com/in/anithaswamy)
Dataset Origin: Kaggle SAP Dataset by Sunitha Siva License:Other (specified in description)-No description available.
Financial integrity is critical. Undetected anomalies in SAP FI/CO postings can lead to: * Inaccurate financial reporting * Significant reconciliation efforts * Potential audit failures or compliance issues * Masking of operational errors or fraud
Standard SAP tools may not catch all types of anomalies, especially complex or novel patterns. This project explores how AI/ML can augment traditional methods to provide more robust and efficient financial monitoring.
FAGLFLEXA for reliability.FE_...) to quantify potential deviations from normalcy based on EDA and SAP knowledge.Model_Anomaly_Count) and HRF counts (HRF_Count) into a Priority_Tier for focusing investigation efforts.Review_Focus text description summarizing why an item was flagged.The project followed a structured approach:
BKPF and FAGLFLEXA data extracts. Discarded BSEG due to imbalances. Removed duplicates.sap_engineered_features.csv.(For detailed methodology, please refer to the Comprehensive_Project_Report.pdf in the /docs folder - if you include it).
Libraries:
joblib==1.4.2
Facebook
TwitterTypically e-commerce datasets are proprietary and consequently hard to find among publicly available data. However, The UCI Machine Learning Repository has made this dataset containing actual transactions from 2010 and 2011. The dataset is maintained on their site, where it can be found by the title "Online Retail".
"This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers."
Per the UCI Machine Learning Repository, this data was made available by Dr Daqing Chen, Director: Public Analytics group. chend '@' lsbu.ac.uk, School of Engineering, London South Bank University, London SE1 0AA, UK.
Image from stocksnap.io.
Analyses for this dataset could include time series, clustering, classification and more.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed to simulate supply chain operations in large-scale engineering projects. It integrates realistic data from IoT sensors, digital twins, and blockchain-enabled monitoring systems over the years 2023 to 2024.
It aims to support research in predictive maintenance, resource optimization, secure data exchange, and supply chain transparency through advanced analytics and machine learning.
⭐ Key Features Time-bound IoT Sensor Data: Includes real-time-like sensor outputs such as temperature and vibration across multiple locations and assets.
Digital Twin Sync Fields: Tracks Condition_Score and Last_Maintenance to simulate digital twin feedback loops.
Operational KPIs: Features supply chain metrics like Resource_Utilization, Delivery_Efficiency, and Downtime_Hours.
Blockchain Contextual Fit: Designed to be compatible with blockchain audit trails and smart contract triggers (e.g., anomaly response, automated logistics payments).
Labeled Targets: SupplyChain_Efficiency_Label classifies overall efficiency into 3 tiers (0: Low, 1: Medium, 2: High) based on predefined KPI thresholds.
Location-aware Simulation: Assets and operations are tagged by realistic geographic locations.
Supply Chain Economics: Captures Inventory_Level and Logistics_Cost for resource allocation analysis.
Year-specific Scope: Covers the period from 2023 to 2024, aligning with recent and ongoing digital transformation trends.