16 datasets found

mlops-dataset
kaggle.com
zip
Updated Nov 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniil Krizhanovskyi (2025). mlops-dataset [Dataset]. https://www.kaggle.com/datasets/daniilkrizhanovskyi/mlops-dataset
Explore at:
zip(4024 bytes)Available download formats
Dataset updated
Nov 12, 2025
Authors
Daniil Krizhanovskyi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Daniil Krizhanovskyi

Released under MIT

Contents
MLops Pipeline
kaggle.com
zip
Updated Nov 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy Okey (2024). MLops Pipeline [Dataset]. https://www.kaggle.com/datasets/amyokey/mlops-pipeline
Explore at:
zip(8831585 bytes)Available download formats
Dataset updated
Nov 20, 2024
Authors
Amy Okey
License
https://www.licenses.ai/ai-licenseshttps://www.licenses.ai/ai-licenses
Description
Dataset

This dataset was created by Amy Okey

Released under RAIL (specified in description)

Contents
mlops-phase1
kaggle.com
zip
Updated Jun 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Under The Hood (2023). mlops-phase1 [Dataset]. https://www.kaggle.com/datasets/hinetabi/mlops-phase1
Explore at:
zip(10589173 bytes)Available download formats
Dataset updated
Jun 30, 2023
Authors
Under The Hood
Description
Dataset

This dataset was created by Under The Hood

Contents
boston MLOps Assignment
kaggle.com
zip
Updated Dec 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shubham Raj (2024). boston MLOps Assignment [Dataset]. https://www.kaggle.com/datasets/shubham14p3/boston-mlops-assignment/data
Explore at:
zip(16746 bytes)Available download formats
Dataset updated
Dec 1, 2024
Authors
Shubham Raj
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Shubham Raj

Released under MIT

Contents
BATCH 8 MLOPS PROJECT
kaggle.com
zip
Updated Oct 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saibhargav Ch (2024). BATCH 8 MLOPS PROJECT [Dataset]. https://www.kaggle.com/datasets/saibhargavch/batch-8-mlops-project/code
Explore at:
zip(263060 bytes)Available download formats
Dataset updated
Oct 7, 2024
Authors
Saibhargav Ch
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Saibhargav Ch

Released under Apache 2.0

Contents
MLOps-Task
kaggle.com
zip
Updated May 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jhagdu (2020). MLOps-Task [Dataset]. https://www.kaggle.com/jhagdu/mlopstask
Explore at:
zip(32680540 bytes)Available download formats
Dataset updated
May 18, 2020
Authors
Jhagdu
Description
Dataset

This dataset was created by Jhagdu

Contents
extNPH-MLOps
kaggle.com
zip
Updated Jul 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jurgen (2023). extNPH-MLOps [Dataset]. https://www.kaggle.com/datasets/jurgendn/mlops-aihub-extnph
Explore at:
zip(432508609 bytes)Available download formats
Dataset updated
Jul 3, 2023
Authors
Jurgen
Description
Dataset

This dataset was created by Jurgen

Contents
the mlops_whitepaper.pdf
kaggle.com
zip
Updated Apr 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohsin Dahri (2025). the mlops_whitepaper.pdf [Dataset]. https://www.kaggle.com/datasets/mohsindahri/the-mlops-whitepaper-pdf
Explore at:
zip(11900334 bytes)Available download formats
Dataset updated
Apr 20, 2025
Authors
Mohsin Dahri
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Mohsin Dahri

Released under CC0: Public Domain

Contents
indian_liver_patientmissing MLOps Assignment
kaggle.com
zip
Updated Dec 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shubham Raj (2024). indian_liver_patientmissing MLOps Assignment [Dataset]. https://www.kaggle.com/datasets/shubham14p3/indian-liver-patientmissing-mlops-assignment
Explore at:
zip(7915 bytes)Available download formats
Dataset updated
Dec 4, 2024
Authors
Shubham Raj
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Shubham Raj

Released under MIT

Contents
Arabic Mental Health MLOps
kaggle.com
zip
Updated Nov 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Taha Shiha (2025). Arabic Mental Health MLOps [Dataset]. https://www.kaggle.com/datasets/ahmedtahashiha/arabic-mental-health-mlops/discussion
Explore at:
zip(533396375 bytes)Available download formats
Dataset updated
Nov 22, 2025
Authors
Ahmed Taha Shiha
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
MLOps project for Arabic mental health text classification
Synthetic_Data_Satellite_Health
kaggle.com
zip
Updated Mar 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JeffDJeffD (2024). Synthetic_Data_Satellite_Health [Dataset]. https://www.kaggle.com/datasets/jeffdjeffd/synthetic-data-satellite-health
Explore at:
zip(458476 bytes)Available download formats
Dataset updated
Mar 2, 2024
Authors
JeffDJeffD
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This purpose of this (imaginary) study is to detect faulty satellite in order to prevent communication interruption. Understand how confirmation bias can lead to misleading results by utilizing a synthetic data set with a developed story line.

Problem Statement:

You are leading a team to conduct a study of the that will allow the space agency to predict the health status of satellites based on telemetry data to enable proactive maintenance and ensure optimal performance in space missions.

Stakeholders' Concerns:

Accurately predicting satellite health to minimize the risk of mission failures and optimize satellite usage.

Identifying the most critical factors that affect satellite health to focus on improving those aspects during the satellite design and maintenance process.

Reducing the rate of false positives and false negatives in predictions to avoid unnecessary maintenance efforts and ensure that actual issues are addressed promptly.

Misclassification Costs (estimation):

False Positive (predicting a malfunction when the component is healthy): Unnecessary maintenance check: $5,000 Unwarranted component replacement: $50,000

False Negative (predicting a component is healthy when it is malfunctioning): Data loss or degradation: $100,000 Partial mission failure: $500,000 Total mission failure or satellite loss: $300,000,000

Team Focus:

Thoroughly exploring the data to understand the relationships between various telemetry variables and satellite health.

Ensuring the model is accurate and reliable by selecting appropriate algorithms, performing feature engineering, and validating the model's performance using relevant metrics.

Identifying and addressing any data quality issues, such as missing values and incorrect data.

Investigating the importance of each variable in the prediction task and communicating these insights to stakeholders for better decision-making

DATA DICTIONARY:

Data Dictionary

time_since_launch (days)

Range: 0 to 3650 Description: Time since the satellite was launched.

orbital_altitude (km)

Range: 300 to 2000 Description: Altitude of the satellite's orbit.

battery_voltage (V)

Range: 20 to 30 Description: Satellite's battery voltage.

solar_panel_temperature (°C)

Range: -50 to 50 Description: Temperature of the satellite's solar panels.

attitude_control_error (degrees)

Range: 0 to 5 Description: Error in the satellite's attitude control system.

data_transmission_rate (Mbps)

Range: 10 to 100 Description: Rate of data transmission from the satellite to the ground station.

thermal_control_status (0 or 1)

Range: 0 (not working) or 1 (working) Description: Binary flag indicating if the thermal control system is working or not.

satellite_health (0 or 1)

Range: 0 (unhealthy) or 1 (healthy) Description: Target variable - binary flag indicating if the satellite is healthy or unhealthy
Google AI Whitepaper knowledgebase
kaggle.com
zip
Updated Apr 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
malavp1703 (2025). Google AI Whitepaper knowledgebase [Dataset]. https://www.kaggle.com/datasets/malavp1703/google-ai-whitepaer-knowledgebase
Explore at:
zip(62415619 bytes)Available download formats
Dataset updated
Apr 20, 2025
Authors
malavp1703
Description
This dataset is a ground of whitepapers shared by Google in its AI workshop. It is a knowledgebase on various GenAI topics including prompt engineering, vector databases, embeddings, RAG, Agents, Agent companions, fine tuning and use of MLops in GenAI planning.
Resume and Job Description
kaggle.com
zip
Updated Jul 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sayyed Faizan95 (2025). Resume and Job Description [Dataset]. https://www.kaggle.com/datasets/sayyedfaizan95/resume-and-job-description
Explore at:
zip(63737 bytes)Available download formats
Dataset updated
Jul 19, 2025
Authors
Sayyed Faizan95
Description
📄 Synthetic Resume Dataset (1,200 Records)

🧠 Overview

This dataset contains 1,200 high-quality, synthetic resume records carefully generated to simulate real-world professional profiles. It is ideal for HR analytics, recruitment model training, resume screening systems, and academic research.

The dataset is divided into:

✅ 60% IT Jobs (720 records)

✅ 40% Non-IT Jobs (480 records)

Each entry mimics realistic resume attributes across a wide range of roles—from fresh graduates to experienced professionals.

📊 Features

Each record includes the following 14 structured fields:

Name, Age, Gender

Education_Level, Field_of_Study, Degrees, Institute_Name, Graduation_Year

Experience_Years, Current_Job_Title, Previous_Job_Titles

Skills, Certifications, Target_Job_Description

👨‍💻 IT Job Titles (sample):

Data Scientist, Cloud Engineer, Prompt Engineer, DevOps Engineer, Full-Stack Developer, MLOps Specialist, Quantum Computing Specialist, and more.

🧑‍⚕️ Non-IT Job Titles (sample):

Nurse, Dentist, Financial Analyst, Mental Health Practitioner, Business Development Manager, Customer Success Manager, and others.

🔍 Key Highlights

✅ 100% synthetic data – no privacy concerns

✅ Zero missing values

✅ Realistic skill and certification pairings based on job roles

✅ Professionally written target job descriptions

✅ Balanced age (21–45) and experience (0–10 years) distribution

✅ Includes both freshers (36.8%) and working professionals (63.2%)
Solana Price Prediction
kaggle.com
zip
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dhaivat N Jambudia (2025). Solana Price Prediction [Dataset]. https://www.kaggle.com/datasets/dhaivatnjambudia/solana-price-prediction
Explore at:
zip(56507 bytes)Available download formats
Dataset updated
Jul 31, 2025
Authors
Dhaivat N Jambudia
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This Data is Extracted from coinGecko API with Price and Volume feature, data requires lot of feature engineering which is good practice for someone who wants to build MLOPs/Data Science project. This is static data set, if you want to make real pipeline this my repo where I have build real ETL pipeline and data stored in MongoDB Repo: https://github.com/Excergic/SOL-Price-Prediction Thank you.
Social Power NBA
kaggle.com
zip
Updated Aug 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noah Gift (2017). Social Power NBA [Dataset]. https://www.kaggle.com/noahgift/social-power-nba
Explore at:
zip(1397766 bytes)Available download formats
Dataset updated
Aug 1, 2017
Authors
Noah Gift
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Context

This data set contains combined on-court performance data for NBA players in the 2016-2017 season, alongside salary, Twitter engagement, and Wikipedia traffic data.

Further information can be found in a series of articles for IBM Developerworks: "Explore valuation and attendance using data science and machine learning" and "Exploring the individual NBA players".

A talk about this dataset has slides from March, 2018, Strata:

https://www.slideshare.net/noahgift/social-power-andinfluenceinthenba-89807740?qid=3f9f835a-f3d7-4174-8a8c-c97f9c82e614&v=&b=&from_search=1

Further reading on this dataset is in the book Pragmatic AI, in Chapter 6 or full book, Pragmatic AI: An introduction to Cloud-based Machine Learning and watch lesson 9 in Essential Machine Learning and AI with Python and Jupyter Notebook

Followup Items

You can watch a breakdown of using cluster analysis on the Pragmatic AI YouTube channel

Learn to deploy a Kaggle project into a production Machine Learning sklearn + flask + container by reading Python for Devops: Learn Ruthlessly Effective Automation, Chapter 14: MLOps and Machine learning engineering

Use social media to predict a winning season with this notebook: https://github.com/noahgift/core-stats-datascience/blob/master/Lesson2_7_Trends_Supervized_Learning.ipynb

Learn to use the cloud for data analysis.

Acknowledgement

Data sources include ESPN, Basketball-Reference, Twitter, Five-ThirtyEight, and Wikipedia. The source code for this dataset (in Python and R) can be found on GitHub. Links to more writing can be found at noahgift.com.

Inspiration

Do NBA fans know more about who the best players are, or do owners?

What is the true worth of the social media presence of athletes in the NBA?
UCI Heart Disease - Explainable AI Project Assets
kaggle.com
zip
Updated Nov 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ariyan_Pro (2025). UCI Heart Disease - Explainable AI Project Assets [Dataset]. https://www.kaggle.com/datasets/ariyannadeem/uci-heart-disease-explainable-ai-project-assets
Explore at:
zip(1051043 bytes)Available download formats
Dataset updated
Nov 18, 2025
Authors
Ariyan_Pro
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Medical-Grade Explainable AI Project Assets

This dataset contains comprehensive assets for a production-ready Explainable AI (XAI) heart disease prediction system achieving 94.1% accuracy with full model transparency.

📊 CONTEXT: Healthcare AI faces a critical "black box" problem where models make predictions without explanations. This project demonstrates how to build trustworthy medical AI using SHAP and LIME for real-time explainability.

🎯 PROJECT GOAL: Create a clinically deployable AI system that not only predicts heart disease with high accuracy but also provides interpretable explanations for each prediction, enabling doctor-AI collaboration.

🚀 KEY FEATURES: - 94.1% prediction accuracy (XGBoost + Optuna) - Real-time SHAP & LIME explanations - FastAPI backend with medical validation - Gradio clinical dashboard - Full MLOps pipeline (MLflow tracking) - 4-Layer enterprise architecture

📁 ASSETS INCLUDED: - heart_clean.csv - Clinical dataset ready for analysis - SHAP summary plots for global explainability - Performance metrics and visualizations - Architecture diagrams - Model evaluation results

🔗 COMPANION RESOURCES: - Live Demo: https://huggingface.co/spaces/Ariyan-Pro/HeartDisease-Predictor - Notebook: https://www.kaggle.com/code/ariyannadeem/heart-disease-prediction-with-explainable-ai - Source Code: https://github.com/Ariyan-Pro/ExplainableAI-HeartDisease

Perfect for learning medical AI implementation, explainable AI techniques, and production deployment.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Daniil Krizhanovskyi (2025). mlops-dataset [Dataset]. https://www.kaggle.com/datasets/daniilkrizhanovskyi/mlops-dataset

mlops-dataset

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zip(4024 bytes)Available download formats

Dataset updated

Nov 12, 2025

Authors

Daniil Krizhanovskyi

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset

This dataset was created by Daniil Krizhanovskyi

Released under MIT

Clear search

Close search

Google apps

Main menu

mlops-dataset

Dataset

Contents

MLops Pipeline

Dataset

Contents

mlops-phase1

Dataset

Contents

boston MLOps Assignment

Dataset

Contents

BATCH 8 MLOPS PROJECT

Dataset

Contents

MLOps-Task

Dataset

Contents

extNPH-MLOps

Dataset

Contents

the mlops_whitepaper.pdf

Dataset

Contents

indian_liver_patientmissing MLOps Assignment

Dataset

Contents

Arabic Mental Health MLOps

Synthetic_Data_Satellite_Health

Problem Statement:

Stakeholders' Concerns:

Misclassification Costs (estimation):

Team Focus:

Data Dictionary

Google AI Whitepaper knowledgebase

Resume and Job Description

📄 Synthetic Resume Dataset (1,200 Records)

🧠 Overview

📊 Features

👨‍💻 IT Job Titles (sample):

🧑‍⚕️ Non-IT Job Titles (sample):

🔍 Key Highlights

Solana Price Prediction

Social Power NBA

Context

Followup Items

Acknowledgement

Inspiration

UCI Heart Disease - Explainable AI Project Assets

mlops-dataset

Dataset

Contents