Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Daniil Krizhanovskyi
Released under MIT
Facebook
Twitterhttps://www.licenses.ai/ai-licenseshttps://www.licenses.ai/ai-licenses
This dataset was created by Amy Okey
Released under RAIL (specified in description)
Facebook
TwitterThis dataset was created by Under The Hood
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Shubham Raj
Released under MIT
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Saibhargav Ch
Released under Apache 2.0
Facebook
TwitterThis dataset was created by Jurgen
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Mohsin Dahri
Released under CC0: Public Domain
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Shubham Raj
Released under MIT
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
MLOps project for Arabic mental health text classification
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This purpose of this (imaginary) study is to detect faulty satellite in order to prevent communication interruption. Understand how confirmation bias can lead to misleading results by utilizing a synthetic data set with a developed story line.
You are leading a team to conduct a study of the that will allow the space agency to predict the health status of satellites based on telemetry data to enable proactive maintenance and ensure optimal performance in space missions.
Accurately predicting satellite health to minimize the risk of mission failures and optimize satellite usage.
Identifying the most critical factors that affect satellite health to focus on improving those aspects during the satellite design and maintenance process.
Reducing the rate of false positives and false negatives in predictions to avoid unnecessary maintenance efforts and ensure that actual issues are addressed promptly.
False Positive (predicting a malfunction when the component is healthy): Unnecessary maintenance check: $5,000 Unwarranted component replacement: $50,000
False Negative (predicting a component is healthy when it is malfunctioning): Data loss or degradation: $100,000 Partial mission failure: $500,000 Total mission failure or satellite loss: $300,000,000
Thoroughly exploring the data to understand the relationships between various telemetry variables and satellite health.
Ensuring the model is accurate and reliable by selecting appropriate algorithms, performing feature engineering, and validating the model's performance using relevant metrics.
Identifying and addressing any data quality issues, such as missing values and incorrect data.
Investigating the importance of each variable in the prediction task and communicating these insights to stakeholders for better decision-making
DATA DICTIONARY:
Range: 0 to 3650 Description: Time since the satellite was launched.
Range: 300 to 2000 Description: Altitude of the satellite's orbit.
Range: 20 to 30 Description: Satellite's battery voltage.
Range: -50 to 50 Description: Temperature of the satellite's solar panels.
Range: 0 to 5 Description: Error in the satellite's attitude control system.
Range: 10 to 100 Description: Rate of data transmission from the satellite to the ground station.
Range: 0 (not working) or 1 (working) Description: Binary flag indicating if the thermal control system is working or not.
Range: 0 (unhealthy) or 1 (healthy) Description: Target variable - binary flag indicating if the satellite is healthy or unhealthy
Facebook
TwitterThis dataset is a ground of whitepapers shared by Google in its AI workshop. It is a knowledgebase on various GenAI topics including prompt engineering, vector databases, embeddings, RAG, Agents, Agent companions, fine tuning and use of MLops in GenAI planning.
Facebook
TwitterThis dataset contains 1,200 high-quality, synthetic resume records carefully generated to simulate real-world professional profiles. It is ideal for HR analytics, recruitment model training, resume screening systems, and academic research.
The dataset is divided into:
Each entry mimics realistic resume attributes across a wide range of rolesβfrom fresh graduates to experienced professionals.
Each record includes the following 14 structured fields:
Name, Age, GenderEducation_Level, Field_of_Study, Degrees, Institute_Name, Graduation_YearExperience_Years, Current_Job_Title, Previous_Job_TitlesSkills, Certifications, Target_Job_Description
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This Data is Extracted from coinGecko API with Price and Volume feature, data requires lot of feature engineering which is good practice for someone who wants to build MLOPs/Data Science project. This is static data set, if you want to make real pipeline this my repo where I have build real ETL pipeline and data stored in MongoDB Repo: https://github.com/Excergic/SOL-Price-Prediction Thank you.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This data set contains combined on-court performance data for NBA players in the 2016-2017 season, alongside salary, Twitter engagement, and Wikipedia traffic data.
Further information can be found in a series of articles for IBM Developerworks: "Explore valuation and attendance using data science and machine learning" and "Exploring the individual NBA players".
A talk about this dataset has slides from March, 2018, Strata:
Further reading on this dataset is in the book Pragmatic AI, in Chapter 6 or full book, Pragmatic AI: An introduction to Cloud-based Machine Learning and watch lesson 9 in Essential Machine Learning and AI with Python and Jupyter Notebook
You can watch a breakdown of using cluster analysis on the Pragmatic AI YouTube channel
Learn to deploy a Kaggle project into a production Machine Learning sklearn + flask + container by reading Python for Devops: Learn Ruthlessly Effective Automation, Chapter 14: MLOps and Machine learning engineering
Use social media to predict a winning season with this notebook: https://github.com/noahgift/core-stats-datascience/blob/master/Lesson2_7_Trends_Supervized_Learning.ipynb
Learn to use the cloud for data analysis.
Data sources include ESPN, Basketball-Reference, Twitter, Five-ThirtyEight, and Wikipedia. The source code for this dataset (in Python and R) can be found on GitHub. Links to more writing can be found at noahgift.com.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Medical-Grade Explainable AI Project Assets
This dataset contains comprehensive assets for a production-ready Explainable AI (XAI) heart disease prediction system achieving 94.1% accuracy with full model transparency.
π CONTEXT: Healthcare AI faces a critical "black box" problem where models make predictions without explanations. This project demonstrates how to build trustworthy medical AI using SHAP and LIME for real-time explainability.
π― PROJECT GOAL: Create a clinically deployable AI system that not only predicts heart disease with high accuracy but also provides interpretable explanations for each prediction, enabling doctor-AI collaboration.
π KEY FEATURES: - 94.1% prediction accuracy (XGBoost + Optuna) - Real-time SHAP & LIME explanations - FastAPI backend with medical validation - Gradio clinical dashboard - Full MLOps pipeline (MLflow tracking) - 4-Layer enterprise architecture
π ASSETS INCLUDED:
- heart_clean.csv - Clinical dataset ready for analysis
- SHAP summary plots for global explainability
- Performance metrics and visualizations
- Architecture diagrams
- Model evaluation results
π COMPANION RESOURCES: - Live Demo: https://huggingface.co/spaces/Ariyan-Pro/HeartDisease-Predictor - Notebook: https://www.kaggle.com/code/ariyannadeem/heart-disease-prediction-with-explainable-ai - Source Code: https://github.com/Ariyan-Pro/ExplainableAI-HeartDisease
Perfect for learning medical AI implementation, explainable AI techniques, and production deployment.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Daniil Krizhanovskyi
Released under MIT