Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides comprehensive air quality and climate measurements spanning from ice core reconstructions to modern direct measurements, focusing on urban environments worldwide.
Beijing, Berlin, Chicago, Dallas, Delhi, Houston, Lagos, London, Los Angeles, Mexico City, Mumbai, New York, Paris, Philadelphia, Phoenix, San Antonio, San Diego, San Jose, Sรฃo Paulo, Tokyo
โโโ co2_emissions.csv # Direct CO2 measurements from Mauna Loa Lab
โโโ air_quality_global.csv # PM2.5 and NO2 data from 20 cities worldwide
โโโ urban_climate.csv # Climate variables for the same cities
โโโ ice_core_co2.csv # Historical CO2 from ice cores
โโโ metadata.json # Complete metadata of dataset
Urban Air Quality Dataset v1.0 (2025).
Data compiled for Kaggle community use. Questions and feedback welcome.
Facebook
TwitterThe "Asthma Disease Prediction"๐ฎโ๐จ dataset is a comprehensive collection of anonymized health records and patient data, meticulously curated for predictive modeling and research purposes. It includes vital patient information, environmental factors, and medical history, enabling the development of advanced machine learning models to forecast asthma onset, severity, and treatment outcomes. This dataset serves as a valuable resource for improving early diagnosis and management of asthma, ultimately enhancing the quality of care for affected individuals๐คฉ.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This synthetic dataset simulates health records of individuals with varying levels of asthma severity. It is designed to support predictive modeling, classification, and exploratory analysis in the healthcare domain.
The dataset contains patient-level data such as demographics, lifestyle factors, environmental exposures, and medical indicators that are known to influence asthma risk and severity.
Use cases include: - Asthma severity prediction - Health risk scoring - Impact analysis of factors like pollution, BMI, or smoking - Educational machine learning tasks
Since the data is fully synthetic, it is safe for public use and contains no personal or sensitive information.
| Column Name | Description |
|---|---|
| Age | Age of the individual in years |
| Gender | Gender of the individual (0 = Female, 1 = Male) |
| BMI | Body Mass Index - a measure of body fat based on height and weight |
| Smoking_Status | Whether the individual is a current smoker (0 = No, 1 = Yes) |
| Exposure_PM25 | Exposure to PM2.5 air pollution level (micrograms per cubic meter) |
| Physical_Activity | Frequency of physical activity per week |
| Family_History | Family history of asthma (0 = No, 1 = Yes) |
| Medication_Use | Whether the person uses asthma medication (0 = No, 1 = Yes) |
| Allergy_Score | Composite allergy score based on known allergens (0โ10 scale) |
| Asthma_Attacks | Number of asthma attacks in the past year |
| Hospital_Visits | Number of hospital visits related to respiratory issues |
| Comorbidities | Number of co-existing chronic conditions |
| Lung_Function_FEV1 | Forced Expiratory Volume in 1 second (percent of predicted value) |
| Quality_of_Life | Self-reported health-related quality of life (1โ5 scale) |
| Asthma_Severity | Asthma severity level (0 = Mild, 1 = Moderate, 2 = Severe) |
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a comprehensive analysis of onion seed quality based on various physical, genetic, and environmental attributes. It is designed to facilitate research in seed viability prediction, germination success, and agricultural optimization.
Dataset Overview Total Attributes: 12 Key Features: Physical properties (weight, size, color), environmental conditions (soil pH, temperature, humidity), genetic markers, and germination rate. Target Variable: Seed_Quality (Categorical or Numerical, indicating the overall quality of the seed). Features Explanation Seed_ID: Unique identifier for each seed sample. Seed_Weight (mg): Weight of the seed in milligrams. Seed_Size (mm): Diameter of the seed in millimeters. Seed_Color: Categorical attribute indicating the seed's color. Moisture_Content (%): Percentage of moisture present in the seed. Genetic_Marker_1 & Genetic_Marker_2: Genetic indicators related to seed quality. Soil_pH: Acidity or alkalinity of the soil where the seed was tested. Temperature (ยฐC): Environmental temperature at the time of testing. Humidity (%): Relative humidity in the surrounding environment. Germination_Rate (%): Percentage of seeds successfully germinating under given conditions. Seed_Quality: Indicator of seed quality based on germination performance and overall characteristics. Potential Applications Seed Quality Prediction: Developing machine learning models to classify high-quality seeds. Agricultural Optimization: Identifying ideal environmental conditions for improved germination. Genetic Analysis: Studying the impact of genetic markers on seed viability.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
About Dataset ๐ Overview This dataset has been carefully synthesized to support research in lung cancer survival prediction, enabling the development of models that estimate:
Whether a patient is likely to survive at least one year post-diagnosis (Binary Classification). The probability of survival based on clinical and lifestyle factors (Regression Analysis). The dataset is designed for machine learning and deep learning applications in medical AI, oncology research, and predictive healthcare.
๐ Dataset Generation Process The dataset was generated using a combination of real-world epidemiological insights, medical literature, and statistical modeling. The feature distributions and relationships have been carefully modeled to reflect real-world clinical scenarios, ensuring biomedical validity.
๐ Medical References & Sources The dataset structure is based on well-established lung cancer risk factors and survival indicators documented in leading medical research and clinical guidelines:
World Health Organization (WHO) Reports on lung cancer epidemiology. National Cancer Institute (NCI) & American Cancer Society (ACS) guidelines on lung cancer risk factors and treatment outcomes. The IASLC Lung Cancer Staging Project (8th Edition): Standard reference for lung cancer staging. Harrisonโs Principles of Internal Medicine (20th Edition): Provides an in-depth review of lung cancer diagnosis and treatment. Lung Cancer: Principles and Practice (2022, Oxford University Press): Clinical insights into lung cancer detection, treatment, and survival factors. ๐ฌ Features of the Dataset Each record in the dataset represents an individualโs clinical condition, lifestyle risk factors, and survival outcome. The dataset includes the following features:
1๏ธโฃ Patient Demographics Age โ A key risk factor for lung cancer progression and survival. Gender โ Male and female lung cancer survival rates can differ. Residence โ Urban vs. Rural (impact of environmental factors). 2๏ธโฃ Risk Factors & Lifestyle Indicators These factors have been linked to lung cancer risk in epidemiological studies:
Smoking Status โ (Current Smoker, Former Smoker, Never Smoked). Air Pollution Exposure โ (Low, Moderate, High). Biomass Fuel Use โ (Yes/No) โ Associated with household air pollution. Factory Exposure โ (Yes/No) โ Industrial exposure increases lung cancer risk. Family History โ (Yes/No) โ Genetic predisposition to lung cancer. Diet Habit โ (Vegetarian, Non-Vegetarian, Mixed) โ Nutritional impact on cancer progression. 3๏ธโฃ Symptoms (Primary Predictors) These are key clinical indicators associated with lung cancer detection and severity:
Hemoptysis (Coughing Blood) Chest Pain Fatigue & Weakness Chronic Cough Unexplained Weight Loss 4๏ธโฃ Tumor Characteristics & Clinical Features Tumor Size (mm) โ The size of the detected tumor. Histology Type โ (Adenocarcinoma, Squamous Cell Carcinoma, Small Cell Carcinoma). Cancer Stage โ (Stage I to Stage IV). 5๏ธโฃ Treatment & Healthcare Facility Treatment Received โ (Surgery, Chemotherapy, Radiation, Targeted Therapy). Hospital Type โ (Private, Government, Medical College). 6๏ธโฃ Target Variables (Predicted Outcomes) Survival (Binary) โ 1 (Yes) if the patient survives at least 1 year, 0 (No) otherwise. Survival Probability (%) (Can be derived) โ Estimated probability of survival within one year. โก Why This Dataset is Valuable? โ Balanced Data Distribution Designed to ensure a representative distribution of lung cancer survival cases. Prevents model bias and improves generalization in predictive models. โ Medically-Inspired Feature Engineering Features are derived from real-world lung cancer risk factors, validated through medical literature. Incorporates both lifestyle and clinical indicators to enhance predictive accuracy.(no real person data is used,just have made an biomedical environment) โ Diverse Risk Factors Considered Smoking, air pollution, and genetic history as primary lung cancer contributors. Symptom severity and tumor histology influence survival rates. โ Scalability & ML Suitability Ideal for classification and regression tasks in machine learning. Can be used with deep learning (TensorFlow, PyTorch), ML models (XGBoost, Random Forest, SVM), and explainable AI techniques like SHAP and LIME. ๐ Dataset Usage & Applications This dataset is highly useful for multiple healthcare AI applications, including:
๐ฉบ Predictive Analytics โ Early detection of high-risk lung cancer patients. ๐ค Healthcare Chatbots โ AI-powered risk assessment tools.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides comprehensive air quality and climate measurements spanning from ice core reconstructions to modern direct measurements, focusing on urban environments worldwide.
Beijing, Berlin, Chicago, Dallas, Delhi, Houston, Lagos, London, Los Angeles, Mexico City, Mumbai, New York, Paris, Philadelphia, Phoenix, San Antonio, San Diego, San Jose, Sรฃo Paulo, Tokyo
โโโ co2_emissions.csv # Direct CO2 measurements from Mauna Loa Lab
โโโ air_quality_global.csv # PM2.5 and NO2 data from 20 cities worldwide
โโโ urban_climate.csv # Climate variables for the same cities
โโโ ice_core_co2.csv # Historical CO2 from ice cores
โโโ metadata.json # Complete metadata of dataset
Urban Air Quality Dataset v1.0 (2025).
Data compiled for Kaggle community use. Questions and feedback welcome.