5 datasets found
  1. Urban Air Quality & Climate Dataset (1958-2025)

    • kaggle.com
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cosmic Black (2025). Urban Air Quality & Climate Dataset (1958-2025) [Dataset]. https://www.kaggle.com/datasets/krishd123/urban-air-quality-and-climate-dataset-1958-2025
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2025
    Dataset provided by
    Kaggle
    Authors
    Cosmic Black
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Urban Air Quality and Climate Dataset (1958-2025)

    Overview

    This dataset provides comprehensive air quality and climate measurements spanning from ice core reconstructions to modern direct measurements, focusing on urban environments worldwide.

    Dataset Components

    1. CO2 Measurements (co2_emissions.csv)

    2. Air Quality Data (air_quality_global.csv)

    3. Urban Climate Data (urban_climate.csv)

    4. Ice Core CO2 Data (ice_core_co2.csv)

    Data Quality

    • All measurements include quality flags and uncertainty estimates
    • Missing data is explicitly coded
    • Seasonal and long-term trends reflect real-world patterns
    • Data sources are clearly documented for each measurement

    Usage Recommendations

    • Ideal for air quality trend analysis and forecasting
    • Suitable for climate-pollution interaction studies
    • Perfect for machine learning model training and validation
    • Comprehensive enough for urban planning and policy research
    • Includes both modern measurements and historical context

    Cities Included

    Beijing, Berlin, Chicago, Dallas, Delhi, Houston, Lagos, London, Los Angeles, Mexico City, Mumbai, New York, Paris, Philadelphia, Phoenix, San Antonio, San Diego, San Jose, Sรฃo Paulo, Tokyo

    File Structure

    โ”œโ”€โ”€ co2_emissions.csv      # Direct CO2 measurements from Mauna Loa Lab
    โ”œโ”€โ”€ air_quality_global.csv    # PM2.5 and NO2 data from 20 cities worldwide
    โ”œโ”€โ”€ urban_climate.csv      # Climate variables for the same cities
    โ”œโ”€โ”€ ice_core_co2.csv       # Historical CO2 from ice cores
    โ”œโ”€โ”€ metadata.json          # Complete metadata of dataset
    

    Citation

    Urban Air Quality Dataset v1.0 (2025).

    Contact

    Data compiled for Kaggle community use. Questions and feedback welcome.

  2. Asthma Disease Prediction

    • kaggle.com
    Updated Sep 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepayan Thakur (2023). Asthma Disease Prediction [Dataset]. https://www.kaggle.com/datasets/deepayanthakur/asthma-disease-prediction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 10, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Deepayan Thakur
    Description

    The "Asthma Disease Prediction"๐Ÿ˜ฎโ€๐Ÿ’จ dataset is a comprehensive collection of anonymized health records and patient data, meticulously curated for predictive modeling and research purposes. It includes vital patient information, environmental factors, and medical history, enabling the development of advanced machine learning models to forecast asthma onset, severity, and treatment outcomes. This dataset serves as a valuable resource for improving early diagnosis and management of asthma, ultimately enhancing the quality of care for affected individuals๐Ÿคฉ.

  3. ๐Ÿซ Asthma Risk & Severity Dataset

    • kaggle.com
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumedh1507 (2025). ๐Ÿซ Asthma Risk & Severity Dataset [Dataset]. https://www.kaggle.com/datasets/sumedh1507/asthma-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 17, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sumedh1507
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This synthetic dataset simulates health records of individuals with varying levels of asthma severity. It is designed to support predictive modeling, classification, and exploratory analysis in the healthcare domain.

    The dataset contains patient-level data such as demographics, lifestyle factors, environmental exposures, and medical indicators that are known to influence asthma risk and severity.

    Use cases include: - Asthma severity prediction - Health risk scoring - Impact analysis of factors like pollution, BMI, or smoking - Educational machine learning tasks

    Since the data is fully synthetic, it is safe for public use and contains no personal or sensitive information.

    Column NameDescription
    AgeAge of the individual in years
    GenderGender of the individual (0 = Female, 1 = Male)
    BMIBody Mass Index - a measure of body fat based on height and weight
    Smoking_StatusWhether the individual is a current smoker (0 = No, 1 = Yes)
    Exposure_PM25Exposure to PM2.5 air pollution level (micrograms per cubic meter)
    Physical_ActivityFrequency of physical activity per week
    Family_HistoryFamily history of asthma (0 = No, 1 = Yes)
    Medication_UseWhether the person uses asthma medication (0 = No, 1 = Yes)
    Allergy_ScoreComposite allergy score based on known allergens (0โ€“10 scale)
    Asthma_AttacksNumber of asthma attacks in the past year
    Hospital_VisitsNumber of hospital visits related to respiratory issues
    ComorbiditiesNumber of co-existing chronic conditions
    Lung_Function_FEV1Forced Expiratory Volume in 1 second (percent of predicted value)
    Quality_of_LifeSelf-reported health-related quality of life (1โ€“5 scale)
    Asthma_SeverityAsthma severity level (0 = Mild, 1 = Moderate, 2 = Severe)
  4. Onion Seed Quality Dataset

    • kaggle.com
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziya (2025). Onion Seed Quality Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/onion-seed-quality-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ziya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides a comprehensive analysis of onion seed quality based on various physical, genetic, and environmental attributes. It is designed to facilitate research in seed viability prediction, germination success, and agricultural optimization.

    Dataset Overview Total Attributes: 12 Key Features: Physical properties (weight, size, color), environmental conditions (soil pH, temperature, humidity), genetic markers, and germination rate. Target Variable: Seed_Quality (Categorical or Numerical, indicating the overall quality of the seed). Features Explanation Seed_ID: Unique identifier for each seed sample. Seed_Weight (mg): Weight of the seed in milligrams. Seed_Size (mm): Diameter of the seed in millimeters. Seed_Color: Categorical attribute indicating the seed's color. Moisture_Content (%): Percentage of moisture present in the seed. Genetic_Marker_1 & Genetic_Marker_2: Genetic indicators related to seed quality. Soil_pH: Acidity or alkalinity of the soil where the seed was tested. Temperature (ยฐC): Environmental temperature at the time of testing. Humidity (%): Relative humidity in the surrounding environment. Germination_Rate (%): Percentage of seeds successfully germinating under given conditions. Seed_Quality: Indicator of seed quality based on germination performance and overall characteristics. Potential Applications Seed Quality Prediction: Developing machine learning models to classify high-quality seeds. Agricultural Optimization: Identifying ideal environmental conditions for improved germination. Genetic Analysis: Studying the impact of genetic markers on seed viability.

  5. Lung cancer Bangladesh

    • kaggle.com
    Updated Mar 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NISHAT VASKER (2025). Lung cancer Bangladesh [Dataset]. http://doi.org/10.34740/kaggle/dsv/11035259
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    NISHAT VASKER
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Bangladesh
    Description

    About Dataset ๐Ÿ“Œ Overview This dataset has been carefully synthesized to support research in lung cancer survival prediction, enabling the development of models that estimate:

    Whether a patient is likely to survive at least one year post-diagnosis (Binary Classification). The probability of survival based on clinical and lifestyle factors (Regression Analysis). The dataset is designed for machine learning and deep learning applications in medical AI, oncology research, and predictive healthcare.

    ๐Ÿ“œ Dataset Generation Process The dataset was generated using a combination of real-world epidemiological insights, medical literature, and statistical modeling. The feature distributions and relationships have been carefully modeled to reflect real-world clinical scenarios, ensuring biomedical validity.

    ๐Ÿ“– Medical References & Sources The dataset structure is based on well-established lung cancer risk factors and survival indicators documented in leading medical research and clinical guidelines:

    World Health Organization (WHO) Reports on lung cancer epidemiology. National Cancer Institute (NCI) & American Cancer Society (ACS) guidelines on lung cancer risk factors and treatment outcomes. The IASLC Lung Cancer Staging Project (8th Edition): Standard reference for lung cancer staging. Harrisonโ€™s Principles of Internal Medicine (20th Edition): Provides an in-depth review of lung cancer diagnosis and treatment. Lung Cancer: Principles and Practice (2022, Oxford University Press): Clinical insights into lung cancer detection, treatment, and survival factors. ๐Ÿ”ฌ Features of the Dataset Each record in the dataset represents an individualโ€™s clinical condition, lifestyle risk factors, and survival outcome. The dataset includes the following features:

    1๏ธโƒฃ Patient Demographics Age โ†’ A key risk factor for lung cancer progression and survival. Gender โ†’ Male and female lung cancer survival rates can differ. Residence โ†’ Urban vs. Rural (impact of environmental factors). 2๏ธโƒฃ Risk Factors & Lifestyle Indicators These factors have been linked to lung cancer risk in epidemiological studies:

    Smoking Status โ†’ (Current Smoker, Former Smoker, Never Smoked). Air Pollution Exposure โ†’ (Low, Moderate, High). Biomass Fuel Use โ†’ (Yes/No) โ€“ Associated with household air pollution. Factory Exposure โ†’ (Yes/No) โ€“ Industrial exposure increases lung cancer risk. Family History โ†’ (Yes/No) โ€“ Genetic predisposition to lung cancer. Diet Habit โ†’ (Vegetarian, Non-Vegetarian, Mixed) โ€“ Nutritional impact on cancer progression. 3๏ธโƒฃ Symptoms (Primary Predictors) These are key clinical indicators associated with lung cancer detection and severity:

    Hemoptysis (Coughing Blood) Chest Pain Fatigue & Weakness Chronic Cough Unexplained Weight Loss 4๏ธโƒฃ Tumor Characteristics & Clinical Features Tumor Size (mm) โ†’ The size of the detected tumor. Histology Type โ†’ (Adenocarcinoma, Squamous Cell Carcinoma, Small Cell Carcinoma). Cancer Stage โ†’ (Stage I to Stage IV). 5๏ธโƒฃ Treatment & Healthcare Facility Treatment Received โ†’ (Surgery, Chemotherapy, Radiation, Targeted Therapy). Hospital Type โ†’ (Private, Government, Medical College). 6๏ธโƒฃ Target Variables (Predicted Outcomes) Survival (Binary) โ†’ 1 (Yes) if the patient survives at least 1 year, 0 (No) otherwise. Survival Probability (%) (Can be derived) โ†’ Estimated probability of survival within one year. โšก Why This Dataset is Valuable? โœ… Balanced Data Distribution Designed to ensure a representative distribution of lung cancer survival cases. Prevents model bias and improves generalization in predictive models. โœ… Medically-Inspired Feature Engineering Features are derived from real-world lung cancer risk factors, validated through medical literature. Incorporates both lifestyle and clinical indicators to enhance predictive accuracy.(no real person data is used,just have made an biomedical environment) โœ… Diverse Risk Factors Considered Smoking, air pollution, and genetic history as primary lung cancer contributors. Symptom severity and tumor histology influence survival rates. โœ… Scalability & ML Suitability Ideal for classification and regression tasks in machine learning. Can be used with deep learning (TensorFlow, PyTorch), ML models (XGBoost, Random Forest, SVM), and explainable AI techniques like SHAP and LIME. ๐Ÿ“‚ Dataset Usage & Applications This dataset is highly useful for multiple healthcare AI applications, including:

    ๐Ÿฉบ Predictive Analytics โ†’ Early detection of high-risk lung cancer patients. ๐Ÿค– Healthcare Chatbots โ†’ AI-powered risk assessment tools.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Cosmic Black (2025). Urban Air Quality & Climate Dataset (1958-2025) [Dataset]. https://www.kaggle.com/datasets/krishd123/urban-air-quality-and-climate-dataset-1958-2025
Organization logo

Urban Air Quality & Climate Dataset (1958-2025)

67 years of CO2, PM2.5 & NO2 measurements + 2000-year ice core reconstruction

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2025
Dataset provided by
Kaggle
Authors
Cosmic Black
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Urban Air Quality and Climate Dataset (1958-2025)

Overview

This dataset provides comprehensive air quality and climate measurements spanning from ice core reconstructions to modern direct measurements, focusing on urban environments worldwide.

Dataset Components

1. CO2 Measurements (co2_emissions.csv)

2. Air Quality Data (air_quality_global.csv)

3. Urban Climate Data (urban_climate.csv)

4. Ice Core CO2 Data (ice_core_co2.csv)

Data Quality

  • All measurements include quality flags and uncertainty estimates
  • Missing data is explicitly coded
  • Seasonal and long-term trends reflect real-world patterns
  • Data sources are clearly documented for each measurement

Usage Recommendations

  • Ideal for air quality trend analysis and forecasting
  • Suitable for climate-pollution interaction studies
  • Perfect for machine learning model training and validation
  • Comprehensive enough for urban planning and policy research
  • Includes both modern measurements and historical context

Cities Included

Beijing, Berlin, Chicago, Dallas, Delhi, Houston, Lagos, London, Los Angeles, Mexico City, Mumbai, New York, Paris, Philadelphia, Phoenix, San Antonio, San Diego, San Jose, Sรฃo Paulo, Tokyo

File Structure

โ”œโ”€โ”€ co2_emissions.csv      # Direct CO2 measurements from Mauna Loa Lab
โ”œโ”€โ”€ air_quality_global.csv    # PM2.5 and NO2 data from 20 cities worldwide
โ”œโ”€โ”€ urban_climate.csv      # Climate variables for the same cities
โ”œโ”€โ”€ ice_core_co2.csv       # Historical CO2 from ice cores
โ”œโ”€โ”€ metadata.json          # Complete metadata of dataset

Citation

Urban Air Quality Dataset v1.0 (2025).

Contact

Data compiled for Kaggle community use. Questions and feedback welcome.

Search
Clear search
Close search
Google apps
Main menu