96 datasets found
  1. Lung Cancer Mortality Datasets v2

    • kaggle.com
    zip
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MasterDataSan (2024). Lung Cancer Mortality Datasets v2 [Dataset]. https://www.kaggle.com/datasets/masterdatasan/lung-cancer-mortality-datasets-v2
    Explore at:
    zip(81127029 bytes)Available download formats
    Dataset updated
    Jun 1, 2024
    Authors
    MasterDataSan
    Description

    This dataset contains data about lung cancer Mortality. This database is a comprehensive collection of patient information, specifically focused on individuals diagnosed with cancer. It is designed to facilitate the analysis of various factors that may influence cancer prognosis and treatment outcomes. The database includes a range of demographic, medical, and treatment-related variables, capturing essential details about each patient's condition and history.

    Key components of the database include:

    Demographic Information: Basic details about the patients such as age, gender, and country of residence. This helps in understanding the distribution of cancer cases across different populations and regions.

    Medical History: Information about each patient’s medical background, including family history of cancer, smoking status, Body Mass Index (BMI), cholesterol levels, and the presence of other health conditions such as hypertension, asthma, cirrhosis, and other cancers. This section is crucial for identifying potential risk factors and comorbidities.

    Cancer Diagnosis: Detailed data about the cancer diagnosis itself, including the date of diagnosis and the stage of cancer at the time of diagnosis. This helps in tracking the progression and severity of the disease.

    Treatment Details: Information regarding the type of treatment each patient received, the end date of the treatment, and the outcome (whether the patient survived or not). This is essential for evaluating the effectiveness of different treatment approaches.

    The structure of the database allows for in-depth analysis and research, making it possible to identify patterns, correlations, and potential causal relationships between various factors and cancer outcomes. It is a valuable resource for medical researchers, epidemiologists, and healthcare providers aiming to improve cancer treatment and patient care.

    id: A unique identifier for each patient in the dataset. age: The age of the patient at the time of diagnosis. gender: The gender of the patient (e.g., male, female). country: The country or region where the patient resides. diagnosis_date: The date on which the patient was diagnosed with lung cancer. cancer_stage: The stage of lung cancer at the time of diagnosis (e.g., Stage I, Stage II, Stage III, Stage IV). family_history: Indicates whether there is a family history of cancer (e.g., yes, no). smoking_status: The smoking status of the patient (e.g., current smoker, former smoker, never smoked, passive smoker). bmi: The Body Mass Index of the patient at the time of diagnosis. cholesterol_level: The cholesterol level of the patient (value). hypertension: Indicates whether the patient has hypertension (high blood pressure) (e.g., yes, no). asthma: Indicates whether the patient has asthma (e.g., yes, no). cirrhosis: Indicates whether the patient has cirrhosis of the liver (e.g., yes, no). other_cancer: Indicates whether the patient has had any other type of cancer in addition to the primary diagnosis (e.g., yes, no). treatment_type: The type of treatment the patient received (e.g., surgery, chemotherapy, radiation, combined). end_treatment_date: The date on which the patient completed their cancer treatment or died. survived: Indicates whether the patient survived (e.g., yes, no).

    This dataset contains artificially generated data with as close a representation of reality as possible. This data is free to use without any licence required.

    Good luck Gakusei!

  2. Lung-Cancer-Risk-Dataset

    • kaggle.com
    Updated Aug 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikey-TraceGod (2025). Lung-Cancer-Risk-Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/12844025
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mikey-TraceGod
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Lung Cancer Risk Dataset

    Overview

    This dataset contains 50,000 patient profiles designed for lung cancer risk analysis and machine learning applications. The dataset is clean, preprocessed, and ready for immediate use in classification tasks, statistical analysis, and data visualization.

    • Rows: 50,000
    • Columns: 11
    • File: preprocessed_lung_cancer_dataset.csv
    • License: CC0: Public Domain

    Dataset Description

    The dataset includes patient profiles with features based on established lung cancer risk factors such as smoking history, environmental exposures, and chronic lung conditions. All data is synthetic and designed to reflect realistic risk factor distributions while maintaining patient privacy.

    Features

    ColumnTypeDescriptionValues/Range
    patient_idIntegerUnique patient identifier100000-149999
    ageIntegerPatient age in years18-100
    genderStringPatient gender'Male', 'Female'
    pack_yearsFloatSmoking exposure (years × packs per day)0-100
    radon_exposureStringResidential radon exposure level'Low', 'Medium', 'High'
    asbestos_exposureStringOccupational asbestos exposure history'Yes', 'No'
    secondhand_smoke_exposureStringPassive smoking exposure'Yes', 'No'
    copd_diagnosisStringChronic obstructive pulmonary disease diagnosis'Yes', 'No'
    alcohol_consumptionStringAlcohol consumption pattern'None', 'Moderate', 'Heavy'
    family_historyStringFamily history of lung cancer'Yes', 'No'
    lung_cancerStringTarget variable: Lung cancer diagnosis'Yes', 'No'

    Data Quality

    • Complete: No missing values or duplicates
    • Clean: All values within realistic ranges
    • Balanced Features: Realistic distribution of risk factors
    • Target Distribution: Approximately 25% positive cases, reflecting real-world lung cancer prevalence

    Use Cases

    • Binary classification modeling
    • Risk factor correlation analysis
    • Data visualization and exploratory analysis
    • Machine learning pipeline development
    • Statistical hypothesis testing
  3. Lung Cancer Dataset

    • kaggle.com
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman_Kumar094 (2025). Lung Cancer Dataset [Dataset]. https://www.kaggle.com/datasets/amankumar094/lung-cancer-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 6, 2025
    Dataset provided by
    Kaggle
    Authors
    Aman_Kumar094
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    ** Description**

    This dataset contains data about lung cancer Mortality and is a comprehensive collection of patient information, specifically focused on individuals diagnosed with cancer. This dataset contains comprehensive information on 800,000 individuals related to lung cancer diagnosis, treatment, and outcomes. With 16 well-structured columns. This large-scale dataset is designed to aid researchers, data scientists, and healthcare professionals in studying patterns, building predictive models, and enhancing early detection and treatment strategies.

    🌍 The Societal Impact of Lung Cancer

    Lung cancer is not just a disease — it's a global crisis that steals time, health, and hope from millions of people every year. As the #1 cause of cancer deaths worldwide, it takes more lives annually than breast, colon, and prostate cancer combined.

    But behind every statistic is a story:

    A parent who never saw their child graduate.

    A worker who had to leave their job too soon.

    A community that lost a leader, a friend, a neighbor.

    Why does this matter? Lung cancer often goes undetected until it's too late. It’s aggressive, silent, and devastating — especially in underserved areas where early detection is rare and treatment options are limited. It doesn’t just affect patients. It affects families, economies, and healthcare systems on a massive scale.

    This dataset represents more than numbers. It represents 800,000 real-world stories — people who can help us unlock patterns, train models, and advance life-saving research.

    By working with this data, you're not just analyzing a dataset — you're stepping into the fight against one of humanity’s deadliest diseases.

    Let’s turn insight into impact. (😊The above descriptions is generated with the help of AI, Just wanted to share this dataset That all. Thank you)

  4. h

    lung-cancer

    • huggingface.co
    Updated Jun 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nate Raw (2022). lung-cancer [Dataset]. https://huggingface.co/datasets/nateraw/lung-cancer
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 24, 2022
    Authors
    Nate Raw
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for Lung Cancer

      Dataset Summary
    

    The effectiveness of cancer prediction system helps the people to know their cancer risk with low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system .

      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    [More Information Needed]

      Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/nateraw/lung-cancer.
    
  5. c

    National Lung Screening Trial

    • cancerimagingarchive.net
    • stage.cancerimagingarchive.net
    dicom, docx, n/a +2
    Updated Sep 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2021). National Lung Screening Trial [Dataset]. http://doi.org/10.7937/TCIA.HMQ8-J677
    Explore at:
    docx, svs, dicom, n/a, sas, zip, and docAvailable download formats
    Dataset updated
    Sep 24, 2021
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Sep 24, 2021
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    https://www.cancerimagingarchive.net/wp-content/uploads/nctn-logo-300x108.png" alt="" width="300" height="108" />

    Demographic Summary of Available Imaging

    CharacteristicValue (N = 26254)
    Age (years)Mean ± SD: 61.4± 5
    Median (IQR): 60 (57-65)
    Range: 43-75
    SexMale: 15512 (59%)
    Female: 10742 (41%)
    Race

    White: 23969 (91.3%)
    Black: 1135 (4.3%)
    Asian: 547 (2.1%)
    American Indian/Alaska Native: 88 (0.3%)
    Native Hawaiian/Other Pacific Islander: 87 (0.3%)
    Unknown: 428 (1.6%)

    Ethnicity

    Not Available

    Background: The aggressive and heterogeneous nature of lung cancer has thwarted efforts to reduce mortality from this cancer through the use of screening. The advent of low-dose helical computed tomography (CT) altered the landscape of lung-cancer screening, with studies indicating that low-dose CT detects many tumors at early stages. The National Lung Screening Trial (NLST) was conducted to determine whether screening with low-dose CT could reduce mortality from lung cancer.

    Methods: From August 2002 through April 2004, we enrolled 53,454 persons at high risk for lung cancer at 33 U.S. medical centers. Participants were randomly assigned to undergo three annual screenings with either low-dose CT (26,722 participants) or single-view posteroanterior chest radiography (26,732). Data were collected on cases of lung cancer and deaths from lung cancer that occurred through December 31, 2009. This dataset includes the low-dose CT scans from 26,254 of these subjects, as well as digitized histopathology images from 451 subjects.

    Results: The rate of adherence to screening was more than 90%. The rate of positive screening tests was 24.2% with low-dose CT and 6.9% with radiography over all three rounds. A total of 96.4% of the positive screening results in the low-dose CT group and 94.5% in the radiography group were false positive results. The incidence of lung cancer was 645 cases per 100,000 person-years (1060 cancers) in the low-dose CT group, as compared with 572 cases per 100,000 person-years (941 cancers) in the radiography group (rate ratio, 1.13; 95% confidence interval [CI], 1.03 to 1.23). There were 247 deaths from lung cancer per 100,000 person-years in the low-dose CT group and 309 deaths per 100,000 person-years in the radiography group, representing a relative reduction in mortality from lung cancer with low-dose CT screening of 20.0% (95% CI, 6.8 to 26.7; P=0.004). The rate of death from any cause was reduced in the low-dose CT group, as compared with the radiography group, by 6.7% (95% CI, 1.2 to 13.6; P=0.02).

    Conclusions: Screening with the use of low-dose CT reduces mortality from lung cancer. (Funded by the National Cancer Institute; National Lung Screening Trial ClinicalTrials.gov number, NCT00047385).

    Data Availability: A summary of the National Lung Screening Trial and its available datasets are provided on the Cancer Data Access System (CDAS). CDAS is maintained by Information Management System (IMS), contracted by the National Cancer Institute (NCI) as keepers and statistical analyzers of the NLST trial data. The full clinical data set from NLST is available through CDAS. Users of TCIA can download without restriction a publicly distributable subset of that clinical data, along with the CT and Histopathology images collected during the trial. (These previously were restricted.)

  6. l

    Lung Cancer Mortality

    • data.lacounty.gov
    • geohub.lacity.org
    • +2more
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    County of Los Angeles (2023). Lung Cancer Mortality [Dataset]. https://data.lacounty.gov/maps/lacounty::lung-cancer-mortality
    Explore at:
    Dataset updated
    Dec 20, 2023
    Dataset authored and provided by
    County of Los Angeles
    Area covered
    Description

    Death rate has been age-adjusted by the 2000 U.S. standard population. Single-year data are only available for Los Angeles County overall, Service Planning Areas, Supervisorial Districts, City of Los Angeles overall, and City of Los Angeles Council Districts.Lung cancer is a leading cause of cancer-related death in the US. People who smoke have the greatest risk of lung cancer, though lung cancer can also occur in people who have never smoked. Most cases are due to long-term tobacco smoking or exposure to secondhand tobacco smoke. Cities and communities can take an active role in curbing tobacco use and reducing lung cancer by adopting policies to regulate tobacco retail; reducing exposure to secondhand smoke in outdoor public spaces, such as parks, restaurants, or in multi-unit housing; and improving access to tobacco cessation programs and other preventive services.For more information about the Community Health Profiles Data Initiative, please see the initiative homepage.

  7. Lung Cancer Risk & Prediction Dataset

    • kaggle.com
    zip
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankush Panday (2025). Lung Cancer Risk & Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/ankushpanday1/lung-cancer-risk-and-prediction-dataset
    Explore at:
    zip(16114231 bytes)Available download formats
    Dataset updated
    Feb 11, 2025
    Authors
    Ankush Panday
    License

    https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

    Description

    This dataset helps understand and predict lung cancer risks based on health, environment, and lifestyle factors. It includes details about smoking habits, pollution exposure, healthcare access, and survival chances.

    Doctors, researchers, and data scientists can use it to find patterns in lung cancer cases and improve early detection.

    Columns Breakdown (25 Features) Country – The country where the patient resides Age – Patient’s age (randomized between 30-90) Gender – Male/Female Smoking_Status – Smoker, Non-Smoker, Former Smoker Second_Hand_Smoke – Yes/No Air_Pollution_Exposure – Low, Medium, High Occupation_Exposure – Yes/No (Factory, Mining, etc.) Rural_or_Urban – Rural/Urban Socioeconomic_Status – Low, Middle, High Healthcare_Access – Good, Limited, Poor Insurance_Coverage – Yes/No Screening_Availability – Yes/No Stage_at_Diagnosis – I, II, III, IV Cancer_Type – NSCLC, SCLC Mutation_Type – EGFR, ALK, KRAS, None Treatment_Access – Full, Partial, None Clinical_Trial_Access – Yes/No Language_Barrier – Yes/No Mortality_Risk – Probability (0.0 - 1.0) 5_Year_Survival_Probability – Probability (0.0 - 1.0) Delay_in_Diagnosis – Yes/No Family_History – Yes/No Indoor_Smoke_Exposure – Yes/No Tobacco_Marketing_Exposure – Yes/No Final_Prediction – Lung Cancer (Yes/No)

  8. Data from: Dataset description.

    • plos.figshare.com
    xls
    Updated Aug 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd (2024). Dataset description. [Dataset]. http://doi.org/10.1371/journal.pone.0305035.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 27, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Among many types of cancers, to date, lung cancer remains one of the deadliest cancers around the world. Many researchers, scientists, doctors, and people from other fields continuously contribute to this subject regarding early prediction and diagnosis. One of the significant problems in prediction is the black-box nature of machine learning models. Though the detection rate is comparatively satisfactory, people have yet to learn how a model came to that decision, causing trust issues among patients and healthcare workers. This work uses multiple machine learning models on a numerical dataset of lung cancer-relevant parameters and compares performance and accuracy. After comparison, each model has been explained using different methods. The main contribution of this research is to give logical explanations of why the model reached a particular decision to achieve trust. This research has also been compared with a previous study that worked with a similar dataset and took expert opinions regarding their proposed model. We also showed that our research achieved better results than their proposed model and specialist opinion using hyperparameter tuning, having an improved accuracy of almost 100% in all four models.

  9. Lung Cancer Risk & Trends Across 25 Countries

    • kaggle.com
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankush Panday (2025). Lung Cancer Risk & Trends Across 25 Countries [Dataset]. http://doi.org/10.34740/kaggle/dsv/10680778
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 6, 2025
    Dataset provided by
    Kaggle
    Authors
    Ankush Panday
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset provides valuable insights into lung cancer cases, risk factors, smoking trends, and healthcare access across 25 of the world's most populated countries. It includes 220,632 individuals with details on their age, gender, smoking history, cancer diagnosis, environmental exposure, and survival rates. The dataset is useful for medical research, predictive modeling, and policy-making to understand lung cancer patterns globally.

  10. Data from: County-level cumulative environmental quality associated with...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). County-level cumulative environmental quality associated with cancer incidence. [Dataset]. https://catalog.data.gov/dataset/county-level-cumulative-environmental-quality-associated-with-cancer-incidence
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Population based cancer incidence rates were abstracted from National Cancer Institute, State Cancer Profiles for all available counties in the United States for which data were available. This is a national county-level database of cancer data that are collected by state public health surveillance systems. All-site cancer is defined as any type of cancer that is captured in the state registry data, though non-melanoma skin cancer is not included. All-site age-adjusted cancer incidence rates were abstracted separately for males and females. County-level annual age-adjusted all-site cancer incidence rates for years 2006–2010 were available for 2687 of 3142 (85.5%) counties in the U.S. Counties for which there are fewer than 16 reported cases in a specific area-sex-race category are suppressed to ensure confidentiality and stability of rate estimates; this accounted for 14 counties in our study. Two states, Kansas and Virginia, do not provide data because of state legislation and regulations which prohibit the release of county level data to outside entities. Data from Michigan does not include cases diagnosed in other states because data exchange agreements prohibit the release of data to third parties. Finally, state data is not available for three states, Minnesota, Ohio, and Washington. The age-adjusted average annual incidence rate for all counties was 453.7 per 100,000 persons. We selected 2006–2010 as it is subsequent in time to the EQI exposure data which was constructed to represent the years 2000–2005. We also gathered data for the three leading causes of cancer for males (lung, prostate, and colorectal) and females (lung, breast, and colorectal). The EQI was used as an exposure metric as an indicator of cumulative environmental exposures at the county-level representing the period 2000 to 2005. A complete description of the datasets used in the EQI are provided in Lobdell et al. and methods used for index construction are described by Messer et al. The EQI was developed for the period 2000– 2005 because it was the time period for which the most recent data were available when index construction was initiated. The EQI includes variables representing each of the environmental domains. The air domain includes 87 variables representing criteria and hazardous air pollutants. The water domain includes 80 variables representing overall water quality, general water contamination, recreational water quality, drinking water quality, atmospheric deposition, drought, and chemical contamination. The land domain includes 26 variables representing agriculture, pesticides, contaminants, facilities, and radon. The built domain includes 14 variables representing roads, highway/road safety, public transit behavior, business environment, and subsidized housing environment. The sociodemographic environment includes 12 variables representing socioeconomics and crime. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., L. Messer, K. Rappazzo , C. Gray, S. Grabich , and D. Lobdell. County-level environmental quality and associations with cancer incidence#. Cancer. John Wiley & Sons Incorporated, New York, NY, USA, 123(15): 2901-2908, (2017).

  11. National Lung Cancer Audit State of the Nation Report 2025 - Dataset -...

    • ckan.publishing.service.gov.uk
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2025). National Lung Cancer Audit State of the Nation Report 2025 - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/national-lung-cancer-audit-state-of-the-nation-report-2025
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    The National Lung Cancer Audit (NLCA) evaluates how the care received by people diagnosed with lung cancer in England and Wales compares with recommended practice and provides information that supports healthcare providers, commissioners, and regulators to improve the care for patients. The NLCA reports a set of process and outcome measures that cover important aspects of the care pathway for people diagnosed with lung cancer. In the NLCA State of the Nation report 2025, we give an overview of the patterns of care and outcomes for 37,750 people diagnosed with lung cancer in England in 2023. A separate section provides describes results for 2,334 people diagnosed in Wales in 2023. The report describes summarises the performance of lung cancer services in 2023 and compares this to the situation in 2020, 2021 and 2022.

  12. Result comparison among literature and this work.

    • plos.figshare.com
    xls
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd (2024). Result comparison among literature and this work. [Dataset]. http://doi.org/10.1371/journal.pone.0305035.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 27, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Among many types of cancers, to date, lung cancer remains one of the deadliest cancers around the world. Many researchers, scientists, doctors, and people from other fields continuously contribute to this subject regarding early prediction and diagnosis. One of the significant problems in prediction is the black-box nature of machine learning models. Though the detection rate is comparatively satisfactory, people have yet to learn how a model came to that decision, causing trust issues among patients and healthcare workers. This work uses multiple machine learning models on a numerical dataset of lung cancer-relevant parameters and compares performance and accuracy. After comparison, each model has been explained using different methods. The main contribution of this research is to give logical explanations of why the model reached a particular decision to achieve trust. This research has also been compared with a previous study that worked with a similar dataset and took expert opinions regarding their proposed model. We also showed that our research achieved better results than their proposed model and specialist opinion using hyperparameter tuning, having an improved accuracy of almost 100% in all four models.

  13. d

    [MI] Rapid Cancer Registration Data

    • digital.nhs.uk
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). [MI] Rapid Cancer Registration Data [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/mi-rapid-cancer-registration-data
    Explore at:
    Dataset updated
    Nov 27, 2025
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Description

    Rapid Cancer Registration Data (RCRD) provides a quick, indicative source of cancer data. It is provided to support the planning and provision of cancer services. The data is based on a rapid processing of cancer registration data sources, in particular on Cancer Outcomes and Services Dataset (COSD) information. In comparison, National Cancer Registration Data (NCRD) relies on additional data sources, enhanced follow-up with trusts and expert processing by cancer registration officers. The Rapid Cancer Registration Data (RCRD) may be useful for service improvement projects including healthcare planning and prioritisation. However, it is poorly suited for epidemiological research due to limitations in the data quality and completeness.

  14. f

    Skew values for each lung cancer risk factor.

    • plos.figshare.com
    xls
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Demeke Endalie; Wondmagegn Taye Abebe (2023). Skew values for each lung cancer risk factor. [Dataset]. http://doi.org/10.1371/journal.pdig.0000308.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    PLOS Digital Health
    Authors
    Demeke Endalie; Wondmagegn Taye Abebe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cancer is a broad term that refers to a wide range of diseases that can affect any part of the human body. To minimize the number of cancer deaths and to prepare an appropriate health policy on cancer spread mitigation, scientifically supported knowledge of cancer causes is critical. As a result, in this study, we analyzed lung cancer risk factors that lead to a highly severe cancer case using a decision tree-based ranking algorithm. This feature relevance ranking algorithm computes the weight of each feature of the dataset by using split points to improve detection accuracy, and each risk factor is weighted based on the number of observations that occur for it on the decision tree. Coughing of blood, air pollution, and obesity are the most severe lung cancer risk factors out of nine, with a weight of 39%, 21%, and 14%, respectively. We also proposed a machine learning model that uses Extreme Gradient Boosting (XGBoost) to detect lung cancer severity levels in lung cancer patients. We used a dataset of 1000 lung cancer patients and 465 individuals free from lung cancer from Tikur Ambesa (Black Lion) Hospital in Addis Ababa, Ethiopia, to assess the performance of the proposed model. The proposed cancer severity level detection model achieved 98.9%, 99%, and 98.9% accuracy, precision, and recall, respectively, for the testing dataset. The findings can assist governments and non-governmental organizations in making lung cancer-related policy decisions.

  15. Lung Cancer : CT Slice Images, Metadata(Synthetic)

    • kaggle.com
    zip
    Updated Jul 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leela Naveen Kumar (2025). Lung Cancer : CT Slice Images, Metadata(Synthetic) [Dataset]. https://www.kaggle.com/datasets/leelanaveenkumar/lung-cancer-ct-slice-images-metadatasynthetic
    Explore at:
    zip(156657624 bytes)Available download formats
    Dataset updated
    Jul 20, 2025
    Authors
    Leela Naveen Kumar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🩺 Lung CT Scans with Metadata for Smoking and Lung Cancer Classification

    This dataset contains 1,097 anonymized CT scan images of lungs, originally sourced from the IQ-OTH/NCCD Lung Cancer Dataset (Hamdallah, 2020), which is released under the CC0: Public Domain license.

    The dataset has been enhanced with synthetically generated metadata to support research in: - Lung cancer classification using deep learning - The impact of smoking status on cancer diagnosis - Metadata-integrated CNN model training and explainability

    📂 Dataset Contents

    • images/ – 1,097 CT scan images in JPG format
    • metadata/metadata.csv – A structured CSV file with the following fields:
      • patient_id: File name (e.g., Patient (1))
      • age: Simulated age (e.g., 45)
      • gender: Male / Female
      • smoking_status: Never Smoked / Former Smoker / Current Smoker
      • cancer_diagnosis: 0 = Normal, 1 = Lung Cancer

    🧠 Use Case

    This dataset was prepared for an MSc dissertation titled:

    Using Deep Learning to Analyze the Impact of Smoking on Lung Cancer

    It is intended for use in: - Deep learning experiments - CNN and metadata fusion models - Medical image classification - Interpretability analysis (e.g., Grad-CAM)

    👤 Class Distribution

    • 536 patients are normal (label 0)
    • 561 patients are cancerous (label 1)

    🔖 License

    🔗 Citation

    If you use this dataset in research or teaching, please cite:

    Guduru, L. N. K. (2025). Lung CT Scans with Metadata for Smoking and Lung Cancer Classification [Dataset]. Kaggle.
    Source images from: Hamdallah, A. (2020). IQ-OTH/NCCD Lung Cancer Dataset. Kaggle.

    ⚠️ Disclaimer

    • All metadata was synthetically generated and does not represent real individuals.
    • The dataset is intended strictly for academic and research use only.
  16. f

    Data from: BARD1 serum autoantibodies for the detection of lung cancer

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Aug 7, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dome, Balazs; Hegedus, Balazs; Pilyugin, Maxim; Descloux, Pascaline; Janes, Samuel; Irminger-Finger, Irmgard; Laszlo, Viktoria; André, Pierre-Alain; Laurent, Geoffrey J.; Bianco, Andrea; Sardy, Sylvain (2017). BARD1 serum autoantibodies for the detection of lung cancer [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001742310
    Explore at:
    Dataset updated
    Aug 7, 2017
    Authors
    Dome, Balazs; Hegedus, Balazs; Pilyugin, Maxim; Descloux, Pascaline; Janes, Samuel; Irminger-Finger, Irmgard; Laszlo, Viktoria; André, Pierre-Alain; Laurent, Geoffrey J.; Bianco, Andrea; Sardy, Sylvain
    Description

    PurposeCurrently the screening for lung cancer for risk groups is based on Computed Tomography (CT) or low dose CT (LDCT); however, the lung cancer death rate has not decreased significantly with people undergoing LDCT. We aimed to develop a simple reliable blood test for early detection of all types of lung cancer based on the immunogenicity of aberrant forms of BARD1 that are specifically upregulated in lung cancer.MethodsELISA assays were performed with a panel of BARD1 epitopes to detect serum levels of antibodies against BARD1 epitopes. We tested 194 blood samples from healthy donors and lung cancer patients with a panel of 40 BARD1 antigens. Using fitted Lasso logistic regression we determined the optimal combination of BARD1 antigens to be used in ELISA for discriminating lung cancer from healthy controls. Random selection of samples for training sets or validations sets was applied to validate the accuracy of our test.ResultsFitted Lasso logistic regression models predict high accuracy of the BARD1 autoimmune antibody test with an AUC = 0.96. Validation in independent samples provided and AUC = 0.86 and identical AUCs were obtained for combined stages 1–3 and late stage 4 lung cancers. The BARD1 antibody test is highly specific for lung cancer and not breast or ovarian cancer.ConclusionThe BARD1 lung cancer test shows higher sensitivity and specificity than previously published blood tests for lung cancer detection and/or diagnosis or CT scans, and it could detect all types and all stages of lung cancer. This BARD1 lung cancer test could therefore be further developed as i) screening test for early detection of lung cancers in high-risk groups, and ii) diagnostic aid in complementing CT scan.

  17. d

    Compendium – Mortality from lung cancer

    • digital.nhs.uk
    Updated Jul 21, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Compendium – Mortality from lung cancer [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/compendium-mortality/current/mortality-from-lung-cancer
    Explore at:
    Dataset updated
    Jul 21, 2021
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Description

    To reduce deaths from lung cancer. For information on the definitions of what these indicators include, please see the relevant specification. From 2016 onwards, mortality counts within the Compendium Mortality Indicator set are based on a bespoke extract taken from the Primary Care Mortality Database (PCMD) maintained by NHS Digital. PCMD is updated monthly using a file of death records from ONS and is continually subject to amendment. It is already well established that late registrations have a small impact on counts. This bespoke extract may be taken at a different time to that of the mortality data published by ONS and as such this may cause some small differences between ONS and NHS Digital mortality figures for a given year.

  18. Lung Cancer

    • kaggle.com
    zip
    Updated Jul 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ms. Nancy Al Aswad (2022). Lung Cancer [Dataset]. https://www.kaggle.com/nancyalaswad90/lung-cancer
    Explore at:
    zip(2046 bytes)Available download formats
    Dataset updated
    Jul 15, 2022
    Authors
    Ms. Nancy Al Aswad
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    What is Lung Cancer Dataset?

    The effectiveness of the cancer prediction system helps people to know their cancer risk at a low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system.

    .

    https://user-images.githubusercontent.com/36210723/182395183-ef7519e3-9c18-47ac-b7a6-a00e234f3949.png" alt="2022-08-02_170741">

    .

    Acknowledgments

    When we use this dataset in our research, we credit the authors as :

    • License : CC BY 4.0.

    • Hong, Z.Q. and Yang, J.Y. "Optimal Discriminant Plane for a Small Number of Samples and Design Method of Classifier on the Plane", Pattern Recognition, Vol. 24, No. 4, pp. 317-324, 1991 and it is published t to reuse in google research dataset

    The main idea for uploading this dataset is to practice data analysis with my students, as I am working in college and want my student to train our studying ideas in a big dataset, It may be not up to date and I mention the collecting years, but it is a good resource of data to practice

  19. s

    Five-year survival from breast, lung and colorectal cancer (NHSOF 1.4.iv) -...

    • ckan.publishing.service.gov.uk
    Updated Aug 4, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). Five-year survival from breast, lung and colorectal cancer (NHSOF 1.4.iv) - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/five-year-survival-from-breast-lung-and-colorectal-cancer-nhsof-1-4-iv
    Explore at:
    Dataset updated
    Aug 4, 2015
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    A measure of the number of adults diagnosed with breast, lung or colorectal cancer in a year who are still alive five years after diagnosis. ONS still publish survival percentages for individual types of cancers. These can be found at: http://www.ons.gov.uk/ons/rel/cancer-unit/cancer-survival/cancer-survival-in-england--patients-diagnosed-2007-2011-and-followed-up-to-2012/index.html A time series for five-year survival figures for breast, lung and colorectal cancer individually (previous NHS Outcomes Framework indicators 1.4.ii, 1.4.iv and 1.4.vi) is still published and can be found under the link 'Indicator data - previous methodology (.xls)' below. Purpose This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with breast, lung or colorectal cancer. Current version updated: May-14 Next version due: To be confirmed

  20. f

    Table_1_The presence of autoantibodies is associated with improved overall...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhao, Huijuan; Ouyang, Libo; Zheng, Peiming; Chen, Lianlian; Li, Gang; Wang, Rong; Cai, Jun; Jing, Keying (2023). Table_1_The presence of autoantibodies is associated with improved overall survival in lung cancer patients.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000942824
    Explore at:
    Dataset updated
    Dec 19, 2023
    Authors
    Zhao, Huijuan; Ouyang, Libo; Zheng, Peiming; Chen, Lianlian; Li, Gang; Wang, Rong; Cai, Jun; Jing, Keying
    Description

    ObjectiveAutoantibodies have been reported to be associated with cancers. As a biomarker, autoantibodies have been widely used in the early screening of lung cancer. However, the correlation between autoantibodies and the prognosis of lung cancer patients is poorly understood, especially in the Asian population. This retrospective study investigated the association between the presence of autoantibodies and outcomes in patients with lung cancer.MethodsA total of 264 patients diagnosed with lung cancer were tested for autoantibodies in Henan Provincial People’s Hospital from January 2017 to June 2022. The general clinical data of these patients were collected, and after screening out those who met the exclusion criteria, 151 patients were finally included in the study. The Cox proportional hazards model was used to analyze the effect of autoantibodies on the outcomes of patients with lung cancer. The Kaplan-Meier curve was used to analyze the relationship between autoantibodies and the overall survival of patients with lung cancer.ResultsCompared to lung cancer patients without autoantibodies, those with autoantibodies had an associated reduced risk of death (HRs: 0.45, 95% CIs 0.27~0.77), independent of gender, age, smoking history, pathological type, and pathological stage of lung cancer. Additionally, the association was found to be more significant by subgroup analysis in male patients, younger patients, and patients with small cell lung cancer. Furthermore, lung cancer patients with autoantibodies had significantly longer survival time than those without autoantibodies.ConclusionThe presence of autoantibodies is an independent indicator of good prognosis in patients with lung cancer, providing a new biomarker for prognostic evaluation in patients with lung cancer.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
MasterDataSan (2024). Lung Cancer Mortality Datasets v2 [Dataset]. https://www.kaggle.com/datasets/masterdatasan/lung-cancer-mortality-datasets-v2
Organization logo

Lung Cancer Mortality Datasets v2

Dataset of lung cancer with time observation durring theatment period

Explore at:
zip(81127029 bytes)Available download formats
Dataset updated
Jun 1, 2024
Authors
MasterDataSan
Description

This dataset contains data about lung cancer Mortality. This database is a comprehensive collection of patient information, specifically focused on individuals diagnosed with cancer. It is designed to facilitate the analysis of various factors that may influence cancer prognosis and treatment outcomes. The database includes a range of demographic, medical, and treatment-related variables, capturing essential details about each patient's condition and history.

Key components of the database include:

Demographic Information: Basic details about the patients such as age, gender, and country of residence. This helps in understanding the distribution of cancer cases across different populations and regions.

Medical History: Information about each patient’s medical background, including family history of cancer, smoking status, Body Mass Index (BMI), cholesterol levels, and the presence of other health conditions such as hypertension, asthma, cirrhosis, and other cancers. This section is crucial for identifying potential risk factors and comorbidities.

Cancer Diagnosis: Detailed data about the cancer diagnosis itself, including the date of diagnosis and the stage of cancer at the time of diagnosis. This helps in tracking the progression and severity of the disease.

Treatment Details: Information regarding the type of treatment each patient received, the end date of the treatment, and the outcome (whether the patient survived or not). This is essential for evaluating the effectiveness of different treatment approaches.

The structure of the database allows for in-depth analysis and research, making it possible to identify patterns, correlations, and potential causal relationships between various factors and cancer outcomes. It is a valuable resource for medical researchers, epidemiologists, and healthcare providers aiming to improve cancer treatment and patient care.

id: A unique identifier for each patient in the dataset. age: The age of the patient at the time of diagnosis. gender: The gender of the patient (e.g., male, female). country: The country or region where the patient resides. diagnosis_date: The date on which the patient was diagnosed with lung cancer. cancer_stage: The stage of lung cancer at the time of diagnosis (e.g., Stage I, Stage II, Stage III, Stage IV). family_history: Indicates whether there is a family history of cancer (e.g., yes, no). smoking_status: The smoking status of the patient (e.g., current smoker, former smoker, never smoked, passive smoker). bmi: The Body Mass Index of the patient at the time of diagnosis. cholesterol_level: The cholesterol level of the patient (value). hypertension: Indicates whether the patient has hypertension (high blood pressure) (e.g., yes, no). asthma: Indicates whether the patient has asthma (e.g., yes, no). cirrhosis: Indicates whether the patient has cirrhosis of the liver (e.g., yes, no). other_cancer: Indicates whether the patient has had any other type of cancer in addition to the primary diagnosis (e.g., yes, no). treatment_type: The type of treatment the patient received (e.g., surgery, chemotherapy, radiation, combined). end_treatment_date: The date on which the patient completed their cancer treatment or died. survived: Indicates whether the patient survived (e.g., yes, no).

This dataset contains artificially generated data with as close a representation of reality as possible. This data is free to use without any licence required.

Good luck Gakusei!

Search
Clear search
Close search
Google apps
Main menu