91 datasets found
  1. Lung Cancer

    • kaggle.com
    Updated Jul 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ms. Nancy Al Aswad (2022). Lung Cancer [Dataset]. https://www.kaggle.com/datasets/nancyalaswad90/lung-cancer/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 15, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ms. Nancy Al Aswad
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    What is Lung Cancer Dataset?

    The effectiveness of the cancer prediction system helps people to know their cancer risk at a low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system.

    .

    https://user-images.githubusercontent.com/36210723/182395183-ef7519e3-9c18-47ac-b7a6-a00e234f3949.png" alt="2022-08-02_170741">

    .

    Acknowledgments

    When we use this dataset in our research, we credit the authors as :

    • License : CC BY 4.0.

    • Hong, Z.Q. and Yang, J.Y. "Optimal Discriminant Plane for a Small Number of Samples and Design Method of Classifier on the Plane", Pattern Recognition, Vol. 24, No. 4, pp. 317-324, 1991 and it is published t to reuse in google research dataset

    The main idea for uploading this dataset is to practice data analysis with my students, as I am working in college and want my student to train our studying ideas in a big dataset, It may be not up to date and I mention the collecting years, but it is a good resource of data to practice

  2. D

    Lung Cancer Diagnostic Tests Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Lung Cancer Diagnostic Tests Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-lung-cancer-diagnostic-tests-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Lung Cancer Diagnostic Tests Market Outlook



    The lung cancer diagnostic tests market size was valued at USD 2.5 billion in 2023 and is projected to reach USD 6.1 billion by 2032, growing at a Compound Annual Growth Rate (CAGR) of 10.5% during the forecast period. This substantial growth can be attributed to the rising prevalence of lung cancer globally, advancements in diagnostic technologies, and increasing awareness regarding early detection and treatment of lung cancer. The growing aging population and the high incidence of smoking, which is a leading cause of lung cancer, further propel the demand for diagnostic tests.



    The increasing prevalence of lung cancer is one of the primary drivers of market growth. Lung cancer remains the leading cause of cancer-related deaths worldwide, necessitating the development of more accurate and early diagnostic methods. With advancements in medical technology, such as molecular diagnostics and non-invasive imaging techniques, the accuracy and efficiency of lung cancer diagnosis have significantly improved. These innovations not only enhance the detection rate but also facilitate personalized treatment plans, thereby improving patient outcomes.



    Furthermore, government initiatives and funding for cancer research play a crucial role in market expansion. Many countries are investing heavily in cancer research, leading to the development of new diagnostic tools and techniques. For instance, organizations such as the National Cancer Institute (NCI) in the United States provide substantial grants for lung cancer research, fostering innovations in diagnostics. In addition, public awareness campaigns and screening programs conducted by healthcare organizations and governments encourage early diagnosis, which is vital for successful treatment and survival rates.



    The integration of artificial intelligence (AI) and machine learning in diagnostic tools is another significant factor contributing to market growth. AI algorithms can analyze medical images with high precision, aiding radiologists in identifying lung cancer at earlier stages. Moreover, AI-driven software can evaluate large datasets from genetic and molecular tests, providing insights into the most effective treatment options based on individual patient profiles. This technological advancement not only enhances the accuracy of diagnostics but also reduces the time required for analysis, thereby increasing the efficiency of healthcare services.



    The EGFR Mutation Test is a pivotal advancement in the realm of lung cancer diagnostics, offering a more personalized approach to treatment. This test specifically identifies mutations in the Epidermal Growth Factor Receptor (EGFR) gene, which are often present in non-small cell lung cancer (NSCLC) patients. By detecting these mutations, healthcare providers can tailor therapies that target the specific genetic alterations, thereby improving treatment efficacy and patient outcomes. The growing adoption of EGFR Mutation Tests underscores the shift towards precision medicine, where treatments are increasingly customized based on individual genetic profiles. This approach not only enhances the effectiveness of therapies but also minimizes adverse effects, as treatments are more accurately aligned with the patient's unique genetic makeup.



    Regionally, North America holds the largest share of the lung cancer diagnostic tests market, followed by Europe and Asia Pacific. The dominance of North America can be attributed to the presence of advanced healthcare infrastructure, high healthcare expenditure, and a robust research landscape. The Asia Pacific region, however, is expected to witness the highest growth rate during the forecast period, driven by increasing healthcare investments, growing awareness about lung cancer, and rising incidences of the disease in countries like China and India. The growing middle-class population and improving healthcare access in these countries further support market growth.



    Test Type Analysis



    The lung cancer diagnostic tests market is segmented by test type into imaging tests, sputum cytology, tissue biopsy, molecular tests, and others. Imaging tests are one of the most commonly used diagnostic methods for lung cancer detection. Techniques such as X-rays, CT scans, and PET scans provide detailed visuals of the lungs, helping in identifying abnormal growths or tumors. The non-invasive nature of these tests and their ability to provide quick results make them a preferred choice among healthcare

  3. f

    lung cancer data.xlsx

    • figshare.com
    xlsx
    Updated Jan 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jehan Al-Musawi; Farah Al-Shadeedi; Nabaa Shakir; Sabreen Ibrahim (2025). lung cancer data.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.28235576.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 19, 2025
    Dataset provided by
    figshare
    Authors
    Jehan Al-Musawi; Farah Al-Shadeedi; Nabaa Shakir; Sabreen Ibrahim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract Objective: To identify the socioepidemiologic and histopathologic patterns of lung cancer patients in the Middle Euphrates region. Patients and Methods: This study analyzed medical information from lung cancer patients at the Middle Euphrates Cancer Center in Iraq from January 2018 to December 2023. Demographic information (age, gender, residency, and education level) as well as clinical details (histopathological categorization) were obtained. The inclusion criteria included all confirmed lung cancer cases, while cases with inadequate data or non-lung cancer diagnosis were omitted. The data were analyzed using IBM SPSS Statistics (version 26). The data summarized using descriptive statistics, and chi-square tests used to identify correlations between categorical variables at a significance level of p < 0.05. Ethical approval was obtained from the relevant institutional review board. Results: A total of 1162 patients were included with mean age at diagnosis(64.47±11.45) years. Majority of patients are over 60 years (64.4%), followed by (40–60 years), 34%, and the least affected group is under 40 years (1.6%). Males account for the majority of cases (68%), while females about 32%, with male:female ratio that fluctuate around 2:1. Illiterate patients and those with low education levels represent the largest proportion accounting for about 87.9% of the study population. Squamous Cell Carcinoma (SCC) is the most frequent subtype (41.7%), followed closely by Adenocarcinoma (AC) at 37%, and Small Cell Lung Cancer (SCLC), 10.5%. Although SCC is the predominant subtype overall, AC incidence is increasing overtime (from 31.7% in 2018 to 41.4% in 2023) with predominance in females, younger and higher educated groups. While the percentage of SCLC and other less common subgroups remained relatively stable over time, there is a significant reduction in NSCLC-NOS diagnoses (from 11.1% in 2018 to 3.2% in 2023). Conclusions: In Iraq, specifically in the Middle Euphrates region, lung cancer is a major public health issue in the elder age groups. The two main subtypes, SCC and AC, are the main contributors, with obvious increment in AC cases in the recent years. The shifting trends indicate the urgent need for improved screening strategies, focused preventative initiatives, and customized treatment plans in view of changing risk profiles.

  4. c

    National Lung Screening Trial

    • cancerimagingarchive.net
    • dev.cancerimagingarchive.net
    dicom, docx, n/a +2
    Updated Sep 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2021). National Lung Screening Trial [Dataset]. http://doi.org/10.7937/TCIA.HMQ8-J677
    Explore at:
    docx, svs, dicom, n/a, sas, zip, and docAvailable download formats
    Dataset updated
    Sep 24, 2021
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Sep 24, 2021
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    https://www.cancerimagingarchive.net/wp-content/uploads/nctn-logo-300x108.png" alt="" width="300" height="108" />

    Demographic Summary of Available Imaging

    CharacteristicValue (N = 26254)
    Age (years)Mean ± SD: 61.4± 5
    Median (IQR): 60 (57-65)
    Range: 43-75
    SexMale: 15512 (59%)
    Female: 10742 (41%)
    Race

    White: 23969 (91.3%)
    Black: 1135 (4.3%)
    Asian: 547 (2.1%)
    American Indian/Alaska Native: 88 (0.3%)
    Native Hawaiian/Other Pacific Islander: 87 (0.3%)
    Unknown: 428 (1.6%)

    Ethnicity

    Not Available

    Background: The aggressive and heterogeneous nature of lung cancer has thwarted efforts to reduce mortality from this cancer through the use of screening. The advent of low-dose helical computed tomography (CT) altered the landscape of lung-cancer screening, with studies indicating that low-dose CT detects many tumors at early stages. The National Lung Screening Trial (NLST) was conducted to determine whether screening with low-dose CT could reduce mortality from lung cancer.

    Methods: From August 2002 through April 2004, we enrolled 53,454 persons at high risk for lung cancer at 33 U.S. medical centers. Participants were randomly assigned to undergo three annual screenings with either low-dose CT (26,722 participants) or single-view posteroanterior chest radiography (26,732). Data were collected on cases of lung cancer and deaths from lung cancer that occurred through December 31, 2009. This dataset includes the low-dose CT scans from 26,254 of these subjects, as well as digitized histopathology images from 451 subjects.

    Results: The rate of adherence to screening was more than 90%. The rate of positive screening tests was 24.2% with low-dose CT and 6.9% with radiography over all three rounds. A total of 96.4% of the positive screening results in the low-dose CT group and 94.5% in the radiography group were false positive results. The incidence of lung cancer was 645 cases per 100,000 person-years (1060 cancers) in the low-dose CT group, as compared with 572 cases per 100,000 person-years (941 cancers) in the radiography group (rate ratio, 1.13; 95% confidence interval [CI], 1.03 to 1.23). There were 247 deaths from lung cancer per 100,000 person-years in the low-dose CT group and 309 deaths per 100,000 person-years in the radiography group, representing a relative reduction in mortality from lung cancer with low-dose CT screening of 20.0% (95% CI, 6.8 to 26.7; P=0.004). The rate of death from any cause was reduced in the low-dose CT group, as compared with the radiography group, by 6.7% (95% CI, 1.2 to 13.6; P=0.02).

    Conclusions: Screening with the use of low-dose CT reduces mortality from lung cancer. (Funded by the National Cancer Institute; National Lung Screening Trial ClinicalTrials.gov number, NCT00047385).

    Data Availability: A summary of the National Lung Screening Trial and its available datasets are provided on the Cancer Data Access System (CDAS). CDAS is maintained by Information Management System (IMS), contracted by the National Cancer Institute (NCI) as keepers and statistical analyzers of the NLST trial data. The full clinical data set from NLST is available through CDAS. Users of TCIA can download without restriction a publicly distributable subset of that clinical data, along with the CT and Histopathology images collected during the trial. (These previously were restricted.)

  5. Lung cancer Bangladesh

    • kaggle.com
    Updated Mar 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NISHAT VASKER (2025). Lung cancer Bangladesh [Dataset]. http://doi.org/10.34740/kaggle/dsv/11035259
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    NISHAT VASKER
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Bangladesh
    Description

    About Dataset 📌 Overview This dataset has been carefully synthesized to support research in lung cancer survival prediction, enabling the development of models that estimate:

    Whether a patient is likely to survive at least one year post-diagnosis (Binary Classification). The probability of survival based on clinical and lifestyle factors (Regression Analysis). The dataset is designed for machine learning and deep learning applications in medical AI, oncology research, and predictive healthcare.

    📜 Dataset Generation Process The dataset was generated using a combination of real-world epidemiological insights, medical literature, and statistical modeling. The feature distributions and relationships have been carefully modeled to reflect real-world clinical scenarios, ensuring biomedical validity.

    📖 Medical References & Sources The dataset structure is based on well-established lung cancer risk factors and survival indicators documented in leading medical research and clinical guidelines:

    World Health Organization (WHO) Reports on lung cancer epidemiology. National Cancer Institute (NCI) & American Cancer Society (ACS) guidelines on lung cancer risk factors and treatment outcomes. The IASLC Lung Cancer Staging Project (8th Edition): Standard reference for lung cancer staging. Harrison’s Principles of Internal Medicine (20th Edition): Provides an in-depth review of lung cancer diagnosis and treatment. Lung Cancer: Principles and Practice (2022, Oxford University Press): Clinical insights into lung cancer detection, treatment, and survival factors. 🔬 Features of the Dataset Each record in the dataset represents an individual’s clinical condition, lifestyle risk factors, and survival outcome. The dataset includes the following features:

    1️⃣ Patient Demographics Age → A key risk factor for lung cancer progression and survival. Gender → Male and female lung cancer survival rates can differ. Residence → Urban vs. Rural (impact of environmental factors). 2️⃣ Risk Factors & Lifestyle Indicators These factors have been linked to lung cancer risk in epidemiological studies:

    Smoking Status → (Current Smoker, Former Smoker, Never Smoked). Air Pollution Exposure → (Low, Moderate, High). Biomass Fuel Use → (Yes/No) – Associated with household air pollution. Factory Exposure → (Yes/No) – Industrial exposure increases lung cancer risk. Family History → (Yes/No) – Genetic predisposition to lung cancer. Diet Habit → (Vegetarian, Non-Vegetarian, Mixed) – Nutritional impact on cancer progression. 3️⃣ Symptoms (Primary Predictors) These are key clinical indicators associated with lung cancer detection and severity:

    Hemoptysis (Coughing Blood) Chest Pain Fatigue & Weakness Chronic Cough Unexplained Weight Loss 4️⃣ Tumor Characteristics & Clinical Features Tumor Size (mm) → The size of the detected tumor. Histology Type → (Adenocarcinoma, Squamous Cell Carcinoma, Small Cell Carcinoma). Cancer Stage → (Stage I to Stage IV). 5️⃣ Treatment & Healthcare Facility Treatment Received → (Surgery, Chemotherapy, Radiation, Targeted Therapy). Hospital Type → (Private, Government, Medical College). 6️⃣ Target Variables (Predicted Outcomes) Survival (Binary) → 1 (Yes) if the patient survives at least 1 year, 0 (No) otherwise. Survival Probability (%) (Can be derived) → Estimated probability of survival within one year. ⚡ Why This Dataset is Valuable? ✅ Balanced Data Distribution Designed to ensure a representative distribution of lung cancer survival cases. Prevents model bias and improves generalization in predictive models. ✅ Medically-Inspired Feature Engineering Features are derived from real-world lung cancer risk factors, validated through medical literature. Incorporates both lifestyle and clinical indicators to enhance predictive accuracy.(no real person data is used,just have made an biomedical environment) ✅ Diverse Risk Factors Considered Smoking, air pollution, and genetic history as primary lung cancer contributors. Symptom severity and tumor histology influence survival rates. ✅ Scalability & ML Suitability Ideal for classification and regression tasks in machine learning. Can be used with deep learning (TensorFlow, PyTorch), ML models (XGBoost, Random Forest, SVM), and explainable AI techniques like SHAP and LIME. 📂 Dataset Usage & Applications This dataset is highly useful for multiple healthcare AI applications, including:

    🩺 Predictive Analytics → Early detection of high-risk lung cancer patients. 🤖 Healthcare Chatbots → AI-powered risk assessment tools.

  6. a

    NCI State Lung Cancer Incidence Rates

    • hub.arcgis.com
    • arc-gis-hub-home-arcgishub.hub.arcgis.com
    Updated Jan 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Cancer Institute (2020). NCI State Lung Cancer Incidence Rates [Dataset]. https://hub.arcgis.com/maps/NCI::nci-state-lung-cancer-incidence-rates
    Explore at:
    Dataset updated
    Jan 2, 2020
    Dataset authored and provided by
    National Cancer Institute
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Description

    This dataset contains Cancer Incidence data for Lung Cancer (All Stages^) including: Age-Adjusted Rate, Confidence Interval, Average Annual Count, and Trend field information for US States for the average 5 year span from 2016 to 2020.Data are segmented by sex (Both Sexes, Male, and Female) and age (All Ages, Ages Under 50, Ages 50 & Over, Ages Under 65, and Ages 65 & Over), with field names and aliases describing the sex and age group tabulated.For more information, visit statecancerprofiles.cancer.govData NotationsState Cancer Registries may provide more current or more local data.TrendRising when 95% confidence interval of average annual percent change is above 0.Stable when 95% confidence interval of average annual percent change includes 0.Falling when 95% confidence interval of average annual percent change is below 0.† Incidence rates (cases per 100,000 population per year) are age-adjusted to the 2000 US standard population (19 age groups: <1, 1-4, 5-9, ... , 80-84, 85+). Rates are for invasive cancer only (except for bladder cancer which is invasive and in situ) or unless otherwise specified. Rates calculated using SEER*Stat. Population counts for denominators are based on Census populations as modified by NCI. The US Population Data File is used for SEER and NPCR incidence rates.‡ Incidence Trend data come from different sources. Due to different years of data availability, most of the trends are AAPCs based on APCs but some are APCs calculated in SEER*Stat. Please refer to the source for each area for additional information.Rates and trends are computed using different standards for malignancy. For more information see malignant.^ All Stages refers to any stage in the Surveillance, Epidemiology, and End Results (SEER) summary stage.Data Source Field Key(1) Source: National Program of Cancer Registries and Surveillance, Epidemiology, and End Results SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute. Based on the 2022 submission.(5) Source: National Program of Cancer Registries and Surveillance, Epidemiology, and End Results SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute. Based on the 2022 submission.(6) Source: National Program of Cancer Registries SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention (based on the 2022 submission).(7) Source: SEER November 2022 submission.(8) Source: Incidence data provided by the SEER Program. AAPCs are calculated by the Joinpoint Regression Program and are based on APCs. Data are age-adjusted to the 2000 US standard population (19 age groups: <1, 1-4, 5-9, ... , 80-84,85+). Rates are for invasive cancer only (except for bladder cancer which is invasive and in situ) or unless otherwise specified. Population counts for denominators are based on Census populations as modified by NCI. The US Population Data File is used with SEER November 2022 data.Some data are not available, see Data Not Available for combinations of geography, cancer site, age, and race/ethnicity.Data for the United States does not include data from Nevada.Data for the United States does not include Puerto Rico.

  7. w

    Lung Cancer Death Rate (per 100,000), New Jersey, by year: Beginning 2010

    • data.wu.ac.at
    • healthdata.nj.gov
    application/excel +5
    Updated May 23, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loretta Kelly (2018). Lung Cancer Death Rate (per 100,000), New Jersey, by year: Beginning 2010 [Dataset]. https://data.wu.ac.at/odso/healthdata_nj_gov/aWE3Ny1jdHFy
    Explore at:
    xlsx, csv, xml, application/xml+rdf, application/excel, jsonAvailable download formats
    Dataset updated
    May 23, 2018
    Dataset provided by
    Loretta Kelly
    Area covered
    New Jersey
    Description

    Rate: Number of deaths due to cancer of the trachea, bronchus, and lung per 100,000 Population.

    Definition: Number of deaths per 100,000 with malignant neoplasm (cancer) cancer of the trachea, bronchus, and lung as the underlying cause (ICD-10 codes: C33-C34).

    Data Sources:

    (1) Centers for Disease Control and Prevention, National Center for Health Statistics. Compressed Mortality File. CDC WONDER On-line Database accessed at http://wonder.cdc.gov/cmf-icd10.html

    (2) Death Certificate Database, Office of Vital Statistics and Registry, New Jersey Department of Health

    (3) Population Estimates, State Data Center, New Jersey Department of Labor and Workforce Development

  8. f

    Data from: Temporal trends in lung cancer survival: a population-based study...

    • tandf.figshare.com
    • figshare.com
    tiff
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lukas Löfling; Shahram Bahmanyar; Helle Kieler; Mats Lambe; Gunnar Wagenius (2023). Temporal trends in lung cancer survival: a population-based study [Dataset]. http://doi.org/10.6084/m9.figshare.17158139.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Lukas Löfling; Shahram Bahmanyar; Helle Kieler; Mats Lambe; Gunnar Wagenius
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Lung cancer is the number one cancer-related cause of death in Sweden and worldwide. In most countries, five-year survival estimates vary between 10% and 20% with evidence of improved survival over time. Over the last decades, the management of lung cancer has changed including the introduction of national guidelines, new diagnostic procedures and treatments. This study aimed to investigate temporal trends in lung cancer survival both overall and in subgroups defined by established prognostic factors (i.e., sex, stage, histopathology and smoking history). We estimated one-, two-, and five-year relative survival, and excess mortality, in patients diagnosed with squamous cell carcinoma or adenocarcinoma of the lung between 1995 and 2016 in Sweden. We used population-based information available in a national lung cancer research database (LCBaSe) generated by cross-linkage between the Swedish National Lung Cancer Register and several Swedish health and sociodemographic registers. We included 36,935 patients diagnosed with squamous cell carcinoma or adenocarcinoma of the lung between 1995 and 2016. The overall one-, two- and five-year survival estimates increased between 1995 and 2016, from 38% to 53%, 21% to 37%, and 14% to 24%, respectively. Over the study period, we also found improved survival in subgroups, for example in patients with stages III-IV disease, patients with adenocarcinoma, and never-smokers. The excess mortality decreased over the study period, both overall and in all subgroups. Lung cancer survival increased over time in the overall lung cancer population. Of special note was evidence of improved survival in patients with stage IV disease. Our results corroborate a previously observed global trend of improved survival in patients with lung cancer.

  9. Data from: County-level cumulative environmental quality associated with...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). County-level cumulative environmental quality associated with cancer incidence. [Dataset]. https://catalog.data.gov/dataset/county-level-cumulative-environmental-quality-associated-with-cancer-incidence
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Population based cancer incidence rates were abstracted from National Cancer Institute, State Cancer Profiles for all available counties in the United States for which data were available. This is a national county-level database of cancer data that are collected by state public health surveillance systems. All-site cancer is defined as any type of cancer that is captured in the state registry data, though non-melanoma skin cancer is not included. All-site age-adjusted cancer incidence rates were abstracted separately for males and females. County-level annual age-adjusted all-site cancer incidence rates for years 2006–2010 were available for 2687 of 3142 (85.5%) counties in the U.S. Counties for which there are fewer than 16 reported cases in a specific area-sex-race category are suppressed to ensure confidentiality and stability of rate estimates; this accounted for 14 counties in our study. Two states, Kansas and Virginia, do not provide data because of state legislation and regulations which prohibit the release of county level data to outside entities. Data from Michigan does not include cases diagnosed in other states because data exchange agreements prohibit the release of data to third parties. Finally, state data is not available for three states, Minnesota, Ohio, and Washington. The age-adjusted average annual incidence rate for all counties was 453.7 per 100,000 persons. We selected 2006–2010 as it is subsequent in time to the EQI exposure data which was constructed to represent the years 2000–2005. We also gathered data for the three leading causes of cancer for males (lung, prostate, and colorectal) and females (lung, breast, and colorectal). The EQI was used as an exposure metric as an indicator of cumulative environmental exposures at the county-level representing the period 2000 to 2005. A complete description of the datasets used in the EQI are provided in Lobdell et al. and methods used for index construction are described by Messer et al. The EQI was developed for the period 2000– 2005 because it was the time period for which the most recent data were available when index construction was initiated. The EQI includes variables representing each of the environmental domains. The air domain includes 87 variables representing criteria and hazardous air pollutants. The water domain includes 80 variables representing overall water quality, general water contamination, recreational water quality, drinking water quality, atmospheric deposition, drought, and chemical contamination. The land domain includes 26 variables representing agriculture, pesticides, contaminants, facilities, and radon. The built domain includes 14 variables representing roads, highway/road safety, public transit behavior, business environment, and subsidized housing environment. The sociodemographic environment includes 12 variables representing socioeconomics and crime. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., L. Messer, K. Rappazzo , C. Gray, S. Grabich , and D. Lobdell. County-level environmental quality and associations with cancer incidence#. Cancer. John Wiley & Sons Incorporated, New York, NY, USA, 123(15): 2901-2908, (2017).

  10. l

    Lung Cancer Mortality

    • data.lacounty.gov
    • geohub.lacity.org
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    County of Los Angeles (2023). Lung Cancer Mortality [Dataset]. https://data.lacounty.gov/datasets/lung-cancer-mortality/about
    Explore at:
    Dataset updated
    Dec 20, 2023
    Dataset authored and provided by
    County of Los Angeles
    Area covered
    Description

    Death rate has been age-adjusted by the 2000 U.S. standard population. Single-year data are only available for Los Angeles County overall, Service Planning Areas, Supervisorial Districts, City of Los Angeles overall, and City of Los Angeles Council Districts.Lung cancer is a leading cause of cancer-related death in the US. People who smoke have the greatest risk of lung cancer, though lung cancer can also occur in people who have never smoked. Most cases are due to long-term tobacco smoking or exposure to secondhand tobacco smoke. Cities and communities can take an active role in curbing tobacco use and reducing lung cancer by adopting policies to regulate tobacco retail; reducing exposure to secondhand smoke in outdoor public spaces, such as parks, restaurants, or in multi-unit housing; and improving access to tobacco cessation programs and other preventive services.For more information about the Community Health Profiles Data Initiative, please see the initiative homepage.

  11. d

    CDC Cancer Deaths (Lung and Colon)

    • catalog.data.gov
    • data.amerigeoss.org
    Updated Apr 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). CDC Cancer Deaths (Lung and Colon) [Dataset]. https://catalog.data.gov/zh_CN/dataset/cdc-cancer-deaths-lung-and-colon
    Explore at:
    Dataset updated
    Apr 1, 2021
    Description

    This map service portrays the number of deaths per 100,000 people per square mile from lung and colon cancer. It displays the distribution of lung and colon cancer across the United States. Pop-ups show attributes such as state name, county name, number of colon or lung cancer deaths, and square miles per area.Lung Cancer: Death due to malignant neoplasm of the trachea, bronchus and lung.Colon Cancer: Death due to malignant neoplasm of the colon, rectum and anus.This data was sourced from: Community Health Status Indicators_Other Health Datapalooza focused content that may interest you: Health Datapalooza Health Datapalooza

  12. c

    Data from The Lung Image Database Consortium (LIDC) and Image Database...

    • cancerimagingarchive.net
    • dev.cancerimagingarchive.net
    dicom, n/a, xls, xlsx +1
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive, Data from The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans [Dataset]. http://doi.org/10.7937/K9/TCIA.2015.LO9QL9SX
    Explore at:
    xlsx, xls, n/a, xml and zip, dicomAvailable download formats
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Sep 21, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions. It is a web-accessible international resource for development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process.

    Seven academic centers and eight medical imaging companies collaborated to create this data set which contains 1018 cases. Each subject includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories ("nodule > or =3 mm," "nodule <3 mm," and "non-nodule > or =3 mm"). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion. The goal of this process was to identify as completely as possible all lung nodules in each CT scan without requiring forced consensus.

    Note : The TCIA team strongly encourages users to review pylidc and the Standardized representation of the TCIA LIDC-IDRI annotations using DICOM (DICOM-LIDC-IDRI-Nodules) of the annotations/segmentations included in this dataset before developing custom tools to analyze the XML version.

  13. P

    National Lung Screening Trial (NLST) Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Lung Screening Trial (NLST) Dataset [Dataset]. https://paperswithcode.com/dataset/national-lung-screening-trial-nlst
    Explore at:
    Description

    The National Lung Screening Trial (NLST) was a randomized controlled trial conducted by the Lung Screening Study group (LSS) and the American College of Radiology Imaging Network (ACRIN) to determine whether screening for lung cancer with low-dose helical computed tomography (CT) reduces mortality from lung cancer in high-risk individuals relative to screening with chest radiography. Approximately 54,000 participants were enrolled between August 2002 and April 2004. Data collection has ended, and information is complete through December 31, 2009. NLST has the ClinicalTrials.gov registration number NCT00047385.

  14. f

    Data from: Dataset description.

    • plos.figshare.com
    xls
    Updated Aug 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataset description. [Dataset]. https://plos.figshare.com/articles/dataset/Dataset_description_/26034390
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 27, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Among many types of cancers, to date, lung cancer remains one of the deadliest cancers around the world. Many researchers, scientists, doctors, and people from other fields continuously contribute to this subject regarding early prediction and diagnosis. One of the significant problems in prediction is the black-box nature of machine learning models. Though the detection rate is comparatively satisfactory, people have yet to learn how a model came to that decision, causing trust issues among patients and healthcare workers. This work uses multiple machine learning models on a numerical dataset of lung cancer-relevant parameters and compares performance and accuracy. After comparison, each model has been explained using different methods. The main contribution of this research is to give logical explanations of why the model reached a particular decision to achieve trust. This research has also been compared with a previous study that worked with a similar dataset and took expert opinions regarding their proposed model. We also showed that our research achieved better results than their proposed model and specialist opinion using hyperparameter tuning, having an improved accuracy of almost 100% in all four models.

  15. W

    Cancer Registration: Frequency of lung Cancer tumours Diagnosed in 2015-2016...

    • cloud.csiss.gmu.edu
    • data.europa.eu
    xls
    Updated Dec 31, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United Kingdom (2019). Cancer Registration: Frequency of lung Cancer tumours Diagnosed in 2015-2016 by CCG and Route to Diagnosis [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/http-www-ncin-org-uk-view-rid-3950
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 31, 2019
    Dataset provided by
    United Kingdom
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    National Cancer Registration And Analysis Service (NCRAS). (2019). Cancer Registration: Frequency of lung Cancer tumours Diagnosed in 2015-2016 by CCG and Route to Diagnosis (2015 -2016) [Dataset]. Public Health England. https://doi.org/10.25503/7gpv-d753

    Aggregated data on lung cancers tumours (ICD-10 C33-C34) diagnosed between 2015-2016 in English resident population.

    Data within the File: - PATIENTS (Count of tumours) - CCG_NAME (Name of resident CCG) - CCG_ROUTE (Name of Route to Diagnosis)

  16. Synthetic Lung Cancer Patient Records Prediction Dataset

    • opendatabay.com
    .undefined
    Updated Apr 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Lung Cancer Patient Records Prediction Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/768c8887-e9fb-4be4-8047-819d4dcb9c6c
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Apr 26, 2025
    Dataset provided by
    Buy & Sell Data | Opendatabay - AI & Synthetic Data Marketplace
    Authors
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Patient Health Records & Digital Health
    Description

    This synthetic Lung Cancer Risk Prediction Dataset is designed for educational and research purposes in the fields of data science, public health, and cancer research. It contains essential health and lifestyle indicators such as smoking habits, chronic diseases, and respiratory symptoms, which can be used to analyze and predict the risk of lung cancer. The dataset is ideal for building predictive models, conducting risk assessments, and exploring the relationships between lifestyle factors and lung health.

    Dataset Features

    • Gender: The biological sex of the individual (Male/Female).
    • Age: The age of the individual in years.
    • Smoking: Whether the individual smokes (Yes/No).
    • Yellow Fingers: Whether the individual has yellow fingers (Yes/No).
    • Anxiety: Whether the individual has anxiety (Yes/No).
    • Peer Pressure: Whether the individual experiences peer pressure (Yes/No).
    • Chronic Disease: Whether the individual has any chronic diseases (Yes/No).
    • Fatigue: Whether the individual experiences fatigue (Yes/No).
    • Allergy: Whether the individual has allergies (Yes/No).
    • Wheezing: Whether the individual experiences wheezing (Yes/No).
    • Alcohol Consuming: Whether the individual consumes alcohol (Yes/No).
    • Coughing: Whether the individual experiences coughing (Yes/No).
    • Shortness of Breath: Whether the individual experiences shortness of breath (Yes/No).
    • Swallowing Difficulty: Whether the individual experiences difficulty swallowing (Yes/No).
    • Chest Pain: Whether the individual experiences chest pain (Yes/No).
    • Lung Cancer: Binary classification indicating lung cancer risk:
    • YES: At risk of lung cancer.
    • NO: Not at risk of lung cancer. ### Distribution

    Usage

    This dataset is ideal for various lung cancer-related applications:

    • Lung Cancer Risk Prediction: Develop machine learning models to classify individuals as at risk or not at risk of lung cancer.
    • Risk Factor Analysis: Identify key factors contributing to lung cancer risks and prioritize lifestyle interventions.
    • Predictive Modeling: Build predictive models using health and lifestyle indicators to assess lung health.
    • Public Health Research: Study the relationships between health metrics, lifestyle factors, and lung cancer risks.
    • Preventive Healthcare: Inform public health campaigns and individual preventive measures. ### Coverage This synthetic dataset is anonymized, ensuring compliance with data privacy standards. It is designed for research and learning purposes, providing diverse health conditions and demographic data for analysis and model building.

    License

    CC0 (Public Domain)

    Who Can Use It

    • Data Science Practitioners: For practicing data preprocessing, classification, and regression tasks related to lung health.
    • Healthcare Professionals and Researchers: To explore relationships between health metrics and lung cancer risks.
    • Public Health Analysts: To understand trends and develop interventions for reducing lung cancer risks.
    • Policy Makers and Regulators: For data-driven decision-making in preventive healthcare policies.
  17. G

    Cancer incidence, by selected sites of cancer and sex, three-year average,...

    • open.canada.ca
    • www150.statcan.gc.ca
    • +1more
    csv, html, xml
    Updated Jan 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). Cancer incidence, by selected sites of cancer and sex, three-year average, Canada, provinces, territories and health regions (2015 boundaries) [Dataset]. https://open.canada.ca/data/en/dataset/0112f88b-c08f-4a7a-8379-1c88b57c7412
    Explore at:
    csv, xml, htmlAvailable download formats
    Dataset updated
    Jan 17, 2023
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    This table contains 30810 series, with data for years 2001/2003 - 2013/2015 (not all combinations necessarily have data for all years). This table contains data described by the following dimensions (Not all combinations are available): Geography (158 items: Canada; Newfoundland and Labrador; Eastern Regional Health Authority, Newfoundland and Labrador; Central Regional Health Authority, Newfoundland and Labrador; ...);  Sex (3 items: Both sexes; Males; Females);  Selected sites of cancer (ICD-O-3) (5 items: All invasive primary cancer sites (including in situ bladder); Colon, rectum and rectosigmoid junction cancer; Bronchus and lung cancer; Female breast cancer; ...);  Characteristics (13 items: Number of new cancer cases; Cancer incidence (rate per 100,000 population); Low 95% confidence interval, cancer incidence (rate per 100,000 population); High 95% confidence interval, cancer incidence (rate per 100,000 population); ...).

  18. Years of Life Lost (YLL): Lung cancer

    • data.europa.eu
    • cloud.csiss.gmu.edu
    • +1more
    html
    Updated Oct 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NHS Digital (2021). Years of Life Lost (YLL): Lung cancer [Dataset]. https://data.europa.eu/set/data/years_of_life_lost_yll_-_lung_cancer
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Oct 11, 2021
    Dataset provided by
    National Health Servicehttps://www.nhs.uk/
    NHS Digitalhttps://digital.nhs.uk/
    Authors
    NHS Digital
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    Years of Life Lost (YLL) as a result of death from lung cancer - Directly age-Standardised Rates (DSR) per 100,000 population Source: Office for National Statistics (ONS) Publisher: Information Centre (IC) - Clinical and Health Outcomes Knowledge Base Geographies: Local Authority District (LAD), Government Office Region (GOR), National, Primary Care Trust (PCT), Strategic Health Authority (SHA) Geographic coverage: England Time coverage: 2005-07, 2007 Type of data: Administrative data

  19. Impact of Time-to-Treatment Initiation on Survival in Single Primary...

    • figshare.com
    txt
    Updated Jul 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jun Teng (2023). Impact of Time-to-Treatment Initiation on Survival in Single Primary Non-Small Cell Lung Cancer:a Population-based Study.csv [Dataset]. http://doi.org/10.6084/m9.figshare.23614482.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 2, 2023
    Dataset provided by
    figshare
    Authors
    Jun Teng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We analyzed NSCLC data from the Surveillance, Epidemiology, and End Results database, focusing on lung adenocarcinoma (LUAD) and lung squamous carcinoma (LUSC).

  20. Dataset from A Randomized, Open-label Study to Explore the Correlation of...

    • data.niaid.nih.gov
    Updated Mar 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataset from A Randomized, Open-label Study to Explore the Correlation of Biomarkers With Response Rate in Chemo-naive Patients With Advanced or Recurrent Non-squamous Non-small Cell Lung Cancer Who Receive Treatment With Avastin in Addition to Carboplatin-based Chemotherapy [Dataset]. https://data.niaid.nih.gov/resources?id=VIVLI_c793cdf0-93aa-4821-ba70-ee9a2629dc9c
    Explore at:
    Dataset updated
    Mar 9, 2025
    Dataset provided by
    Roche Holding AGhttp://roche.com/
    Authors
    Clinical Trials
    Area covered
    Germany, Poland, Czech Republic, Hong Kong, Hungary, Russian Federation, Italy, Australia, France, Spain
    Variables measured
    Evaluating Response To Treatment
    Description

    This study will explore the correlation of biomarkers with response rate, and the overall efficacy and safety, of Avastin in combination with carboplatin-based chemotherapy in patients with advanced or recurrent non-squamous non-small cell lung cancer. Patients will be randomized to one of 2 groups, to receive either Avastin 7.5mg/kg iv on day 1 of each 3 week cycle, or Avastin 15mg/kg iv on day 1 of each 3 week cycle; all patients will also receive treatment with carboplatin and either gemcitabine or paclitaxel for a maximum of 6 cycles. The anticipated time on study treatment is until disease progression, and the target sample size is 100-500 individuals.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ms. Nancy Al Aswad (2022). Lung Cancer [Dataset]. https://www.kaggle.com/datasets/nancyalaswad90/lung-cancer/discussion?sort=undefined
Organization logo

Lung Cancer

Exploring Lung Cancer DataSet

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 15, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ms. Nancy Al Aswad
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

What is Lung Cancer Dataset?

The effectiveness of the cancer prediction system helps people to know their cancer risk at a low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system.

.

https://user-images.githubusercontent.com/36210723/182395183-ef7519e3-9c18-47ac-b7a6-a00e234f3949.png" alt="2022-08-02_170741">

.

Acknowledgments

When we use this dataset in our research, we credit the authors as :

  • License : CC BY 4.0.

  • Hong, Z.Q. and Yang, J.Y. "Optimal Discriminant Plane for a Small Number of Samples and Design Method of Classifier on the Plane", Pattern Recognition, Vol. 24, No. 4, pp. 317-324, 1991 and it is published t to reuse in google research dataset

The main idea for uploading this dataset is to practice data analysis with my students, as I am working in college and want my student to train our studying ideas in a big dataset, It may be not up to date and I mention the collecting years, but it is a good resource of data to practice

Search
Clear search
Close search
Google apps
Main menu