Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
** Description**
This dataset contains data about lung cancer Mortality and is a comprehensive collection of patient information, specifically focused on individuals diagnosed with cancer. This dataset contains comprehensive information on 800,000 individuals related to lung cancer diagnosis, treatment, and outcomes. With 16 well-structured columns. This large-scale dataset is designed to aid researchers, data scientists, and healthcare professionals in studying patterns, building predictive models, and enhancing early detection and treatment strategies.
🌍 The Societal Impact of Lung Cancer
Lung cancer is not just a disease — it's a global crisis that steals time, health, and hope from millions of people every year. As the #1 cause of cancer deaths worldwide, it takes more lives annually than breast, colon, and prostate cancer combined.
But behind every statistic is a story:
A parent who never saw their child graduate.
A worker who had to leave their job too soon.
A community that lost a leader, a friend, a neighbor.
Why does this matter? Lung cancer often goes undetected until it's too late. It’s aggressive, silent, and devastating — especially in underserved areas where early detection is rare and treatment options are limited. It doesn’t just affect patients. It affects families, economies, and healthcare systems on a massive scale.
This dataset represents more than numbers. It represents 800,000 real-world stories — people who can help us unlock patterns, train models, and advance life-saving research.
By working with this data, you're not just analyzing a dataset — you're stepping into the fight against one of humanity’s deadliest diseases.
Let’s turn insight into impact. (😊The above descriptions is generated with the help of AI, Just wanted to share this dataset That all. Thank you)
Death rate has been age-adjusted by the 2000 U.S. standard population. Single-year data are only available for Los Angeles County overall, Service Planning Areas, Supervisorial Districts, City of Los Angeles overall, and City of Los Angeles Council Districts.Lung cancer is a leading cause of cancer-related death in the US. People who smoke have the greatest risk of lung cancer, though lung cancer can also occur in people who have never smoked. Most cases are due to long-term tobacco smoking or exposure to secondhand tobacco smoke. Cities and communities can take an active role in curbing tobacco use and reducing lung cancer by adopting policies to regulate tobacco retail; reducing exposure to secondhand smoke in outdoor public spaces, such as parks, restaurants, or in multi-unit housing; and improving access to tobacco cessation programs and other preventive services.For more information about the Community Health Profiles Data Initiative, please see the initiative homepage.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Card for Lung Cancer
Dataset Summary
The effectiveness of cancer prediction system helps the people to know their cancer risk with low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system .
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/nateraw/lung-cancer.
This map service portrays the number of deaths per 100,000 people per square mile from lung and colon cancer. It displays the distribution of lung and colon cancer across the United States. Pop-ups show attributes such as state name, county name, number of colon or lung cancer deaths, and square miles per area.Lung Cancer: Death due to malignant neoplasm of the trachea, bronchus and lung.Colon Cancer: Death due to malignant neoplasm of the colon, rectum and anus.This data was sourced from: Community Health Status Indicators_Other Health Datapalooza focused content that may interest you: Health Datapalooza Health Datapalooza
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Characteristic | Value (N = 26254) |
---|---|
Age (years) | Mean ± SD: 61.4± 5 Median (IQR): 60 (57-65) Range: 43-75 |
Sex | Male: 15512 (59%) Female: 10742 (41%) |
Race | White: 23969 (91.3%) |
Ethnicity | Not Available |
Background: The aggressive and heterogeneous nature of lung cancer has thwarted efforts to reduce mortality from this cancer through the use of screening. The advent of low-dose helical computed tomography (CT) altered the landscape of lung-cancer screening, with studies indicating that low-dose CT detects many tumors at early stages. The National Lung Screening Trial (NLST) was conducted to determine whether screening with low-dose CT could reduce mortality from lung cancer.
Methods: From August 2002 through April 2004, we enrolled 53,454 persons at high risk for lung cancer at 33 U.S. medical centers. Participants were randomly assigned to undergo three annual screenings with either low-dose CT (26,722 participants) or single-view posteroanterior chest radiography (26,732). Data were collected on cases of lung cancer and deaths from lung cancer that occurred through December 31, 2009. This dataset includes the low-dose CT scans from 26,254 of these subjects, as well as digitized histopathology images from 451 subjects.
Results: The rate of adherence to screening was more than 90%. The rate of positive screening tests was 24.2% with low-dose CT and 6.9% with radiography over all three rounds. A total of 96.4% of the positive screening results in the low-dose CT group and 94.5% in the radiography group were false positive results. The incidence of lung cancer was 645 cases per 100,000 person-years (1060 cancers) in the low-dose CT group, as compared with 572 cases per 100,000 person-years (941 cancers) in the radiography group (rate ratio, 1.13; 95% confidence interval [CI], 1.03 to 1.23). There were 247 deaths from lung cancer per 100,000 person-years in the low-dose CT group and 309 deaths per 100,000 person-years in the radiography group, representing a relative reduction in mortality from lung cancer with low-dose CT screening of 20.0% (95% CI, 6.8 to 26.7; P=0.004). The rate of death from any cause was reduced in the low-dose CT group, as compared with the radiography group, by 6.7% (95% CI, 1.2 to 13.6; P=0.02).
Conclusions: Screening with the use of low-dose CT reduces mortality from lung cancer. (Funded by the National Cancer Institute; National Lung Screening Trial ClinicalTrials.gov number, NCT00047385).
Data Availability: A summary of the National Lung Screening Trial and its available datasets are provided on the Cancer Data Access System (CDAS). CDAS is maintained by Information Management System (IMS), contracted by the National Cancer Institute (NCI) as keepers and statistical analyzers of the NLST trial data. The full clinical data set from NLST is available through CDAS. Users of TCIA can download without restriction a publicly distributable subset of that clinical data, along with the CT and Histopathology images collected during the trial. (These previously were restricted.)
This dataset contains information on patients with lung cancer, including their age, gender, air pollution exposure, alcohol use, dust allergy, occupational hazards, genetic risk, chronic lung disease, balanced diet, obesity, smoking, passive smoker, chest pain, coughing of blood, fatigue, weight loss ,shortness of breath ,wheezing ,swallowing difficulty ,clubbing of finger nails and snoring
Lung cancer is the leading cause of cancer death worldwide, accounting for 1.59 million deaths in 2018. The majority of lung cancer cases are attributed to smoking, but exposure to air pollution is also a risk factor. A new study has found that air pollution may be linked to an increased risk of lung cancer, even in nonsmokers.
The study, which was published in the journal Nature Medicine, looked at data from over 462,000 people in China who were followed for an average of six years. The participants were divided into two groups: those who lived in areas with high levels of air pollution and those who lived in areas with low levels of air pollution.
The researchers found that the people in the high-pollution group were more likely to develop lung cancer than those in the low-pollution group. They also found that the risk was higher in nonsmokers than smokers, and that the risk increased with age.
While this study does not prove that air pollution causes lung cancer, it does suggest that there may be a link between the two. More research is needed to confirm these findings and to determine what effect different types and levels of air pollution may have on lung cancer risk
- predicting the likelihood of a patient developing lung cancer
- identifying risk factors for lung cancer
- determining the most effective treatment for a patient with lung cancer
License
See the dataset description for more information.
File: cancer patient data sets.csv | Column name | Description | |:-----------------------------|:--------------------------------------------------------------------| | Age | The age of the patient. (Numeric) | | Gender | The gender of the patient. (Categorical) | | Air Pollution | The level of air pollution exposure of the patient. (Categorical) | | Alcohol use | The level of alcohol use of the patient. (Categorical) | | Dust Allergy | The level of dust allergy of the patient. (Categorical) | | OccuPational Hazards | The level of occupational hazards of the patient. (Categorical) | | Genetic Risk | The level of genetic risk of the patient. (Categorical) | | chronic Lung Disease | The level of chronic lung disease of the patient. (Categorical) | | Balanced Diet | The level of balanced diet of the patient. (Categorical) | | Obesity | The level of obesity of the patient. (Categorical) | | Smoking | The level of smoking of the patient. (Categorical) | | Passive Smoker | The level of passive smoker of the patient. (Categorical) | | Chest Pain | The level of chest pain of the patient. (Categorical) | | Coughing of Blood | The level of coughing of blood of the patient. (Categorical) | | Fatigue | The level of fatigue of the patient. (Categorical) | | Weight Loss | The level of weight loss of the patient. (Categorical) | | Shortness of Breath | The level of shortness of breath of the patient. (Categorical) | | Wheezing | The level of wheezing of the patient. (Categorical) | | Swallowing Difficulty | The level of swallowing difficulty of the patient. (Categorical) | | Clubbing of Finger Nails | The level of clubbing of finger nails of the patient. (Categorical) |
Number and rate of new cancer cases diagnosed annually from 1992 to the most recent diagnosis year available. Included are all invasive cancers and in situ bladder cancer with cases defined using the Surveillance, Epidemiology and End Results (SEER) Groups for Primary Site based on the World Health Organization International Classification of Diseases for Oncology, Third Edition (ICD-O-3). Random rounding of case counts to the nearest multiple of 5 is used to prevent inappropriate disclosure of health-related information.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This dataset consists of CT and PET-CT DICOM images of lung cancer subjects with XML Annotation files that indicate tumor location with bounding boxes. The images were retrospectively acquired from patients with suspicion of lung cancer, and who underwent standard-of-care lung biopsy and PET/CT. Subjects were grouped according to a tissue histopathological diagnosis. Patients with Names/IDs containing the letter 'A' were diagnosed with Adenocarcinoma, 'B' with Small Cell Carcinoma, 'E' with Large Cell Carcinoma, and 'G' with Squamous Cell Carcinoma.
The images were analyzed on the mediastinum (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval varies from 0.625 mm to 5 mm. Scanning mode includes plain, contrast and 3D reconstruction.
Before the examination, the patient underwent fasting for at least 6 hours, and the blood glucose of each patient was less than 11 mmol/L. Whole-body emission scans were acquired 60 minutes after the intravenous injection of 18F-FDG (4.44MBq/kg, 0.12mCi/kg), with patients in the supine position in the PET scanner. FDG doses and uptake times were 168.72-468.79MBq (295.8±64.8MBq) and 27-171min (70.4±24.9 minutes), respectively. 18F-FDG with a radiochemical purity of 95% was provided. Patients were allowed to breathe normally during PET and CT acquisitions. Attenuation correction of PET images was performed using CT data with the hybrid segmentation method. Attenuation corrections were performed using a CT protocol (180mAs,120kV,1.0pitch). Each study comprised one CT volume, one PET volume and fused PET and CT images: the CT resolution was 512 × 512 pixels at 1mm × 1mm, the PET resolution was 200 × 200 pixels at 4.07mm × 4.07mm, with a slice thickness and an interslice distance of 1mm. Both volumes were reconstructed with the same number of slices. Three-dimensional (3D) emission and transmission scanning were acquired from the base of the skull to mid femur. The PET images were reconstructed via the TrueX TOF method with a slice thickness of 1mm.
The location of each tumor was annotated by five academic thoracic radiologists with expertise in lung cancer to make this dataset a useful tool and resource for developing algorithms for medical diagnosis. Two of the radiologists had more than 15 years of experience and the others had more than 5 years of experience. After one of the radiologists labeled each subject the other four radiologists performed a verification, resulting in all five radiologists reviewing each annotation file in the dataset. Annotations were captured using Labellmg. The image annotations are saved as XML files in PASCAL VOC format, which can be parsed using the PASCAL Development Toolkit: https://pypi.org/project/pascal-voc-tools/. Python code to visualize the annotation boxes on top of the DICOM images can be downloaded here.
Two deep learning researchers used the images and the corresponding annotation files to train several well-known detection models which resulted in a maximum a posteriori probability (MAP) of around 0.87 on the validation set.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PurposeCurrently the screening for lung cancer for risk groups is based on Computed Tomography (CT) or low dose CT (LDCT); however, the lung cancer death rate has not decreased significantly with people undergoing LDCT. We aimed to develop a simple reliable blood test for early detection of all types of lung cancer based on the immunogenicity of aberrant forms of BARD1 that are specifically upregulated in lung cancer.MethodsELISA assays were performed with a panel of BARD1 epitopes to detect serum levels of antibodies against BARD1 epitopes. We tested 194 blood samples from healthy donors and lung cancer patients with a panel of 40 BARD1 antigens. Using fitted Lasso logistic regression we determined the optimal combination of BARD1 antigens to be used in ELISA for discriminating lung cancer from healthy controls. Random selection of samples for training sets or validations sets was applied to validate the accuracy of our test.ResultsFitted Lasso logistic regression models predict high accuracy of the BARD1 autoimmune antibody test with an AUC = 0.96. Validation in independent samples provided and AUC = 0.86 and identical AUCs were obtained for combined stages 1–3 and late stage 4 lung cancers. The BARD1 antibody test is highly specific for lung cancer and not breast or ovarian cancer.ConclusionThe BARD1 lung cancer test shows higher sensitivity and specificity than previously published blood tests for lung cancer detection and/or diagnosis or CT scans, and it could detect all types and all stages of lung cancer. This BARD1 lung cancer test could therefore be further developed as i) screening test for early detection of lung cancers in high-risk groups, and ii) diagnostic aid in complementing CT scan.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
A measure of the number of adults diagnosed with breast, lung or colorectal cancer in a year who are still alive one year after diagnosis.
ONS still publish survival percentages for individual types of cancers. These can be found at: http://www.ons.gov.uk/ons/rel/cancer-unit/cancer-survival/cancer-survival-in-england--patients-diagnosed-2007-2011-and-followed-up-to-2012/index.html
A time series for one-year survival figures for breast, lung and colorectal cancer individually (previous NHS Outcomes Framework indicators 1.4.i, 1.4.iii and 1.4.v) is still published and can be found under the link 'Indicator data - previous methodology (.xls)' below.
Purpose
This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with breast, lung or colorectal cancer.
Current version updated: Feb-14
Next version due: To be confirmed
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
About Dataset 📌 Overview This dataset has been carefully synthesized to support research in lung cancer survival prediction, enabling the development of models that estimate:
Whether a patient is likely to survive at least one year post-diagnosis (Binary Classification). The probability of survival based on clinical and lifestyle factors (Regression Analysis). The dataset is designed for machine learning and deep learning applications in medical AI, oncology research, and predictive healthcare.
📜 Dataset Generation Process The dataset was generated using a combination of real-world epidemiological insights, medical literature, and statistical modeling. The feature distributions and relationships have been carefully modeled to reflect real-world clinical scenarios, ensuring biomedical validity.
📖 Medical References & Sources The dataset structure is based on well-established lung cancer risk factors and survival indicators documented in leading medical research and clinical guidelines:
World Health Organization (WHO) Reports on lung cancer epidemiology. National Cancer Institute (NCI) & American Cancer Society (ACS) guidelines on lung cancer risk factors and treatment outcomes. The IASLC Lung Cancer Staging Project (8th Edition): Standard reference for lung cancer staging. Harrison’s Principles of Internal Medicine (20th Edition): Provides an in-depth review of lung cancer diagnosis and treatment. Lung Cancer: Principles and Practice (2022, Oxford University Press): Clinical insights into lung cancer detection, treatment, and survival factors. 🔬 Features of the Dataset Each record in the dataset represents an individual’s clinical condition, lifestyle risk factors, and survival outcome. The dataset includes the following features:
1️⃣ Patient Demographics Age → A key risk factor for lung cancer progression and survival. Gender → Male and female lung cancer survival rates can differ. Residence → Urban vs. Rural (impact of environmental factors). 2️⃣ Risk Factors & Lifestyle Indicators These factors have been linked to lung cancer risk in epidemiological studies:
Smoking Status → (Current Smoker, Former Smoker, Never Smoked). Air Pollution Exposure → (Low, Moderate, High). Biomass Fuel Use → (Yes/No) – Associated with household air pollution. Factory Exposure → (Yes/No) – Industrial exposure increases lung cancer risk. Family History → (Yes/No) – Genetic predisposition to lung cancer. Diet Habit → (Vegetarian, Non-Vegetarian, Mixed) – Nutritional impact on cancer progression. 3️⃣ Symptoms (Primary Predictors) These are key clinical indicators associated with lung cancer detection and severity:
Hemoptysis (Coughing Blood) Chest Pain Fatigue & Weakness Chronic Cough Unexplained Weight Loss 4️⃣ Tumor Characteristics & Clinical Features Tumor Size (mm) → The size of the detected tumor. Histology Type → (Adenocarcinoma, Squamous Cell Carcinoma, Small Cell Carcinoma). Cancer Stage → (Stage I to Stage IV). 5️⃣ Treatment & Healthcare Facility Treatment Received → (Surgery, Chemotherapy, Radiation, Targeted Therapy). Hospital Type → (Private, Government, Medical College). 6️⃣ Target Variables (Predicted Outcomes) Survival (Binary) → 1 (Yes) if the patient survives at least 1 year, 0 (No) otherwise. Survival Probability (%) (Can be derived) → Estimated probability of survival within one year. ⚡ Why This Dataset is Valuable? ✅ Balanced Data Distribution Designed to ensure a representative distribution of lung cancer survival cases. Prevents model bias and improves generalization in predictive models. ✅ Medically-Inspired Feature Engineering Features are derived from real-world lung cancer risk factors, validated through medical literature. Incorporates both lifestyle and clinical indicators to enhance predictive accuracy.(no real person data is used,just have made an biomedical environment) ✅ Diverse Risk Factors Considered Smoking, air pollution, and genetic history as primary lung cancer contributors. Symptom severity and tumor histology influence survival rates. ✅ Scalability & ML Suitability Ideal for classification and regression tasks in machine learning. Can be used with deep learning (TensorFlow, PyTorch), ML models (XGBoost, Random Forest, SVM), and explainable AI techniques like SHAP and LIME. 📂 Dataset Usage & Applications This dataset is highly useful for multiple healthcare AI applications, including:
🩺 Predictive Analytics → Early detection of high-risk lung cancer patients. 🤖 Healthcare Chatbots → AI-powered risk assessment tools.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The National Lung Cancer Audit (NLCA) evaluates how the care received by people diagnosed with lung cancer in England and Wales compares with recommended practice and provides information that supports healthcare providers, commissioners, and regulators to improve the care for patients. The NLCA reports a set of process and outcome measures that cover important aspects of the care pathway for people diagnosed with lung cancer. In the NLCA State of the Nation report 2025, we give an overview of the patterns of care and outcomes for 37,750 people diagnosed with lung cancer in England in 2023. A separate section provides describes results for 2,334 people diagnosed in Wales in 2023. The report describes summarises the performance of lung cancer services in 2023 and compares this to the situation in 2020, 2021 and 2022.
Pneumonia is the world’s leading cause of death among children under 5 years of age.
Pneumonia killed approximately 2,400 children a day in 2015.
Pneumonia killed an estimated 880,000 children under the age of five in 2016.
More than 150,000 people are estimated to die from lung cancer each year.
Infections, including pneumonia, are the second most common cause of death in people with lung cancer.
“The physician workforce shortages that our nation is facing are being felt even more acutely as we mobilize on the front lines to combat the COVID-19 national emergency.” --David J. Skorton, MD, AAMC president and CEO
The demographic that is going to suffer most from this shortage is patients over age 65: "While the national population is projected to grow by 10.4% during the 15 years covered by the study, the over-65 population is expected to grow by 45.1%."
For the original dataset, click here.
For the sorted dataset needed to run this notebook, click here.
CONTENT: 5856 Posterior to Anterior (PA) Chest X-ray images from pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients’ routine clinical care.
PROCESS: “For the analysis of chest X-ray images, all chest radiographs were initially screened for quality control by removing all low quality or unreadable scans. The diagnoses for the images were then graded by two expert physicians before being cleared for training the AI system. In order to account for any grading errors, the evaluation set was also checked by a third expert.” (page 12)
Here's a link to an example project using this dataset: https://github.com/Luv2bnanook44/flatiron_phase4_project
This dataset was preprocessed from this Kaggle dataset from Paul Mooney: https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia
A measure of the number of adults diagnosed with breast, lung or colorectal cancer in a year who are still alive five years after diagnosis. ONS still publish survival percentages for individual types of cancers. These can be found at: http://www.ons.gov.uk/ons/rel/cancer-unit/cancer-survival/cancer-survival-in-england--patients-diagnosed-2007-2011-and-followed-up-to-2012/index.html A time series for five-year survival figures for breast, lung and colorectal cancer individually (previous NHS Outcomes Framework indicators 1.4.ii, 1.4.iv and 1.4.vi) is still published and can be found under the link 'Indicator data - previous methodology (.xls)' below. Purpose This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with breast, lung or colorectal cancer. Current version updated: May-14 Next version due: To be confirmed
This table contains 33048 series, with data for years 2000/2002 - 2010/2012 (not all combinations necessarily have data for all years), and was last released on 2016-03-16. This table contains data described by the following dimensions (Not all combinations are available): Geography (36 items: Total, census metropolitan areas; St. John's, Newfoundland and Labrador; Halifax, Nova Scotia;Moncton, New Brunswick; ...), Sex (3 items: Both sexes; Males; Females), Indicators (2 items: Mortality; Potential years of life lost), Selected causes of death (ICD-10) (17 items: Total, all causes of death; All malignant neoplasms (cancers); Colorectal cancer; Lung cancer; ...), Characteristics (9 items: Number; Low 95% confidence interval, number; High 95% confidence interval, number; Rate; ...).
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This data shows the percentage of adults (age 18 and over) who are current smokers. Smoking is the single biggest cause of preventable death and illnesses, and big inequalities exist between and within communities. Smoking is a major risk factor for many diseases, such as lung cancer, chronic obstructive pulmonary disease (COPD, bronchitis and emphysema) and heart disease. It is also associated with cancers in other organs. Smoking is a modifiable lifestyle risk factor. Preventing people from starting smoking is important in reducing the health harms and inequalities. This data is based on the Office for National Statistics (ONS) Annual Population Survey (APS). The percentage of adults is not age-standardised. In this dataset particularly at district level there may be inherent statistical uncertainty in some data values. Thus as with many other datasets, this data should be used together with other data and resources to obtain a fuller picture. Data source: Office for Health Improvement and Disparities (OHID) Public Health Outcomes Framework (PHOF) indicator 92443 (Number 15). This data is updated annually.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
One-year and five-year net survival for adults (15-99) in England diagnosed with one of 29 common cancers, by age and sex.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset presents the footprint of cancer mortality statistics in Australia for all cancers combined and the 6 top cancer groupings (colorectal, leukaemia, lung, lymphoma, melanoma of the skin and pancreas) and their respective ICD-10 codes. The data spans the years 2009-2013 and is aggregated to Greater Capital City Statistical Areas (GCCSA) from the 2011 Australian Statistical Geography Standard (ASGS).
Mortality data refer to the number of deaths due to cancer in a given time period. Cancer deaths data are sourced from the Australian Institute of Health and Welfare (AIHW) 2013 National Mortality Database (NMD).
For further information about this dataset, please visit:
Please note:
AURIN has spatially enabled the original data.
Due to changes in geographic classifications over time, long-term trends are not available.
Values assigned to "n.p." in the original data have been removed from the data.
The Australian and jurisdictional totals include people who could not be assigned a GCCSA. The number of people who could not be assigned a GCCSA is less than 1% of the total.
The Australian total also includes residents of Other Territories (Cocos (Keeling) Islands, Christmas Island and Jervis Bay Territory).
Cause of Death Unit Record File data are provided to the AIHW by the Registries of Births, Deaths and Marriages and the National Coronial Information System (managed by the Victorian Department of Justice) and include cause of death coded by the Australian Bureau of Statistics (ABS). The data are maintained by the AIHW in the NMD.
Year refers to year of occurrence of death for years up to and including 2012, and year of registration of death for 2013. Deaths registered in 2011 and earlier are based on the final version of cause of death data; deaths registered in 2012 and 2013 are based on revised and preliminary versions, respectively and are subject to further revision by the ABS.
Cause of death information are based on underlying cause of death and are classified according to the International Classification of Diseases and Related Health Problems (ICD). Deaths registered in 1997 onwards are classified according to the 10th revision (ICD-10).
Colorectal deaths presented are underestimates. For further information, refer to "Complexities in the measurement of bowel cancer in Australia" in Causes of Death, Australia (ABS cat. no. 3303.0).
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This data shows the percentage of adults (age 18 and over) who are current smokers.
Smoking is the single biggest cause of preventable death and illnesses, and big inequalities exist between and within communities. Smoking is a major risk factor for many diseases, such as lung cancer, chronic obstructive pulmonary disease (COPD, bronchitis and emphysema) and heart disease. It is also associated with cancers in other organs.
Smoking is a modifiable lifestyle risk factor. Preventing people from starting smoking is important in reducing the health harms and inequalities.
This data is based on the Office for National Statistics (ONS) Annual Population Survey (APS). In this dataset particularly at district level there may be inherent statistical uncertainty in some data values. Thus as with many other datasets, this data should be used together with other data and resources to obtain a fuller picture.
Data source: Public Health England, Public Health Outcomes Framework (PHOF) indicator 2.14. This data is updated annually.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This data shows the percentage of adults (age 18 and over) who are current smokers. Smoking is the single biggest cause of preventable death and illnesses, and big inequalities exist between and within communities. Smoking is a major risk factor for many diseases, such as lung cancer, chronic obstructive pulmonary disease (COPD, bronchitis and emphysema) and heart disease. It is also associated with cancers in other organs. Smoking is a modifiable lifestyle risk factor. Preventing people from starting smoking is important in reducing the health harms and inequalities. This data is based on the Office for National Statistics (ONS) Annual Population Survey (APS). The percentage of adults is not age-standardised. In this dataset particularly at district level there may be inherent statistical uncertainty in some data values. Thus as with many other datasets, this data should be used together with other data and resources to obtain a fuller picture. Data source: Office for Health Improvement and Disparities (OHID) Public Health Outcomes Framework (PHOF) indicator 92443 (Number 15). This data is updated annually.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
** Description**
This dataset contains data about lung cancer Mortality and is a comprehensive collection of patient information, specifically focused on individuals diagnosed with cancer. This dataset contains comprehensive information on 800,000 individuals related to lung cancer diagnosis, treatment, and outcomes. With 16 well-structured columns. This large-scale dataset is designed to aid researchers, data scientists, and healthcare professionals in studying patterns, building predictive models, and enhancing early detection and treatment strategies.
🌍 The Societal Impact of Lung Cancer
Lung cancer is not just a disease — it's a global crisis that steals time, health, and hope from millions of people every year. As the #1 cause of cancer deaths worldwide, it takes more lives annually than breast, colon, and prostate cancer combined.
But behind every statistic is a story:
A parent who never saw their child graduate.
A worker who had to leave their job too soon.
A community that lost a leader, a friend, a neighbor.
Why does this matter? Lung cancer often goes undetected until it's too late. It’s aggressive, silent, and devastating — especially in underserved areas where early detection is rare and treatment options are limited. It doesn’t just affect patients. It affects families, economies, and healthcare systems on a massive scale.
This dataset represents more than numbers. It represents 800,000 real-world stories — people who can help us unlock patterns, train models, and advance life-saving research.
By working with this data, you're not just analyzing a dataset — you're stepping into the fight against one of humanity’s deadliest diseases.
Let’s turn insight into impact. (😊The above descriptions is generated with the help of AI, Just wanted to share this dataset That all. Thank you)