100+ datasets found

Lung Cancer Mortality Datasets v2
kaggle.com
zip
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MasterDataSan (2024). Lung Cancer Mortality Datasets v2 [Dataset]. https://www.kaggle.com/datasets/masterdatasan/lung-cancer-mortality-datasets-v2
Explore at:
zip(81127029 bytes)Available download formats
Dataset updated
Jun 1, 2024
Authors
MasterDataSan
Description
This dataset contains data about lung cancer Mortality. This database is a comprehensive collection of patient information, specifically focused on individuals diagnosed with cancer. It is designed to facilitate the analysis of various factors that may influence cancer prognosis and treatment outcomes. The database includes a range of demographic, medical, and treatment-related variables, capturing essential details about each patient's condition and history.

Key components of the database include:

Demographic Information: Basic details about the patients such as age, gender, and country of residence. This helps in understanding the distribution of cancer cases across different populations and regions.

Medical History: Information about each patient’s medical background, including family history of cancer, smoking status, Body Mass Index (BMI), cholesterol levels, and the presence of other health conditions such as hypertension, asthma, cirrhosis, and other cancers. This section is crucial for identifying potential risk factors and comorbidities.

Cancer Diagnosis: Detailed data about the cancer diagnosis itself, including the date of diagnosis and the stage of cancer at the time of diagnosis. This helps in tracking the progression and severity of the disease.

Treatment Details: Information regarding the type of treatment each patient received, the end date of the treatment, and the outcome (whether the patient survived or not). This is essential for evaluating the effectiveness of different treatment approaches.

The structure of the database allows for in-depth analysis and research, making it possible to identify patterns, correlations, and potential causal relationships between various factors and cancer outcomes. It is a valuable resource for medical researchers, epidemiologists, and healthcare providers aiming to improve cancer treatment and patient care.

id: A unique identifier for each patient in the dataset. age: The age of the patient at the time of diagnosis. gender: The gender of the patient (e.g., male, female). country: The country or region where the patient resides. diagnosis_date: The date on which the patient was diagnosed with lung cancer. cancer_stage: The stage of lung cancer at the time of diagnosis (e.g., Stage I, Stage II, Stage III, Stage IV). family_history: Indicates whether there is a family history of cancer (e.g., yes, no). smoking_status: The smoking status of the patient (e.g., current smoker, former smoker, never smoked, passive smoker). bmi: The Body Mass Index of the patient at the time of diagnosis. cholesterol_level: The cholesterol level of the patient (value). hypertension: Indicates whether the patient has hypertension (high blood pressure) (e.g., yes, no). asthma: Indicates whether the patient has asthma (e.g., yes, no). cirrhosis: Indicates whether the patient has cirrhosis of the liver (e.g., yes, no). other_cancer: Indicates whether the patient has had any other type of cancer in addition to the primary diagnosis (e.g., yes, no). treatment_type: The type of treatment the patient received (e.g., surgery, chemotherapy, radiation, combined). end_treatment_date: The date on which the patient completed their cancer treatment or died. survived: Indicates whether the patient survived (e.g., yes, no).

This dataset contains artificially generated data with as close a representation of reality as possible. This data is free to use without any licence required.

Good luck Gakusei!

Breast Cancer Dataset [Wisconsin Diagnostic UCI]

kaggle.com

zip

Updated Jan 22, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Abhinav Mangalore (2024). Breast Cancer Dataset [Wisconsin Diagnostic UCI] [Dataset]. https://www.kaggle.com/datasets/abhinavmangalore/breast-cancer-dataset-wisconsin-diagnostic-uci

Explore at:

zip(49831 bytes)Available download formats

Dataset updated

Jan 22, 2024

Authors

Abhinav Mangalore

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

Wisconsin

Description

This dataset is taken from the UCI Machine Learning Repository (Link: https://data.world/health/breast-cancer-wisconsin) by the Donor: Nick Street

The main idea and inspiration behind the upload was to provide datasets for Machine Learning as practice and reference for my peers at college. The main purpose is to analyze data and experiment with different machine learning ideas and techniques for this binary classification task. As such, this dataset is a very useful resource to practice on.

Breast cancer is when breast cells mutate and become cancerous cells that multiply and form tumors. It accounts for 25% of all cancer cases and affected over 2.1 Million people in 2015 alone. Breast cancer typically affects women and people assigned female at birth (AFAB) age 50 and older, but it can also affect men and people assigned male at birth (AMAB), as well as younger women. Healthcare providers may treat breast cancer with surgery to remove tumors or treatment to kill cancerous cells.

Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. A few of the images can be found at http://www.cs.wisc.edu/~street/images/

The task: To classify whether the tumor is benign (B) or malignant (M).

Relevant information

Features are computed from a digitized image of a fine needle
aspirate (FNA) of a breast mass. They describe
characteristics of the cell nuclei present in the image.
A few of the images can be found at
http://www.cs.wisc.edu/~street/images/

Separating plane described above was obtained using
Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree
Construction Via Linear Programming." Proceedings of the 4th
Midwest Artificial Intelligence and Cognitive Science Society,
pp. 97-101, 1992], a classification method which uses linear
programming to construct a decision tree. Relevant features
were selected using an exhaustive search in the space of 1-4
features and 1-3 separating planes.

The actual linear program used to obtain the separating plane
in the 3-dimensional space is that described in:
[K. P. Bennett and O. L. Mangasarian: "Robust Linear
Programming Discrimination of Two Linearly Inseparable Sets",
Optimization Methods and Software 1, 1992, 23-34].


This database is also available through the UW CS ftp server:

ftp ftp.cs.wisc.edu
cd math-prog/cpo-dataset/machine-learn/WDBC/

Number of instances: 569

Number of attributes: 32 (ID, diagnosis, 30 real-valued input features)

Original Creators:

Dr. William H. Wolberg, General Surgery Dept., University of
Wisconsin, Clinical Sciences Center, Madison, WI 53792
wolberg@eagle.surgery.wisc.edu

W. Nick Street, Computer Sciences Dept., University of
Wisconsin, 1210 West Dayton St., Madison, WI 53706
street@cs.wisc.edu 608-262-6619

Olvi L. Mangasarian, Computer Sciences Dept., University of
Wisconsin, 1210 West Dayton St., Madison, WI 53706
olvi@cs.wisc.edu

Donor: Nick Street

Date: November 1995

Past Usage:

first usage:

W.N. Street, W.H. Wolberg and O.L. Mangasarian 
Nuclear feature extraction for breast tumor diagnosis.
IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science
and Technology, volume 1905, pages 861-870, San Jose, CA, 1993.

OR literature:

O.L. Mangasarian, W.N. Street and W.H. Wolberg. 
Breast cancer diagnosis and prognosis via linear programming. 
Operations Research, 43(4), pages 570-577, July-August 1995.

Medical literature:

W.H. Wolberg, W.N. Street, and O.L. Mangasarian. 
Machine learning techniques to diagnose breast cancer from
fine-needle aspirates. 
Cancer Letters 77 (1994) 163-171.

W.H. Wolberg, W.N. Street, and O.L. Mangasarian. 
Image analysis and machine learning applied to breast cancer
diagnosis and prognosis. 
Analytical and Quantitative Cytology and Histology, Vol. 17
No. 2, pages 77-87, April 1995. 

W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. 
Computerized breast cancer diagnosis and prognosis from fine
needle aspirates. 
Archives of Surgery 1995;130:511-516.

W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. 
Computer-derived nuclear features distinguish malignant from
benign breast cytology. 
Human Pathology, 26:792--796, 1995.

Appendix Cancer Prediction Dataset
kaggle.com
zip
Updated Feb 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ankush Panday (2025). Appendix Cancer Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/ankushpanday1/appendix-cancer-prediction-dataset
Explore at:
zip(7343922 bytes)Available download formats
Dataset updated
Feb 4, 2025
Authors
Ankush Panday
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains clinical, demographic, and lifestyle data for 260,000 individuals from 25 countries. Designed for healthcare research and predictive modeling, it includes diverse variables relevant to appendix cancer diagnosis and risk factors. The dataset can support machine learning tasks, statistical analysis, and exploratory data studies in oncology and public health domains.
Lung Cancer Dataset
kaggle.com
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aman_Kumar094 (2025). Lung Cancer Dataset [Dataset]. https://www.kaggle.com/datasets/amankumar094/lung-cancer-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 6, 2025
Dataset provided by
Kaggle
Authors
Aman_Kumar094
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
** Description**

This dataset contains data about lung cancer Mortality and is a comprehensive collection of patient information, specifically focused on individuals diagnosed with cancer. This dataset contains comprehensive information on 800,000 individuals related to lung cancer diagnosis, treatment, and outcomes. With 16 well-structured columns. This large-scale dataset is designed to aid researchers, data scientists, and healthcare professionals in studying patterns, building predictive models, and enhancing early detection and treatment strategies.

🌍 The Societal Impact of Lung Cancer

Lung cancer is not just a disease — it's a global crisis that steals time, health, and hope from millions of people every year. As the #1 cause of cancer deaths worldwide, it takes more lives annually than breast, colon, and prostate cancer combined.

But behind every statistic is a story:

A parent who never saw their child graduate.

A worker who had to leave their job too soon.

A community that lost a leader, a friend, a neighbor.

Why does this matter? Lung cancer often goes undetected until it's too late. It’s aggressive, silent, and devastating — especially in underserved areas where early detection is rare and treatment options are limited. It doesn’t just affect patients. It affects families, economies, and healthcare systems on a massive scale.

This dataset represents more than numbers. It represents 800,000 real-world stories — people who can help us unlock patterns, train models, and advance life-saving research.

By working with this data, you're not just analyzing a dataset — you're stepping into the fight against one of humanity’s deadliest diseases.

Let’s turn insight into impact. (😊The above descriptions is generated with the help of AI, Just wanted to share this dataset That all. Thank you)
d
[MI] Rapid Cancer Registration Data
digital.nhs.uk
Updated Nov 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). [MI] Rapid Cancer Registration Data [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/mi-rapid-cancer-registration-data
Explore at:
Dataset updated
Nov 27, 2025
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Description
Rapid Cancer Registration Data (RCRD) provides a quick, indicative source of cancer data. It is provided to support the planning and provision of cancer services. The data is based on a rapid processing of cancer registration data sources, in particular on Cancer Outcomes and Services Dataset (COSD) information. In comparison, National Cancer Registration Data (NCRD) relies on additional data sources, enhanced follow-up with trusts and expert processing by cancer registration officers. The Rapid Cancer Registration Data (RCRD) may be useful for service improvement projects including healthcare planning and prioritisation. However, it is poorly suited for epidemiological research due to limitations in the data quality and completeness.
One-year survival from all cancers (NHSOF 1.4.i) - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Aug 4, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2015). One-year survival from all cancers (NHSOF 1.4.i) - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/one-year-survival-from-all-cancers-nhsof-1-4-i
Explore at:
Dataset updated
Aug 4, 2015
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
A measure of the number of adults diagnosed with any type of cancer in a year who are still alive one year after diagnosis. Purpose This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with any type of cancer. Current version updated: Feb-17 Next version due: Feb-18
Cancer survival in England - adults diagnosed
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Aug 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2019). Cancer survival in England - adults diagnosed [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/cancersurvivalratescancersurvivalinenglandadultsdiagnosed
Explore at:
xlsxAvailable download formats
Dataset updated
Aug 12, 2019
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
One-year and five-year net survival for adults (15-99) in England diagnosed with one of 29 common cancers, by age and sex.
Number and rates of new cases of primary cancer, by cancer type, age group...
www150.statcan.gc.ca
datasets.ai
+2more
Updated May 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2021). Number and rates of new cases of primary cancer, by cancer type, age group and sex [Dataset]. http://doi.org/10.25318/1310011101-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1310011101-eng
Dataset updated
May 19, 2021
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Number and rate of new cancer cases diagnosed annually from 1992 to the most recent diagnosis year available. Included are all invasive cancers and in situ bladder cancer with cases defined using the Surveillance, Epidemiology and End Results (SEER) Groups for Primary Site based on the World Health Organization International Classification of Diseases for Oncology, Third Edition (ICD-O-3). Random rounding of case counts to the nearest multiple of 5 is used to prevent inappropriate disclosure of health-related information.
Five-year survival from all cancers (NHSOF 1.4.ii) - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Aug 4, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2015). Five-year survival from all cancers (NHSOF 1.4.ii) - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/five-year-survival-from-all-cancers-nhsof-1-4-ii
Explore at:
Dataset updated
Aug 4, 2015
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
A measure of the number of adults diagnosed with any type of cancer in a year who are still alive five years after diagnosis. Purpose This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with any type of cancer. Current version updated: Feb-17 Next version due: Feb-18
p
Breast Cancer Dataset - Dataset - CKAN
data.poltekkes-smg.ac.id
Updated Oct 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Breast Cancer Dataset - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/breast-cancer-dataset
Explore at:
Dataset updated
Oct 7, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description: Breast cancer is the most common cancer amongst women in the world. It accounts for 25% of all cancer cases, and affected over 2.1 Million people in 2015 alone. It starts when cells in the breast begin to grow out of control. These cells usually form tumors that can be seen via X-ray or felt as lumps in the breast area. The key challenges against it’s detection is how to classify tumors into malignant (cancerous) or benign(non cancerous). We ask you to complete the analysis of classifying these tumors using machine learning (with SVMs) and the Breast Cancer Wisconsin (Diagnostic) Dataset. Acknowledgements: This dataset has been referred from Kaggle. Objective: Understand the Dataset & cleanup (if required). Build classification models to predict whether the cancer type is Malignant or Benign. Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.
h
lung-cancer
huggingface.co
Updated Jun 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nate Raw (2022). lung-cancer [Dataset]. https://huggingface.co/datasets/nateraw/lung-cancer
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 24, 2022
Authors
Nate Raw
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for Lung Cancer

Dataset Summary

The effectiveness of cancer prediction system helps the people to know their cancer risk with low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system .

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/nateraw/lung-cancer.
breast cancer
figshare.com
txt
Updated Mar 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ariel Silva (2022). breast cancer [Dataset]. http://doi.org/10.6084/m9.figshare.19441766.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19441766.v1
Dataset updated
Mar 28, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ariel Silva
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Cancer affects people of different ages, ethnicities and sex. Collecting and storing data from these people assists in the development, understanding and analysis of statistics on the disease. In Brazil, the oncology hospital units, whichreceive patients diagnosed with cancer, store the information in a national database, called Hospital Registry of Cancer (RHC). Were selected the folowing variables: age, sex, race, alcohol consumption, tobacco consumption and cancer staging.
s
Five-year survival from breast, lung and colorectal cancer (NHSOF 1.4.iv) -...
ckan.publishing.service.gov.uk
Updated Aug 4, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). Five-year survival from breast, lung and colorectal cancer (NHSOF 1.4.iv) - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/five-year-survival-from-breast-lung-and-colorectal-cancer-nhsof-1-4-iv
Explore at:
Dataset updated
Aug 4, 2015
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
A measure of the number of adults diagnosed with breast, lung or colorectal cancer in a year who are still alive five years after diagnosis. ONS still publish survival percentages for individual types of cancers. These can be found at: http://www.ons.gov.uk/ons/rel/cancer-unit/cancer-survival/cancer-survival-in-england--patients-diagnosed-2007-2011-and-followed-up-to-2012/index.html A time series for five-year survival figures for breast, lung and colorectal cancer individually (previous NHS Outcomes Framework indicators 1.4.ii, 1.4.iv and 1.4.vi) is still published and can be found under the link 'Indicator data - previous methodology (.xls)' below. Purpose This indicator attempts to capture the success of the NHS in preventing people from dying once they have been diagnosed with breast, lung or colorectal cancer. Current version updated: May-14 Next version due: To be confirmed
b
Mortality rate from oral cancer, all ages - WMCA
cityobservatory.birmingham.gov.uk
csv, excel, geojson +1
Updated Nov 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Mortality rate from oral cancer, all ages - WMCA [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/mortality-rate-from-oral-cancer-all-ages-wmca/
Explore at:
csv, geojson, json, excelAvailable download formats
Dataset updated
Nov 3, 2025
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Age-standardised rate of mortality from oral cancer (ICD-10 codes C00-C14) in persons of all ages and sexes per 100,000 population.RationaleOver the last decade in the UK (between 2003-2005 and 2012-2014), oral cancer mortality rates have increased by 20% for males and 19% for females1Five year survival rates are 56%. Most oral cancers are triggered by tobacco and alcohol, which together account for 75% of cases2. Cigarette smoking is associated with an increased risk of the more common forms of oral cancer. The risk among cigarette smokers is estimated to be 10 times that for non-smokers. More intense use of tobacco increases the risk, while ceasing to smoke for 10 years or more reduces it to almost the same as that of non-smokers3. Oral cancer mortality rates can be used in conjunction with registration data to inform service planning as well as comparing survival rates across areas of England to assess the impact of public health prevention policies such as smoking cessation.References:(1) Cancer Research Campaign. Cancer Statistics: Oral – UK. London: CRC, 2000.(2) Blot WJ, McLaughlin JK, Winn DM et al. Smoking and drinking in relation to oral and pharyngeal cancer. Cancer Res 1988; 48: 3282-7. (3) La Vecchia C, Tavani A, Franceschi S et al. Epidemiology and prevention of oral cancer. Oral Oncology 1997; 33: 302-12.Definition of numeratorAll cancer mortality for lip, oral cavity and pharynx (ICD-10 C00-C14) in the respective calendar years aggregated into quinary age bands (0-4, 5-9,…, 85-89, 90+). This does not include secondary cancers or recurrences. Data are reported according to the calendar year in which the cancer was diagnosed.Counts of deaths for years up to and including 2019 have been adjusted where needed to take account of the MUSE ICD-10 coding change introduced in 2020. Detailed guidance on the MUSE implementation is available at: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/causeofdeathcodinginmortalitystatisticssoftwarechanges/january2020Counts of deaths for years up to and including 2013 have been double adjusted by applying comparability ratios from both the IRIS coding change and the MUSE coding change where needed to take account of both the MUSE ICD-10 coding change and the IRIS ICD-10 coding change introduced in 2014. The detailed guidance on the IRIS implementation is available at: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/bulletins/impactoftheimplementationofirissoftwareforicd10causeofdeathcodingonmortalitystatisticsenglandandwales/2014-08-08Counts of deaths for years up to and including 2010 have been triple adjusted by applying comparability ratios from the 2011 coding change, the IRIS coding change and the MUSE coding change where needed to take account of the MUSE ICD-10 coding change, the IRIS ICD-10 coding change and the ICD-10 coding change introduced in 2011. The detailed guidance on the 2011 implementation is available at https://webarchive.nationalarchives.gov.uk/ukgwa/20160108084125/http://www.ons.gov.uk/ons/guide-method/classifications/international-standard-classifications/icd-10-for-mortality/comparability-ratios/index.htmlDefinition of denominatorPopulation-years (aggregated populations for the three years) for people of all ages, aggregated into quinary age bands (0-4, 5-9, …, 85-89, 90+)

Lung-Cancer-Risk-Dataset

kaggle.com

Updated Aug 23, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Mikey-TraceGod (2025). Lung-Cancer-Risk-Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/12844025

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/12844025

Dataset updated

Aug 23, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Mikey-TraceGod

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Lung Cancer Risk Dataset

Overview

This dataset contains 50,000 patient profiles designed for lung cancer risk analysis and machine learning applications. The dataset is clean, preprocessed, and ready for immediate use in classification tasks, statistical analysis, and data visualization.

Rows: 50,000
Columns: 11
File: preprocessed_lung_cancer_dataset.csv
License: CC0: Public Domain

Dataset Description

The dataset includes patient profiles with features based on established lung cancer risk factors such as smoking history, environmental exposures, and chronic lung conditions. All data is synthetic and designed to reflect realistic risk factor distributions while maintaining patient privacy.

Features

Column	Type	Description	Values/Range
patient_id	Integer	Unique patient identifier	100000-149999
age	Integer	Patient age in years	18-100
gender	String	Patient gender	'Male', 'Female'
pack_years	Float	Smoking exposure (years × packs per day)	0-100
radon_exposure	String	Residential radon exposure level	'Low', 'Medium', 'High'
asbestos_exposure	String	Occupational asbestos exposure history	'Yes', 'No'
secondhand_smoke_exposure	String	Passive smoking exposure	'Yes', 'No'
copd_diagnosis	String	Chronic obstructive pulmonary disease diagnosis	'Yes', 'No'
alcohol_consumption	String	Alcohol consumption pattern	'None', 'Moderate', 'Heavy'
family_history	String	Family history of lung cancer	'Yes', 'No'
lung_cancer	String	Target variable: Lung cancer diagnosis	'Yes', 'No'

Data Quality

Complete: No missing values or duplicates
Clean: All values within realistic ranges
Balanced Features: Realistic distribution of risk factors
Target Distribution: Approximately 25% positive cases, reflecting real-world lung cancer prevalence

Use Cases

Binary classification modeling
Risk factor correlation analysis
Data visualization and exploratory analysis
Machine learning pipeline development
Statistical hypothesis testing

p
Urinary biomarkers for pancreatic cancer - Dataset - CKAN
data.poltekkes-smg.ac.id
Updated Oct 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Urinary biomarkers for pancreatic cancer - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/urinary-biomarkers-for-pancreatic-cancer
Explore at:
Dataset updated
Oct 8, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Can a simple urine test detect one of the deadliest cancers? About Dataset This is a brand-new (!) dataset from an open-access paper published December 10, 2020. The paper and the full dataset are open-access (CC-BY), so please give attribution to the original authors in your work. Background Pancreatic cancer is an extremely deadly type of cancer. Once diagnosed, the five-year survival rate is less than 10%. However, if pancreatic cancer is caught early, the odds of surviving are much better. Unfortunately, many cases of pancreatic cancer show no symptoms until the cancer has spread throughout the body. A diagnostic test to identify people with pancreatic cancer could be enormously helpful. The paper In a paper by Silvana Debernardi and colleagues, published this year in the journal PLOS Medicine, a multi-national team of researchers sought to develop an accurate diagnostic test for the most common type of pancreatic cancer, called pancreatic ductal adenocarcinoma or PDAC. They gathered a series of biomarkers from the urine of three groups of patients: Healthy controls Patients with non-cancerous pancreatic conditions, like chronic pancreatitis Patients with pancreatic ductal adenocarcinoma When possible, these patients were age- and sex-matched. The goal was to develop an accurate way to identify patients with pancreatic cancer. The data The key features are four urinary biomarkers: creatinine, LYVE1, REG1B, and TFF1. Creatinine is a protein that is often used as an indicator of kidney function. YVLE1 is lymphatic vessel endothelial hyaluronan receptor 1, a protein that may play a role in tumor metastasis REG1B is a protein that may be associated with pancreas regeneration TFF1 is trefoil factor 1, which may be related to regeneration and repair of the urinary tract Age and sex, both included in the dataset, may also play a role in who gets pancreatic cancer. The dataset includes a few other biomarkers as well, but these were not measured in all patients (they were collected partly to measure how various blood biomarkers compared to urine biomarkers). I have not changed any of the data from the paper, other than renaming the columns for easy importing and use. The file Debernardi et al 2020 data.csv contains the raw data, while the file Debernardi et al 2020 documentation.csv contains a detailed documentation of what each column represents (as well as the original column names from the paper). Prediction task The goal in this dataset is predicting diagnosis, and more specifically, differentiating between 3 (pancreatic cancer) versus 2 (non-cancerous pancreas condition) and 1 (healthy). The dataset includes information on stage of pancreatic cancer, and diagnosis for non-cancerous patients, but remember—these won't be available to a predictive model. The goal, after all, is to predict the presence of disease before it's diagnosed, not after! Acknowledgements I would like to thank the authors of this paper, for graciously sharing their raw data with the research community.
IDC Breast Cancer Dataset Descriptions.
plos.figshare.com
xls
Updated Sep 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mudhafar Jalil Jassim Ghrabat; Arkan A. Ghaib; Auhood Al-Hossenat; Zaid Ameen Abduljabbar; Vincent Omollo Nyangaresi; Junchao Ma; Abdulla J. Y. Aldarwish; Iman Qays Abduljaleel; Dhafer G. Honi; Husam A. Neamah (2025). IDC Breast Cancer Dataset Descriptions. [Dataset]. http://doi.org/10.1371/journal.pone.0329078.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0329078.t005
Dataset updated
Sep 3, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Mudhafar Jalil Jassim Ghrabat; Arkan A. Ghaib; Auhood Al-Hossenat; Zaid Ameen Abduljabbar; Vincent Omollo Nyangaresi; Junchao Ma; Abdulla J. Y. Aldarwish; Iman Qays Abduljaleel; Dhafer G. Honi; Husam A. Neamah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Breast cancer is highlighted in recent research as one of the most prevalent types of cancer. Timely identification is essential for enhancing patient results and decreasing fatality rates. Utilizing computer-assisted detection and diagnosis early on may greatly improve the chances of recovery by accurately predicting outcomes and developing suitable treatment plans. Grading breast cancer properly, especially evaluating nuclear atypia, is difficult owing to faults and inconsistencies in slide preparation and the intricate nature of tissue patterns. This work explores the capability of deep learning to extract characteristics from histopathology photos of breast cancer. The research introduces a new method called SMOTE-based Convolutional Neural Network (CNN) technology to detect areas impacted by Invasive Ductal Carcinoma (IDC) in whole slide pictures. The trials used a dataset of 162 individuals with IDC, split into training (113 photos) and testing (49 images) groups. Every model was subjected to individual testing. The SMO_CNN model we developed demonstrated exceptional testing and training accuracies of 98.95% and 99.20% respectively, surpassing CNN, VGG19, and ResNet50 models. The results highlight the effectiveness of the created model in properly detecting IDC-affected tissue areas, showing great promise for improving breast cancer diagnosis and treatment planning. We surpassing other models as such, CNN, VGG19, ResNet50.
b
One year survival from all cancers - ICP Outcomes Framework - Birmingham and...
cityobservatory.birmingham.gov.uk
csv, excel, geojson +1
Updated Sep 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). One year survival from all cancers - ICP Outcomes Framework - Birmingham and Solihull [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/one-year-survival-from-all-cancers-icp-outcomes-framework-birmingham-and-solihull/
Explore at:
excel, csv, geojson, jsonAvailable download formats
Dataset updated
Sep 10, 2025
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
Solihull
Description
This dataset provides insights into one-year survival rates from all cancers, serving as a key indicator of early cancer outcomes. It measures the proportion of individuals diagnosed with an invasive cancer who survive for at least one year following their diagnosis. The dataset includes all invasive tumours classified under ICD-10 codes C00 to C97, excluding non-melanoma skin cancer (C44). It supports analysis across different population groups and geographies, including ethnicity, deprivation levels, and the Birmingham and Solihull (BSol) area.

Rationale

Improving one-year survival rates is a critical goal in cancer care, as it reflects the effectiveness of early diagnosis and initial treatment. This indicator helps monitor progress in reducing early mortality from cancer and supports targeted interventions to improve outcomes.

Numerator

The numerator includes individuals who were diagnosed with a specific type of cancer and died from the same type of cancer within one year of diagnosis. Only invasive cancers are included, as defined by ICD-10 codes C00 to C97, excluding non-melanoma skin cancer (C44). Data is sourced from the National Cancer Registration and Analysis Service (NCRAS).

Denominator

The denominator comprises all individuals diagnosed with an invasive cancer (ICD-10 codes C00 to C97, excluding C44) within a five-year period. This data is also sourced from the National Cancer Registration and Analysis Service (NCRAS).

Caveats

This dataset uses a simplified methodology that differs from the national calculation of one-year cancer survival. As a result, the figures presented here may not align with nationally published statistics. However, this approach enables the provision of survival data disaggregated by ethnicity, deprivation, and local geographies such as BSol, which is not always possible with national data.

External references

For more information, visit the National Cancer Registration and Analysis Service (NCRAS).

Click here to explore more from the Birmingham and Solihull Integrated Care Partnerships Outcome Framework.
DataSheet_1_Triple-negative breast cancer survival prediction:...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yu Qiu; Yan Chen; Haoyang Shen; Shuixin Yan; Jiadi Li; Weizhu Wu (2024). DataSheet_1_Triple-negative breast cancer survival prediction: population-based research using the SEER database and an external validation cohort.xls [Dataset]. http://doi.org/10.3389/fonc.2024.1388869.s001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.3389/fonc.2024.1388869.s001
Dataset updated
Jun 10, 2024
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Yu Qiu; Yan Chen; Haoyang Shen; Shuixin Yan; Jiadi Li; Weizhu Wu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionTriple-negative breast cancer (TNBC) is linked to a poorer outlook, heightened aggressiveness relative to other breast cancer variants, and limited treatment choices. The absence of conventional treatment methods makes TNBC patients susceptible to metastasis. The objective of this research was to assess the clinical and pathological traits of TNBC patients, predict the influence of risk elements on their outlook, and create a prediction model to assist doctors in treating TNBC patients and enhancing their prognosis.MethodsWe included 23,394 individuals with complete baseline clinical data and survival information who were diagnosed with primary TNBC between 2010 and 2015 based on the SEER database. External validation utilised a group from The Affiliated Lihuili Hospital of Ningbo University. Independent risk factors linked to TNBC prognosis were identified through univariate, multivariate, and least absolute shrinkage and selection operator regression methods. These characteristics were chosen as parameters to develop 3- and 5-year overall survival (OS) and breast cancer-specific survival (BCSS) nomogram models. Model accuracy was assessed using calibration curves, consistency indices (C-indices), receiver operating characteristic curves (ROCs), and decision curve analyses (DCAs). Finally, TNBC patients were divided into groups of high, medium, and low risk, employing the nomogram model for conducting a Kaplan-Meier survival analysis.ResultsIn the training cohort, variables such as age at diagnosis, marital status, grade, T stage, N stage, M stage, surgery, radiation, and chemotherapy were linked to OS and BCSS. For the nomogram, the C-indices stood at 0.762, 0.747, and 0.764 in forecasting OS across the training, internal validation, and external validation groups, respectively. Additionally, the C-index values for the training, internal validation, and external validation groups in BCSS prediction stood at 0.793, 0.755, and 0.811, in that order. The findings revealed that the calibration of our nomogram model was successful, and the time-variant ROC curves highlighted its effectiveness in clinical settings. Ultimately, the clinical DCA showcased the prospective clinical advantages of the suggested model. Furthermore, the online version was simple to use, and nomogram classification may enhance the differentiation of TNBC prognosis and distinguish risk groups more accurately.ConclusionThese nomograms are precise tools for assessing risk in patients with TNBC and forecasting survival. They can help doctors identify prognostic markers and create more effective treatment plans for patients with TNBC, providing more accurate assessments of their 3- and 5-year OS and BCSS.
Z
Data set - What Defines Quality of Life for Older Patients Diagnosed with...
nde-dev.biothings.io
data.niaid.nih.gov
+1more
Updated Oct 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seghers, PAL (2022). Data set - What Defines Quality of Life for Older Patients Diagnosed with Cancer? A Qualitative Study [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_7062210
Explore at:
Dataset updated
Oct 5, 2022
Dataset provided by
Jolina A. Kregting
Siri Rostoft
Seghers, PAL
Shane O'Hanlon
Marije E. Hamaker
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data set from- What Defines Quality of Life for Older Patients Diagnosed with Cancer? A Qualitative Study

Abstract of the study: The treatment of cancer can have a significant impact on quality of life in older patients and this needs to be taken into account in decision making. However, quality of life can consist of many different components with varying importance between individuals. We set out to assess how older patients with cancer define quality of life and the components that are most significant to them. This was a single-centre, qualitative interview study. Patients aged 70 years or older with cancer were asked to answer open-ended questions: What makes life worthwhile? What does quality of life mean to you? What could affect your quality of life? Subsequently, they were asked to choose the five most important determinants of quality of life from a predefined list: cognition, contact with family or with community, independence, staying in your own home, helping others, having enough energy, emotional well-being, life satisfaction, religion and leisure activities. Afterwards, answers to the open-ended questions were independently categorized by two authors. The proportion of patients mentioning each category in the open-ended questions were compared to the predefined questions. Overall, 63 patients (median age 76 years) were included. When asked, “What makes life worthwhile?”, patients identified social functioning (86%) most frequently. Moreover, to define quality of life, patients most frequently mentioned categories in the domains of physical functioning (70%) and physical health (48%). Maintaining cognition was mentioned in 17% of the open-ended questions and it was the most commonly chosen option from the list of determinants (72% of respondents). In conclusion, physical functioning, social functioning, physical health and cognition are important components in quality of life. When discussing treatment options, the impact of treatment on these aspects should be taken into consideration.

Reference of research paper: Seghers PAL, Kregting JA, van Huis-Tanja LH, Soubeyran P, O'Hanlon S, Rostoft S, Hamaker ME, Portielje JEA. What Defines Quality of Life for Older Patients Diagnosed with Cancer? A Qualitative Study. Cancers. 2022; 14(5):1123. https://doi.org/10.3390/cancers14051123

Content of the data set: The first Tab describes what questions were asked, the second tab shows all individual anonymised answers to the open questions, the fourth shows the definitions that were used to classify all answers. Q1-Q4 show how the answers were categorised.

Facebook

Twitter

Click to copy link

Link copied

Cite

MasterDataSan (2024). Lung Cancer Mortality Datasets v2 [Dataset]. https://www.kaggle.com/datasets/masterdatasan/lung-cancer-mortality-datasets-v2

Lung Cancer Mortality Datasets v2

Dataset of lung cancer with time observation durring theatment period

Explore at:

zip(81127029 bytes)Available download formats

Dataset updated

Jun 1, 2024

Authors

MasterDataSan

Description

This dataset contains data about lung cancer Mortality. This database is a comprehensive collection of patient information, specifically focused on individuals diagnosed with cancer. It is designed to facilitate the analysis of various factors that may influence cancer prognosis and treatment outcomes. The database includes a range of demographic, medical, and treatment-related variables, capturing essential details about each patient's condition and history.

Key components of the database include:

Demographic Information: Basic details about the patients such as age, gender, and country of residence. This helps in understanding the distribution of cancer cases across different populations and regions.

Medical History: Information about each patient’s medical background, including family history of cancer, smoking status, Body Mass Index (BMI), cholesterol levels, and the presence of other health conditions such as hypertension, asthma, cirrhosis, and other cancers. This section is crucial for identifying potential risk factors and comorbidities.

Cancer Diagnosis: Detailed data about the cancer diagnosis itself, including the date of diagnosis and the stage of cancer at the time of diagnosis. This helps in tracking the progression and severity of the disease.

Treatment Details: Information regarding the type of treatment each patient received, the end date of the treatment, and the outcome (whether the patient survived or not). This is essential for evaluating the effectiveness of different treatment approaches.

The structure of the database allows for in-depth analysis and research, making it possible to identify patterns, correlations, and potential causal relationships between various factors and cancer outcomes. It is a valuable resource for medical researchers, epidemiologists, and healthcare providers aiming to improve cancer treatment and patient care.

id: A unique identifier for each patient in the dataset. age: The age of the patient at the time of diagnosis. gender: The gender of the patient (e.g., male, female). country: The country or region where the patient resides. diagnosis_date: The date on which the patient was diagnosed with lung cancer. cancer_stage: The stage of lung cancer at the time of diagnosis (e.g., Stage I, Stage II, Stage III, Stage IV). family_history: Indicates whether there is a family history of cancer (e.g., yes, no). smoking_status: The smoking status of the patient (e.g., current smoker, former smoker, never smoked, passive smoker). bmi: The Body Mass Index of the patient at the time of diagnosis. cholesterol_level: The cholesterol level of the patient (value). hypertension: Indicates whether the patient has hypertension (high blood pressure) (e.g., yes, no). asthma: Indicates whether the patient has asthma (e.g., yes, no). cirrhosis: Indicates whether the patient has cirrhosis of the liver (e.g., yes, no). other_cancer: Indicates whether the patient has had any other type of cancer in addition to the primary diagnosis (e.g., yes, no). treatment_type: The type of treatment the patient received (e.g., surgery, chemotherapy, radiation, combined). end_treatment_date: The date on which the patient completed their cancer treatment or died. survived: Indicates whether the patient survived (e.g., yes, no).

This dataset contains artificially generated data with as close a representation of reality as possible. This data is free to use without any licence required.

Good luck Gakusei!

Clear search

Close search

Google apps

Main menu

Lung Cancer Mortality Datasets v2

Breast Cancer Dataset [Wisconsin Diagnostic UCI]

Appendix Cancer Prediction Dataset

Lung Cancer Dataset

[MI] Rapid Cancer Registration Data

One-year survival from all cancers (NHSOF 1.4.i) - Dataset - data.gov.uk

Cancer survival in England - adults diagnosed

Number and rates of new cases of primary cancer, by cancer type, age group...

Five-year survival from all cancers (NHSOF 1.4.ii) - Dataset - data.gov.uk

Breast Cancer Dataset - Dataset - CKAN

lung-cancer

breast cancer

Five-year survival from breast, lung and colorectal cancer (NHSOF 1.4.iv) -...

Mortality rate from oral cancer, all ages - WMCA

Lung-Cancer-Risk-Dataset

Lung Cancer Risk Dataset

Overview

Dataset Description

Features

Data Quality

Use Cases

Urinary biomarkers for pancreatic cancer - Dataset - CKAN

IDC Breast Cancer Dataset Descriptions.

One year survival from all cancers - ICP Outcomes Framework - Birmingham and...

DataSheet_1_Triple-negative breast cancer survival prediction:...

Data set - What Defines Quality of Life for Older Patients Diagnosed with...

Lung Cancer Mortality Datasets v2

Dataset of lung cancer with time observation durring theatment period