100+ datasets found

Lung Cancer Mortality Datasets v2
kaggle.com
zip
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MasterDataSan (2024). Lung Cancer Mortality Datasets v2 [Dataset]. https://www.kaggle.com/datasets/masterdatasan/lung-cancer-mortality-datasets-v2
Explore at:
zip(81127029 bytes)Available download formats
Dataset updated
Jun 1, 2024
Authors
MasterDataSan
Description
This dataset contains data about lung cancer Mortality. This database is a comprehensive collection of patient information, specifically focused on individuals diagnosed with cancer. It is designed to facilitate the analysis of various factors that may influence cancer prognosis and treatment outcomes. The database includes a range of demographic, medical, and treatment-related variables, capturing essential details about each patient's condition and history.

Key components of the database include:

Demographic Information: Basic details about the patients such as age, gender, and country of residence. This helps in understanding the distribution of cancer cases across different populations and regions.

Medical History: Information about each patient’s medical background, including family history of cancer, smoking status, Body Mass Index (BMI), cholesterol levels, and the presence of other health conditions such as hypertension, asthma, cirrhosis, and other cancers. This section is crucial for identifying potential risk factors and comorbidities.

Cancer Diagnosis: Detailed data about the cancer diagnosis itself, including the date of diagnosis and the stage of cancer at the time of diagnosis. This helps in tracking the progression and severity of the disease.

Treatment Details: Information regarding the type of treatment each patient received, the end date of the treatment, and the outcome (whether the patient survived or not). This is essential for evaluating the effectiveness of different treatment approaches.

The structure of the database allows for in-depth analysis and research, making it possible to identify patterns, correlations, and potential causal relationships between various factors and cancer outcomes. It is a valuable resource for medical researchers, epidemiologists, and healthcare providers aiming to improve cancer treatment and patient care.

id: A unique identifier for each patient in the dataset. age: The age of the patient at the time of diagnosis. gender: The gender of the patient (e.g., male, female). country: The country or region where the patient resides. diagnosis_date: The date on which the patient was diagnosed with lung cancer. cancer_stage: The stage of lung cancer at the time of diagnosis (e.g., Stage I, Stage II, Stage III, Stage IV). family_history: Indicates whether there is a family history of cancer (e.g., yes, no). smoking_status: The smoking status of the patient (e.g., current smoker, former smoker, never smoked, passive smoker). bmi: The Body Mass Index of the patient at the time of diagnosis. cholesterol_level: The cholesterol level of the patient (value). hypertension: Indicates whether the patient has hypertension (high blood pressure) (e.g., yes, no). asthma: Indicates whether the patient has asthma (e.g., yes, no). cirrhosis: Indicates whether the patient has cirrhosis of the liver (e.g., yes, no). other_cancer: Indicates whether the patient has had any other type of cancer in addition to the primary diagnosis (e.g., yes, no). treatment_type: The type of treatment the patient received (e.g., surgery, chemotherapy, radiation, combined). end_treatment_date: The date on which the patient completed their cancer treatment or died. survived: Indicates whether the patient survived (e.g., yes, no).

This dataset contains artificially generated data with as close a representation of reality as possible. This data is free to use without any licence required.

Good luck Gakusei!
Cancer Rates by U.S. State
kaggle.com
zip
Updated Dec 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heemali Chaudhari (2022). Cancer Rates by U.S. State [Dataset]. https://www.kaggle.com/datasets/heemalichaudhari/cancer-rates-by-us-state
Explore at:
zip(219237 bytes)Available download formats
Dataset updated
Dec 26, 2022
Authors
Heemali Chaudhari
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
In the following maps, the U.S. states are divided into groups based on the rates at which people developed or died from cancer in 2013, the most recent year for which incidence data are available.

The rates are the numbers out of 100,000 people who developed or died from cancer each year.

Incidence Rates by State The number of people who get cancer is called cancer incidence. In the United States, the rate of getting cancer varies from state to state.

*Rates are per 100,000 and are age-adjusted to the 2000 U.S. standard population.

‡Rates are not shown if the state did not meet USCS publication criteria or if the state did not submit data to CDC.

†Source: U.S. Cancer Statistics Working Group. United States Cancer Statistics: 1999–2013 Incidence and Mortality Web-based Report. Atlanta (GA): Department of Health and Human Services, Centers for Disease Control and Prevention, and National Cancer Institute; 2016. Available at: http://www.cdc.gov/uscs.

Death Rates by State Rates of dying from cancer also vary from state to state.

*Rates are per 100,000 and are age-adjusted to the 2000 U.S. standard population.

†Source: U.S. Cancer Statistics Working Group. United States Cancer Statistics: 1999–2013 Incidence and Mortality Web-based Report. Atlanta (GA): Department of Health and Human Services, Centers for Disease Control and Prevention, and National Cancer Institute; 2016. Available at: http://www.cdc.gov/uscs.

Source: https://www.cdc.gov/cancer/dcpc/data/state.htm
Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER)...
catalog.data.gov
data.virginia.gov
+3more
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Cancer Institute (NCI), National Institutes of Health (NIH) (2025). Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER) Registries Limited-Use [Dataset]. https://catalog.data.gov/dataset/cancer-incidence-surveillance-epidemiology-and-end-results-seer-registries-limited-use
Explore at:
Dataset updated
Jul 16, 2025
Dataset provided by
National Cancer Institutehttp://www.cancer.gov/
Description
SEER Limited-Use cancer incidence data with associated population data. Geographic areas available are county and SEER registry. The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute collects and distributes high quality, comprehensive cancer data from a number of population-based cancer registries. Data include patient demographics, primary tumor site, morphology, stage at diagnosis, first course of treatment, and follow-up for vital status. The SEER Program is the only comprehensive source of population-based information in the United States that includes stage of cancer at the time of diagnosis and survival rates within each stage.
Cancer data of United States of America
kaggle.com
zip
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tanisha1604 (2024). Cancer data of United States of America [Dataset]. https://www.kaggle.com/datasets/tanisha1604/cancer-data-of-united-states-of-america
Explore at:
zip(346754 bytes)Available download formats
Dataset updated
Apr 18, 2024
Authors
Tanisha1604
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
United States
Description
About Dataset

The dataset contains 2 .csv files This file contains various demographic and health-related data for different regions. Here's a brief description of each column:

File 1st

avganncount: Average number of cancer cases diagnosed annually.

avgdeathsperyear: Average number of deaths due to cancer per year.

target_deathrate: Target death rate due to cancer.

incidencerate: Incidence rate of cancer.

medincome: Median income in the region.

popest2015: Estimated population in 2015.

povertypercent: Percentage of population below the poverty line.

studypercap: Per capita number of cancer-related clinical trials conducted.

binnedinc: Binned median income.

medianage: Median age in the region.

pctprivatecoveragealone: Percentage of population covered by private health insurance alone.

pctempprivcoverage: Percentage of population covered by employee-provided private health insurance.

pctpubliccoverage: Percentage of population covered by public health insurance.

pctpubliccoveragealone: Percentage of population covered by public health insurance only.

pctwhite: Percentage of White population.

pctblack: Percentage of Black population.

pctasian: Percentage of Asian population.

pctotherrace: Percentage of population belonging to other races.

pctmarriedhouseholds: Percentage of married households. birthrate: Birth rate in the region.

File 2nd

This file contains demographic information about different regions, including details about household size and geographical location. Here's a description of each column:

statefips: The FIPS code representing the state.

countyfips: The FIPS code representing the county or census area within the state.

avghouseholdsize: The average household size in the region.

geography: The geographical location, typically represented as the county or census area name followed by the state name.

Each row in the file represents a specific region, providing details about household size and geographical location. This information can be used for various demographic analyses and studies.
H
SEER Cancer Statistics Database
dataverse.harvard.edu
data.niaid.nih.gov
Updated Jul 11, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2011). SEER Cancer Statistics Database [Dataset]. http://doi.org/10.7910/DVN/C9KBBC
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/C9KBBC
Dataset updated
Jul 11, 2011
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Users can access data about cancer statistics in the United States including but not limited to searches by type of cancer and race, sex, ethnicity, age at diagnosis, and age at death. Background Surveillance Epidemiology and End Results (SEER) database’s mission is to provide information on cancer statistics to help reduce the burden of disease in the U.S. population. The SEER database is a project to the National Cancer Institute. The SEER database collects information on incidence, prevalence, and survival from specific geographic areas representing 28 percent of the United States population. User functionality Users can access a variety of reso urces. Cancer Stat Fact Sheets allow users to look at summaries of statistics by major cancer type. Cancer Statistic Reviews are available from 1975-2008 in table format. Users are also able to build their own tables and graphs using Fast Stats. The Cancer Query system provides more flexibility and a larger set of cancer statistics than F ast Stats but requires more input from the user. State Cancer Profiles include dynamic maps and graphs enabling the investigation of cancer trends at the county, state, and national levels. SEER research data files and SEER*Stat software are available to download through your Internet connection (SEER*Stat’s client-server mode) or via discs shipped directly to you. A signed data agreement form is required to access the SEER data Data Notes Data is available in different formats depending on which type of data is accessed. Some data is available in table, PDF, and html formats. Detailed information about the data is available under “Data Documentation and Variable Recodes”.
Number and rates of new cases of primary cancer, by cancer type, age group...
www150.statcan.gc.ca
datasets.ai
+2more
Updated May 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2021). Number and rates of new cases of primary cancer, by cancer type, age group and sex [Dataset]. http://doi.org/10.25318/1310011101-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1310011101-eng
Dataset updated
May 19, 2021
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Number and rate of new cancer cases diagnosed annually from 1992 to the most recent diagnosis year available. Included are all invasive cancers and in situ bladder cancer with cases defined using the Surveillance, Epidemiology and End Results (SEER) Groups for Primary Site based on the World Health Organization International Classification of Diseases for Oncology, Third Edition (ICD-O-3). Random rounding of case counts to the nearest multiple of 5 is used to prevent inappropriate disclosure of health-related information.
d
[MI] Rapid Cancer Registration Data
digital.nhs.uk
Updated Nov 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). [MI] Rapid Cancer Registration Data [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/mi-rapid-cancer-registration-data
Explore at:
Dataset updated
Nov 27, 2025
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Description
Rapid Cancer Registration Data (RCRD) provides a quick, indicative source of cancer data. It is provided to support the planning and provision of cancer services. The data is based on a rapid processing of cancer registration data sources, in particular on Cancer Outcomes and Services Dataset (COSD) information. In comparison, National Cancer Registration Data (NCRD) relies on additional data sources, enhanced follow-up with trusts and expert processing by cancer registration officers. The Rapid Cancer Registration Data (RCRD) may be useful for service improvement projects including healthcare planning and prioritisation. However, it is poorly suited for epidemiological research due to limitations in the data quality and completeness.
c
Multimodal Head and Neck cancer dataset
cancerimagingarchive.net
n/a, svs and png
Updated Nov 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2025). Multimodal Head and Neck cancer dataset [Dataset]. http://doi.org/10.7937/rcty-5h16
Explore at:
svs and png, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/rcty-5h16
Dataset updated
Nov 18, 2025
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Nov 18, 2025
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
Abstract
HANCOCK is a comprehensive, monocentric dataset of 763 head and neck cancer patients, including diverse data modalities. It contains histopathology imaging (whole-slide images of H&E-stained primary tumors and tissue microarrays with immunohistochemical staining) alongside structured clinical data (demographics, tumor pathology characteristics, laboratory blood measurements) and textual data (de-identified surgery reports and medical histories). All patients were treated curatively, and data span diagnoses from 2005–2019. This multimodal collection enables research into integrative analyses – for example, combining histologic features with clinical parameters for outcome prediction. Early analyses have demonstrated that fusing these modalities improves prognostic modeling compared to single-source data, and that leveraging histology with foundation models can enhance endpoint prediction. HANCOCK aims to facilitate precision oncology studies by providing a large public resource for developing and benchmarking multimodal machine learning methods in head and neck cancer.
Introduction
Head and neck cancer (HNC) is a prevalent malignancy with poor outcomes – it is the 7th most common cancer globally and carries a 5-year survival of only ~25–60% despite modern treatments. Improving patient prognosis may require personalized, multimodal therapy decisions, using information from pathology, clinical, and other data sources. However, progress in multimodal prediction has been limited by the lack of large public datasets that integrate these diverse data types. To our knowledge, existing HNC datasets are either small or incomplete; for example, a radiomics study included 288 oropharyngeal cases, and a proteomics-focused set with imaging had only 122 cases. The Cancer Genome Atlas (TCGA) provides multi-omics for >500 HNC cases, but lacks crucial data like pathology reports, blood tests, or comprehensive imaging for each patient. These limitations hinder robust multimodal research.
HANCOCK was created to address this gap. It aggregates 763 patients’ data from a single academic center, capturing a real-world, uniformly treated cohort. The dataset uniquely combines whole slide histopathology images, tissue microarray images, detailed clinical parameters, pathology reports, and lab values in one resource. By curating and harmonizing these modalities, HANCOCK enables researchers to explore complex data interdependencies and develop multimodal predictive models. The patient population reflects typical HNC demographics – 80% male, median age 61, with 72% being former or current smokers – aligning with expected epidemiology and supporting generalizability. In summary, HANCOCK is an unprecedented multimodal HNC dataset that can fuel research in machine learning, prognostic biomarker discovery, and integrative oncology, ultimately advancing personalized head and neck cancer care.
Methods
The following sections describe how the HANCOCK data were collected, processed, and prepared for public sharing.
Subject Inclusion and Exclusion Criteria
Patients included in HANCOCK were those diagnosed with head and neck cancer between 2005 and 2019 at University Hospital Erlangen (Germany) who underwent a curative-intent initial treatment (surgery and/or definitive therapy). This encompasses cancers of the oral cavity, oropharynx, hypopharynx, and larynx. Patients treated palliatively or with recurrent/metastatic disease at presentation were excluded to focus on first-course, curative treatments. The cohort consists of 763 patients (approximately 80% male, 20% female) with a median age of 61 years. Notably, ~72% have a history of tobacco use, which is consistent with real-world HNC risk factors. The distribution of tumor subsites and stages reflects typical HNC presentation, and thus the dataset is broadly representative of the general HNC patient population. Being a single-center dataset, there is limited geographic diversity; however, the homogeneous data acquisition and treatment context reduce variability in data quality. No significant selection biases were introduced aside from the exclusion of non-curative cases – all major HNC subsite cases over the inclusion period were captured, providing a comprehensive real-world sample. Ethical approval was obtained for this retrospective data collection and sharing (Ethics Committee vote #23-22-Br), and all data were fully de-identified prior to release.
Data Acquisition
Histopathology: Tissue specimens from the primary tumors (and involved lymph nodes, if present) were obtained from the pathology archives. All samples were formalin-fixed and paraffin-embedded (FFPE) and stained with hematoxylin and eosin (H&E) following routine protocols. Digital whole-slide imaging was performed on these histology slides. A total of 709 H&E slides of primary tumor tissue (701 patients had one slide, 8 patients had two slides) were scanned at high resolution using a 3DHISTECH P1000 scanner at an effective 82.44× magnification (0.1213 µm/pixel). Additionally, 396 H&E slides of lymph node metastases were scanned, using two systems: an Aperio Leica GT450 at 40× (0.2634 µm/pixel) and the 3DHISTECH P1000 at ~51× (0.1945 µm/pixel). (Multiple scanners were utilized over the course of the project; all resulting images were cross-verified for quality.) The digital whole slide images (WSIs) are provided in the pyramidal Aperio SVS format, a TIFF-based format compatible with standard viewers.
In addition to full slides, tissue microarrays (TMAs) were constructed from each patient’s tumor block to sample important regions. For each case, two cylindrical core biopsies (diameter 1.5 mm) were taken – one from the tumor center and one from the invasive tumor front. These cores were assembled into TMA blocks and stained on separate slides with a panel of eight stains: H&E plus immunohistochemical (IHC) markers targeting various immune cells and tumor biomarkers. The IHC markers include CD3, CD8, CD56, CD68, CD163, PD-L1, and MHC-1, which label T cells (CD3, CD8), natural killer cells (CD56), monocytes/macrophages (CD68, CD163), and a tumor immune checkpoint ligand (PD-L1), as well as MHC class I expression. Each core appears on up to 8 stained TMA slides (one per stain), yielding up to 16 TMA images per patient (two cores × eight stains). In the dataset, TMA images are provided for both the tumor-center and tumor-front cores; these too are digitized high-resolution images (consistent microscope settings, ~40×). The combination of WSIs and TMAs yields a rich imaging dataset: 701 patients have at least one primary tumor WSI (62 patients lack WSIs due to unavailable tissue), and all patients have TMA core images unless the tumor block was exhausted. This imaging data offers both broad tissue context from WSIs and targeted cellular detail from TMAs. Manual tumor region annotations are also included for the primary tumor WSIs (see Data Analysis below).
Clinical and Pathology Data: A wide array of non-imaging data was extracted from hospital information systems and pathology reports for each patient. Key demographic variables (age, sex, etc.) and tumor pathology details were collected, including primary tumor site, histologic subtype, grade, TNM stage, resection margin status, depth of invasion, perineural and lymphovascular invasion, and nodal metastasis status. These pathology parameters were recorded in a structured format for each case. Standard clinical coding systems were used where applicable: e.g., diagnoses are coded with ICD-10 codes and procedures with OPS codes (the German procedure classification system). The dataset includes these codes for each patient’s conditions and treatments. Comprehensive laboratory blood test results at diagnosis or pre-treatment were also compiled, covering complete blood counts, coagulation measures, electrolytes, kidney function, C-reactive protein, and other relevant analytes. Reference ranges for each lab parameter are provided alongside the values to indicate whether a result was normal or abnormal. Most patients have a full panel of these lab results, though some values are missing if a test was not clinically indicated; the dataset notes availability per patient. All structured data have been cleaned and validated – for example, harmonizing category values and checking consistency (e.g. TNM stages align with recorded tumor sites).
Textual Data (Surgical Reports and Histories): Unstructured clinical text was also included to add rich context on treatment details. Surgery reports (operative notes) from the primary tumor resection and associated medical history summaries were retrieved from the hospital’s electronic records. For each patient, the operative report from their first definitive surgery and the corresponding
Lung Cancer Dataset
kaggle.com
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aman_Kumar094 (2025). Lung Cancer Dataset [Dataset]. https://www.kaggle.com/datasets/amankumar094/lung-cancer-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 6, 2025
Dataset provided by
Kaggle
Authors
Aman_Kumar094
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
** Description**

This dataset contains data about lung cancer Mortality and is a comprehensive collection of patient information, specifically focused on individuals diagnosed with cancer. This dataset contains comprehensive information on 800,000 individuals related to lung cancer diagnosis, treatment, and outcomes. With 16 well-structured columns. This large-scale dataset is designed to aid researchers, data scientists, and healthcare professionals in studying patterns, building predictive models, and enhancing early detection and treatment strategies.

🌍 The Societal Impact of Lung Cancer

Lung cancer is not just a disease — it's a global crisis that steals time, health, and hope from millions of people every year. As the #1 cause of cancer deaths worldwide, it takes more lives annually than breast, colon, and prostate cancer combined.

But behind every statistic is a story:

A parent who never saw their child graduate.

A worker who had to leave their job too soon.

A community that lost a leader, a friend, a neighbor.

Why does this matter? Lung cancer often goes undetected until it's too late. It’s aggressive, silent, and devastating — especially in underserved areas where early detection is rare and treatment options are limited. It doesn’t just affect patients. It affects families, economies, and healthcare systems on a massive scale.

This dataset represents more than numbers. It represents 800,000 real-world stories — people who can help us unlock patterns, train models, and advance life-saving research.

By working with this data, you're not just analyzing a dataset — you're stepping into the fight against one of humanity’s deadliest diseases.

Let’s turn insight into impact. (😊The above descriptions is generated with the help of AI, Just wanted to share this dataset That all. Thank you)
a
Cancer (in persons of all ages): England
hub.arcgis.com
data.catchmentbasedapproach.org
Updated Apr 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Rivers Trust (2021). Cancer (in persons of all ages): England [Dataset]. https://hub.arcgis.com/datasets/c5c07229db684a65822fdc9a29388b0b
Explore at:
Dataset updated
Apr 6, 2021
Dataset authored and provided by
The Rivers Trust
Area covered

Description
SUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of cancer (in persons of all ages). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to cancer (in persons of all ages).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOA’s population (all ages) with cancer was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOA’s population with cancer was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with cancer, within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have cancerB) the NUMBER of people within that MSOA who are estimated to have cancerAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have cancer, compared to other MSOAs. In other words, those are areas where it’s estimated a large number of people suffer from cancer, and where those people make up a large percentage of the population, indicating there is a real issue with cancer within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of cancer, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of cancer.TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.MSOA boundaries: © Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right 2021.Population data: Mid-2019 (June 30) Population Estimates for Middle Layer Super Output Areas in England and Wales. © Office for National Statistics licensed under the Open Government Licence v3.0. © Crown Copyright 2020.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital; © Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right 2021. © Crown Copyright 2020.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
p
Breast Cancer Dataset - Dataset - CKAN
data.poltekkes-smg.ac.id
Updated Oct 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Breast Cancer Dataset - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/breast-cancer-dataset
Explore at:
Dataset updated
Oct 7, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description: Breast cancer is the most common cancer amongst women in the world. It accounts for 25% of all cancer cases, and affected over 2.1 Million people in 2015 alone. It starts when cells in the breast begin to grow out of control. These cells usually form tumors that can be seen via X-ray or felt as lumps in the breast area. The key challenges against it’s detection is how to classify tumors into malignant (cancerous) or benign(non cancerous). We ask you to complete the analysis of classifying these tumors using machine learning (with SVMs) and the Breast Cancer Wisconsin (Diagnostic) Dataset. Acknowledgements: This dataset has been referred from Kaggle. Objective: Understand the Dataset & cleanup (if required). Build classification models to predict whether the cancer type is Malignant or Benign. Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.
a
Breast Cancer Mortality
egis-lacounty.hub.arcgis.com
data.lacounty.gov
+3more
Updated Dec 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
County of Los Angeles (2023). Breast Cancer Mortality [Dataset]. https://egis-lacounty.hub.arcgis.com/datasets/breast-cancer-mortality
Explore at:
Dataset updated
Dec 19, 2023
Dataset authored and provided by
County of Los Angeles
Area covered

Description
Death rate has been age-adjusted to the 2000 U.S. standard population. Single-year data are only available for Los Angeles County overall, Service Planning Areas, Supervisorial Districts, City of Los Angeles overall, and City of Los Angeles Council Districts.Obesity can increase an individual’s lifetime risk of breast cancer. Promoting healthy food retail and physical activity and improving access to preventive care services are important measures that cities and communities can take to prevent breast cancer.For more information about the Community Health Profiles Data Initiative, please see the initiative homepage.
f
Data Sheet 1_Causes of death after oral cancer diagnosis: a population based...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Nov 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shao, Li; Shen, Chao; Shao, Xuwen; Jiang, Zhenyu; Zhou, Jie (2024). Data Sheet 1_Causes of death after oral cancer diagnosis: a population based study.zip [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001449122
Explore at:
Dataset updated
Nov 12, 2024
Authors
Shao, Li; Shen, Chao; Shao, Xuwen; Jiang, Zhenyu; Zhou, Jie
Description
BackgroundNowadays, the number of oral cancer survivors is increasing, emphasizing the importance of thoroughly understanding diverse causes of death in oral cancer survivors. Our study aimed to investigate the distribution of causes of death after oral cancer diagnosis.MethodsEligible patients were identified between 2004 and 2015 from the Surveillance, Epidemiology, and End Results (SEER) database. We calculated the number of deaths in different demographic and clinicopathological variables during each follow-up period. Standardized mortality ratios(SMRs) were generated for each cause of death after oral cancer diagnosis.ResultsA total of 30538 patients diagnosed with oral cancer were included, and 17654 deaths were reported during follow-up period. 27.08% of deaths were caused by non-caner reasons. The proportion of non-cancer related deaths increased with the extension of survival time, and non-cancer death accounted for 57.93% of all deaths when followed up more than 10 years. The most common non-cancer cause of death was cardiovascular disease (SMR 4.68#; 95%CI 4.46-4.92).ConclusionsNon-cancer causes of death should not be ignored in oral cancer patients. For oral cancer survivors, multidisciplinary follow-up strategy should be recommended to achieve longer survival time.
i
SEER Breast Cancer Data
ieee-dataport.org
data.niaid.nih.gov
Updated Jul 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
jing teng (2025). SEER Breast Cancer Data [Dataset]. https://ieee-dataport.org/open-access/seer-breast-cancer-data
Explore at:
Dataset updated
Jul 29, 2025
Authors
jing teng
Description
examined regional LNs
r
Cancer Incidence och mortality in a population based investigation in the...
researchdata.se
datasets.ai
Updated Oct 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Håkan Olsson (2024). Cancer Incidence och mortality in a population based investigation in the southern health care region - Cost for health care [Dataset]. https://researchdata.se/en/catalogue/dataset/ext0119-1
Explore at:
Dataset updated
Oct 16, 2024
Dataset provided by
Lund University
Authors
Håkan Olsson
Time period covered
2000 - 2007
Description
All individuals diagnosed with cancer from 2000 to 2007 were identified in the Cancer Register of Southern Sweden, but only individuals who were also identified in the Population Register of Scania were included in this cohort. Age- and gender-matched controls were identified in the Population Register of Scania. The controls were reconciled with the cancer registry in southern Sweden so that they had no prior diagnosis of cancer and with the Population Register of Scania that they were alive at time of diagnosis to the matched case. Also spouses to cancer patients were used as controls.

For each individual, healthcare costs were monitored related to the date of diagnosis. Costs for outpatient care, inpatient care, number of days in hospital and medications were included. Costs were also calculated for the controls.

Other information available about the individuals in the cohort are age, sex, domicile, type of tumor and medication.

Purpose:

To study the health cost per individual in relation to mortality and comorbidity.
u
Cancer death rates by county, 2019-2023 - Dataset - Healthy Communities Data...
midb.uspatial.umn.edu
Updated Oct 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Cancer death rates by county, 2019-2023 - Dataset - Healthy Communities Data Portal [Dataset]. https://midb.uspatial.umn.edu/hcdp/dataset/cancer-death-rates-by-county-2019-2023
Explore at:
Dataset updated
Oct 24, 2025
Description
Cancer death rates by county, all races (includes Hispanic/Latino), all sexes, all ages, 2019-2023. Death data were provided by the National Vital Statistics System. Death rates (deaths per 100,000 population per year) are age-adjusted to the 2000 US standard population (20 age groups: <1, 1-4, 5-9, ... , 80-84, 85-89, 90+). Rates calculated using SEER*Stat. Population counts for denominators are based on Census populations as modified by the National Cancer Institute. The US Population Data File is used for mortality data. The Average Annual Percent Change is based onthe APCs calculated by the Joinpoint Regression Program (Version 4.9.0.0). Due to data availability issues, the time period used in the calculation of the joinpoint regression model may differ for selected counties. Counties with a (3) after their name may have their joinpoint regresssion model calculated using a different time period due to data availability issues.
Deaths from All Cancers - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Jul 28, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2017). Deaths from All Cancers - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/deaths-from-all-cancers
Explore at:
Dataset updated
Jul 28, 2017
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This data shows premature deaths (Age under 75) from all Cancers, numbers and rates by gender, as 3-year moving-averages. Cancers are a major cause of premature deaths. Inequalities exist in cancer rates between the most deprived areas and the most affluent areas. Directly Age-Standardised Rates (DASR) are shown in the data (where numbers are sufficient) so that death rates can be directly compared between areas. The DASR calculation applies Age-specific rates to a Standard (European) population to cancel out possible effects on crude rates due to different age structures among populations, thus enabling direct comparisons of rates. A limitation on using mortalities as a proxy for prevalence of health conditions is that mortalities may give an incomplete view of health conditions in an area, as ill-health might not lead to premature death. Data source: Office for Health Improvement and Disparities (OHID), indicator ID 40501, E05a. This data is updated annually.
d
Deaths from All Cancers - Dataset - Datopian CKAN instance
demo.dev.datopian.com
Updated Oct 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Deaths from All Cancers - Dataset - Datopian CKAN instance [Dataset]. https://demo.dev.datopian.com/dataset/lcc--deaths-from-all-cancers
Explore at:
Dataset updated
Oct 7, 2025
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This data shows premature deaths (Age under 75) from all Cancers, numbers and rates by gender, as 3-year moving-averages. Cancers are a major cause of premature deaths. Inequalities exist in cancer rates between the most deprived areas and the most affluent areas. Directly Age-Standardised Rates (DASR) are shown in the data (where numbers are sufficient) so that death rates can be directly compared between areas. The DASR calculation applies Age-specific rates to a Standard (European) population to cancel out possible effects on crude rates due to different age structures among populations, thus enabling direct comparisons of rates. A limitation on using mortalities as a proxy for prevalence of health conditions is that mortalities may give an incomplete view of health conditions in an area, as ill-health might not lead to premature death. Data source: Office for Health Improvement and Disparities (OHID), indicator ID 40501, E05a. This data is updated annually.
Years of Life Lost (YLL): Breast cancer - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Feb 9, 2010
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2010). Years of Life Lost (YLL): Breast cancer - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/years_of_life_lost_yll_-_breast_cancer
Explore at:
Dataset updated
Feb 9, 2010
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Years of Life Lost (YLL) as a result of death from breast cancer - Directly age-Standardised Rates (DSR) per 100,000 population Source: Office for National Statistics (ONS) Publisher: Information Centre (IC) - Clinical and Health Outcomes Knowledge Base Geographies: Local Authority District (LAD), Government Office Region (GOR), National, Primary Care Trust (PCT), Strategic Health Authority (SHA) Geographic coverage: England Time coverage: 2005-07, 2007 Type of data: Administrative data
h
lung-cancer
huggingface.co
Updated Jun 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nate Raw (2022). lung-cancer [Dataset]. https://huggingface.co/datasets/nateraw/lung-cancer
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 24, 2022
Authors
Nate Raw
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for Lung Cancer

Dataset Summary

The effectiveness of cancer prediction system helps the people to know their cancer risk with low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system .

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/nateraw/lung-cancer.

Facebook

Twitter

Click to copy link

Link copied

Cite

MasterDataSan (2024). Lung Cancer Mortality Datasets v2 [Dataset]. https://www.kaggle.com/datasets/masterdatasan/lung-cancer-mortality-datasets-v2

Lung Cancer Mortality Datasets v2

Dataset of lung cancer with time observation durring theatment period

Explore at:

zip(81127029 bytes)Available download formats

Dataset updated

Jun 1, 2024

Authors

MasterDataSan

Description

This dataset contains data about lung cancer Mortality. This database is a comprehensive collection of patient information, specifically focused on individuals diagnosed with cancer. It is designed to facilitate the analysis of various factors that may influence cancer prognosis and treatment outcomes. The database includes a range of demographic, medical, and treatment-related variables, capturing essential details about each patient's condition and history.

Key components of the database include:

Demographic Information: Basic details about the patients such as age, gender, and country of residence. This helps in understanding the distribution of cancer cases across different populations and regions.

Medical History: Information about each patient’s medical background, including family history of cancer, smoking status, Body Mass Index (BMI), cholesterol levels, and the presence of other health conditions such as hypertension, asthma, cirrhosis, and other cancers. This section is crucial for identifying potential risk factors and comorbidities.

Cancer Diagnosis: Detailed data about the cancer diagnosis itself, including the date of diagnosis and the stage of cancer at the time of diagnosis. This helps in tracking the progression and severity of the disease.

Treatment Details: Information regarding the type of treatment each patient received, the end date of the treatment, and the outcome (whether the patient survived or not). This is essential for evaluating the effectiveness of different treatment approaches.

The structure of the database allows for in-depth analysis and research, making it possible to identify patterns, correlations, and potential causal relationships between various factors and cancer outcomes. It is a valuable resource for medical researchers, epidemiologists, and healthcare providers aiming to improve cancer treatment and patient care.

id: A unique identifier for each patient in the dataset. age: The age of the patient at the time of diagnosis. gender: The gender of the patient (e.g., male, female). country: The country or region where the patient resides. diagnosis_date: The date on which the patient was diagnosed with lung cancer. cancer_stage: The stage of lung cancer at the time of diagnosis (e.g., Stage I, Stage II, Stage III, Stage IV). family_history: Indicates whether there is a family history of cancer (e.g., yes, no). smoking_status: The smoking status of the patient (e.g., current smoker, former smoker, never smoked, passive smoker). bmi: The Body Mass Index of the patient at the time of diagnosis. cholesterol_level: The cholesterol level of the patient (value). hypertension: Indicates whether the patient has hypertension (high blood pressure) (e.g., yes, no). asthma: Indicates whether the patient has asthma (e.g., yes, no). cirrhosis: Indicates whether the patient has cirrhosis of the liver (e.g., yes, no). other_cancer: Indicates whether the patient has had any other type of cancer in addition to the primary diagnosis (e.g., yes, no). treatment_type: The type of treatment the patient received (e.g., surgery, chemotherapy, radiation, combined). end_treatment_date: The date on which the patient completed their cancer treatment or died. survived: Indicates whether the patient survived (e.g., yes, no).

This dataset contains artificially generated data with as close a representation of reality as possible. This data is free to use without any licence required.

Good luck Gakusei!

Clear search

Close search

Google apps

Main menu

Lung Cancer Mortality Datasets v2

Cancer Rates by U.S. State

Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER)...

Cancer data of United States of America

About Dataset

File 1st

File 2nd

SEER Cancer Statistics Database

Number and rates of new cases of primary cancer, by cancer type, age group...

[MI] Rapid Cancer Registration Data

Multimodal Head and Neck cancer dataset

Abstract

Introduction

Methods

Subject Inclusion and Exclusion Criteria

Data Acquisition

Lung Cancer Dataset

Cancer (in persons of all ages): England

Breast Cancer Dataset - Dataset - CKAN

Breast Cancer Mortality

Data Sheet 1_Causes of death after oral cancer diagnosis: a population based...

SEER Breast Cancer Data

Cancer Incidence och mortality in a population based investigation in the...

Cancer death rates by county, 2019-2023 - Dataset - Healthy Communities Data...

Deaths from All Cancers - Dataset - data.gov.uk

Deaths from All Cancers - Dataset - Datopian CKAN instance

Years of Life Lost (YLL): Breast cancer - Dataset - data.gov.uk

lung-cancer

Lung Cancer Mortality Datasets v2

Dataset of lung cancer with time observation durring theatment period