66 datasets found

Cancer Rates by U.S. State
kaggle.com
zip
Updated Dec 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heemali Chaudhari (2022). Cancer Rates by U.S. State [Dataset]. https://www.kaggle.com/datasets/heemalichaudhari/cancer-rates-by-us-state
Explore at:
zip(219237 bytes)Available download formats
Dataset updated
Dec 26, 2022
Authors
Heemali Chaudhari
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
In the following maps, the U.S. states are divided into groups based on the rates at which people developed or died from cancer in 2013, the most recent year for which incidence data are available.

The rates are the numbers out of 100,000 people who developed or died from cancer each year.

Incidence Rates by State The number of people who get cancer is called cancer incidence. In the United States, the rate of getting cancer varies from state to state.

*Rates are per 100,000 and are age-adjusted to the 2000 U.S. standard population.

‡Rates are not shown if the state did not meet USCS publication criteria or if the state did not submit data to CDC.

†Source: U.S. Cancer Statistics Working Group. United States Cancer Statistics: 1999–2013 Incidence and Mortality Web-based Report. Atlanta (GA): Department of Health and Human Services, Centers for Disease Control and Prevention, and National Cancer Institute; 2016. Available at: http://www.cdc.gov/uscs.

Death Rates by State Rates of dying from cancer also vary from state to state.

*Rates are per 100,000 and are age-adjusted to the 2000 U.S. standard population.

†Source: U.S. Cancer Statistics Working Group. United States Cancer Statistics: 1999–2013 Incidence and Mortality Web-based Report. Atlanta (GA): Department of Health and Human Services, Centers for Disease Control and Prevention, and National Cancer Institute; 2016. Available at: http://www.cdc.gov/uscs.

Source: https://www.cdc.gov/cancer/dcpc/data/state.htm
Lung Cancer Dataset
kaggle.com
zip
Updated Jun 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akash Nath (2024). Lung Cancer Dataset [Dataset]. https://www.kaggle.com/datasets/akashnath29/lung-cancer-dataset
Explore at:
zip(70442 bytes)Available download formats
Dataset updated
Jun 26, 2024
Authors
Akash Nath
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Lung Cancer Dataset

Introduction:

Lung cancer remains one of the most prevalent and deadly forms of cancer worldwide, posing significant challenges for early detection and effective treatment. To contribute to the global effort in understanding and combating this disease, we are excited to introduce our comprehensive Lung Cancer Dataset, now available on Kaggle.

Scientific Overview:

This dataset is an invaluable asset in the realm of Health Care, providing a structured foundation for the development of cancer detection models. This dataset exemplifies the variety of symptoms of Lung Cancer. Each category within the dataset—'GENDER', 'AGE', 'SMOKING', 'YELLOW_FINGERS', 'ANXIETY', 'PEER_PRESSURE', 'CHRONIC_DISEASE', 'FATIGUE', 'ALLERGY', 'WHEEZING', 'ALCOHOL_CONSUMING', 'COUGHING', 'SHORTNESS_OF_BREATH', 'SWALLOWING_DIFFICULTY', 'CHEST_PAIN'—has been carefully curated to encompass a diverse range of symptoms, ensuring that the resulting models are versatile and accurate. This scientific approach not only enhances the dataset's diversity to record symptoms of lung cancer but also contributes to the broader field of AI-driven health technologies, pushing the boundaries of what health care assistants can achieve.

Dataset Composition

The Lung Cancer Dataset includes a diverse array of symptoms essential for comprehensive analysis and model development. The primary categories of data are as follows:

1. Patient Demographics

Age: Provides the age at diagnosis, enabling analysis of age-related incidence and outcomes. Gender: Includes information on patient gender, facilitating gender-based studies. Smoking Status: Categorized as current smoker, former smoker, or non-smoker, this data is critical for evaluating the impact of smoking on lung cancer risk and progression.

2. Medical History

Comorbidities: Details additional health issues such as chronic obstructive pulmonary disease (COPD), which are relevant for treatment planning and prognosis.

3. Clinical Data

Vital Signs: Records of blood pressure, heart rate, respiratory rate, and other vital signs at diagnosis and during treatment.

Implementation Guide for the Mental Health Dataset:

Data Integration

Dataset Acquisition: Obtain the Lung Cancer Dataset. Data Exploration: Familiarize yourself with the structure and contents of the dataset, including symptoms and conclusions related to different conditions.

Preprocessing

Data Cleaning: Remove any irrelevant or redundant entries, and ensure consistency in formatting across the dataset. Tokenization: Break down the symptoms and conclusions into tokens or individual words to facilitate analysis and model training. Normalization: Standardize the text data by converting it to lowercase and removing punctuation or special characters as needed.

Model Training

Choose a Framework: Select a suitable machine learning or natural language processing framework such as TensorFlow, PyTorch, or spaCy. Model Selection: Decide on the type of model to use, such as recurrent neural networks (RNNs), transformers, or sequence-to-sequence models, based on the complexity of the dataset and the desired level of accuracy. Training Process: Train the chosen model using the preprocessed dataset, adjusting hyperparameters as necessary to optimize performance. Evaluation: Assess the performance of the trained model using appropriate metrics such as accuracy, precision, recall, and F1-score.

Deployment

Integration: Integrate the trained model into a chatbot or virtual assistant application using programming languages like Python or JavaScript. User Interface Design: Design an intuitive user interface that allows users to interact with the chatbot and receive responses related to Lung Cancer. Testing: Conduct thorough testing of the deployed chatbot to ensure functionality, accuracy, and responsiveness in providing relevant result. Feedback Mechanism: Implement a feedback mechanism to gather user feedback and improve the chatbot's performance over time.

Continuous Improvement

Monitoring: Continuously monitor the chatbot's performance and user interactions to identify areas for improvement. Data Updates: Periodically update the dataset with new symptoms to ensure accuracy. Model Refinement: Fine-tune the model based on user feedback and additional training data to enhance the chatbot's effectiveness and accuracy in detecting lung cancer. By following this implementation guide, developers can effectively leverage the Lung Cancer Dataset to build and deploy AI-driven chatbots and virtual assistants that offer accurate predictions to users worldwide.

Potential Applications

The extensive nature of the Lung Cancer Dataset supports a wide range of scientific and clinical applications:

Machine Learning Models: Facilitates the development of predictive algorithms for early detection, prognosis, and personalized t...
CDC WONDER: Cancer Statistics
catalog.data.gov
healthdata.gov
+4more
Updated Jul 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention, Department of Health & Human Services (2025). CDC WONDER: Cancer Statistics [Dataset]. https://catalog.data.gov/dataset/cdc-wonder-cancer-statistics
Explore at:
Dataset updated
Jul 29, 2025
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description
The United States Cancer Statistics (USCS) online databases in WONDER provide cancer incidence and mortality data for the United States for the years since 1999, by year, state and metropolitan areas (MSA), age group, race, ethnicity, sex, childhood cancer classifications and cancer site. Report case counts, deaths, crude and age-adjusted incidence and death rates, and 95% confidence intervals for rates. The USCS data are the official federal statistics on cancer incidence from registries having high-quality data and cancer mortality statistics for 50 states and the District of Columbia. USCS are produced by the Centers for Disease Control and Prevention (CDC) and the National Cancer Institute (NCI), in collaboration with the North American Association of Central Cancer Registries (NAACCR). Mortality data are provided by the Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS), National Vital Statistics System (NVSS).
Lung Cancer Mortality Datasets v2
kaggle.com
zip
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MasterDataSan (2024). Lung Cancer Mortality Datasets v2 [Dataset]. https://www.kaggle.com/datasets/masterdatasan/lung-cancer-mortality-datasets-v2
Explore at:
zip(81127029 bytes)Available download formats
Dataset updated
Jun 1, 2024
Authors
MasterDataSan
Description
This dataset contains data about lung cancer Mortality. This database is a comprehensive collection of patient information, specifically focused on individuals diagnosed with cancer. It is designed to facilitate the analysis of various factors that may influence cancer prognosis and treatment outcomes. The database includes a range of demographic, medical, and treatment-related variables, capturing essential details about each patient's condition and history.

Key components of the database include:

Demographic Information: Basic details about the patients such as age, gender, and country of residence. This helps in understanding the distribution of cancer cases across different populations and regions.

Medical History: Information about each patient’s medical background, including family history of cancer, smoking status, Body Mass Index (BMI), cholesterol levels, and the presence of other health conditions such as hypertension, asthma, cirrhosis, and other cancers. This section is crucial for identifying potential risk factors and comorbidities.

Cancer Diagnosis: Detailed data about the cancer diagnosis itself, including the date of diagnosis and the stage of cancer at the time of diagnosis. This helps in tracking the progression and severity of the disease.

Treatment Details: Information regarding the type of treatment each patient received, the end date of the treatment, and the outcome (whether the patient survived or not). This is essential for evaluating the effectiveness of different treatment approaches.

The structure of the database allows for in-depth analysis and research, making it possible to identify patterns, correlations, and potential causal relationships between various factors and cancer outcomes. It is a valuable resource for medical researchers, epidemiologists, and healthcare providers aiming to improve cancer treatment and patient care.

id: A unique identifier for each patient in the dataset. age: The age of the patient at the time of diagnosis. gender: The gender of the patient (e.g., male, female). country: The country or region where the patient resides. diagnosis_date: The date on which the patient was diagnosed with lung cancer. cancer_stage: The stage of lung cancer at the time of diagnosis (e.g., Stage I, Stage II, Stage III, Stage IV). family_history: Indicates whether there is a family history of cancer (e.g., yes, no). smoking_status: The smoking status of the patient (e.g., current smoker, former smoker, never smoked, passive smoker). bmi: The Body Mass Index of the patient at the time of diagnosis. cholesterol_level: The cholesterol level of the patient (value). hypertension: Indicates whether the patient has hypertension (high blood pressure) (e.g., yes, no). asthma: Indicates whether the patient has asthma (e.g., yes, no). cirrhosis: Indicates whether the patient has cirrhosis of the liver (e.g., yes, no). other_cancer: Indicates whether the patient has had any other type of cancer in addition to the primary diagnosis (e.g., yes, no). treatment_type: The type of treatment the patient received (e.g., surgery, chemotherapy, radiation, combined). end_treatment_date: The date on which the patient completed their cancer treatment or died. survived: Indicates whether the patient survived (e.g., yes, no).

This dataset contains artificially generated data with as close a representation of reality as possible. This data is free to use without any licence required.

Good luck Gakusei!
p
Cervical Cancer Risk Classification - Dataset - CKAN
data.poltekkes-smg.ac.id
Updated Oct 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Cervical Cancer Risk Classification - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/cervical-cancer-risk-classification
Explore at:
Dataset updated
Oct 7, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cervical Cancer Risk Factors for Biopsy: This Dataset is Obtained from UCI Repository and kindly acknowledged! This file contains a List of Risk Factors for Cervical Cancer leading to a Biopsy Examination! About 11,000 new cases of invasive cervical cancer are diagnosed each year in the U.S. However, the number of new cervical cancer cases has been declining steadily over the past decades. Although it is the most preventable type of cancer, each year cervical cancer kills about 4,000 women in the U.S. and about 300,000 women worldwide. In the United States, cervical cancer mortality rates plunged by 74% from 1955 - 1992 thanks to increased screening and early detection with the Pap test. AGE Fifty percent of cervical cancer diagnoses occur in women ages 35 - 54, and about 20% occur in women over 65 years of age. The median age of diagnosis is 48 years. About 15% of women develop cervical cancer between the ages of 20 - 30. Cervical cancer is extremely rare in women younger than age 20. However, many young women become infected with multiple types of human papilloma virus, which then can increase their risk of getting cervical cancer in the future. Young women with early abnormal changes who do not have regular examinations are at high risk for localized cancer by the time they are age 40, and for invasive cancer by age 50. SOCIOECONOMIC AND ETHNIC FACTORS Although the rate of cervical cancer has declined among both Caucasian and African-American women over the past decades, it remains much more prevalent in African-Americans -- whose death rates are twice as high as Caucasian women. Hispanic American women have more than twice the risk of invasive cervical cancer as Caucasian women, also due to a lower rate of screening. These differences, however, are almost certainly due to social and economic differences. Numerous studies report that high poverty levels are linked with low screening rates. In addition, lack of health insurance, limited transportation, and language difficulties hinder a poor woman’s access to screening services. HIGH SEXUAL ACTIVITY Human papilloma virus (HPV) is the main risk factor for cervical cancer. In adults, the most important risk factor for HPV is sexual activity with an infected person. Women most at risk for cervical cancer are those with a history of multiple sexual partners, sexual intercourse at age 17 years or younger, or both. A woman who has never been sexually active has a very low risk for developing cervical cancer. Sexual activity with multiple partners increases the likelihood of many other sexually transmitted infections (chlamydia, gonorrhea, syphilis).Studies have found an association between chlamydia and cervical cancer risk, including the possibility that chlamydia may prolong HPV infection. FAMILY HISTORY Women have a higher risk of cervical cancer if they have a first-degree relative (mother, sister) who has had cervical cancer. USE OF ORAL CONTRACEPTIVES Studies have reported a strong association between cervical cancer and long-term use of oral contraception (OC). Women who take birth control pills for more than 5 - 10 years appear to have a much higher risk HPV infection (up to four times higher) than those who do not use OCs. (Women taking OCs for fewer than 5 years do not have a significantly higher risk.) The reasons for this risk from OC use are not entirely clear. Women who use OCs may be less likely to use a diaphragm, condoms, or other methods that offer some protection against sexual transmitted diseases, including HPV. Some research also suggests that the hormones in OCs might help the virus enter the genetic material of cervical cells. HAVING MANY CHILDREN Studies indicate that having many children increases the risk for developing cervical cancer, particularly in women infected with HPV. SMOKING Smoking is associated with a higher risk for precancerous changes (dysplasia) in the cervix and for progression to invasive cervical cancer, especially for women infected with HPV. IMMUNOSUPPRESSION Women with weak immune systems, (such as those with HIV / AIDS), are more susceptible to acquiring HPV. Immunocompromised patients are also at higher risk for having cervical precancer develop rapidly into invasive cancer. DIETHYLSTILBESTROL (DES) From 1938 - 1971, diethylstilbestrol (DES), an estrogen-related drug, was widely prescribed to pregnant women to help prevent miscarriages. The daughters of these women face a higher risk for cervical cancer. DES is no longer prsecribed.
Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER)...
catalog.data.gov
data.virginia.gov
+3more
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Cancer Institute (NCI), National Institutes of Health (NIH) (2025). Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER) Registries Limited-Use [Dataset]. https://catalog.data.gov/dataset/cancer-incidence-surveillance-epidemiology-and-end-results-seer-registries-limited-use
Explore at:
Dataset updated
Jul 16, 2025
Dataset provided by
National Cancer Institutehttp://www.cancer.gov/
Description
SEER Limited-Use cancer incidence data with associated population data. Geographic areas available are county and SEER registry. The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute collects and distributes high quality, comprehensive cancer data from a number of population-based cancer registries. Data include patient demographics, primary tumor site, morphology, stage at diagnosis, first course of treatment, and follow-up for vital status. The SEER Program is the only comprehensive source of population-based information in the United States that includes stage of cancer at the time of diagnosis and survival rates within each stage.
Appendix Cancer Prediction Dataset
kaggle.com
zip
Updated Feb 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ankush Panday (2025). Appendix Cancer Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/ankushpanday1/appendix-cancer-prediction-dataset
Explore at:
zip(7343922 bytes)Available download formats
Dataset updated
Feb 4, 2025
Authors
Ankush Panday
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains clinical, demographic, and lifestyle data for 260,000 individuals from 25 countries. Designed for healthcare research and predictive modeling, it includes diverse variables relevant to appendix cancer diagnosis and risk factors. The dataset can support machine learning tasks, statistical analysis, and exploratory data studies in oncology and public health domains.
p
Breast Cancer Dataset - Dataset - CKAN
data.poltekkes-smg.ac.id
Updated Oct 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Breast Cancer Dataset - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/breast-cancer-dataset
Explore at:
Dataset updated
Oct 7, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description: Breast cancer is the most common cancer amongst women in the world. It accounts for 25% of all cancer cases, and affected over 2.1 Million people in 2015 alone. It starts when cells in the breast begin to grow out of control. These cells usually form tumors that can be seen via X-ray or felt as lumps in the breast area. The key challenges against it’s detection is how to classify tumors into malignant (cancerous) or benign(non cancerous). We ask you to complete the analysis of classifying these tumors using machine learning (with SVMs) and the Breast Cancer Wisconsin (Diagnostic) Dataset. Acknowledgements: This dataset has been referred from Kaggle. Objective: Understand the Dataset & cleanup (if required). Build classification models to predict whether the cancer type is Malignant or Benign. Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.
h
lung-cancer
huggingface.co
Updated Jun 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nate Raw (2022). lung-cancer [Dataset]. https://huggingface.co/datasets/nateraw/lung-cancer
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 24, 2022
Authors
Nate Raw
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for Lung Cancer

Dataset Summary

The effectiveness of cancer prediction system helps the people to know their cancer risk with low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system .

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/nateraw/lung-cancer.
c
Multimodal Head and Neck cancer dataset
cancerimagingarchive.net
n/a, svs and png
Updated Nov 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2025). Multimodal Head and Neck cancer dataset [Dataset]. http://doi.org/10.7937/rcty-5h16
Explore at:
svs and png, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/rcty-5h16
Dataset updated
Nov 18, 2025
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Nov 18, 2025
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
Abstract
HANCOCK is a comprehensive, monocentric dataset of 763 head and neck cancer patients, including diverse data modalities. It contains histopathology imaging (whole-slide images of H&E-stained primary tumors and tissue microarrays with immunohistochemical staining) alongside structured clinical data (demographics, tumor pathology characteristics, laboratory blood measurements) and textual data (de-identified surgery reports and medical histories). All patients were treated curatively, and data span diagnoses from 2005–2019. This multimodal collection enables research into integrative analyses – for example, combining histologic features with clinical parameters for outcome prediction. Early analyses have demonstrated that fusing these modalities improves prognostic modeling compared to single-source data, and that leveraging histology with foundation models can enhance endpoint prediction. HANCOCK aims to facilitate precision oncology studies by providing a large public resource for developing and benchmarking multimodal machine learning methods in head and neck cancer.
Introduction
Head and neck cancer (HNC) is a prevalent malignancy with poor outcomes – it is the 7th most common cancer globally and carries a 5-year survival of only ~25–60% despite modern treatments. Improving patient prognosis may require personalized, multimodal therapy decisions, using information from pathology, clinical, and other data sources. However, progress in multimodal prediction has been limited by the lack of large public datasets that integrate these diverse data types. To our knowledge, existing HNC datasets are either small or incomplete; for example, a radiomics study included 288 oropharyngeal cases, and a proteomics-focused set with imaging had only 122 cases. The Cancer Genome Atlas (TCGA) provides multi-omics for >500 HNC cases, but lacks crucial data like pathology reports, blood tests, or comprehensive imaging for each patient. These limitations hinder robust multimodal research.
HANCOCK was created to address this gap. It aggregates 763 patients’ data from a single academic center, capturing a real-world, uniformly treated cohort. The dataset uniquely combines whole slide histopathology images, tissue microarray images, detailed clinical parameters, pathology reports, and lab values in one resource. By curating and harmonizing these modalities, HANCOCK enables researchers to explore complex data interdependencies and develop multimodal predictive models. The patient population reflects typical HNC demographics – 80% male, median age 61, with 72% being former or current smokers – aligning with expected epidemiology and supporting generalizability. In summary, HANCOCK is an unprecedented multimodal HNC dataset that can fuel research in machine learning, prognostic biomarker discovery, and integrative oncology, ultimately advancing personalized head and neck cancer care.
Methods
The following sections describe how the HANCOCK data were collected, processed, and prepared for public sharing.
Subject Inclusion and Exclusion Criteria
Patients included in HANCOCK were those diagnosed with head and neck cancer between 2005 and 2019 at University Hospital Erlangen (Germany) who underwent a curative-intent initial treatment (surgery and/or definitive therapy). This encompasses cancers of the oral cavity, oropharynx, hypopharynx, and larynx. Patients treated palliatively or with recurrent/metastatic disease at presentation were excluded to focus on first-course, curative treatments. The cohort consists of 763 patients (approximately 80% male, 20% female) with a median age of 61 years. Notably, ~72% have a history of tobacco use, which is consistent with real-world HNC risk factors. The distribution of tumor subsites and stages reflects typical HNC presentation, and thus the dataset is broadly representative of the general HNC patient population. Being a single-center dataset, there is limited geographic diversity; however, the homogeneous data acquisition and treatment context reduce variability in data quality. No significant selection biases were introduced aside from the exclusion of non-curative cases – all major HNC subsite cases over the inclusion period were captured, providing a comprehensive real-world sample. Ethical approval was obtained for this retrospective data collection and sharing (Ethics Committee vote #23-22-Br), and all data were fully de-identified prior to release.
Data Acquisition
Histopathology: Tissue specimens from the primary tumors (and involved lymph nodes, if present) were obtained from the pathology archives. All samples were formalin-fixed and paraffin-embedded (FFPE) and stained with hematoxylin and eosin (H&E) following routine protocols. Digital whole-slide imaging was performed on these histology slides. A total of 709 H&E slides of primary tumor tissue (701 patients had one slide, 8 patients had two slides) were scanned at high resolution using a 3DHISTECH P1000 scanner at an effective 82.44× magnification (0.1213 µm/pixel). Additionally, 396 H&E slides of lymph node metastases were scanned, using two systems: an Aperio Leica GT450 at 40× (0.2634 µm/pixel) and the 3DHISTECH P1000 at ~51× (0.1945 µm/pixel). (Multiple scanners were utilized over the course of the project; all resulting images were cross-verified for quality.) The digital whole slide images (WSIs) are provided in the pyramidal Aperio SVS format, a TIFF-based format compatible with standard viewers.
In addition to full slides, tissue microarrays (TMAs) were constructed from each patient’s tumor block to sample important regions. For each case, two cylindrical core biopsies (diameter 1.5 mm) were taken – one from the tumor center and one from the invasive tumor front. These cores were assembled into TMA blocks and stained on separate slides with a panel of eight stains: H&E plus immunohistochemical (IHC) markers targeting various immune cells and tumor biomarkers. The IHC markers include CD3, CD8, CD56, CD68, CD163, PD-L1, and MHC-1, which label T cells (CD3, CD8), natural killer cells (CD56), monocytes/macrophages (CD68, CD163), and a tumor immune checkpoint ligand (PD-L1), as well as MHC class I expression. Each core appears on up to 8 stained TMA slides (one per stain), yielding up to 16 TMA images per patient (two cores × eight stains). In the dataset, TMA images are provided for both the tumor-center and tumor-front cores; these too are digitized high-resolution images (consistent microscope settings, ~40×). The combination of WSIs and TMAs yields a rich imaging dataset: 701 patients have at least one primary tumor WSI (62 patients lack WSIs due to unavailable tissue), and all patients have TMA core images unless the tumor block was exhausted. This imaging data offers both broad tissue context from WSIs and targeted cellular detail from TMAs. Manual tumor region annotations are also included for the primary tumor WSIs (see Data Analysis below).
Clinical and Pathology Data: A wide array of non-imaging data was extracted from hospital information systems and pathology reports for each patient. Key demographic variables (age, sex, etc.) and tumor pathology details were collected, including primary tumor site, histologic subtype, grade, TNM stage, resection margin status, depth of invasion, perineural and lymphovascular invasion, and nodal metastasis status. These pathology parameters were recorded in a structured format for each case. Standard clinical coding systems were used where applicable: e.g., diagnoses are coded with ICD-10 codes and procedures with OPS codes (the German procedure classification system). The dataset includes these codes for each patient’s conditions and treatments. Comprehensive laboratory blood test results at diagnosis or pre-treatment were also compiled, covering complete blood counts, coagulation measures, electrolytes, kidney function, C-reactive protein, and other relevant analytes. Reference ranges for each lab parameter are provided alongside the values to indicate whether a result was normal or abnormal. Most patients have a full panel of these lab results, though some values are missing if a test was not clinically indicated; the dataset notes availability per patient. All structured data have been cleaned and validated – for example, harmonizing category values and checking consistency (e.g. TNM stages align with recorded tumor sites).
Textual Data (Surgical Reports and Histories): Unstructured clinical text was also included to add rich context on treatment details. Surgery reports (operative notes) from the primary tumor resection and associated medical history summaries were retrieved from the hospital’s electronic records. For each patient, the operative report from their first definitive surgery and the corresponding
d
[MI] Rapid Cancer Registration Data
digital.nhs.uk
Updated Nov 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). [MI] Rapid Cancer Registration Data [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/mi-rapid-cancer-registration-data
Explore at:
Dataset updated
Nov 27, 2025
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Description
Rapid Cancer Registration Data (RCRD) provides a quick, indicative source of cancer data. It is provided to support the planning and provision of cancer services. The data is based on a rapid processing of cancer registration data sources, in particular on Cancer Outcomes and Services Dataset (COSD) information. In comparison, National Cancer Registration Data (NCRD) relies on additional data sources, enhanced follow-up with trusts and expert processing by cancer registration officers. The Rapid Cancer Registration Data (RCRD) may be useful for service improvement projects including healthcare planning and prioritisation. However, it is poorly suited for epidemiological research due to limitations in the data quality and completeness.
i
SEER Breast Cancer Data
ieee-dataport.org
data.niaid.nih.gov
Updated Jul 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
jing teng (2025). SEER Breast Cancer Data [Dataset]. https://ieee-dataport.org/open-access/seer-breast-cancer-data
Explore at:
Dataset updated
Jul 29, 2025
Authors
jing teng
Description
examined regional LNs
Deaths from All Cancers - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Jul 28, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2017). Deaths from All Cancers - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/deaths-from-all-cancers
Explore at:
Dataset updated
Jul 28, 2017
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This data shows premature deaths (Age under 75) from all Cancers, numbers and rates by gender, as 3-year moving-averages. Cancers are a major cause of premature deaths. Inequalities exist in cancer rates between the most deprived areas and the most affluent areas. Directly Age-Standardised Rates (DASR) are shown in the data (where numbers are sufficient) so that death rates can be directly compared between areas. The DASR calculation applies Age-specific rates to a Standard (European) population to cancel out possible effects on crude rates due to different age structures among populations, thus enabling direct comparisons of rates. A limitation on using mortalities as a proxy for prevalence of health conditions is that mortalities may give an incomplete view of health conditions in an area, as ill-health might not lead to premature death. Data source: Office for Health Improvement and Disparities (OHID), indicator ID 40501, E05a. This data is updated annually.
Number and rates of new cases of primary cancer, by cancer type, age group...
www150.statcan.gc.ca
datasets.ai
+2more
Updated May 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2021). Number and rates of new cases of primary cancer, by cancer type, age group and sex [Dataset]. http://doi.org/10.25318/1310011101-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1310011101-eng
Dataset updated
May 19, 2021
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Number and rate of new cancer cases diagnosed annually from 1992 to the most recent diagnosis year available. Included are all invasive cancers and in situ bladder cancer with cases defined using the Surveillance, Epidemiology and End Results (SEER) Groups for Primary Site based on the World Health Organization International Classification of Diseases for Oncology, Third Edition (ICD-O-3). Random rounding of case counts to the nearest multiple of 5 is used to prevent inappropriate disclosure of health-related information.
Place of Death from Cancer - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated May 9, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2014). Place of Death from Cancer - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/place_of_death_from_cancer
Explore at:
Dataset updated
May 9, 2014
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Describes the place of death from cancer in Scotland, by demographic characteristics including deprivation. Locations of death are home, hospice, NHS Acute hospital, other institution; covers the four major cancers of lung, breast, colorectal and prostate. Ten year trends are also presented. As from May 2010 these statistics can be designated as National Statistics products. Source agency: ISD Scotland (part of NHS National Services Scotland) Designation: National Statistics Language: English Alternative title: Place of Death from Cancer

National Lung Screening Trial

cancerimagingarchive.net
stage.cancerimagingarchive.net

dicom, docx, n/a +2

Updated Sep 24, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

The Cancer Imaging Archive (2021). National Lung Screening Trial [Dataset]. http://doi.org/10.7937/TCIA.HMQ8-J677

Explore at:

docx, svs, dicom, n/a, sas, zip, and docAvailable download formats

Unique identifier

https://doi.org/10.7937/TCIA.HMQ8-J677

Dataset updated

Sep 24, 2021

Dataset authored and provided by

The Cancer Imaging Archive

License

https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

Time period covered

Sep 24, 2021

Dataset funded by

National Cancer Institutehttp://www.cancer.gov/

Description

https://www.cancerimagingarchive.net/wp-content/uploads/nctn-logo-300x108.png" alt="" width="300" height="108" />

Demographic Summary of Available Imaging

Characteristic	Value (N = 26254)
Age (years)	Mean ± SD: 61.4± 5 Median (IQR): 60 (57-65) Range: 43-75
Sex	Male: 15512 (59%) Female: 10742 (41%)
Race	White: 23969 (91.3%) Black: 1135 (4.3%) Asian: 547 (2.1%) American Indian/Alaska Native: 88 (0.3%) Native Hawaiian/Other Pacific Islander: 87 (0.3%) Unknown: 428 (1.6%)
Ethnicity	Not Available

Background: The aggressive and heterogeneous nature of lung cancer has thwarted efforts to reduce mortality from this cancer through the use of screening. The advent of low-dose helical computed tomography (CT) altered the landscape of lung-cancer screening, with studies indicating that low-dose CT detects many tumors at early stages. The National Lung Screening Trial (NLST) was conducted to determine whether screening with low-dose CT could reduce mortality from lung cancer.

Methods: From August 2002 through April 2004, we enrolled 53,454 persons at high risk for lung cancer at 33 U.S. medical centers. Participants were randomly assigned to undergo three annual screenings with either low-dose CT (26,722 participants) or single-view posteroanterior chest radiography (26,732). Data were collected on cases of lung cancer and deaths from lung cancer that occurred through December 31, 2009. This dataset includes the low-dose CT scans from 26,254 of these subjects, as well as digitized histopathology images from 451 subjects.

Results: The rate of adherence to screening was more than 90%. The rate of positive screening tests was 24.2% with low-dose CT and 6.9% with radiography over all three rounds. A total of 96.4% of the positive screening results in the low-dose CT group and 94.5% in the radiography group were false positive results. The incidence of lung cancer was 645 cases per 100,000 person-years (1060 cancers) in the low-dose CT group, as compared with 572 cases per 100,000 person-years (941 cancers) in the radiography group (rate ratio, 1.13; 95% confidence interval [CI], 1.03 to 1.23). There were 247 deaths from lung cancer per 100,000 person-years in the low-dose CT group and 309 deaths per 100,000 person-years in the radiography group, representing a relative reduction in mortality from lung cancer with low-dose CT screening of 20.0% (95% CI, 6.8 to 26.7; P=0.004). The rate of death from any cause was reduced in the low-dose CT group, as compared with the radiography group, by 6.7% (95% CI, 1.2 to 13.6; P=0.02).

Conclusions: Screening with the use of low-dose CT reduces mortality from lung cancer. (Funded by the National Cancer Institute; National Lung Screening Trial ClinicalTrials.gov number, NCT00047385).

Data Availability: A summary of the National Lung Screening Trial and its available datasets are provided on the Cancer Data Access System (CDAS). CDAS is maintained by Information Management System (IMS), contracted by the National Cancer Institute (NCI) as keepers and statistical analyzers of the NLST trial data. The full clinical data set from NLST is available through CDAS. Users of TCIA can download without restriction a publicly distributable subset of that clinical data, along with the CT and Histopathology images collected during the trial. (These previously were restricted.)

w
Cancer Outcomes and Services Dataset (COSD)
data.wu.ac.at
data.europa.eu
Updated Aug 17, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Public Health England (2017). Cancer Outcomes and Services Dataset (COSD) [Dataset]. https://data.wu.ac.at/odso/data_gov_uk/ZDIwMDcyYzQtNGRhNS00MjcxLWEzZmMtNDkyOGZhYzA3MjZm
Explore at:
Dataset updated
Aug 17, 2017
Dataset provided by
Public Health England
Description
The Cancer Outcome and Services Data set (COSD) has been the national standard for reporting cancer in the NHS in England since January 2013.

In January 2013 the COSD replaced the previous National Cancer Dataset as the new national standard for reporting cancer in the NHS in England. It incorporated a revised generic Cancer Registration Dataset (CRDS) and additional clinical and pathology site specific data items relevant to different tumour types.

The COSD specifies the items to be submitted electronically by service providers to the National Cancer Registration and Analysis Service (NCRAS) on a monthly basis. It replaces the existing monthly NCRAS upload and may include separate files from different hospital systems.

The COSD also identifies the items that the NCRAS will obtain from other sources such as Cancer Waiting Times, Cancer Screening Programmes and ONS. (Some items from these other datasets will be included in COSD provider submissions for patient identification and record matching.)

Data is submitted by NHS Providers of Cancer Services and will linked with data from other sources by the NCRAS at patient level using NHS number in order to compile the full dataset.
a
Cancer (in persons of all ages): England
hub.arcgis.com
data.catchmentbasedapproach.org
Updated Apr 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Rivers Trust (2021). Cancer (in persons of all ages): England [Dataset]. https://hub.arcgis.com/datasets/c5c07229db684a65822fdc9a29388b0b
Explore at:
Dataset updated
Apr 6, 2021
Dataset authored and provided by
The Rivers Trust
Area covered

Description
SUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of cancer (in persons of all ages). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to cancer (in persons of all ages).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOA’s population (all ages) with cancer was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOA’s population with cancer was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with cancer, within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have cancerB) the NUMBER of people within that MSOA who are estimated to have cancerAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have cancer, compared to other MSOAs. In other words, those are areas where it’s estimated a large number of people suffer from cancer, and where those people make up a large percentage of the population, indicating there is a real issue with cancer within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of cancer, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of cancer.TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.MSOA boundaries: © Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right 2021.Population data: Mid-2019 (June 30) Population Estimates for Middle Layer Super Output Areas in England and Wales. © Office for National Statistics licensed under the Open Government Licence v3.0. © Crown Copyright 2020.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital; © Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right 2021. © Crown Copyright 2020.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
d
Synthea lung cancer synthetic patient data series for ML
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chen, AJ (2023). Synthea lung cancer synthetic patient data series for ML [Dataset]. http://doi.org/10.7910/DVN/Q5LK5A
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/Q5LK5A
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Chen, AJ
Description
These synthetic patient datasets were created for machine learning (ML) study of lung cancer risk prediction in simulation of ML-enabled learning health systems. Five populations of 30K patients were generated by the Synthea patient generator. They were combined sequentially to form 5 different size populations, from 30K to 150K patients. Patients with or without lung cancer were selected roughly at 1:3 ratio and their electronic health records (EHR) were processed to data table files ready for machine learning. The ML-ready table files also have the continuous numeric values converted to categorical values. Because Synthea patients are closely resemble to real patients, these ML-ready dataset can be used to develop and test ML algorithms, and train researchers. Unlike real patient data, these Synthea datasets can be shared with collaborators anywhere without privacy concerns. The first use of these datasets was in a LHS simulation study, which was published in Nature Scientific Reports (see https://www.nature.com/articles/s41598-022-23011-4).
Cancer patient´s care transition database.xlsx
figshare.com
xlsx
Updated Mar 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elisiane Lorenzini; Julia Estela Willrich Boell; Nelly D. Oelke; Caroline Donini Rodrigues; Letícia Flores Trindade; Vanessa Dalsasso Batista Winter; Michelle Mariah Malkiewiez; Gabriela Ceretta Flôres; Pâmella Pluta; Adriane Cristina Bernat Kolankiewicz (2020). Cancer patient´s care transition database.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.11831343.v3
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11831343.v3
Dataset updated
Mar 6, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Elisiane Lorenzini; Julia Estela Willrich Boell; Nelly D. Oelke; Caroline Donini Rodrigues; Letícia Flores Trindade; Vanessa Dalsasso Batista Winter; Michelle Mariah Malkiewiez; Gabriela Ceretta Flôres; Pâmella Pluta; Adriane Cristina Bernat Kolankiewicz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains information of 213 cancer patients undergoing clinical or surgical treatment characterized on sociodemographic and clinical data as well as data from the Care Transition Measure (CTM 15-Brazil). Data collection was carried out 7 to 30 days after their discharge from hospital from June to August 2019. Understanding these data can contribute to improving quality of care transitions and avoiding hospital readmissions. To this end, this dataset contains a broad array of variables:

*gender

*age group

*place of residence

*race

*marital status

*schooling

*paid work activity

*type of treatment

*cancer staging

*metastasis

*comorbidities

*main complaint

*continue use medication

*diagnosis

*cancer type

*diagnostic year

*oncology treatment

*first hospitalization

*readmission in the last 30 days

*number of hospitalizations in the last 30 days

*readmission in the last 6 months

*number of hospitalizations in the last 6 months

*readmission in the last year

*number of hospitalizations in the last year

*questions 1-15 from CTM 15-Brazil

The data are presented as a single Excel XLSX file: cancer patient´s care transitions dataset.xlsx.

The analyses of the present dataset have the potential to generate hospital readmission prevention strategies to be implemented by the hospital team. Researchers who are interested in CTs of cancer patients can extensively explore the variables described here.

The project from which these data were extracted was approved by the institution’s research ethics committee (approval n. 3.266.259/2019) at Associação Hospital de Caridade Ijuí, Rio Grande do Sul, Brazil.

Facebook

Twitter

Click to copy link

Link copied

Cite

Heemali Chaudhari (2022). Cancer Rates by U.S. State [Dataset]. https://www.kaggle.com/datasets/heemalichaudhari/cancer-rates-by-us-state

Cancer Rates by U.S. State

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

zip(219237 bytes)Available download formats

Dataset updated

Dec 26, 2022

Authors

Heemali Chaudhari

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

United States

Description

In the following maps, the U.S. states are divided into groups based on the rates at which people developed or died from cancer in 2013, the most recent year for which incidence data are available.

The rates are the numbers out of 100,000 people who developed or died from cancer each year.

Incidence Rates by State The number of people who get cancer is called cancer incidence. In the United States, the rate of getting cancer varies from state to state.

*Rates are per 100,000 and are age-adjusted to the 2000 U.S. standard population.

‡Rates are not shown if the state did not meet USCS publication criteria or if the state did not submit data to CDC.

†Source: U.S. Cancer Statistics Working Group. United States Cancer Statistics: 1999–2013 Incidence and Mortality Web-based Report. Atlanta (GA): Department of Health and Human Services, Centers for Disease Control and Prevention, and National Cancer Institute; 2016. Available at: http://www.cdc.gov/uscs.

Death Rates by State Rates of dying from cancer also vary from state to state.

*Rates are per 100,000 and are age-adjusted to the 2000 U.S. standard population.

Source: https://www.cdc.gov/cancer/dcpc/data/state.htm

Clear search

Close search

Google apps

Main menu

Cancer Rates by U.S. State

Lung Cancer Dataset

Lung Cancer Dataset

Introduction:

Scientific Overview:

Dataset Composition

1. Patient Demographics

2. Medical History

3. Clinical Data

Implementation Guide for the Mental Health Dataset:

Data Integration

Preprocessing

Model Training

Deployment

Continuous Improvement

Potential Applications

CDC WONDER: Cancer Statistics

Lung Cancer Mortality Datasets v2

Cervical Cancer Risk Classification - Dataset - CKAN

Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER)...

Appendix Cancer Prediction Dataset

Breast Cancer Dataset - Dataset - CKAN

lung-cancer

Multimodal Head and Neck cancer dataset

Abstract

Introduction

Methods

Subject Inclusion and Exclusion Criteria

Data Acquisition

[MI] Rapid Cancer Registration Data

SEER Breast Cancer Data

Deaths from All Cancers - Dataset - data.gov.uk

Number and rates of new cases of primary cancer, by cancer type, age group...

Place of Death from Cancer - Dataset - data.gov.uk

National Lung Screening Trial

Demographic Summary of Available Imaging

Cancer Outcomes and Services Dataset (COSD)

Cancer (in persons of all ages): England

Synthea lung cancer synthetic patient data series for ML

Cancer patient´s care transition database.xlsx

Cancer Rates by U.S. State

Cancer Rates by U.S. State