100+ datasets found

Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER)...
catalog.data.gov
healthdata.gov
+2more
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Cancer Institute (NCI), National Institutes of Health (NIH) (2025). Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER) Registries Limited-Use [Dataset]. https://catalog.data.gov/dataset/cancer-incidence-surveillance-epidemiology-and-end-results-seer-registries-limited-use
Explore at:
Dataset updated
Jul 16, 2025
Dataset provided by
National Cancer Institutehttp://www.cancer.gov/
Description
SEER Limited-Use cancer incidence data with associated population data. Geographic areas available are county and SEER registry. The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute collects and distributes high quality, comprehensive cancer data from a number of population-based cancer registries. Data include patient demographics, primary tumor site, morphology, stage at diagnosis, first course of treatment, and follow-up for vital status. The SEER Program is the only comprehensive source of population-based information in the United States that includes stage of cancer at the time of diagnosis and survival rates within each stage.
s
COSMIC
cancer.sanger.ac.uk
grch37-cancer.sanger.ac.uk
Updated Nov 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wellcome Sanger Institute (2025). COSMIC [Dataset]. http://doi.org/10.1093/nar/gkw1121
Explore at:
Unique identifier
https://doi.org/10.1093/nar/gkw1121
Dataset updated
Nov 18, 2025
Dataset provided by
Wellcome Sanger Institute
Description
COSMIC, the Catalogue Of Somatic Mutations In Cancer, is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer
H
SEER Cancer Statistics Database
dataverse.harvard.edu
data.niaid.nih.gov
Updated Jul 11, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2011). SEER Cancer Statistics Database [Dataset]. http://doi.org/10.7910/DVN/C9KBBC
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/C9KBBC
Dataset updated
Jul 11, 2011
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Users can access data about cancer statistics in the United States including but not limited to searches by type of cancer and race, sex, ethnicity, age at diagnosis, and age at death. Background Surveillance Epidemiology and End Results (SEER) database’s mission is to provide information on cancer statistics to help reduce the burden of disease in the U.S. population. The SEER database is a project to the National Cancer Institute. The SEER database collects information on incidence, prevalence, and survival from specific geographic areas representing 28 percent of the United States population. User functionality Users can access a variety of reso urces. Cancer Stat Fact Sheets allow users to look at summaries of statistics by major cancer type. Cancer Statistic Reviews are available from 1975-2008 in table format. Users are also able to build their own tables and graphs using Fast Stats. The Cancer Query system provides more flexibility and a larger set of cancer statistics than F ast Stats but requires more input from the user. State Cancer Profiles include dynamic maps and graphs enabling the investigation of cancer trends at the county, state, and national levels. SEER research data files and SEER*Stat software are available to download through your Internet connection (SEER*Stat’s client-server mode) or via discs shipped directly to you. A signed data agreement form is required to access the SEER data Data Notes Data is available in different formats depending on which type of data is accessed. Some data is available in table, PDF, and html formats. Detailed information about the data is available under “Data Documentation and Variable Recodes”.
T
Veterans Affairs Central Cancer Registry (VACCR)
data.va.gov
datahub.va.gov
+3more
csv, xlsx, xml
Updated Sep 12, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Veterans Affairs Central Cancer Registry (VACCR) [Dataset]. https://www.data.va.gov/dataset/Veterans-Affairs-Central-Cancer-Registry-VACCR-/jvmd-8fgj
Explore at:
xlsx, xml, csvAvailable download formats
Dataset updated
Sep 12, 2019
Description
The Veterans Affairs Central Cancer Registry (VACCR) receives and stores information on cancer diagnosis and treatment constraints compiled and sent in by the local cancer registry staff at each of the 132 Veterans Affairs Medical Centers that diagnose and/or treat Veterans with cancer. The information sent is encoded to meet the site-specific requirements for registry inclusion as established by several oversight bodies, including the North American Association of Central Cancer Registries, the American College of Surgeons' Commission on Cancer, and the American Joint Commission on Cancer, among others. The information is obtained from a wide variety of medical record documents at the local medical center pertaining to each Veterans Health Administration (VHA) cancer patient. The information is then transmitted to the VACCR. Details collected include extensive demographics, cancer identification, extent of disease and staging, first course of treatment, and outcomes. Data extraction is available to researchers with VA approved Institutional Review Board studies, peer review, and Data Use Agreements.
c
The Cancer Genome Atlas Breast Invasive Carcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Feb 2, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2014). The Cancer Genome Atlas Breast Invasive Carcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.AB2NAZRP
Explore at:
n/a, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.AB2NAZRP
Dataset updated
Feb 2, 2014
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Breast Phenotype Research Group.
CDC WONDER: Cancer Statistics
catalog.data.gov
healthdata.gov
+3more
Updated Jul 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention, Department of Health & Human Services (2025). CDC WONDER: Cancer Statistics [Dataset]. https://catalog.data.gov/dataset/cdc-wonder-cancer-statistics
Explore at:
Dataset updated
Jul 29, 2025
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description
The United States Cancer Statistics (USCS) online databases in WONDER provide cancer incidence and mortality data for the United States for the years since 1999, by year, state and metropolitan areas (MSA), age group, race, ethnicity, sex, childhood cancer classifications and cancer site. Report case counts, deaths, crude and age-adjusted incidence and death rates, and 95% confidence intervals for rates. The USCS data are the official federal statistics on cancer incidence from registries having high-quality data and cancer mortality statistics for 50 states and the District of Columbia. USCS are produced by the Centers for Disease Control and Prevention (CDC) and the National Cancer Institute (NCI), in collaboration with the North American Association of Central Cancer Registries (NAACCR). Mortality data are provided by the Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS), National Vital Statistics System (NVSS).
UAE Cancer Patient Dataset
kaggle.com
zip
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akshay Kumar (2025). UAE Cancer Patient Dataset [Dataset]. https://www.kaggle.com/datasets/ak0212/uae-cancer-patient-dataset
Explore at:
zip(283039 bytes)Available download formats
Dataset updated
Mar 20, 2025
Authors
Akshay Kumar
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
United Arab Emirates
Description
This dataset is designed for research, analysis, and machine learning applications in healthcare. It includes 10,000+ records of synthetic cancer patient data from the United Arab Emirates (UAE) with 20 features, such as: ✔ Patient demographics (Age, Gender, Nationality, Ethnicity) ✔ Diagnosis details (Cancer Type, Stage, Diagnosis Date) ✔ Treatment information (Treatment Type, Hospital, Physician) ✔ Health-related factors (Smoking Status, Comorbidities, Weight, Height) ✔ Outcomes (Recovered, Under Treatment, Deceased)
o
breast-cancer
openml.org
Updated Apr 6, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matjaz Zwitter; Milan Soklic (2014). breast-cancer [Dataset]. https://www.openml.org/d/13
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 6, 2014
Authors
Matjaz Zwitter; Milan Soklic
Description
Author:
Source: Unknown -
Please cite:

Citation Request: This breast cancer domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data. Please include this citation if you plan to use this database.

Title: Breast cancer data (Michalski has used this)

Sources: -- Matjaz Zwitter & Milan Soklic (physicians) Institute of Oncology University Medical Center Ljubljana, Yugoslavia -- Donors: Ming Tan and Jeff Schlimmer (Jeffrey.Schlimmer@a.gp.cs.cmu.edu) -- Date: 11 July 1988

Past Usage: (Several: here are some) -- Michalski,R.S., Mozetic,I., Hong,J., & Lavrac,N. (1986). The Multi-Purpose Incremental Learning System AQ15 and its Testing Application to Three Medical Domains. In Proceedings of the Fifth National Conference on Artificial Intelligence, 1041-1045, Philadelphia, PA: Morgan Kaufmann. -- accuracy range: 66%-72% -- Clark,P. & Niblett,T. (1987). Induction in Noisy Domains. In Progress in Machine Learning (from the Proceedings of the 2nd European Working Session on Learning), 11-30, Bled, Yugoslavia: Sigma Press. -- 8 test results given: 65%-72% accuracy range -- Tan, M., & Eshelman, L. (1988). Using weighted networks to represent classification knowledge in noisy domains. Proceedings of the Fifth International Conference on Machine Learning, 121-134, Ann Arbor, MI. -- 4 systems tested: accuracy range was 68%-73.5% -- Cestnik,G., Konenenko,I, & Bratko,I. (1987). Assistant-86: A Knowledge-Elicitation Tool for Sophisticated Users. In I.Bratko & N.Lavrac (Eds.) Progress in Machine Learning, 31-45, Sigma Press. -- Assistant-86: 78% accuracy

Relevant Information: This is one of three domains provided by the Oncology Institute that has repeatedly appeared in the machine learning literature. (See also lymphography and primary-tumor.)

This data set includes 201 instances of one class and 85 instances of another class. The instances are described by 9 attributes, some of which are linear and some are nominal.

Number of Instances: 286

Number of Attributes: 9 + the class attribute

Attribute Information:

Class: no-recurrence-events, recurrence-events

age: 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99.

menopause: lt40, ge40, premeno.

tumor-size: 0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59.

inv-nodes: 0-2, 3-5, 6-8, 9-11, 12-14, 15-17, 18-20, 21-23, 24-26, 27-29, 30-32, 33-35, 36-39.

node-caps: yes, no.

deg-malig: 1, 2, 3.

breast: left, right.

breast-quad: left-up, left-low, right-up, right-low, central.

irradiat: yes, no.

Missing Attribute Values: (denoted by "?") Attribute #: Number of instances with missing values:

8

1.

Class Distribution:

no-recurrence-events: 201 instances

recurrence-events: 85 instances

Num Instances: 286 Num Attributes: 10 Num Continuous: 0 (Int 0 / Real 0) Num Discrete: 10 Missing values: 9 / 0.3%

name type enum ints real missing distinct (1) 1 'age' Enum 100% 0% 0% 0 / 0% 6 / 2% 0% 2 'menopause' Enum 100% 0% 0% 0 / 0% 3 / 1% 0% 3 'tumor-size' Enum 100% 0% 0% 0 / 0% 11 / 4% 0% 4 'inv-nodes' Enum 100% 0% 0% 0 / 0% 7 / 2% 0% 5 'node-caps' Enum 97% 0% 0% 8 / 3% 2 / 1% 0% 6 'deg-malig' Enum 100% 0% 0% 0 / 0% 3 / 1% 0% 7 'breast' Enum 100% 0% 0% 0 / 0% 2 / 1% 0% 8 'breast-quad' Enum 100% 0% 0% 1 / 0% 5 / 2% 0% 9 'irradiat' Enum 100% 0% 0% 0 / 0% 2 / 1% 0% 10 'Class' Enum 100% 0% 0% 0 / 0% 2 / 1% 0%
Cancer Data
kaggle.com
zip
Updated Mar 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erdem Taha (2023). Cancer Data [Dataset]. https://www.kaggle.com/datasets/erdemtaha/cancer-data/data
Explore at:
zip(49810 bytes)Available download formats
Dataset updated
Mar 22, 2023
Authors
Erdem Taha
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
🦠 Breast Cancer Data Set

This dataset contains the characteristics of patients diagnosed with cancer. The dataset contains a unique ID for each patient, the type of cancer (diagnosis), the visual characteristics of the cancer and the average values of these characteristics.

📚 The main features of the dataset are as follows:

id: Represents a unique ID of each patient.

diagnosis: Indicates the type of cancer. This property can take the values "M" (Malignant - Benign) or "B" (Benign - Malignant).

radius_mean, texture_mean, perimeter_mean, area_mean, smoothness_mean, compactness_mean, concavity_mean, concave points_mean: Represents the mean values of the cancer's visual characteristics.

There are also several categorical features where patients in the dataset are labeled with numerical values. You can examine them in the Chart area.

Other features contain specific ranges of average values of the features of the cancer image:

radius_mean, texture_mean, perimeter_mean, area_mean, smoothness_mean, compactness_mean, concavity_mean, concave points_mean

Each of these features is mapped to a table containing the number of values in a given range. You can examine the Chart Tables

Each sample contains the patient's unique ID, the cancer diagnosis and the average values of the cancer's visual characteristics.

Such a dataset can be used to train or test models and algorithms used to make cancer diagnoses. Understanding and analyzing the dataset can contribute to the improvement of cancer-related visual features and diagnosis.

✨ Examples of Projects that can be done with the Data Set

Logistic Regression: This algorithm can be used effectively for binary classification problems. In this dataset, logistic regression may be an appropriate choice since there are "Malignant" (benign) and "Benign" (malignant) classes. It can be used to predict cancer type with the visual features in the dataset.

K-Nearest Neighbors (KNN): KNN classifies an example by looking at the k closest examples around it. This algorithm assumes that patients with similar characteristics tend to have similar types of cancer. KNN can be used for cancer diagnosis by taking into account neighborhood relationships in the data set.

Support Vector Machines (SVM): SVM is effective for classification tasks, especially for two-class problems. Focusing on the clear separation of classes in the dataset, SVM is a powerful algorithm that can be used for cancer diagnosis.

Data Set Related Training Notebooks 😊 ("I Recommend You Review")

K-NN Project: https://www.kaggle.com/code/erdemtaha/prediction-cancer-data-with-k-nn-95

Logistic Regressüon: https://www.kaggle.com/code/erdemtaha/cancer-prediction-96-5-with-logistic-regression

💖 Acknowledgements and Information

This is a copy of content that has been elaborated for educational purposes and published to reach more people, you can access the original source from the link below, please do not forget to support that data

🔗 https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data

This database can also be accessed via the UW CS ftp server: 🔗 ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/

It can also be found at the UCI Machine Learning Repository: 🔗 https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

📩 Personal Information:

If you have some questions or curiosities about the data or studies, you can contact me as you wish from the links below 😊

LinkedIn: https://www.linkedin.com/in/erdem-taha-sokullu/

Mail: erdemtahasokullu@gmail.com

Github: https://github.com/Prometheussx

Kaggle: https://www.kaggle.com/erdemtaha

📜 License:

This Data has a CC BY-NC-SA 4.0 License You can review the license rules from the link below

License Link: https://creativecommons.org/licenses/by-nc-sa/4.0/
i
SEER Breast Cancer Data
ieee-dataport.org
data.niaid.nih.gov
+2more
Updated Jul 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
jing teng (2025). SEER Breast Cancer Data [Dataset]. https://ieee-dataport.org/open-access/seer-breast-cancer-data
Explore at:
Dataset updated
Jul 29, 2025
Authors
jing teng
Description
examined regional LNs
Prostate Cancer Dataset
kaggle.com
zip
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soujanya Hassan Prabhakar (2023). Prostate Cancer Dataset [Dataset]. https://www.kaggle.com/datasets/soujanyahp/prostate-cancer-dataset
Explore at:
zip(2573 bytes)Available download formats
Dataset updated
Jun 15, 2023
Authors
Soujanya Hassan Prabhakar
Description
The data used in this example is sourced from a study conducted by Stamey et al. (1989). The study aimed to investigate the relationship between the level of prostate-specific antigen (PSA) and various clinical measures in a group of 97 men who were scheduled to undergo a radical prostatectomy. PSA is a protein that is produced by the prostate gland, and higher levels of PSA are often associated with a higher likelihood of having prostate cancer. The dataset provides valuable information for examining the correlation between PSA levels and other clinical factors in the context of prostate cancer.

source: https://web.stanford.edu/~hastie/ElemStatLearn/datasets/prostate.data
m
Massachusetts Cancer Data - Interactive City and Town
mass.gov
Updated Feb 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Public Health (2018). Massachusetts Cancer Data - Interactive City and Town [Dataset]. https://www.mass.gov/info-details/massachusetts-cancer-data-interactive-city-and-town
Explore at:
Dataset updated
Feb 16, 2018
Dataset provided by
Office of Health Data, Strategy, and Innovation
Department of Public Health
Area covered
Massachusetts
Description
This page presents data on cancer incidence (new cases) in Massachusetts cities and towns, provided by the Massachusetts Cancer Registry (MACR).
m
The IQ-OTHNCCD lung cancer dataset
data.mendeley.com
Updated Oct 19, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hamdalla alyasriy (2020). The IQ-OTHNCCD lung cancer dataset [Dataset]. http://doi.org/10.17632/bhmdr45bh2.1
Explore at:
Unique identifier
https://doi.org/10.17632/bhmdr45bh2.1
Dataset updated
Oct 19, 2020
Authors
hamdalla alyasriy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases (IQ-OTH/NCCD) lung cancer dataset was collected in the above-mentioned specialist hospitals over a period of three months in fall 2019. It includes CT scans of patients diagnosed with lung cancer in different stages, as well as healthy subjects. IQ-OTH/NCCD slides were marked by oncologists and radiologists in these two centers. The dataset contains a total of 1190 images representing CT scan slices of 110 cases (see Figure 1). These cases are grouped into three classes: normal, benign, and malignant. of these, 40 cases are diagnosed as malignant; 15 cases diagnosed with benign; and 55 cases classified as normal cases. The CT scans were originally collected in DICOM format. The scanner used is SOMATOM from Siemens. CT protocol includes: 120 kV, slice thickness of 1 mm, with window width ranging from 350 to 1200 HU and window center from 50 to 600 were used for reading. with breath hold at full inspiration. All images were de-identified before performing analysis. Written consent was waived by the oversight review board. The study was approved by the institutional review board of participating medical centers. Each scan contains several slices. The number of these slices range from 80 to 200 slices, each of them represents an image of the human chest with different sides and angles. The 110 cases vary in gender, age, educational attainment, area of residence and living status. Some of them are employees of the Iraqi ministries of Transport and Oil, others are farmers and gainers. Most of them come from places in the middle region of Iraq, particularly, the provinces of Baghdad, Wasit, Diyala, Salahuddin, and Babylon.
m
Massachusetts Cancer Registry
mass.gov
Updated Mar 8, 2007
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of Health Data, Strategy, and Innovation (2007). Massachusetts Cancer Registry [Dataset]. https://www.mass.gov/massachusetts-cancer-registry
Explore at:
Dataset updated
Mar 8, 2007
Dataset provided by
Office of Health Data, Strategy, and Innovation
Department of Public Health
Area covered
Massachusetts
Description
The Massachusetts Cancer Registry (MACR) works to improve and save lives through collection and reporting of cancer data.
c
Multimodal imaging of ductal carcinoma in situ with microinvasion
cancerimagingarchive.net
n/a +1
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2023). Multimodal imaging of ductal carcinoma in situ with microinvasion [Dataset]. http://doi.org/10.7937/3fyc-ac78
Explore at:
svs, tiff, and xml, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/3fyc-ac78
Dataset updated
Dec 8, 2023
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Dec 8, 2023
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
Ductal carcinoma in situ with microinvasion (DCISM) is a challenging subtype of breast cancer with controversial invasiveness and prognosis. Accurate diagnosis of DCISM from ductal carcinoma in situ (DCIS) is crucial for optimal treatment and improved clinical outcomes. This dataset provides histopathology images and paired CK5/6 immunohistochemical staining images from patients with DCISM, as well as multiphoton microscopy images of suspicious regions. It offers multi-modal imaging data from various perspectives for analysis and diagnosis of microinvasive breast cancer by other researchers in the field.
The dataset contains data from 12 breast cancer patients, including 10 cases of ductal carcinoma in situ with microinvasion (DCISM), 1 case of ductal carcinoma in situ (DCIS), and 1 case of invasive breast cancer.
The magnification of the glass slide images is 40x. The pathology slide scanner used was created by the Sunny Optical Technology (group) Co., Ltd., and the pixel aspect ratio of the images is 1. The dataset also includes multiphoton microscopy imaging of suspicious microinvasion areas. The multiphoton imaging system was manufactured by Zeiss, and it also has a pixel aspect ratio of 1.
Our database was specifically collected for the use of imaging methods in diagnosing DICSM. The suffixes in each case number indicate the patient's condition - "DCISM" for ductal carcinoma in situ with microinvasion, "DCIS" for ductal carcinoma in situ, and "IDC" for invasive ductal carcinoma. Apart from these labels, we have not collected any additional clinical information for these cases.
Melanoma Skin Cancer Dataset of 10000 Images
kaggle.com
zip
Updated Mar 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Hasnain Javid (2022). Melanoma Skin Cancer Dataset of 10000 Images [Dataset]. https://www.kaggle.com/datasets/hasnainjaved/melanoma-skin-cancer-dataset-of-10000-images
Explore at:
zip(103508268 bytes)Available download formats
Dataset updated
Mar 29, 2022
Authors
Muhammad Hasnain Javid
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Melanoma Skin Cancer Dataset contains 10000 images. Melanoma skin cancer is deadly cancer, early detection and cure can save many lives. This dataset will be useful for developing the deep learning models for accurate classification of melanoma. Dataset consists of 9600 images for training the model and 1000 images for evaluation of model.
o
National Cancer Institute Imaging Data Commons (IDC) Collections
registry.opendata.aws
Updated May 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Imaging Data Commons (IDC)(https://imaging.datacommons.cancer.gov) team (2023). National Cancer Institute Imaging Data Commons (IDC) Collections [Dataset]. https://registry.opendata.aws/nci-imaging-data-commons/
Explore at:
Dataset updated
May 10, 2023
Dataset provided by
Imaging Data Commons (IDC)(<a href="https://imaging.datacommons.cancer.gov">https://imaging.datacommons.cancer.gov</a>) team
Description
Imaging Data Commons (IDC) is a repository within the Cancer Research Data Commons (CRDC) that manages imaging data and enables its integration with the other components of CRDC. IDC hosts a growing number of imaging collections that are contributed by either funded US National Cancer Institute (NCI) data collection activities, or by the individual researchers.Image data hosted by IDC is stored in DICOM format.
Cancer Dataset(Top 50 Populated Countries)
kaggle.com
zip
Updated Jan 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ankush Panday (2025). Cancer Dataset(Top 50 Populated Countries) [Dataset]. https://www.kaggle.com/datasets/ankushpanday1/cancer-datasettop-50-populated-countries
Explore at:
zip(23228945 bytes)Available download formats
Dataset updated
Jan 17, 2025
Authors
Ankush Panday
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset provides a detailed view of global cancer trends across the 50 most populated countries. With 160,000 records, it encompasses a wide range of variables including cancer types, risk factors, healthcare expenditure, and environmental factors. The data is designed to assist researchers, healthcare policymakers, and data scientists in identifying patterns, predicting future trends, and crafting effective cancer control strategies.
n
National Cancer Institute 3D Structure Database
neuinfo.org
dknet.org
+1more
Updated Feb 1, 2001
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2001). National Cancer Institute 3D Structure Database [Dataset]. http://identifiers.org/RRID:SCR_008211/resolver?q=&i=rrid
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008211 https://identifiers.org/RRID:SCR_008211/resolver?q=&i=rrid
Dataset updated
Feb 1, 2001
Description
The NCI DIS 3D database is a collection of 3D structures for over 400,000 drugs. The database is an extension of the NCI Drug Information System. The structural information stored in the DIS is only the connection table for each drug. The connection table is just a list of which atoms are connected and how they are connected. It is essentially a searcheable database of three-dimensional structures has been developed from the chemistry database of the NCI Drug Information System (DIS), a file of about 450,000 primarily organic compounds which have been tested by NCI for anticancer activity. The DIS database is very similar in size and content to the proprietary databases used in the pharmaceutical industry; its development began in the 1950s; and this history led to a number of problems in the generation of 3D structures. This information can be searched to find drugs that share similar patterns of connections, which can correlate with similar biological activity. But the cellular targets for drug action, as well as the drugs themselves, are 3 dimensional objects and advances in computer hardware and software have reached the point where they can be represented as such. In many cases the important points of interaction between a drug and its target can be represented by a 3D arrangement of a small number of atoms. Such a group of atoms is called a pharmacophore. The pharmacophore can be used to search 3D databases and drugs that match the pharmacophore could have similar biological activity, but have very different patterns of atomic connections. Having a diverse set of lead compounds increases the chances of finding an active compound with acceptable properties for clinical development. Sponsor: The ICBG are supported by the Cooperative Agreement mechanism, with funds from nine components of the NIH, the National Science Foundation, and the Foreign Agricultural Service of the USDA.
d
Data from: Cancer Rates
catalog.data.gov
data-lakecountyil.opendata.arcgis.com
+2more
Updated Nov 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lake County Illinois GIS (2024). Cancer Rates [Dataset]. https://catalog.data.gov/dataset/cancer-rates-5cf0c
Explore at:
Dataset updated
Nov 22, 2024
Dataset provided by
Lake County Illinois GIS
Description
Cancer Rates for Lake County Illinois. Explanation of field attributes: Colorectal Cancer - Cancer that develops in the colon (the longest part of the large intestine) and/or the rectum (the last several inches of the large intestine). This is a rate per 100,000. Lung Cancer – Cancer that forms in tissues of the lung, usually in the cells lining air passages. This is a rate per 100,000. Breast Cancer – Cancer that forms in tissues of the breast. This is a rate per 100,000. Prostate Cancer – Cancer that forms in tissues of the prostate. This is a rate per 100,000. Urinary System Cancer – Cancer that forms in the organs of the body that produce and discharge urine. These include the kidneys, ureters, bladder, and urethra. This is a rate per 100,000. All Cancer – All cancers including, but not limited to: colorectal cancer, lung cancer, breast cancer, prostate cancer, and cancer of the urinary system. This is a rate per 100,000.

Facebook

Twitter

Click to copy link

Link copied

Cite

National Cancer Institute (NCI), National Institutes of Health (NIH) (2025). Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER) Registries Limited-Use [Dataset]. https://catalog.data.gov/dataset/cancer-incidence-surveillance-epidemiology-and-end-results-seer-registries-limited-use

Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER) Registries Limited-Use

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jul 16, 2025

Dataset provided by

National Cancer Institutehttp://www.cancer.gov/

Description

SEER Limited-Use cancer incidence data with associated population data. Geographic areas available are county and SEER registry. The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute collects and distributes high quality, comprehensive cancer data from a number of population-based cancer registries. Data include patient demographics, primary tumor site, morphology, stage at diagnosis, first course of treatment, and follow-up for vital status. The SEER Program is the only comprehensive source of population-based information in the United States that includes stage of cancer at the time of diagnosis and survival rates within each stage.

Clear search

Close search

Google apps

Main menu

Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER)...

COSMIC

SEER Cancer Statistics Database

Veterans Affairs Central Cancer Registry (VACCR)

The Cancer Genome Atlas Breast Invasive Carcinoma Collection

CIP TCGA Radiology Initiative

CDC WONDER: Cancer Statistics

UAE Cancer Patient Dataset

breast-cancer

Cancer Data

🦠 Breast Cancer Data Set

📚 The main features of the dataset are as follows:

✨ Examples of Projects that can be done with the Data Set

Data Set Related Training Notebooks 😊 ("I Recommend You Review")

💖 Acknowledgements and Information

📩 Personal Information:

📜 License:

SEER Breast Cancer Data

Prostate Cancer Dataset

Massachusetts Cancer Data - Interactive City and Town

The IQ-OTHNCCD lung cancer dataset

Massachusetts Cancer Registry

Multimodal imaging of ductal carcinoma in situ with microinvasion

Melanoma Skin Cancer Dataset of 10000 Images

National Cancer Institute Imaging Data Commons (IDC) Collections

Cancer Dataset(Top 50 Populated Countries)

National Cancer Institute 3D Structure Database

Data from: Cancer Rates

Cancer Incidence - Surveillance, Epidemiology, and End Results (SEER) Registries Limited-Use