42 datasets found

c
Curated Breast Imaging Subset of Digital Database for Screening Mammography
cancerimagingarchive.net
csv, dicom, n/a
Updated Sep 14, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2017). Curated Breast Imaging Subset of Digital Database for Screening Mammography [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.7O02S9CY
Explore at:
csv, dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.7O02S9CY
Dataset updated
Sep 14, 2017
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Sep 14, 2017
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
This CBIS-DDSM (Curated Breast Imaging Subset of DDSM) is an updated and standardized version of the Digital Database for Screening Mammography (DDSM). The DDSM is a database of 2,620 scanned film mammography studies. It contains normal, benign, and malignant cases with verified pathology information. The scale of the database along with ground truth validation makes the DDSM a useful tool in the development and testing of decision support systems. The CBIS-DDSM collection includes a subset of the DDSM data selected and curated by a trained mammographer. The images have been decompressed and converted to DICOM format. Updated ROI segmentation and bounding boxes, and pathologic diagnosis for training data are also included. A manuscript describing how to use this dataset in detail is available at https://www.nature.com/articles/sdata2017177.

Published research results from work in developing decision support systems in mammography are difficult to replicate due to the lack of a standard evaluation data set; most computer-aided diagnosis (CADx) and detection (CADe) algorithms for breast cancer in mammography are evaluated on private data sets or on unspecified subsets of public databases. Few well-curated public datasets have been provided for the mammography community. These include the DDSM, the Mammographic Imaging Analysis Society (MIAS) database, and the Image Retrieval in Medical Applications (IRMA) project. Although these public data sets are useful, they are limited in terms of data set size and accessibility.
For example, most researchers using the DDSM do not leverage all its images for a variety of historical reasons. When the database was released in 1997, computational resources to process hundreds or thousands of images were not widely available. Additionally, the DDSM images are saved in non-standard compression files that require the use of decompression code that has not been updated or maintained for modern computers. Finally, the ROI annotations for the abnormalities in the DDSM were provided to indicate a general position of lesions, but not a precise segmentation for them. Therefore, many researchers must implement segmentation algorithms for accurate feature extraction. This causes an inability to directly compare the performance of methods or to replicate prior results. The CBIS-DDSM collection addresses that challenge by publicly releasing an curated and standardized version of the DDSM for evaluation of future CADx and CADe systems (sometimes referred to generally as CAD) research in mammography.
Please note that the image data for this collection is structured such that each participant has multiple patient IDs. For example, participant 00038 has 10 separate patient IDs which provide information about the scans within the IDs (e.g. Calc-Test_P_00038_LEFT_CC, Calc-Test_P_00038_RIGHT_CC_1). This makes it appear as though there are 6,671 patients according to the DICOM metadata, but there are only 1,566 actual participants in the cohort.
For scientific and other inquiries about this dataset, please contact TCIA's Helpdesk.
Breast cancer dataset
zenodo.org
zip
Updated Jan 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saiful Izzuan Hussain; Saiful Izzuan Hussain (2025). Breast cancer dataset [Dataset]. http://doi.org/10.5281/zenodo.14769221
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14769221
Dataset updated
Jan 30, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Saiful Izzuan Hussain; Saiful Izzuan Hussain
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset used in this study consists of 7,632 mammogram images categorized into two classes: 2,520 benign and 5,112 malignant images from Huang and Lin (2020). The mammography images in the INbreast database were originally collected from the Centro Hospitalar de S. Joao (CHSJ) Breast Center in Porto. The database contains data collected from August 2008 to July 2010 and includes 115 cases with a total of 410 images (Moreira et al., 2012). Of these, 90 cases concern women with abnormalities in both breasts. Four different types of breast disease are recorded in the database: Mass, calcification, asymmetries and distortions. The mammograms are recorded from two standard perspectives: Craniocaudal (CC) and Mediolateral Oblique (MLO). In addition, breast density is classified into four categories based on the BI-RADS standards: Fully Fat (Density 1), Scattered Fibrous-Landular Density (Density 2), Heterogeneously Dense (Density 3) and Extremely Dense (Density 4). The images are stored in two resolutions: 3328 x 4084 pixels or 2560 x 3328 pixels, in DICOM format. 106 mammograms depicting breast masses were selected from the INbreast database. To enhance the dataset for model training, data augmentation techniques were applied, increasing the total number of breast mammography images to 7,632.
i
Mammograms-Breast Cancer Images
ieee-dataport.org
data.niaid.nih.gov
+1more
Updated Dec 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr G R Sinha (2019). Mammograms-Breast Cancer Images [Dataset]. https://ieee-dataport.org/documents/mammograms-breast-cancer-images
Explore at:
Dataset updated
Dec 27, 2019
Authors
Dr G R Sinha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a small dataset as a part of huge dataset of breast cancer images. The images are mammograms.
m
AISSLab Breast Cancer Dataset: Toward General AI Harmonization with Real...
data.mendeley.com
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aymen Al-Hejri (2025). AISSLab Breast Cancer Dataset: Toward General AI Harmonization with Real Mammogram Imaging [Dataset]. http://doi.org/10.17632/zp8yfhvndv.2
Explore at:
Unique identifier
https://doi.org/10.17632/zp8yfhvndv.2
Dataset updated
Jul 15, 2025
Authors
Aymen Al-Hejri
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The AISSLab Breast Cancer Dataset is a collection of mammogram images by experts from the Ma'amon's Diagnostic Centre Mammogram Images for Breast Cancer (MDCMI-BC) in Yemen. It is designed to support advancements in breast cancer research and computer-aided diagnosis (CAD) systems. To facilitate research in breast cancer detection, focusing on harmonizing AI with diverse imaging data. This dataset emphasizes improving diagnostic accuracy and is available for academic and clinical research applications.

If you are using this dataset for research purpose kindly cite the following papers:

[1] A. M. Al-Hejri, R. M. Al-Tam, M. Fazea, A. H. Sable, S. Lee, and M. A. Al-antari, “ETECADx: Ensemble Self-Attention Transformer Encoder for Breast Cancer Diagnosis Using Full-Field Digital X-ray Breast Images,” Diagnostics, vol. 13, no. 1, p. 89, Dec. 2022, doi: 10.3390/diagnostics13010089.

[2] R. M. Al-Tam, A. M. Al-Hejri, S. S. Alshamrani, M. A. Al-antari, and S. M. Narangale, “Multimodal breast cancer hybrid explainable computer-aided diagnosis using medical mammograms and ultrasound Images,” Biocybern. Biomed. Eng., vol. 44, no. 3, pp. 731–758, Jul. 2024, doi: 10.1016/j.bbe.2024.08.007.
f
Digital mammography Dataset for Breast Cancer Diagnosis Research (DMID)
figshare.com
zip
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Parita Oza; Rajiv Oza; Urvi Oza; Paawan Sharma; Samir Patel; Pankaj Kumar; Bakul Gohel (2023). Digital mammography Dataset for Breast Cancer Diagnosis Research (DMID) [Dataset]. http://doi.org/10.6084/m9.figshare.24522883.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24522883.v2
Dataset updated
Nov 8, 2023
Dataset provided by
figshare
Authors
Parita Oza; Rajiv Oza; Urvi Oza; Paawan Sharma; Samir Patel; Pankaj Kumar; Bakul Gohel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains images of mammograms and can be used for research and education purposes only. The dataset contains DCM images, TIFF images, a Radiology report, a Segmented mask, and pixel level annotation on abnormal regions and csv file that contains other metadata.
r
CSAW-CC (mammography) – a dataset for AI research to improve screening,...
researchdata.se
demo.researchdata.se
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fredrik Strand (2025). CSAW-CC (mammography) – a dataset for AI research to improve screening, diagnostics and prognostics of breast cancer [Dataset]. http://doi.org/10.5878/45vm-t798
Explore at:
(9211529), (29050)Available download formats
Unique identifier
https://doi.org/10.5878/45vm-t798
Dataset updated
Jan 7, 2025
Dataset provided by
Karolinska Institutet
Authors
Fredrik Strand
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2008 - 2015
Area covered
Stockholm County
Description
The dataset contains x-ray images, mammography, from breast cancer screening at the Karolinska University Hospital, Stockholm, Sweden, collected by principal investigator Fredrik Strand at Karolinska Institutet. The purpose for compiling the dataset was to perform AI research to improve screening, diagnostics and prognostics of breast cancer.

The dataset is based on a selection of cases with and without a breast cancer diagnosis, taken from a more comprehensive source dataset.

1,103 cases of first-time breast cancer for women in the screening age range (40-74 years) during the included time period (November 2008 to December 2015) were included. Of these, a random selection of 873 cases have been included in the published dataset.

A random selection of 10,000 healthy controls during the same time period were included. Of these, a random selection of 7,850 cases have been included in the published dataset.

For each individual all screening mammograms, also repeated over time, were included; as well as the date of screening and the age. In addition, there are pixel-level annotations of the tumors created by a breast radiologist (small lesions such as micro-calcifications have been annotated as an area). Annotations were also drawn in mammograms prior to diagnosis; if these contain a single pixel it means no cancer was seen but the estimated location of the center of the future cancer was shown by a single pixel annotation.

In addition to images, the dataset also contains cancer data created at the Karolinska University Hospital and extracted through the Regional Cancer Center Stockholm-Gotland. This data contains information about the time of diagnosis and cancer characteristics including tumor size, histology and lymph node metastasis.

The precision of non-image data was decreased, through categorisation and jittering, to ensure that no single individual can be identified.

The following types of files are available: - CSV: The following data is included (if applicable): cancer/no cancer (meaning breast cancer during 2008 to 2015), age group at screening, days from image to diagnosis (if any), cancer histology, cancer size group, ipsilateral axillary lymph node metastasis. There is one csv file for the entire dataset, with one row per image. Any information about cancer diagnosis is repeated for all rows for an individual who was diagnosed (i.e., it is also included in rows before diagnosis). For each exam date there is the assessment by radiologist 1, radiologist 2 and the consensus decision. - DICOM: Mammograms. For each screening, four images for the standard views were acuqired: left and right, mediolateral oblique and craniocaudal. There should be four files per examination date. - PNG: Cancer annotations. For each DICOM image containing a visible tumor.

Access: The dataset is available upon request due to the size of the material. The image files in DICOM and PNG format comprises approximately 2.5 TB. Access to the CSV file including parametric data is possible via download as associated documentation.
CBIS-DDSM One View Mammograms TFRecords
kaggle.com
Updated Dec 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergio Fuentes (2022). CBIS-DDSM One View Mammograms TFRecords [Dataset]. http://doi.org/10.34740/kaggle/dsv/4429171
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/4429171
Dataset updated
Dec 29, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sergio Fuentes
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
CBIS-DDSM images in TFRecords format. Each example has an imagem or view of a mammogram, with the corresponding label of the image. It can be positive or negative for breast cancer . CBIS-DDSM images were taken from this Kaggle dataset: https://www.kaggle.com/datasets/awsaf49/cbis-ddsm-breast-cancer-image-dataset
m
Breast Mammography Image Dataset with Masses
data.mendeley.com
Updated Jan 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Faramonna (2023). Breast Mammography Image Dataset with Masses [Dataset]. http://doi.org/10.17632/8fztxggjnc.1
Explore at:
Unique identifier
https://doi.org/10.17632/8fztxggjnc.1
Dataset updated
Jan 27, 2023
Authors
David Faramonna
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The mammography dataset includes both benign and malignant tumors. In order to create the pictures for this dataset, 106 masses from the INbreast dataset, 53 masses from the MIAS dataset, and 2188 masses from the DDSM dataset were initially extracted. Then, we preprocess our photos using contrast-limited adaptive histogram equalization and data augmentation. Inbreast dataset has 7632 photos, MIAS dataset has 3816 images, and DDSM dataset includes 13128 images after data augmentation. Additionally, we combine DDSM, MIAS, and INbreast. The size of each image was changed to 227*227 pixels.
D
CBIS-DDSM Dataset
datasetninja.com
Updated Sep 14, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rebecca Sawyer Lee; Francisco Gimenez; Assaf Hoogi (2017). CBIS-DDSM Dataset [Dataset]. https://datasetninja.com/cbis-ddsm
Explore at:
Dataset updated
Sep 14, 2017
Dataset provided by
Dataset Ninja
Authors
Rebecca Sawyer Lee; Francisco Gimenez; Assaf Hoogi
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
The CBIS-DDSM: Curated Breast Imaging Subset of Digital Database for Screening Mammography includes decompressed images, data selection and curation by trained mammographers, updated mass segmentation and bounding boxes, and pathologic diagnosis for training data, formatted similarly to modern computer vision data sets. The data set contains 753 calcification cases and 891 mass cases, providing a data set size capable of analyzing decision support systems in mammography.
i
King Abdulaziz University Breast Cancer Mammogram Dataset
ieee-dataport.org
Updated Apr 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Snehal Sapkale (2024). King Abdulaziz University Breast Cancer Mammogram Dataset [Dataset]. https://ieee-dataport.org/documents/king-abdulaziz-university-breast-cancer-mammogram-dataset
Explore at:
Dataset updated
Apr 10, 2024
Authors
Snehal Sapkale
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
categorizing
s
Data from: CSAW-M: An Ordinal Classification Dataset for Benchmarking...
figshare.scilifelab.se
Updated Jan 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moein Sorkhei; Yue Liu; Hossein Azizpour; Edward Azavedo; Karin Dembrower; Dimitra Ntoula; Anthanasios Zouzos; Fredrik Strand; Kevin Smith (2025). CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer [Dataset]. http://doi.org/10.17044/scilifelab.14687271.v2
Explore at:
Unique identifier
https://doi.org/10.17044/scilifelab.14687271.v2
Dataset updated
Jan 15, 2025
Dataset provided by
KTH Royal Institute of Technology
Authors
Moein Sorkhei; Yue Liu; Hossein Azizpour; Edward Azavedo; Karin Dembrower; Dimitra Ntoula; Anthanasios Zouzos; Fredrik Strand; Kevin Smith
License
https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
Description
Welcome to the the CSAW-M dataset homepageThis page includes the files and metadata related to the CSAW-M, a curated dataset of mammograms with expert assessments of the masking of cancer. CSAW-M is collected from over 10,000 individuals and annotated with potential masking. In contrast to the previous approaches which measure breast image density as a proxy, our dataset directly provides annotations of masking potential assessments from five specialists. We trained deep learning models on CSAW-M to estimate the masking level, and showed that the estimated masking is significantly more predictive of screening participants diagnosed with interval and large invasive cancers — without being explicitly trained for these tasks — than its breast density counterparts. Please find the paper corresponding to our work here and the GitHub repo here.CSAW-M Research Use LicensePlease read carefully all the terms and conditions of the CSAW-M Research Use License. How to access the dataset:If you want to get access to the data, please use the "Request access to files" option above (currently, non-Swedish researchers need to have a general figshare account to be able to to request access). We will ask you to agree to our terms of conditions and provide us with some information about what you will use the data for. We will then receive the request and process it, after which you would be able to download all the files.If you use this Work, please cite our paper:@article{sorkhei2021csaw, title={CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer}, author={Sorkhei, Moein and Liu, Yue and Azizpour, Hossein and Azavedo, Edward and Dembrower, Karin and Ntoula, Dimitra and Zouzos, Athanasios and Strand, Fredrik and Smith, Kevin}, year={2021} }
R
Cancer In Mammogram Dataset
universe.roboflow.com
zip
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thiago (2023). Cancer In Mammogram Dataset [Dataset]. https://universe.roboflow.com/thiago-ffdel/cancer-in-mammogram
Explore at:
zipAvailable download formats
Dataset updated
Dec 10, 2023
Dataset authored and provided by
Thiago
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Cancer Bounding Boxes
Description
Cancer In Mammogram

## Overview Cancer In Mammogram is a dataset for object detection tasks - it contains Cancer annotations for 2,360 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
p
Data from: VinDr-Mammo: A large-scale benchmark dataset for computer-aided...
physionet.org
Updated Mar 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hieu Huy Pham; Hieu Nguyen Trung; Ha Quy Nguyen (2022). VinDr-Mammo: A large-scale benchmark dataset for computer-aided detection and diagnosis in full-field digital mammography [Dataset]. http://doi.org/10.13026/br2v-7517
Explore at:
Unique identifier
https://doi.org/10.13026/br2v-7517
Dataset updated
Mar 21, 2022
Authors
Hieu Huy Pham; Hieu Nguyen Trung; Ha Quy Nguyen
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Breast cancer is one of the most prevalent types of cancer and the leading type of cancer death. Mammography is the recommended imaging modality for periodic breast cancer screening. A few datasets have been published to develop computer-aided tools for mammography analysis. However, these datasets either have a limited sample size or consist of screen-film mammography (SFM), which have been replaced by full-field digital mammography (FFDM) in clinical practices. This project introduces a large-scale full-field digital mammography dataset of 5,000 four-view exams, which are double read by experienced mammographers to provide cancer assessment and breast density following the Breast Imaging Report and Data System (BI-RADS). Breast abnormalities that require further examination are also marked by bounding rectangles.
Mammography Dataset from INbreast, MIAS, and DDSM
kaggle.com
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emilio A. Venegas Hernández (2024). Mammography Dataset from INbreast, MIAS, and DDSM [Dataset]. https://www.kaggle.com/datasets/emiliovenegas1/mammography-dataset-from-inbreast-mias-and-ddsm/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 31, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Emilio A. Venegas Hernández
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Malign and benign mammograms

Malignant and benign mammograms from INbreast, MIAS, and DDSM datasets, were downloaded directly from Lin, Ting-Yu, and Huang, Mei-Ling. Dataset of Breast mammography images with Masses https://doi.org/10.17632/ywsbh3ndr8.2

Normal mammograms

Normal mammograms were sourced from the DDSM webpage: http://www.eng.usf.edu/cvprg/Mammography/Database.html. However, the FTP service is currently not operational. Consequently, using BeautifulSoup (bs4) and PIL, thumbnails of all the normal datasets were extracted, resulting in a total of 2026 files. These files were then augmented and enhanced using CLAHE (Contrast Limited Adaptive Histogram Equalization).

Consult Jupyter Notebook for more information on the methods used for extraction and enhancing from webpage of DDSM
m
Breast Cancer Mammography Dataset with Lymph Node Metastasis Evaluation
mostwiedzy.pl
zip
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maciej Bobowicz (2025). Breast Cancer Mammography Dataset with Lymph Node Metastasis Evaluation [Dataset]. http://doi.org/10.34808/5q1a-sp47
Explore at:
zip(34748941673)Available download formats
Unique identifier
https://doi.org/10.34808/5q1a-sp47
Dataset updated
Jul 31, 2025
Authors
Maciej Bobowicz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This dataset includes clinical and mammographic imaging data from 1289 breast cancer patients, collected retrospectively between 2010 and 2021 at two clinical centers. It aims to explore the effectiveness of artificial intelligence in assessing prognostic indicators from mammography for detecting lymph node metastases in breast cancer. The dataset consists of digital mammography images (FFDM), radiological assessments, and detailed clinical data including histopathological outcomes. Inclusion criteria: diagnosis of breast cancer between 2010-2021, age ≥18 years, availability of preoperative mammography images for both breasts with radiological description, and availability of postoperative histopathological results. Failure to meet any of these conditions constitutes an exclusion criterion.
c
DICOM SR of clinical data and measurement for breast cancer collections to...
cancerimagingarchive.net
dicom, n/a
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive, DICOM SR of clinical data and measurement for breast cancer collections to TCIA [Dataset]. http://doi.org/10.7937/TCIA.2019.wgllssg1
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/TCIA.2019.wgllssg1
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 26, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Data Integration & Imaging Informatics (DI-Cubed) project explored the issue of lack of standardized data capture at the point of data creation, as reflected in the non-image data accompanying various TCIA breast cancer collections. The work addressed the desire for semantic interoperability between various NCI initiatives by aligning on common clinical metadata elements and supporting use cases that connect clinical, imaging, and genomics data. Accordingly, clinical and measurement data was imported into I2B2 and cross-mapped to industry standard concepts for names and values including those derived from BRIDG, CDISC SDTM, DICOM Structured Reporting models and using NCI Thesaurus, SNOMED CT and LOINC controlled terminology. A subset of the standardized data was then exported from I2B2 to CSV and thence converted to DICOM SR according to the the DICOM Breast Imaging Report template [1] , which supports description of patient characteristics, histopathology, receptor status and clinical findings including measurements. The purpose was not to advocate DICOM SR as an appropriate format for interchange or storage of such information for query purposes, but rather to demonstrate that use of standard concepts harmonized across multiple collections could be transformed into an existing standard report representation. The DICOM SR can be stored and used together with the images in repositories such as TCIA and in image viewers that support rendering of DICOM SR content. During the project, various deficiencies in the DICOM Breast Imaging Report template were identified with respect to describing breast MR studies, laterality of findings versus procedures, more recently developed receptor types, and patient characteristics and status. These were addressed via DICOM CP 1838, finalized in Jan 2019, and this subset reflects those changes. DICOM Breast Imaging Report Templates available from: http://dicom.nema.org/medical/dicom/current/output/chtml/part16/sect_BreastImagingReportTemplates.html
N
Radiologist and Deep Neural Network Predictions for Low-pass Filtered...
datacatalog.med.nyu.edu
Updated Jun 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taro Makino; Stanisław Jastrzębski; Witold Oleszkiewicz; Celin Chacko; Robin Ehrenpreis; Naziya Samreen; Chloe Chhor; Eric Kim; Jiyon Lee; Kristine Pysarenko; Beatriu Reig; Hildegard Toth; Divya Awal; Linda Du; Alice Kim; James Y. Park; Daniel K. Sodickson; Laura Heacock; Linda Moy; Kyunghyun Cho; Krzysztof J. Geras (2022). Radiologist and Deep Neural Network Predictions for Low-pass Filtered Mammograms [Dataset]. https://datacatalog.med.nyu.edu/dataset/10518
Explore at:
Dataset updated
Jun 20, 2022
Dataset provided by
NYU Health Sciences Library
Authors
Taro Makino; Stanisław Jastrzębski; Witold Oleszkiewicz; Celin Chacko; Robin Ehrenpreis; Naziya Samreen; Chloe Chhor; Eric Kim; Jiyon Lee; Kristine Pysarenko; Beatriu Reig; Hildegard Toth; Divya Awal; Linda Du; Alice Kim; James Y. Park; Daniel K. Sodickson; Laura Heacock; Linda Moy; Kyunghyun Cho; Krzysztof J. Geras
Area covered
New York (State) - New York City
Description
Investigators manipulated images from the NYU Breast Cancer Screening Dataset to identify differences in the the features of perception used in diagnosis by radiologists versus deep neural networks (DNNs). Two studies were conducted. In the reader study, a set of 720 exams were processed with Gaussian low-pass filtering at varying severity levels and ten radiologists and five DNNs (trained on unperturbed data) provided binary predictions on whether a malignant lesion was present in each breast (yes or no). In the annotation reader study, a subset of 120 exams with malignant images were presented to seven radiologists for their annotation of up to three regions of interest (ROIs) containing suspicious features. Low-pass filtering was applied to the interior and exterior of ROIs and the entire image before the images were presented to DNNs (trained on unperturbed data). The resulting dataset contains radiologist and DNN reader predictions and radiologist annotations from both studies.
f
Data distribution of CBIS-DDSM dataset.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jawad Ahmad; Sheeraz Akram; Arfan Jaffar; Zulfiqar Ali; Sohail Masood Bhatti; Awais Ahmad; Shafiq Ur Rehman (2024). Data distribution of CBIS-DDSM dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0304757.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0304757.t001
Dataset updated
Jul 11, 2024
Dataset provided by
PLOS ONE
Authors
Jawad Ahmad; Sheeraz Akram; Arfan Jaffar; Zulfiqar Ali; Sohail Masood Bhatti; Awais Ahmad; Shafiq Ur Rehman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recent advancements in AI, driven by big data technologies, have reshaped various industries, with a strong focus on data-driven approaches. This has resulted in remarkable progress in fields like computer vision, e-commerce, cybersecurity, and healthcare, primarily fueled by the integration of machine learning and deep learning models. Notably, the intersection of oncology and computer science has given rise to Computer-Aided Diagnosis (CAD) systems, offering vital tools to aid medical professionals in tumor detection, classification, recurrence tracking, and prognosis prediction. Breast cancer, a significant global health concern, is particularly prevalent in Asia due to diverse factors like lifestyle, genetics, environmental exposures, and healthcare accessibility. Early detection through mammography screening is critical, but the accuracy of mammograms can vary due to factors like breast composition and tumor characteristics, leading to potential misdiagnoses. To address this, an innovative CAD system leveraging deep learning and computer vision techniques was introduced. This system enhances breast cancer diagnosis by independently identifying and categorizing breast lesions, segmenting mass lesions, and classifying them based on pathology. Thorough validation using the Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) demonstrated the CAD system’s exceptional performance, with a 99% success rate in detecting and classifying breast masses. While the accuracy of detection is 98.5%, when segmenting breast masses into separate groups for examination, the method’s performance was approximately 95.39%. Upon completing all the analysis, the system’s classification phase yielded an overall accuracy of 99.16% for classification. The potential for this integrated framework to outperform current deep learning techniques is proposed, despite potential challenges related to the high number of trainable parameters. Ultimately, this recommended framework offers valuable support to researchers and physicians in breast cancer diagnosis by harnessing cutting-edge AI and image processing technologies, extending recent advances in deep learning to the medical domain.
RSNA Mammography Breast Cancer TFRecord Dataset
kaggle.com
Updated Dec 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
muhammed (2023). RSNA Mammography Breast Cancer TFRecord Dataset [Dataset]. https://www.kaggle.com/datasets/clkmuhammed/rsna-mammography-breast-cancer-tfrecord-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 17, 2023
Dataset provided by
Kaggle
Authors
muhammed
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description

Source RSNA Screening Mammography Breast Cancer Detection

Processing of the huge 314GB+ Dataset (Include 54713 Images) of this competition into TFRecords for fast dataloading during training.

All images are resized to 768x1280 and saved in 100 TFRecords, making each TFRecord contain roughly 548 images as 8.6GB+ Dataset.

TFRecords have the benefit of loading large chunks of data containing many samples instead of loading every image and label seperately.

Dataset Description

Note: The dataset for this challenge contains radiographic breast images of female subjects. The goal of this competition is to identify cases of breast cancer in mammograms from screening exams. It is important to identify cases of cancer for obvious reasons, but false positives also have downsides for patients. As millions of women get mammograms each year, a useful machine learning tool could help a great many people. This competition uses a hidden test. When your submitted notebook is scored the actual test data (including a full length sample submission) will be made available to your notebook.

Files

[train/test]_images/[patient_id]/[image_id].dcm The mammograms, in dicom format. You can expect roughly 8,000 patients in the hidden test set. There are usually but not always 4 images per patient. Note that many of the images use the jpeg 2000 format which may you may need special libraries to load.

sample_submission.csv A valid sample submission. Only the first few rows are available for download.

[train/test].csv Metadata for each patient and image. Only the first few rows of the test set are available for download.

site_id - ID code for the source hospital. patient_id - ID code for the patient. image_id - ID code for the image. laterality - Whether the image is of the left or right breast. view - The orientation of the image. The default for a screening exam is to capture two views per breast. age - The patient's age in years. implant - Whether or not the patient had breast implants. Site 1 only provides breast implant information at the patient level, not at the breast level. density - A rating for how dense the breast tissue is, with A being the least dense and D being the most dense. Extremely dense tissue can make diagnosis more difficult. Only provided for train. machine_id - An ID code for the imaging device. cancer - Whether or not the breast was positive for malignant cancer. The target value. Only provided for train. biopsy - Whether or not a follow-up biopsy was performed on the breast. Only provided for train. invasive - If the breast is positive for cancer, whether or not the cancer proved to be invasive. Only provided for train. BIRADS - 0 if the breast required follow-up, 1 if the breast was rated as negative for cancer, and 2 if the breast was rated as normal. Only provided for train. prediction_id - The ID for the matching submission row. Multiple images will share the same prediction ID. Test only. difficult_negative_case - True if the case was unusually difficult. Only provided for train.
H
OPTIMAM Mammographic Image Database
find.data.gov.scot
dtechtive.com
Updated Jul 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cancer Research Horizons (2023). OPTIMAM Mammographic Image Database [Dataset]. https://find.data.gov.scot/datasets/25791
Explore at:
Dataset updated
Jul 3, 2023
Dataset provided by
Cancer Research Horizons
Area covered
United Kingdom
Description
The OPTIMAM Mammography Image Database is a sharable resource with processed and unprocessed mammography images from United Kingdom breast screening centers, with annotated cancers and clinical details.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Cancer Imaging Archive (2017). Curated Breast Imaging Subset of Digital Database for Screening Mammography [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.7O02S9CY

Curated Breast Imaging Subset of Digital Database for Screening Mammography

CBIS-DDSM

Explore at:

89 scholarly articles cite this dataset (View in Google Scholar)

csv, dicom, n/aAvailable download formats

Unique identifier

https://doi.org/10.7937/K9/TCIA.2016.7O02S9CY

Dataset updated

Sep 14, 2017

Dataset authored and provided by

The Cancer Imaging Archive

License

https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

Time period covered

Sep 14, 2017

Dataset funded by

National Cancer Institutehttp://www.cancer.gov/

Description

This CBIS-DDSM (Curated Breast Imaging Subset of DDSM) is an updated and standardized version of the Digital Database for Screening Mammography (DDSM). The DDSM is a database of 2,620 scanned film mammography studies. It contains normal, benign, and malignant cases with verified pathology information. The scale of the database along with ground truth validation makes the DDSM a useful tool in the development and testing of decision support systems. The CBIS-DDSM collection includes a subset of the DDSM data selected and curated by a trained mammographer. The images have been decompressed and converted to DICOM format. Updated ROI segmentation and bounding boxes, and pathologic diagnosis for training data are also included. A manuscript describing how to use this dataset in detail is available at https://www.nature.com/articles/sdata2017177.

Published research results from work in developing decision support systems in mammography are difficult to replicate due to the lack of a standard evaluation data set; most computer-aided diagnosis (CADx) and detection (CADe) algorithms for breast cancer in mammography are evaluated on private data sets or on unspecified subsets of public databases. Few well-curated public datasets have been provided for the mammography community. These include the DDSM, the Mammographic Imaging Analysis Society (MIAS) database, and the Image Retrieval in Medical Applications (IRMA) project. Although these public data sets are useful, they are limited in terms of data set size and accessibility.

For example, most researchers using the DDSM do not leverage all its images for a variety of historical reasons. When the database was released in 1997, computational resources to process hundreds or thousands of images were not widely available. Additionally, the DDSM images are saved in non-standard compression files that require the use of decompression code that has not been updated or maintained for modern computers. Finally, the ROI annotations for the abnormalities in the DDSM were provided to indicate a general position of lesions, but not a precise segmentation for them. Therefore, many researchers must implement segmentation algorithms for accurate feature extraction. This causes an inability to directly compare the performance of methods or to replicate prior results. The CBIS-DDSM collection addresses that challenge by publicly releasing an curated and standardized version of the DDSM for evaluation of future CADx and CADe systems (sometimes referred to generally as CAD) research in mammography.

Please note that the image data for this collection is structured such that each participant has multiple patient IDs. For example, participant 00038 has 10 separate patient IDs which provide information about the scans within the IDs (e.g. Calc-Test_P_00038_LEFT_CC, Calc-Test_P_00038_RIGHT_CC_1). This makes it appear as though there are 6,671 patients according to the DICOM metadata, but there are only 1,566 actual participants in the cohort.

For scientific and other inquiries about this dataset, please contact TCIA's Helpdesk.

Clear search

Close search

Google apps

Main menu

Curated Breast Imaging Subset of Digital Database for Screening Mammography

Breast cancer dataset

Mammograms-Breast Cancer Images

AISSLab Breast Cancer Dataset: Toward General AI Harmonization with Real...

Digital mammography Dataset for Breast Cancer Diagnosis Research (DMID)

CSAW-CC (mammography) – a dataset for AI research to improve screening,...

CBIS-DDSM One View Mammograms TFRecords

Breast Mammography Image Dataset with Masses

CBIS-DDSM Dataset

King Abdulaziz University Breast Cancer Mammogram Dataset

Data from: CSAW-M: An Ordinal Classification Dataset for Benchmarking...

Cancer In Mammogram Dataset

Cancer In Mammogram

Data from: VinDr-Mammo: A large-scale benchmark dataset for computer-aided...

Mammography Dataset from INbreast, MIAS, and DDSM

Malign and benign mammograms

Normal mammograms

Breast Cancer Mammography Dataset with Lymph Node Metastasis Evaluation

DICOM SR of clinical data and measurement for breast cancer collections to...

Radiologist and Deep Neural Network Predictions for Low-pass Filtered...

Data distribution of CBIS-DDSM dataset.

RSNA Mammography Breast Cancer TFRecord Dataset

OPTIMAM Mammographic Image Database

Curated Breast Imaging Subset of Digital Database for Screening MammographySee More Versions

CBIS-DDSM

Curated Breast Imaging Subset of Digital Database for Screening Mammography