7 datasets found

c
Curated Breast Imaging Subset of Digital Database for Screening Mammography
cancerimagingarchive.net
csv, dicom, n/a
Updated Sep 14, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2017). Curated Breast Imaging Subset of Digital Database for Screening Mammography [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.7O02S9CY
Explore at:
csv, dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.7O02S9CY
Dataset updated
Sep 14, 2017
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Sep 14, 2017
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
This CBIS-DDSM (Curated Breast Imaging Subset of DDSM) is an updated and standardized version of the Digital Database for Screening Mammography (DDSM). The DDSM is a database of 2,620 scanned film mammography studies. It contains normal, benign, and malignant cases with verified pathology information. The scale of the database along with ground truth validation makes the DDSM a useful tool in the development and testing of decision support systems. The CBIS-DDSM collection includes a subset of the DDSM data selected and curated by a trained mammographer. The images have been decompressed and converted to DICOM format. Updated ROI segmentation and bounding boxes, and pathologic diagnosis for training data are also included. A manuscript describing how to use this dataset in detail is available at https://www.nature.com/articles/sdata2017177.

Published research results from work in developing decision support systems in mammography are difficult to replicate due to the lack of a standard evaluation data set; most computer-aided diagnosis (CADx) and detection (CADe) algorithms for breast cancer in mammography are evaluated on private data sets or on unspecified subsets of public databases. Few well-curated public datasets have been provided for the mammography community. These include the DDSM, the Mammographic Imaging Analysis Society (MIAS) database, and the Image Retrieval in Medical Applications (IRMA) project. Although these public data sets are useful, they are limited in terms of data set size and accessibility.
For example, most researchers using the DDSM do not leverage all its images for a variety of historical reasons. When the database was released in 1997, computational resources to process hundreds or thousands of images were not widely available. Additionally, the DDSM images are saved in non-standard compression files that require the use of decompression code that has not been updated or maintained for modern computers. Finally, the ROI annotations for the abnormalities in the DDSM were provided to indicate a general position of lesions, but not a precise segmentation for them. Therefore, many researchers must implement segmentation algorithms for accurate feature extraction. This causes an inability to directly compare the performance of methods or to replicate prior results. The CBIS-DDSM collection addresses that challenge by publicly releasing an curated and standardized version of the DDSM for evaluation of future CADx and CADe systems (sometimes referred to generally as CAD) research in mammography.
Please note that the image data for this collection is structured such that each participant has multiple patient IDs. For example, participant 00038 has 10 separate patient IDs which provide information about the scans within the IDs (e.g. Calc-Test_P_00038_LEFT_CC, Calc-Test_P_00038_RIGHT_CC_1). This makes it appear as though there are 6,671 patients according to the DICOM metadata, but there are only 1,566 actual participants in the cohort.
For scientific and other inquiries about this dataset, please contact TCIA's Helpdesk.
CBIS DDSM Dataset
kaggle.com
Updated Apr 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Orvile (2025). CBIS DDSM Dataset [Dataset]. https://www.kaggle.com/datasets/orvile/cbis-ddsm-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Orvile
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

The CBIS-DDSM: Curated Breast Imaging Subset of Digital Database for Screening Mammography includes decompressed images, data selection and curation by trained mammographers, updated mass segmentation and bounding boxes, and pathologic diagnosis for training data, formatted similarly to modern computer vision data sets. The data set contains 753 calcification cases and 891 mass cases, providing a data set size capable of analyzing decision support systems in mammography.

Authors mention that published research results are difficult to replicate due to the lack of a standard evaluation data set in the area of decision support systems in mammography; most computer-aided diagnosis (CADx) and detection (CADe) algorithms for breast cancer in mammography are evaluated on private data sets or on unspecified subsets of public databases. This causes an inability to directly compare the performance of methods or to replicate prior results. Authors seek to resolve this substantial challenge by releasing an updated and standardized version of the Digital Database for Screening Mammography (DDSM) for evaluation of future CADx and CADe systems (sometimes referred to generally as CAD) research in mammography.

The DDSM is a collection of mammograms from the following sources: Massachusetts General Hospital, Wake Forest University School of Medicine, Sacred Heart Hospital, and Washington University of St Louis School of Medicine. The DDSM was developed through a grant from the DOD Breast Cancer Research Program, US Army Research and Material Command, and the necessary patient consents were obtained by the original developers of the DDSM. The cases are annotated with ROIs for calcifications and masses, as well as the following information that may be useful for CADe and CADx algorithms: Breast Imaging Reporting and Data System (BI-RADS) descriptors for mass shape, mass margin, calcification type, calcification distribution, and breast density; overall BI-RADS assessment from 0 to 5; rating of the subtlety of the abnormality from 1 to 5; and patient age.

Mass segmentation

Mass margin and shape have long been proven substantial indicators for diagnosis in mammography. Because of this, many methods are based on developing mathematical descriptions of the tumour outline. Due to the dependence of these methods on accurate ROI segmentation and the imprecise nature of many of the DDSM-provided annotations, as seen in Fig. 1, we applied a lesion segmentation algorithm (described below) that is initialized by the general original DDSM contours but is able to supply much more accurate ROIs. Figure 1 contains example ROIs from the DDSM, our mammographer, and the automated segmentation algorithm. As shown, the DDSM outlines provide only a general location and not a precise mass boundary. The segmentation algorithm was designed to provide an exact delineation of the mass from the surrounding tissue. This segmentation was done only for masses and not calcifications.

Standardized train/test splits

Separate sets of cases for training and testing algorithms are important for ensuring that all researchers are using the same cases for these tasks. Specifically, the test set should contain cases of varying difficulty in order to ensure that the method is tested thoroughly. The data were split into a training set and a testing set based on the BI-RADS category. This allows for an appropriate stratification for researchers working on CADe as well as CADx. Note that many of the BI-RADS assessments likely were updated after additional information was gathered by the physician, as it is unconventional to subscribe BI-RADS 4 and 5 to screening images. The split was obtained using 20% of the cases for testing and the rest for training. The data were split for all mass cases and all calcification cases separately. Here ‘case’ is used to indicate a particular abnormality, seen on the craniocaudal (CC) and/or mediolateral oblique (MLO) views, which are the standard views for screening mammography. Figure 2 displays the histograms of BI-RADS assessment and pathology for the training and test sets for calcification cases and mass cases. As shown, the data split was performed in such a way as to provide an equal level of difficulty in the training and test sets.

Data Records

The original images are distributed at the full mammography and abnormality level as DICOM files. Full mammography images include both MLO and CC views of the mammograms.

Metadata for each abnormality was transferred from the original csv files to tag format. For example:

Patient ID: the first 7 characters of images in the case file Density category Breast: Left or Right View: CC or MLO Mass shape (when applicable) Mass margin (when applicable) Calcification type (when applicable) Calcification d...
h
CBIS-DDSM_1024
huggingface.co
Updated May 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sean Baek (2025). CBIS-DDSM_1024 [Dataset]. https://huggingface.co/datasets/dbaek111/CBIS-DDSM_1024
Explore at:
Dataset updated
May 26, 2025
Authors
Sean Baek
Description
The CBIS-DDSM dataset consists of mammograms for 1,566 patients provided in DICOM format with metadata in CSV files. Among its contents, the full mammogram images, which originally numbered 3,120, had 34 excluded, resulting in 3,086 images. These were then converted to 8-bit PNG files and organized into 'cancer' and 'not_cancer' folders based on their pathology for both training and testing purposes.The CBIS-DDSM dataset consists of mammograms for 1,566 patients provided in DICOM format with… See the full description on the dataset page: https://huggingface.co/datasets/dbaek111/CBIS-DDSM_1024.
f
Approaches comparison on CBIS DDSM dataset.
plos.figshare.com
xls
Updated Oct 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mudassar Ali; Tong Wu; Haoji Hu; Tariq Mahmood (2024). Approaches comparison on CBIS DDSM dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0309421.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0309421.t005
Dataset updated
Oct 2, 2024
Dataset provided by
PLOS ONE
Authors
Mudassar Ali; Tong Wu; Haoji Hu; Tariq Mahmood
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PurposeUsing computer-aided design (CAD) systems, this research endeavors to enhance breast cancer segmentation by addressing data insufficiency and data complexity during model training. As perceived by computer vision models, the inherent symmetry and complexity of mammography images make segmentation difficult. The objective is to optimize the precision and effectiveness of medical imaging.MethodsThe study introduces a hybrid strategy combining shape-guided segmentation (SGS) and M3D-neural cellular automata (M3D-NCA), resulting in improved computational efficiency and performance. The implementation of Shape-guided segmentation (SGS) during the initialization phase, coupled with the elimination of convolutional layers, enables the model to effectively reduce computation time. The research proposes a novel loss function that combines segmentation losses from both components for effective training.ResultsThe robust technique provided aims to improve the accuracy and consistency of breast tumor segmentation, leading to significant improvements in medical imaging and breast cancer detection and treatment.ConclusionThis study enhances breast cancer segmentation in medical imaging using CAD systems. Combining shape-guided segmentation (SGS) and M3D-neural cellular automata (M3D-NCA) is a hybrid approach that improves performance and computational efficiency by dealing with complex data and not having enough training data. The approach also reduces computing time and improves training efficiency. The study aims to improve breast cancer detection and treatment methods in medical imaging technology.
g
DDSM Dataset
gts.ai
json
Updated Jul 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). DDSM Dataset [Dataset]. https://gts.ai/dataset-download/ddsm-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Jul 5, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Explore the comprehensive DDSM and CBIS-DDSM mammogram image dataset, featuring 55,890 pre-processed images resized to 299x299 pixels.
i
Re-curated Breast Imaging Subset DDSM Dataset (RBIS-DDSM)
ieee-dataport.org
Updated Mar 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RAKSHITH SATHISH (2022). Re-curated Breast Imaging Subset DDSM Dataset (RBIS-DDSM) [Dataset]. https://ieee-dataport.org/documents/re-curated-breast-imaging-subset-ddsm-dataset-rbis-ddsm
Explore at:
Dataset updated
Mar 3, 2022
Authors
RAKSHITH SATHISH
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Re-curated Breast Imaging Subset DDSM Dataset (RBIS-DDSM) is a curated version of 849 images from the CBIS-DDSM dataset available online with a permissive copyright license (CC-BY-SA 3.0). The CBIS-DDSM dataset is an improved version of the DDSM dataset. The authors of the CBIS-DDSM dataset attempted to improve the ground truth by applying simple image processing based methods to enhance the edges without any manual intervention from medical experts in order to segment and annotate masses.
The Complete Mini-DDSM
kaggle.com
Updated Mar 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abbas Cheddad (2021). The Complete Mini-DDSM [Dataset]. https://www.kaggle.com/cheddad/miniddsm2/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 24, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abbas Cheddad
License
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
Description
[2021-03-23] Updated: Enjoy!

Consent

By downloading this complete Mini-DDSM Data Set, you agree to the following:

This page on Kaggle remains the main source of this public data set (i.e., no redistribution of this data set)

In any resultant publications of research that uses the paper / data set, due credits (to recognize the efforts of my team) must be provided to: [Ref paper/Mini-DDSM] C.D. Lekamlage, F. Afzal, E. Westerberg and A. Cheddad, “Mini-DDSM: Mammography-based Automatic Age Estimation,” in the 3rd International Conference on Digital Medicine and Image Processing (DMIP 2020), ACM, Kyoto, Japan, November 06-09, 2020, pp: 1-6. And [Ref DDSM] Michael Heath, Kevin Bowyer, Daniel Kopans, Richard Moore and W. Philip Kegelmeyer, in Proceedings of the Fifth International Workshop on Digital Mammography, M.J. Yaffe, ed., 212-218, Medical Physics Publishing, 2001. ISBN 1-930524-00-5.

Context & Data Set Characteristics

You can read the Paper that describes the initial attempt to collect this free data set and the experiments we conducted. It required a tremendous time, coding and machine processing power to get it in shape to make it as much as possible accessible for the research community. Below, are some of the merits of this new Mini-DDSM version:

There is a scarcity in the availability of large public and fully annotated healthcare data sets

The intention here is to make an easy access to the DDSM (half resolution though)

The data set comes along with the age/density attributes, patient folders (condition: benign, cancer, healthy), original filename identification, and suspicious/tumor contour binary mask.

The lesion binary mask is constructed based on the original freeman chain-coding, so this data set prevents you that inconvenience.

The data set can act as a validation platform for machine learning developed/under development algorithms **(see an example (imputation of missing data using DL) of such interesting ML topics in the "Tasks" tab above)- Tasks tab has been removed by Kaggle- **

There are still open research questions that this data set along with deep learning may need to address

No complication of extracting/loading images from tfrecords. You want images, you get images! So, whether you are using Python, MATLAB, JAVA, C++, you have the images stored as images.

Free of charge and open access, no lengthy protocols and no forms to fill/sign

This data set comes with an excel sheet that gives you a direct access to all image attributes and metadata (see Fig. 1) ==> Get it here**

Due to several requests from people having machine/internet bandwidth limitations that do not allow them to download the 47 GB data set, Folder: -MINI-DDSM-Complete-PNG-16-, we also provide this dataset in JPEG format (~4 GB), Folder: -MINI-DDSM-Complete-JPEG-8-.**

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1822946%2F3905483ba6e03b7142a9121a03824558%2FRaws.png?generation=1609421580586145&alt=media" alt="">

Figure 1. The first few rows of the accompanying excel sheet.

Content

This is the light-weight version of the popular DDSM (Digital Database for Screening Mammography) [Ref] data set which currently is obsolete. To answer the nagging question why Mini-DDSM, it is important to know that the DDSM database has a website maintained at the University of South Florida for purposes of keeping it accessible on the web. However, image files are compressed with lossless JPEG (i.e., “.LJPEG”) encoding that are generated using a broken software (or at least an outdated tool as described on the DDSM website). CBIS-DDSM provides an alternative host of the original DDSM, but unfortunately, images are stripped from their original identification filename and from the age attribute. Figure 2 illustrates the age distribution in this complete Mini-DDSM and Fig.3 exhibits the density (amount of Fibroglandular tissue) distribution using Bi-Rads scoring.

https://raw.githubusercontent.com/ARDISDataset/MiniDDSM/master/AgeDistributionW.png" alt="Age Distr"> Figure 2. Age distribution in this complete version of the Mini-DDSM data set.

https://raw.githubusercontent.com/ARDISDataset/MiniDDSM/master/BIRADS.png" alt="Density"> Figure 3. Density distribution in this complete version of the Mini-DDSM data set.

Inspiration

Please give us feedback/suggestions to improve the data set to: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1822946%2F756766cadde4657770f39cc63613908f%2FContact.png?generation=1605701971315133&alt=media" alt="">
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Cancer Imaging Archive (2017). Curated Breast Imaging Subset of Digital Database for Screening Mammography [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.7O02S9CY

Curated Breast Imaging Subset of Digital Database for Screening Mammography

CBIS-DDSM

Explore at:

85 scholarly articles cite this dataset (View in Google Scholar)

csv, dicom, n/aAvailable download formats

Unique identifier

https://doi.org/10.7937/K9/TCIA.2016.7O02S9CY

Dataset updated

Sep 14, 2017

Dataset authored and provided by

The Cancer Imaging Archive

License

https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

Time period covered

Sep 14, 2017

Dataset funded by

National Cancer Institutehttp://www.cancer.gov/

Description

This CBIS-DDSM (Curated Breast Imaging Subset of DDSM) is an updated and standardized version of the Digital Database for Screening Mammography (DDSM). The DDSM is a database of 2,620 scanned film mammography studies. It contains normal, benign, and malignant cases with verified pathology information. The scale of the database along with ground truth validation makes the DDSM a useful tool in the development and testing of decision support systems. The CBIS-DDSM collection includes a subset of the DDSM data selected and curated by a trained mammographer. The images have been decompressed and converted to DICOM format. Updated ROI segmentation and bounding boxes, and pathologic diagnosis for training data are also included. A manuscript describing how to use this dataset in detail is available at https://www.nature.com/articles/sdata2017177.

Published research results from work in developing decision support systems in mammography are difficult to replicate due to the lack of a standard evaluation data set; most computer-aided diagnosis (CADx) and detection (CADe) algorithms for breast cancer in mammography are evaluated on private data sets or on unspecified subsets of public databases. Few well-curated public datasets have been provided for the mammography community. These include the DDSM, the Mammographic Imaging Analysis Society (MIAS) database, and the Image Retrieval in Medical Applications (IRMA) project. Although these public data sets are useful, they are limited in terms of data set size and accessibility.

For example, most researchers using the DDSM do not leverage all its images for a variety of historical reasons. When the database was released in 1997, computational resources to process hundreds or thousands of images were not widely available. Additionally, the DDSM images are saved in non-standard compression files that require the use of decompression code that has not been updated or maintained for modern computers. Finally, the ROI annotations for the abnormalities in the DDSM were provided to indicate a general position of lesions, but not a precise segmentation for them. Therefore, many researchers must implement segmentation algorithms for accurate feature extraction. This causes an inability to directly compare the performance of methods or to replicate prior results. The CBIS-DDSM collection addresses that challenge by publicly releasing an curated and standardized version of the DDSM for evaluation of future CADx and CADe systems (sometimes referred to generally as CAD) research in mammography.

Please note that the image data for this collection is structured such that each participant has multiple patient IDs. For example, participant 00038 has 10 separate patient IDs which provide information about the scans within the IDs (e.g. Calc-Test_P_00038_LEFT_CC, Calc-Test_P_00038_RIGHT_CC_1). This makes it appear as though there are 6,671 patients according to the DICOM metadata, but there are only 1,566 actual participants in the cohort.

For scientific and other inquiries about this dataset, please contact TCIA's Helpdesk.

Clear search

Close search

Google apps

Main menu

Curated Breast Imaging Subset of Digital Database for Screening Mammography

CBIS DDSM Dataset

Introduction

Mass segmentation

Standardized train/test splits

Data Records

CBIS-DDSM_1024

Approaches comparison on CBIS DDSM dataset.

DDSM Dataset

Re-curated Breast Imaging Subset DDSM Dataset (RBIS-DDSM)

The Complete Mini-DDSM

[2021-03-23] Updated: Enjoy!

Consent

Context & Data Set Characteristics

Content

Inspiration

Curated Breast Imaging Subset of Digital Database for Screening MammographySee More Versions

CBIS-DDSM

Curated Breast Imaging Subset of Digital Database for Screening Mammography