Last updated
Download format
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Cost to access
Described as free to access or have a license that allows redistribution.
10 datasets found
  1. DDSM Mammography

    Updated Jul 3, 2018
  2. T


    Updated Apr 28, 2020
  3. IRMA Version of DDSM LJPEG DataDigital Database for Screening Mammography...

    Updated 2010
  4. o

    A Heuristic Approach to Automated Nipple Detection in Digital Mammograms

    Updated Feb 20, 2013
  5. o

    Data from: Hidden Markov tree model applied to the detection of...

    Updated Jul 9, 2007
  6. f

    Bayesian Classifier Performance.

    Updated Dec 2, 2015
  7. cbis-ddsm-lastcopy

    Updated Mar 30, 2020
  8. f

    Datasets’ characteristics used in this study.

    Updated Feb 9, 2018
  9. f

    Bayesian classifier performancea.

    Updated Dec 2, 2015
  10. f

    KNN classifier performance.

    Updated Dec 2, 2015
  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Click to copy link
Link copied

DDSM Mammography

tfrecords files of scans from the DDSM dataset

17 scholarly articles cite this dataset (View in Google Scholar)
zip (6377193753 bytes)Available download formats
Dataset updated Jul 3, 2018
Eric A. Scuccimarra

CC0 1.0 Universal Public Domain Dedication
License information was derived automatically



This dataset consists of images from the DDSM [1] and CBIS-DDSM [3] datasets. The images have been pre-processed and converted to 299x299 images by extracting the ROIs. The data is stored as tfrecords files for TensorFlow.

The dataset contains 55,890 training examples, of which 14% are positive and the remaining 86% negative, divided into 5 tfrecords files.

Note - The data has been separated into training and test as per the division in the CBIS-DDSM dataset. The test files have been divided equally into test and validation data. However the split between test and validation data was done incorrectly, resulted in the test numpy files containing only masses and the validation files containing only calcifications. These files should be combined in order to have balanced and complete test data.


The dataset consists of negative images from the DDSM dataset and positive images from the CBIS-DDSM dataset. The data was pre-processed to convert it into 299x299 images.

The negative (DDSM) images were tiled into 598x598 tiles, which were then resized to 299x299.

The positive (CBIS-DDSM) images had their ROIs extracted using the masks with a small amount of padding to provide context. Each ROI was then randomly cropped three times into 598x598 images, with random flips and rotations, and then the images were resized down to 299x299.

The images are labeled with two labels:

  1. label_normal - 0 for negative and 1 for positive
  2. label - full multi-class labels, 0 is negative, 1 is benign calcification, 2 is benign mass, 3 is malignant calcification, 4 is malignant mass

The following Python code will decode the training examples:

   features = tf.parse_single_example(
            'label': tf.FixedLenFeature([], tf.int64),
            'label_normal': tf.FixedLenFeature([], tf.int64),
            'image': tf.FixedLenFeature([], tf.string)

    # extract the data
    label = features['label_normal']
    image = tf.decode_raw(features['image'], tf.uint8)

    # reshape and scale the image
    image = tf.reshape(image, [299, 299, 1])

The training examples do include images which contain content other than breast tissue, such as black background and occasionally overlay text.


Previous work [5] has already dealt with classifying pre-identified lesions, this dataset was created with the intention of classifying raw scans as positive or negative by detecting abnormalities. The ability to automatically detect lesions could save many lives.


[1] The Digital Database for Screening Mammography, Michael Heath, Kevin Bowyer, Daniel Kopans, Richard Moore and W. Philip Kegelmeyer, in Proceedings of the Fifth International Workshop on Digital Mammography, M.J. Yaffe, ed., 212-218, Medical Physics Publishing, 2001. ISBN 1-930524-00-5.

[2] Current status of the Digital Database for Screening Mammography, Michael Heath, Kevin Bowyer, Daniel Kopans, W. Philip Kegelmeyer, Richard Moore, Kyong Chang, and S. Munish Kumaran, in Digital Mammography, 457-460, Kluwer Academic Publishers, 1998; Proceedings of the Fourth International Workshop on Digital Mammography.

[3] Rebecca Sawyer Lee, Francisco Gimenez, Assaf Hoogi , Daniel Rubin (2016). Curated Breast Imaging Subset of DDSM. The Cancer Imaging Archive.

[4] Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057.

[5] D. Levy, A. Jain, Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks, arXiv:1612.00542v1, 2016

Clear search
Close search
Google apps
Main menu