19 datasets found

D
CBIS-DDSM Dataset
datasetninja.com
Updated Sep 14, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rebecca Sawyer Lee; Francisco Gimenez; Assaf Hoogi (2017). CBIS-DDSM Dataset [Dataset]. https://datasetninja.com/cbis-ddsm
Explore at:
Dataset updated
Sep 14, 2017
Dataset provided by
Dataset Ninja
Authors
Rebecca Sawyer Lee; Francisco Gimenez; Assaf Hoogi
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
The CBIS-DDSM: Curated Breast Imaging Subset of Digital Database for Screening Mammography includes decompressed images, data selection and curation by trained mammographers, updated mass segmentation and bounding boxes, and pathologic diagnosis for training data, formatted similarly to modern computer vision data sets. The data set contains 753 calcification cases and 891 mass cases, providing a data set size capable of analyzing decision support systems in mammography.
h
DDSM-mammography-dataset
huggingface.co
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata Medical (2025). DDSM-mammography-dataset [Dataset]. https://huggingface.co/datasets/ud-medical/DDSM-mammography-dataset
Explore at:
Dataset updated
Jul 15, 2025
Authors
Unidata Medical
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Mammogram Photos of breast cancer - 600,000+ Studies

Dataset comprises 100,000+ studies with protocol and 500,000+ studies without protocol, totaling over 600,000 digital mammography exams curated for cancer detection and diagnosis research.It is designed for advancing breast cancer research, providing a comprehensive resource for studying screening mammography, malignant and benign cases, and computer-aided detection systems. - Get the data

Dataset characteristics:… See the full description on the dataset page: https://huggingface.co/datasets/ud-medical/DDSM-mammography-dataset.
T
curated_breast_imaging_ddsm
tensorflow.org
Updated Jun 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). curated_breast_imaging_ddsm [Dataset]. https://www.tensorflow.org/datasets/catalog/curated_breast_imaging_ddsm
Explore at:
Dataset updated
Jun 1, 2024
Description
The CBIS-DDSM (Curated Breast Imaging Subset of DDSM) is an updated and standardized version of the Digital Database for Screening Mammography (DDSM). The DDSM is a database of 2,620 scanned film mammography studies. It contains normal, benign, and malignant cases with verified pathology information.

The default config is made of patches extracted from the original mammograms, following the description from (http://arxiv.org/abs/1708.09427), in order to frame the task to solve in a traditional image classification setting.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('curated_breast_imaging_ddsm', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/curated_breast_imaging_ddsm-patches-3.0.0.png" alt="Visualization" width="500px">
CBIS-DDSM: Mass Case Mammograms PNG Dataset
kaggle.com
Updated Dec 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Duru Alaylı (2023). CBIS-DDSM: Mass Case Mammograms PNG Dataset [Dataset]. https://www.kaggle.com/datasets/durualayl/cbis-ddsm-mass-case-mammograms-png-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 5, 2023
Dataset provided by
Kaggle
Authors
Duru Alaylı
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
CBIS-DDSM is a publicly available dataset provided by The Cancer Imaging Archive. It is an updated and standardized version of the DDSM dataset that is provided by the University of South Florida. It consists of 2,620 mammograms that are benign with no callback, benign or malignant cases with mass and/or calcification abnormalities. The cases also have both the MLO and CC views or only one of them. Each case has a full image, a cropped image, and a region of interest (ROI) mask image. The dataset is also split into train and test sets for both the mass and calcification cases.

Description

For simpler usage CBIS-DDSM: Mass Case Mammograms PNG Dataset is only a subset of the mass cases from the dataset and it only contains the full image views of the mammograms. The images are in .png format and they are grouped in the following directories.

MLO_full: MLO full views of mammograms for training (661 images) - MLO-ben-full-images: MLO benign full views of mammograms for training (291 images) - MLO-ben-wout-full-images: MLO benign with no callback full views of mammograms for training (52 images) - MLO-mal-full-images: MLO malignant full views of mammograms for training (318 images)

MLO_full_test: MLO full views of mammograms for test (192 images) - MLO-ben-full-images: MLO benign full views of mammograms for test (94 images) - MLO-ben-wout-full-images: MLO benign with no callback full views of mammograms for test (19 images) - MLO-mal-full-images: MLO malignant full views of mammograms for test (79 images)

CC_full: CC full views of mammograms for training (576 images) - CC-ben-full-images: CC benign full views of mammograms for training (259 images) - CC-ben-wout-full-images: CC benign with no callback full views of mammograms for training (33 images) - CC-mal-full-images: CC malignant full views of mammograms for training (284 images)

CC_full_test: CC full views of mammograms for test (171 images) - CC-ben-full-images: CC benign full views of mammograms for test (90 images) - CC-ben-wout-full-images: CC benign with no callback full views of mammograms for test (15 images) - CC-mal-full-images: CC malignant full views of mammograms for test (66 images)

Citation for Databases

TCIAl. "Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM)." The Cancer Imaging Archive, (2023). https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=22516629#22516629accaef0469834754b89af9e007760b10.

University of South Florida. DDSM: Digital Database for Screening Mammography." University of South Florida Digital Mammography, http://www.eng.usf.edu/cvprg/mammography/database.html.
CBIS-DDSM: Breast Cancer Image Dataset
kaggle.com
Updated Feb 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Awsaf (2021). CBIS-DDSM: Breast Cancer Image Dataset [Dataset]. https://www.kaggle.com/awsaf49/cbis-ddsm-breast-cancer-image-dataset/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 7, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Awsaf
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
https://www.researchgate.net/publication/338558131/figure/fig3/AS:962412517793792@1606468433025/CBIS-DDSM-example-images-used-for-detection.jpg" alt="">

Descripton

This dataset is jpeg format of the original dataset(163GB). The resolution was kept to the original dataset.

Number of Studies: 6775

Number of Series: 6775

Number of Participants: 1,566(NB)

Number of Images: 10239

Modalities: MG

Image Size (GB): 6(.jpg)

NB: The image data for this collection is structured such that each participant has multiple patient IDs. For example, pat_id 00038 has 10 separate patient IDs which provide information about the scans within the IDs (e.g. Calc-Test_P_00038_LEFT_CC, Calc-Test_P_00038_RIGHT_CC_1) This makes it appear as though there are 6,671 participants according to the DICOM metadata, but there are only 1,566 actual participants in the cohort.

Summary

This CBIS-DDSM (Curated Breast Imaging Subset of DDSM) is an updated and standardized version of the Digital Database for Screening Mammography (DDSM). The DDSM is a database of 2,620 scanned film mammography studies. It contains normal, benign, and malignant cases with verified pathology information. The scale of the database along with ground truth validation makes the DDSM a useful tool in the development and testing of decision support systems. The CBIS-DDSM collection includes a subset of the DDDSM data selected and curated by a trained mammographer. The images have been decompressed and converted to DICOM format. Updated ROI segmentation and bounding boxes, and pathologic diagnosis for training data are also included. A manuscript describing how to use this dataset in detail is available at https://www.nature.com/articles/sdata2017177.

Published research results from work in developing decision support systems in mammography are difficult to replicate due to the lack of a standard evaluation data set; most computer-aided diagnosis (CADx) and detection (CADe) algorithms for breast cancer in mammography are evaluated on private data sets or on unspecified subsets of public databases. Few well-curated public datasets have been provided for the mammography community. These include the DDSM, the Mammographic Imaging Analysis Society (MIAS) database, and the Image Retrieval in Medical Applications (IRMA) project. Although these public data sets are useful, they are limited in terms of data set size and accessibility.

For example, most researchers using the DDSM do not leverage all its images for a variety of historical reasons. When the database was released in 1997, computational resources to process hundreds or thousands of images were not widely available. Additionally, the DDSM images are saved in non-standard compression files that require the use of decompression code that has not been updated or maintained for modern computers. Finally, the ROI annotations for the abnormalities in the DDSM were provided to indicate a general position of lesions, but not a precise segmentation for them. Therefore, many researchers must implement segmentation algorithms for accurate feature extraction. This causes an inability to directly compare the performance of methods or to replicate prior results. The CBIS-DDSM collection addresses that challenge by publicly releasing a curated and standardized version of the DDSM for evaluation of future CADx and CADe systems (sometimes referred to generally as CAD) research in mammography.

Please note that the image data for this collection is structured such that each participant has multiple patient IDs. For example, participant 00038 has 10 separate patient IDs which provide information about the scans within the IDs (e.g. Calc-Test_P_00038_LEFT_CC, Calc-Test_P_00038_RIGHT_CC_1). This makes it appear as though there are 6,671 patients according to the DICOM metadata, but there are only 1,566 actual participants in the cohort.

For scientific inquiries about this dataset, please contact Dr. Daniel Rubin, Department of Biomedical Data Science, Radiology, and Medicine, Stanford University School of Medicine (dlrubin@stanford.edu).

Citations & Data Usage Policy

Users of this data must abide by the TCIA Data Usage Policy and the Creative Commons Attribution 3.0 Unported License under which it has been published. Attribution should include references to the following citations:

CBIS-DDSM Citation

Rebecca Sawyer Lee, Francisco Gimenez, Assaf Hoogi , Daniel Rubin (2016). **Curated Breast Imaging Subset of DDSM [Dataset]**. The Cancer Imaging Archive. **DOI:** https://doi.org/10.7937/K9/TCIA.2016.7O02S9CY

Publication Citation

Rebecca Sawyer Lee, Francisco Gimenez, Assaf Hoogi, Kanae Kawai Miyake, Mia Gorovoy & Danie...
DDSM Mammography
kaggle.com
zip
Updated Jul 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric A. Scuccimarra (2018). DDSM Mammography [Dataset]. https://www.kaggle.com/skooch/ddsm-mammography
Explore at:
zip(3093452937 bytes)Available download formats
Dataset updated
Jul 3, 2018
Authors
Eric A. Scuccimarra
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Summary

This dataset consists of images from the DDSM [1] and CBIS-DDSM [3] datasets. The images have been pre-processed and converted to 299x299 images by extracting the ROIs. The data is stored as tfrecords files for TensorFlow.

The dataset contains 55,890 training examples, of which 14% are positive and the remaining 86% negative, divided into 5 tfrecords files.

Note - The data has been separated into training and test as per the division in the CBIS-DDSM dataset. The test files have been divided equally into test and validation data. However the split between test and validation data was done incorrectly, resulted in the test numpy files containing only masses and the validation files containing only calcifications. These files should be combined in order to have balanced and complete test data.

Pre-processing

The dataset consists of negative images from the DDSM dataset and positive images from the CBIS-DDSM dataset. The data was pre-processed to convert it into 299x299 images.

The negative (DDSM) images were tiled into 598x598 tiles, which were then resized to 299x299.

The positive (CBIS-DDSM) images had their ROIs extracted using the masks with a small amount of padding to provide context. Each ROI was then randomly cropped three times into 598x598 images, with random flips and rotations, and then the images were resized down to 299x299.

The images are labeled with two labels:

label_normal - 0 for negative and 1 for positive

label - full multi-class labels, 0 is negative, 1 is benign calcification, 2 is benign mass, 3 is malignant calcification, 4 is malignant mass

The following Python code will decode the training examples:

features = tf.parse_single_example( serialized_example, features={ 'label': tf.FixedLenFeature([], tf.int64), 'label_normal': tf.FixedLenFeature([], tf.int64), 'image': tf.FixedLenFeature([], tf.string) }) # extract the data label = features['label_normal'] image = tf.decode_raw(features['image'], tf.uint8) # reshape and scale the image image = tf.reshape(image, [299, 299, 1])

The training examples do include images which contain content other than breast tissue, such as black background and occasionally overlay text.

Inspiration

Previous work [5] has already dealt with classifying pre-identified lesions, this dataset was created with the intention of classifying raw scans as positive or negative by detecting abnormalities. The ability to automatically detect lesions could save many lives.

Acknowledgements

[1] The Digital Database for Screening Mammography, Michael Heath, Kevin Bowyer, Daniel Kopans, Richard Moore and W. Philip Kegelmeyer, in Proceedings of the Fifth International Workshop on Digital Mammography, M.J. Yaffe, ed., 212-218, Medical Physics Publishing, 2001. ISBN 1-930524-00-5.

[2] Current status of the Digital Database for Screening Mammography, Michael Heath, Kevin Bowyer, Daniel Kopans, W. Philip Kegelmeyer, Richard Moore, Kyong Chang, and S. Munish Kumaran, in Digital Mammography, 457-460, Kluwer Academic Publishers, 1998; Proceedings of the Fourth International Workshop on Digital Mammography.

[3] Rebecca Sawyer Lee, Francisco Gimenez, Assaf Hoogi , Daniel Rubin (2016). Curated Breast Imaging Subset of DDSM. The Cancer Imaging Archive.

[4] Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057.

[5] D. Levy, A. Jain, Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks, arXiv:1612.00542v1, 2016
t
Digital Database for Screening Mammography (DDSM) dataset - Dataset - LDM
service.tib.eu
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Digital Database for Screening Mammography (DDSM) dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/digital-database-for-screening-mammography--ddsm--dataset
Explore at:
Dataset updated
Dec 2, 2024
Description
The DDSM dataset is a public mammogram dataset used for training and testing the proposed method.
f
Data distribution of CBIS-DDSM dataset.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmad, Awais; Ahmad, Jawad; Ali, Zulfiqar; Jaffar, Arfan; Akram, Sheeraz; Rehman, Shafiq Ur; Bhatti, Sohail Masood (2024). Data distribution of CBIS-DDSM dataset. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001306741
Explore at:
Dataset updated
Jul 11, 2024
Authors
Ahmad, Awais; Ahmad, Jawad; Ali, Zulfiqar; Jaffar, Arfan; Akram, Sheeraz; Rehman, Shafiq Ur; Bhatti, Sohail Masood
Description
Recent advancements in AI, driven by big data technologies, have reshaped various industries, with a strong focus on data-driven approaches. This has resulted in remarkable progress in fields like computer vision, e-commerce, cybersecurity, and healthcare, primarily fueled by the integration of machine learning and deep learning models. Notably, the intersection of oncology and computer science has given rise to Computer-Aided Diagnosis (CAD) systems, offering vital tools to aid medical professionals in tumor detection, classification, recurrence tracking, and prognosis prediction. Breast cancer, a significant global health concern, is particularly prevalent in Asia due to diverse factors like lifestyle, genetics, environmental exposures, and healthcare accessibility. Early detection through mammography screening is critical, but the accuracy of mammograms can vary due to factors like breast composition and tumor characteristics, leading to potential misdiagnoses. To address this, an innovative CAD system leveraging deep learning and computer vision techniques was introduced. This system enhances breast cancer diagnosis by independently identifying and categorizing breast lesions, segmenting mass lesions, and classifying them based on pathology. Thorough validation using the Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) demonstrated the CAD system’s exceptional performance, with a 99% success rate in detecting and classifying breast masses. While the accuracy of detection is 98.5%, when segmenting breast masses into separate groups for examination, the method’s performance was approximately 95.39%. Upon completing all the analysis, the system’s classification phase yielded an overall accuracy of 99.16% for classification. The potential for this integrated framework to outperform current deep learning techniques is proposed, despite potential challenges related to the high number of trainable parameters. Ultimately, this recommended framework offers valuable support to researchers and physicians in breast cancer diagnosis by harnessing cutting-edge AI and image processing technologies, extending recent advances in deep learning to the medical domain.
DDSM-mammography-positive-case
kaggle.com
zip
Updated Jan 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laurent Pourchot (2023). DDSM-mammography-positive-case [Dataset]. https://www.kaggle.com/datasets/pourchot/ddsm-mammography-positive-case/data
Explore at:
zip(19549837664 bytes)Available download formats
Dataset updated
Jan 13, 2023
Authors
Laurent Pourchot
Description
Dataset

This dataset was created by Laurent Pourchot

Contents
m
Breast Mammography Image Dataset with Masses
data.mendeley.com
Updated Jan 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Faramonna (2023). Breast Mammography Image Dataset with Masses [Dataset]. http://doi.org/10.17632/8fztxggjnc.1
Explore at:
Unique identifier
https://doi.org/10.17632/8fztxggjnc.1
Dataset updated
Jan 27, 2023
Authors
David Faramonna
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The mammography dataset includes both benign and malignant tumors. In order to create the pictures for this dataset, 106 masses from the INbreast dataset, 53 masses from the MIAS dataset, and 2188 masses from the DDSM dataset were initially extracted. Then, we preprocess our photos using contrast-limited adaptive histogram equalization and data augmentation. Inbreast dataset has 7632 photos, MIAS dataset has 3816 images, and DDSM dataset includes 13128 images after data augmentation. Additionally, we combine DDSM, MIAS, and INbreast. The size of each image was changed to 227*227 pixels.
t
Essam Rashed, M. Samir Abou El Seoud (2024). Dataset: Curated Breast Imaging...
service.tib.eu
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Essam Rashed, M. Samir Abou El Seoud (2024). Dataset: Curated Breast Imaging Subset of Digital Database of Screening Mammography (CBIS-DDSM). https://doi.org/10.57702/sjkug8pe [Dataset]. https://service.tib.eu/ldmservice/dataset/curated-breast-imaging-subset-of-digital-database-of-screening-mammography--cbis-ddsm-
Explore at:
Dataset updated
Dec 2, 2024
Description
The Curated Breast Imaging Subset of Digital Database of Screening Mammography (CBIS-DDSM) dataset is used for both training and testing of the developed deep learning approach.
DDSM CBIS Patch
kaggle.com
zip
Updated Jan 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luka Ihn (2021). DDSM CBIS Patch [Dataset]. https://www.kaggle.com/datasets/llkihn/ddsm-cbis-patch
Explore at:
zip(1276311012 bytes)Available download formats
Dataset updated
Jan 2, 2021
Authors
Luka Ihn
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Context

Smaller version of the CBIS-DDSM with patient information. DDSM is a database of scanned film mammography studies. Additional information is also available for each anomaly. Regions around tumors were saved as separate images.

To load all data using python install numpy, pandas and tables and call load_all_files function from simple_load_data.py

Inspiration

The goal of this data set was to fix flaws of https://www.kaggle.com/skooch/ddsm-mammography .

This version of data set allows to divide train set into several folds in such a way that a patient won't be present in more than one fold. Information about patient id is stored in metadata file. divide_into_k_folds() function from simple_load_data.py implements such division into folds.

Also this version contains only images from CBIS-DDSM as it seems that images from DDSM can be discriminated from those of CBIS-DDSM by artifacts.

Images now have shape 900x900 Instead of 250x250 so that users can downscale images by their needs.

File description

Code files: | file name | description | | --- | --- | | simple_load_data.py | This file contains the simplest implementation of loading the data | | pytorch_lightning_data_module.py | This file contains implementation of pytorch_lightning's LightningDataModule. | | requirements.txt | This file contains versions of packages on which this code was tested. |

Data files: | file name | description | | --- | --- | | train_data.npy and test_data.npy | These files contain images of tumors stored as numpy memmap objects. | | train_labels.npy and test_labels.npy | These files contain true values for each image of tumor. |

Possible values of labels are: - 0 for 'BENIGN MASS' - 1 for 'BENIGN CALCIFICATION' - 2 for 'MALIGNANT MASS' - 3 for 'MALIGNANT CALCIFICATION'

Metadata: train_meta.h5 and test_meta.h5 These files contain additional information for each tumor. Such as type of tumour, its malignancy, patiend_ID and others.

Source dataset:

https://wiki.cancerimagingarchive.net/display/Public/CBIS-DDSM

Rebecca Sawyer Lee, Francisco Gimenez, Assaf Hoogi , Daniel Rubin (2016). Curated Breast Imaging Subset of DDSM [Dataset]. The Cancer Imaging Archive. DOI: https://doi.org/10.7937/K9/TCIA.2016.7O02S9CY

Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. DOI: https://doi.org/10.1007/s10278-013-9622-7
Mammography Dataset from INbreast, MIAS, and DDSM
kaggle.com
zip
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emilio A. Venegas Hernández (2024). Mammography Dataset from INbreast, MIAS, and DDSM [Dataset]. https://www.kaggle.com/datasets/emiliovenegas1/mammography-dataset-from-inbreast-mias-and-ddsm
Explore at:
zip(621357189 bytes)Available download formats
Dataset updated
May 31, 2024
Authors
Emilio A. Venegas Hernández
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Malign and benign mammograms

Malignant and benign mammograms from INbreast, MIAS, and DDSM datasets, were downloaded directly from Lin, Ting-Yu, and Huang, Mei-Ling. Dataset of Breast mammography images with Masses https://doi.org/10.17632/ywsbh3ndr8.2

Normal mammograms

Normal mammograms were sourced from the DDSM webpage: http://www.eng.usf.edu/cvprg/Mammography/Database.html. However, the FTP service is currently not operational. Consequently, using BeautifulSoup (bs4) and PIL, thumbnails of all the normal datasets were extracted, resulting in a total of 2026 files. These files were then augmented and enhanced using CLAHE (Contrast Limited Adaptive Histogram Equalization).

Consult Jupyter Notebook for more information on the methods used for extraction and enhancing from webpage of DDSM
The Complete Mini-DDSM
kaggle.com
zip
Updated Mar 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abbas Cheddad (2021). The Complete Mini-DDSM [Dataset]. https://www.kaggle.com/cheddad/miniddsm2
Explore at:
zip(52715632967 bytes)Available download formats
Dataset updated
Mar 24, 2021
Authors
Abbas Cheddad
License
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
Description
[2021-03-23] Updated: Enjoy!

Consent

By downloading this complete Mini-DDSM Dataset, you agree to the following:

This page on Kaggle remains the main source of this public dataset (i.e., no redistribution of this dataset)

In any resultant publications of research that uses the paper / dataset, due credits (to recognize the efforts of my team) must be provided to: [Ref paper/Mini-DDSM] C.D. Lekamlage, F. Afzal, E. Westerberg and A. Cheddad, “Mini-DDSM: Mammography-based Automatic Age Estimation,” in the 3rd International Conference on Digital Medicine and Image Processing (DMIP 2020), ACM, Kyoto, Japan, November 06-09, 2020, pp: 1-6. https://doi.org/10.1145/3441369.3441370 And [Ref DDSM] Michael Heath, Kevin Bowyer, Daniel Kopans, Richard Moore and W. Philip Kegelmeyer, in Proceedings of the Fifth International Workshop on Digital Mammography, M.J. Yaffe, ed., 212-218, Medical Physics Publishing, 2001. ISBN 1-930524-00-5.

Context & Dataset Characteristics

You can read the [Paper][1] that describes the initial attempt to collect this free dataset and the experiments we conducted. It required a tremendous time, coding and machine processing power to get it in shape to make it as much as possible accessible for the research community. Below, are some of the merits of this new Mini-DDSM version:

There is a scarcity in the availability of large public and fully annotated healthcare datasets

The intention here is to make an easy access to the DDSM (half resolution though)

The dataset comes along with the age/density attributes, patient folders (condition: benign, cancer, healthy), original filename identification, and suspicious/tumor contour binary mask.

The lesion binary mask is constructed based on the original freeman chain-coding, so this dataset prevents you that inconvenience.

The dataset can act as a validation platform for machine learning developed/under development algorithms **(see an example (imputation of missing data using DL) of such interesting ML topics in the "Tasks" tab above)- Tasks tab has been removed by Kaggle- **

There are still open research questions that this dataset along with deep learning may need to address

No complication of extracting/loading images from tfrecords. You want images, you get images! Thus, whether you are using Python, MATLAB, JAVA, C++, you have the images stored as images.

Free of charge and open access, no lengthy protocols and no forms to fill/sign

This dataset comes with an excel sheet that gives you a direct access to all image attributes and metadata (see Fig. 1) ==> Get it here**

Due to several requests from people having machine/internet bandwidth limitations that do not allow them to download the 47 GB dataset, Folder: -MINI-DDSM-Complete-PNG-16-, we also provide this dataset in JPEG format (~4 GB), Folder: -MINI-DDSM-Complete-JPEG-8-.**

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1822946%2F3905483ba6e03b7142a9121a03824558%2FRaws.png?generation=1609421580586145&alt=media" alt="">

Figure 1. The first few rows of the accompanying excel sheet.

Content

This is the light-weight version of the popular DDSM (Digital Database for Screening Mammography) [Ref] dataset which currently is obsolete. To answer the nagging question why Mini-DDSM, it is important to know that the DDSM database has a website maintained at the University of South Florida for purposes of keeping it accessible on the web. However, image files are compressed with lossless JPEG (i.e., “.LJPEG”) encoding that are generated using a broken software (or at least an outdated tool as described on the DDSM website). CBIS-DDSM provides an alternative host of the original DDSM, but unfortunately, images are stripped from their original identification filename and from the age attribute. Figure 2 illustrates the age distribution in this complete Mini-DDSM and Fig.3 exhibits the density (amount of Fibroglandular tissue) distribution using Bi-Rads scoring.

https://raw.githubusercontent.com/ARDISDataset/MiniDDSM/master/AgeDistributionW.png" alt="Age Distr"> Figure 2. Age distribution in this complete version of the Mini-DDSM dataset.

https://raw.githubusercontent.com/ARDISDataset/MiniDDSM/master/BIRADS.png" alt="Density"> Figure 3. Density distribution in this complete version of the Mini-DDSM dataset.

Inspiration

Please give us feedback/suggestions to improve the dataset to: ![](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1822946%2F756766cadde4657770f...
Performance of fused model approach.
plos.figshare.com
xls
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jawad Ahmad; Sheeraz Akram; Arfan Jaffar; Zulfiqar Ali; Sohail Masood Bhatti; Awais Ahmad; Shafiq Ur Rehman (2024). Performance of fused model approach. [Dataset]. http://doi.org/10.1371/journal.pone.0304757.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0304757.t003
Dataset updated
Jul 11, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Jawad Ahmad; Sheeraz Akram; Arfan Jaffar; Zulfiqar Ali; Sohail Masood Bhatti; Awais Ahmad; Shafiq Ur Rehman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recent advancements in AI, driven by big data technologies, have reshaped various industries, with a strong focus on data-driven approaches. This has resulted in remarkable progress in fields like computer vision, e-commerce, cybersecurity, and healthcare, primarily fueled by the integration of machine learning and deep learning models. Notably, the intersection of oncology and computer science has given rise to Computer-Aided Diagnosis (CAD) systems, offering vital tools to aid medical professionals in tumor detection, classification, recurrence tracking, and prognosis prediction. Breast cancer, a significant global health concern, is particularly prevalent in Asia due to diverse factors like lifestyle, genetics, environmental exposures, and healthcare accessibility. Early detection through mammography screening is critical, but the accuracy of mammograms can vary due to factors like breast composition and tumor characteristics, leading to potential misdiagnoses. To address this, an innovative CAD system leveraging deep learning and computer vision techniques was introduced. This system enhances breast cancer diagnosis by independently identifying and categorizing breast lesions, segmenting mass lesions, and classifying them based on pathology. Thorough validation using the Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) demonstrated the CAD system’s exceptional performance, with a 99% success rate in detecting and classifying breast masses. While the accuracy of detection is 98.5%, when segmenting breast masses into separate groups for examination, the method’s performance was approximately 95.39%. Upon completing all the analysis, the system’s classification phase yielded an overall accuracy of 99.16% for classification. The potential for this integrated framework to outperform current deep learning techniques is proposed, despite potential challenges related to the high number of trainable parameters. Ultimately, this recommended framework offers valuable support to researchers and physicians in breast cancer diagnosis by harnessing cutting-edge AI and image processing technologies, extending recent advances in deep learning to the medical domain.
MINI-DDSM-ROI-Mammography
kaggle.com
zip
Updated May 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qx Nam (2023). MINI-DDSM-ROI-Mammography [Dataset]. https://www.kaggle.com/datasets/quachnam/mini-ddsm-roi-mammography
Explore at:
zip(7134143129 bytes)Available download formats
Dataset updated
May 7, 2023
Authors
Qx Nam
Description
Dataset used is https://www.kaggle.com/datasets/cheddad/miniddsm2 to extract ROI.

Model used is https://www.kaggle.com/datasets/quachnam/checkpoint-yolov8l
i
Raw and Uncalibrated Data of Runners
india-data.org
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IIIT Hyderabad, IHUB (2025). Raw and Uncalibrated Data of Runners [Dataset]. https://india-data.org/googleSEO-list-dataset-search
Explore at:
.csv files which have imu sensor data.Available download formats
Dataset updated
Jun 12, 2025
Dataset authored and provided by
IIIT Hyderabad, IHUB
License
https://india-data.org/terms-conditionshttps://india-data.org/terms-conditions
Area covered
India
Description
Recreational Runners aged 26 to 50 are instrumented with IMU sensors. They started at a point A on road. They returned to the same point after one round. We have used Movella Dot Sensors. There are seven sensors on both legs and the waist.

The data can be used to extract Spatio Temporal Parameters and with good Inverse Kinematic Algorithms, one can extract Joint Range of motion too.
Evaluating the identification of mass lesions.
plos.figshare.com
xls
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jawad Ahmad; Sheeraz Akram; Arfan Jaffar; Zulfiqar Ali; Sohail Masood Bhatti; Awais Ahmad; Shafiq Ur Rehman (2024). Evaluating the identification of mass lesions. [Dataset]. http://doi.org/10.1371/journal.pone.0304757.t010
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0304757.t010
Dataset updated
Jul 11, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Jawad Ahmad; Sheeraz Akram; Arfan Jaffar; Zulfiqar Ali; Sohail Masood Bhatti; Awais Ahmad; Shafiq Ur Rehman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recent advancements in AI, driven by big data technologies, have reshaped various industries, with a strong focus on data-driven approaches. This has resulted in remarkable progress in fields like computer vision, e-commerce, cybersecurity, and healthcare, primarily fueled by the integration of machine learning and deep learning models. Notably, the intersection of oncology and computer science has given rise to Computer-Aided Diagnosis (CAD) systems, offering vital tools to aid medical professionals in tumor detection, classification, recurrence tracking, and prognosis prediction. Breast cancer, a significant global health concern, is particularly prevalent in Asia due to diverse factors like lifestyle, genetics, environmental exposures, and healthcare accessibility. Early detection through mammography screening is critical, but the accuracy of mammograms can vary due to factors like breast composition and tumor characteristics, leading to potential misdiagnoses. To address this, an innovative CAD system leveraging deep learning and computer vision techniques was introduced. This system enhances breast cancer diagnosis by independently identifying and categorizing breast lesions, segmenting mass lesions, and classifying them based on pathology. Thorough validation using the Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) demonstrated the CAD system’s exceptional performance, with a 99% success rate in detecting and classifying breast masses. While the accuracy of detection is 98.5%, when segmenting breast masses into separate groups for examination, the method’s performance was approximately 95.39%. Upon completing all the analysis, the system’s classification phase yielded an overall accuracy of 99.16% for classification. The potential for this integrated framework to outperform current deep learning techniques is proposed, despite potential challenges related to the high number of trainable parameters. Ultimately, this recommended framework offers valuable support to researchers and physicians in breast cancer diagnosis by harnessing cutting-edge AI and image processing technologies, extending recent advances in deep learning to the medical domain.
The detection effects of the two modules based on baselines Faster RCNN.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jingzhen He; Jing Wang; Zeyu Han; Baojun Li; Mei Lv; Yunfeng Shi (2023). The detection effects of the two modules based on baselines Faster RCNN. [Dataset]. http://doi.org/10.1371/journal.pone.0275194.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0275194.t003
Dataset updated
Jun 21, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Jingzhen He; Jing Wang; Zeyu Han; Baojun Li; Mei Lv; Yunfeng Shi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The detection effects of the two modules based on baselines Faster RCNN.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Rebecca Sawyer Lee; Francisco Gimenez; Assaf Hoogi (2017). CBIS-DDSM Dataset [Dataset]. https://datasetninja.com/cbis-ddsm

CBIS-DDSM Dataset

Explore at:

Dataset updated

Sep 14, 2017

Dataset provided by

Dataset Ninja

Authors

Rebecca Sawyer Lee; Francisco Gimenez; Assaf Hoogi

License

Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically

Description

The CBIS-DDSM: Curated Breast Imaging Subset of Digital Database for Screening Mammography includes decompressed images, data selection and curation by trained mammographers, updated mass segmentation and bounding boxes, and pathologic diagnosis for training data, formatted similarly to modern computer vision data sets. The data set contains 753 calcification cases and 891 mass cases, providing a data set size capable of analyzing decision support systems in mammography.

Clear search

Close search

Google apps

Main menu

CBIS-DDSM Dataset

DDSM-mammography-dataset

curated_breast_imaging_ddsm

CBIS-DDSM: Mass Case Mammograms PNG Dataset

Description

Citation for Databases

CBIS-DDSM: Breast Cancer Image Dataset

Descripton

Summary

Citations & Data Usage Policy

CBIS-DDSM Citation

Publication Citation

DDSM Mammography

Summary

Pre-processing

Inspiration

Acknowledgements

Digital Database for Screening Mammography (DDSM) dataset - Dataset - LDM

Data distribution of CBIS-DDSM dataset.

DDSM-mammography-positive-case

Dataset

Contents

Breast Mammography Image Dataset with Masses

Essam Rashed, M. Samir Abou El Seoud (2024). Dataset: Curated Breast Imaging...

DDSM CBIS Patch

Context

Inspiration

File description

Source dataset:

Mammography Dataset from INbreast, MIAS, and DDSM

Malign and benign mammograms

Normal mammograms

The Complete Mini-DDSM

[2021-03-23] Updated: Enjoy!

Consent

Context & Dataset Characteristics

Content

Inspiration

Performance of fused model approach.

MINI-DDSM-ROI-Mammography

Raw and Uncalibrated Data of Runners

Evaluating the identification of mass lesions.

The detection effects of the two modules based on baselines Faster RCNN.

CBIS-DDSM Dataset