42 datasets found
  1. c

    Curated Breast Imaging Subset of Digital Database for Screening Mammography

    • cancerimagingarchive.net
    csv, dicom, n/a
    Updated Sep 14, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2017). Curated Breast Imaging Subset of Digital Database for Screening Mammography [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.7O02S9CY
    Explore at:
    csv, dicom, n/aAvailable download formats
    Dataset updated
    Sep 14, 2017
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Sep 14, 2017
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    This CBIS-DDSM (Curated Breast Imaging Subset of DDSM) is an updated and standardized version of the Digital Database for Screening Mammography (DDSM). The DDSM is a database of 2,620 scanned film mammography studies. It contains normal, benign, and malignant cases with verified pathology information. The scale of the database along with ground truth validation makes the DDSM a useful tool in the development and testing of decision support systems. The CBIS-DDSM collection includes a subset of the DDSM data selected and curated by a trained mammographer. The images have been decompressed and converted to DICOM format. Updated ROI segmentation and bounding boxes, and pathologic diagnosis for training data are also included. A manuscript describing how to use this dataset in detail is available at https://www.nature.com/articles/sdata2017177.

    Published research results from work in developing decision support systems in mammography are difficult to replicate due to the lack of a standard evaluation data set; most computer-aided diagnosis (CADx) and detection (CADe) algorithms for breast cancer in mammography are evaluated on private data sets or on unspecified subsets of public databases. Few well-curated public datasets have been provided for the mammography community. These include the DDSM, the Mammographic Imaging Analysis Society (MIAS) database, and the Image Retrieval in Medical Applications (IRMA) project. Although these public data sets are useful, they are limited in terms of data set size and accessibility.

    For example, most researchers using the DDSM do not leverage all its images for a variety of historical reasons. When the database was released in 1997, computational resources to process hundreds or thousands of images were not widely available. Additionally, the DDSM images are saved in non-standard compression files that require the use of decompression code that has not been updated or maintained for modern computers. Finally, the ROI annotations for the abnormalities in the DDSM were provided to indicate a general position of lesions, but not a precise segmentation for them. Therefore, many researchers must implement segmentation algorithms for accurate feature extraction. This causes an inability to directly compare the performance of methods or to replicate prior results. The CBIS-DDSM collection addresses that challenge by publicly releasing an curated and standardized version of the DDSM for evaluation of future CADx and CADe systems (sometimes referred to generally as CAD) research in mammography.

    Please note that the image data for this collection is structured such that each participant has multiple patient IDs. For example, participant 00038 has 10 separate patient IDs which provide information about the scans within the IDs (e.g. Calc-Test_P_00038_LEFT_CC, Calc-Test_P_00038_RIGHT_CC_1). This makes it appear as though there are 6,671 patients according to the DICOM metadata, but there are only 1,566 actual participants in the cohort.

    For scientific and other inquiries about this dataset, please contact TCIA's Helpdesk.

  2. Breast cancer dataset

    • zenodo.org
    zip
    Updated Jan 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saiful Izzuan Hussain; Saiful Izzuan Hussain (2025). Breast cancer dataset [Dataset]. http://doi.org/10.5281/zenodo.14769221
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 30, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Saiful Izzuan Hussain; Saiful Izzuan Hussain
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset used in this study consists of 7,632 mammogram images categorized into two classes: 2,520 benign and 5,112 malignant images from Huang and Lin (2020). The mammography images in the INbreast database were originally collected from the Centro Hospitalar de S. Joao (CHSJ) Breast Center in Porto. The database contains data collected from August 2008 to July 2010 and includes 115 cases with a total of 410 images (Moreira et al., 2012). Of these, 90 cases concern women with abnormalities in both breasts. Four different types of breast disease are recorded in the database: Mass, calcification, asymmetries and distortions. The mammograms are recorded from two standard perspectives: Craniocaudal (CC) and Mediolateral Oblique (MLO). In addition, breast density is classified into four categories based on the BI-RADS standards: Fully Fat (Density 1), Scattered Fibrous-Landular Density (Density 2), Heterogeneously Dense (Density 3) and Extremely Dense (Density 4). The images are stored in two resolutions: 3328 x 4084 pixels or 2560 x 3328 pixels, in DICOM format. 106 mammograms depicting breast masses were selected from the INbreast database. To enhance the dataset for model training, data augmentation techniques were applied, increasing the total number of breast mammography images to 7,632.

  3. i

    Mammograms-Breast Cancer Images

    • ieee-dataport.org
    • data.niaid.nih.gov
    • +1more
    Updated Dec 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr G R Sinha (2019). Mammograms-Breast Cancer Images [Dataset]. https://ieee-dataport.org/documents/mammograms-breast-cancer-images
    Explore at:
    Dataset updated
    Dec 27, 2019
    Authors
    Dr G R Sinha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a small dataset as a part of huge dataset of breast cancer images. The images are mammograms.

  4. m

    AISSLab Breast Cancer Dataset: Toward General AI Harmonization with Real...

    • data.mendeley.com
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aymen Al-Hejri (2025). AISSLab Breast Cancer Dataset: Toward General AI Harmonization with Real Mammogram Imaging [Dataset]. http://doi.org/10.17632/zp8yfhvndv.2
    Explore at:
    Dataset updated
    Jul 15, 2025
    Authors
    Aymen Al-Hejri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The AISSLab Breast Cancer Dataset is a collection of mammogram images by experts from the Ma'amon's Diagnostic Centre Mammogram Images for Breast Cancer (MDCMI-BC) in Yemen. It is designed to support advancements in breast cancer research and computer-aided diagnosis (CAD) systems. To facilitate research in breast cancer detection, focusing on harmonizing AI with diverse imaging data. This dataset emphasizes improving diagnostic accuracy and is available for academic and clinical research applications.

    If you are using this dataset for research purpose kindly cite the following papers:

    [1] A. M. Al-Hejri, R. M. Al-Tam, M. Fazea, A. H. Sable, S. Lee, and M. A. Al-antari, “ETECADx: Ensemble Self-Attention Transformer Encoder for Breast Cancer Diagnosis Using Full-Field Digital X-ray Breast Images,” Diagnostics, vol. 13, no. 1, p. 89, Dec. 2022, doi: 10.3390/diagnostics13010089.

    [2] R. M. Al-Tam, A. M. Al-Hejri, S. S. Alshamrani, M. A. Al-antari, and S. M. Narangale, “Multimodal breast cancer hybrid explainable computer-aided diagnosis using medical mammograms and ultrasound Images,” Biocybern. Biomed. Eng., vol. 44, no. 3, pp. 731–758, Jul. 2024, doi: 10.1016/j.bbe.2024.08.007.

  5. f

    Digital mammography Dataset for Breast Cancer Diagnosis Research (DMID)

    • figshare.com
    zip
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parita Oza; Rajiv Oza; Urvi Oza; Paawan Sharma; Samir Patel; Pankaj Kumar; Bakul Gohel (2023). Digital mammography Dataset for Breast Cancer Diagnosis Research (DMID) [Dataset]. http://doi.org/10.6084/m9.figshare.24522883.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    figshare
    Authors
    Parita Oza; Rajiv Oza; Urvi Oza; Paawan Sharma; Samir Patel; Pankaj Kumar; Bakul Gohel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains images of mammograms and can be used for research and education purposes only. The dataset contains DCM images, TIFF images, a Radiology report, a Segmented mask, and pixel level annotation on abnormal regions and csv file that contains other metadata.

  6. r

    CSAW-CC (mammography) – a dataset for AI research to improve screening,...

    • researchdata.se
    • demo.researchdata.se
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fredrik Strand (2025). CSAW-CC (mammography) – a dataset for AI research to improve screening, diagnostics and prognostics of breast cancer [Dataset]. http://doi.org/10.5878/45vm-t798
    Explore at:
    (9211529), (29050)Available download formats
    Dataset updated
    Jan 7, 2025
    Dataset provided by
    Karolinska Institutet
    Authors
    Fredrik Strand
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2008 - 2015
    Area covered
    Stockholm County
    Description

    The dataset contains x-ray images, mammography, from breast cancer screening at the Karolinska University Hospital, Stockholm, Sweden, collected by principal investigator Fredrik Strand at Karolinska Institutet. The purpose for compiling the dataset was to perform AI research to improve screening, diagnostics and prognostics of breast cancer.

    The dataset is based on a selection of cases with and without a breast cancer diagnosis, taken from a more comprehensive source dataset.

    1,103 cases of first-time breast cancer for women in the screening age range (40-74 years) during the included time period (November 2008 to December 2015) were included. Of these, a random selection of 873 cases have been included in the published dataset.

    A random selection of 10,000 healthy controls during the same time period were included. Of these, a random selection of 7,850 cases have been included in the published dataset.

    For each individual all screening mammograms, also repeated over time, were included; as well as the date of screening and the age. In addition, there are pixel-level annotations of the tumors created by a breast radiologist (small lesions such as micro-calcifications have been annotated as an area). Annotations were also drawn in mammograms prior to diagnosis; if these contain a single pixel it means no cancer was seen but the estimated location of the center of the future cancer was shown by a single pixel annotation.

    In addition to images, the dataset also contains cancer data created at the Karolinska University Hospital and extracted through the Regional Cancer Center Stockholm-Gotland. This data contains information about the time of diagnosis and cancer characteristics including tumor size, histology and lymph node metastasis.

    The precision of non-image data was decreased, through categorisation and jittering, to ensure that no single individual can be identified.

    The following types of files are available: - CSV: The following data is included (if applicable): cancer/no cancer (meaning breast cancer during 2008 to 2015), age group at screening, days from image to diagnosis (if any), cancer histology, cancer size group, ipsilateral axillary lymph node metastasis. There is one csv file for the entire dataset, with one row per image. Any information about cancer diagnosis is repeated for all rows for an individual who was diagnosed (i.e., it is also included in rows before diagnosis). For each exam date there is the assessment by radiologist 1, radiologist 2 and the consensus decision. - DICOM: Mammograms. For each screening, four images for the standard views were acuqired: left and right, mediolateral oblique and craniocaudal. There should be four files per examination date. - PNG: Cancer annotations. For each DICOM image containing a visible tumor.

    Access: The dataset is available upon request due to the size of the material. The image files in DICOM and PNG format comprises approximately 2.5 TB. Access to the CSV file including parametric data is possible via download as associated documentation.

  7. CBIS-DDSM One View Mammograms TFRecords

    • kaggle.com
    Updated Dec 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergio Fuentes (2022). CBIS-DDSM One View Mammograms TFRecords [Dataset]. http://doi.org/10.34740/kaggle/dsv/4429171
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 29, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sergio Fuentes
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    CBIS-DDSM images in TFRecords format. Each example has an imagem or view of a mammogram, with the corresponding label of the image. It can be positive or negative for breast cancer . CBIS-DDSM images were taken from this Kaggle dataset: https://www.kaggle.com/datasets/awsaf49/cbis-ddsm-breast-cancer-image-dataset

  8. m

    Breast Mammography Image Dataset with Masses

    • data.mendeley.com
    Updated Jan 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Faramonna (2023). Breast Mammography Image Dataset with Masses [Dataset]. http://doi.org/10.17632/8fztxggjnc.1
    Explore at:
    Dataset updated
    Jan 27, 2023
    Authors
    David Faramonna
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The mammography dataset includes both benign and malignant tumors. In order to create the pictures for this dataset, 106 masses from the INbreast dataset, 53 masses from the MIAS dataset, and 2188 masses from the DDSM dataset were initially extracted. Then, we preprocess our photos using contrast-limited adaptive histogram equalization and data augmentation. Inbreast dataset has 7632 photos, MIAS dataset has 3816 images, and DDSM dataset includes 13128 images after data augmentation. Additionally, we combine DDSM, MIAS, and INbreast. The size of each image was changed to 227*227 pixels.

  9. D

    CBIS-DDSM Dataset

    • datasetninja.com
    Updated Sep 14, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca Sawyer Lee; Francisco Gimenez; Assaf Hoogi (2017). CBIS-DDSM Dataset [Dataset]. https://datasetninja.com/cbis-ddsm
    Explore at:
    Dataset updated
    Sep 14, 2017
    Dataset provided by
    Dataset Ninja
    Authors
    Rebecca Sawyer Lee; Francisco Gimenez; Assaf Hoogi
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    The CBIS-DDSM: Curated Breast Imaging Subset of Digital Database for Screening Mammography includes decompressed images, data selection and curation by trained mammographers, updated mass segmentation and bounding boxes, and pathologic diagnosis for training data, formatted similarly to modern computer vision data sets. The data set contains 753 calcification cases and 891 mass cases, providing a data set size capable of analyzing decision support systems in mammography.

  10. i

    King Abdulaziz University Breast Cancer Mammogram Dataset

    • ieee-dataport.org
    Updated Apr 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Snehal Sapkale (2024). King Abdulaziz University Breast Cancer Mammogram Dataset [Dataset]. https://ieee-dataport.org/documents/king-abdulaziz-university-breast-cancer-mammogram-dataset
    Explore at:
    Dataset updated
    Apr 10, 2024
    Authors
    Snehal Sapkale
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    categorizing

  11. s

    Data from: CSAW-M: An Ordinal Classification Dataset for Benchmarking...

    • figshare.scilifelab.se
    Updated Jan 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moein Sorkhei; Yue Liu; Hossein Azizpour; Edward Azavedo; Karin Dembrower; Dimitra Ntoula; Anthanasios Zouzos; Fredrik Strand; Kevin Smith (2025). CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer [Dataset]. http://doi.org/10.17044/scilifelab.14687271.v2
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    KTH Royal Institute of Technology
    Authors
    Moein Sorkhei; Yue Liu; Hossein Azizpour; Edward Azavedo; Karin Dembrower; Dimitra Ntoula; Anthanasios Zouzos; Fredrik Strand; Kevin Smith
    License

    https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/

    Description

    Welcome to the the CSAW-M dataset homepageThis page includes the files and metadata related to the CSAW-M, a curated dataset of mammograms with expert assessments of the masking of cancer. CSAW-M is collected from over 10,000 individuals and annotated with potential masking. In contrast to the previous approaches which measure breast image density as a proxy, our dataset directly provides annotations of masking potential assessments from five specialists. We trained deep learning models on CSAW-M to estimate the masking level, and showed that the estimated masking is significantly more predictive of screening participants diagnosed with interval and large invasive cancers — without being explicitly trained for these tasks — than its breast density counterparts. Please find the paper corresponding to our work here and the GitHub repo here.CSAW-M Research Use LicensePlease read carefully all the terms and conditions of the CSAW-M Research Use License. How to access the dataset:If you want to get access to the data, please use the "Request access to files" option above (currently, non-Swedish researchers need to have a general figshare account to be able to to request access). We will ask you to agree to our terms of conditions and provide us with some information about what you will use the data for. We will then receive the request and process it, after which you would be able to download all the files.If you use this Work, please cite our paper:@article{sorkhei2021csaw, title={CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer}, author={Sorkhei, Moein and Liu, Yue and Azizpour, Hossein and Azavedo, Edward and Dembrower, Karin and Ntoula, Dimitra and Zouzos, Athanasios and Strand, Fredrik and Smith, Kevin}, year={2021} }

  12. R

    Cancer In Mammogram Dataset

    • universe.roboflow.com
    zip
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thiago (2023). Cancer In Mammogram Dataset [Dataset]. https://universe.roboflow.com/thiago-ffdel/cancer-in-mammogram
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset authored and provided by
    Thiago
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Cancer Bounding Boxes
    Description

    Cancer In Mammogram

    ## Overview
    
    Cancer In Mammogram is a dataset for object detection tasks - it contains Cancer annotations for 2,360 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  13. p

    Data from: VinDr-Mammo: A large-scale benchmark dataset for computer-aided...

    • physionet.org
    Updated Mar 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hieu Huy Pham; Hieu Nguyen Trung; Ha Quy Nguyen (2022). VinDr-Mammo: A large-scale benchmark dataset for computer-aided detection and diagnosis in full-field digital mammography [Dataset]. http://doi.org/10.13026/br2v-7517
    Explore at:
    Dataset updated
    Mar 21, 2022
    Authors
    Hieu Huy Pham; Hieu Nguyen Trung; Ha Quy Nguyen
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    Breast cancer is one of the most prevalent types of cancer and the leading type of cancer death. Mammography is the recommended imaging modality for periodic breast cancer screening. A few datasets have been published to develop computer-aided tools for mammography analysis. However, these datasets either have a limited sample size or consist of screen-film mammography (SFM), which have been replaced by full-field digital mammography (FFDM) in clinical practices. This project introduces a large-scale full-field digital mammography dataset of 5,000 four-view exams, which are double read by experienced mammographers to provide cancer assessment and breast density following the Breast Imaging Report and Data System (BI-RADS). Breast abnormalities that require further examination are also marked by bounding rectangles.

  14. Mammography Dataset from INbreast, MIAS, and DDSM

    • kaggle.com
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emilio A. Venegas Hernández (2024). Mammography Dataset from INbreast, MIAS, and DDSM [Dataset]. https://www.kaggle.com/datasets/emiliovenegas1/mammography-dataset-from-inbreast-mias-and-ddsm/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 31, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Emilio A. Venegas Hernández
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Malign and benign mammograms

    Malignant and benign mammograms from INbreast, MIAS, and DDSM datasets, were downloaded directly from Lin, Ting-Yu, and Huang, Mei-Ling. Dataset of Breast mammography images with Masses https://doi.org/10.17632/ywsbh3ndr8.2

    Normal mammograms

    Normal mammograms were sourced from the DDSM webpage: http://www.eng.usf.edu/cvprg/Mammography/Database.html. However, the FTP service is currently not operational. Consequently, using BeautifulSoup (bs4) and PIL, thumbnails of all the normal datasets were extracted, resulting in a total of 2026 files. These files were then augmented and enhanced using CLAHE (Contrast Limited Adaptive Histogram Equalization).

    Consult Jupyter Notebook for more information on the methods used for extraction and enhancing from webpage of DDSM

  15. m

    Breast Cancer Mammography Dataset with Lymph Node Metastasis Evaluation

    • mostwiedzy.pl
    zip
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maciej Bobowicz (2025). Breast Cancer Mammography Dataset with Lymph Node Metastasis Evaluation [Dataset]. http://doi.org/10.34808/5q1a-sp47
    Explore at:
    zip(34748941673)Available download formats
    Dataset updated
    Jul 31, 2025
    Authors
    Maciej Bobowicz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This dataset includes clinical and mammographic imaging data from 1289 breast cancer patients, collected retrospectively between 2010 and 2021 at two clinical centers. It aims to explore the effectiveness of artificial intelligence in assessing prognostic indicators from mammography for detecting lymph node metastases in breast cancer. The dataset consists of digital mammography images (FFDM), radiological assessments, and detailed clinical data including histopathological outcomes. Inclusion criteria: diagnosis of breast cancer between 2010-2021, age ≥18 years, availability of preoperative mammography images for both breasts with radiological description, and availability of postoperative histopathological results. Failure to meet any of these conditions constitutes an exclusion criterion.

  16. c

    DICOM SR of clinical data and measurement for breast cancer collections to...

    • cancerimagingarchive.net
    dicom, n/a
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive, DICOM SR of clinical data and measurement for breast cancer collections to TCIA [Dataset]. http://doi.org/10.7937/TCIA.2019.wgllssg1
    Explore at:
    dicom, n/aAvailable download formats
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 26, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Data Integration & Imaging Informatics (DI-Cubed) project explored the issue of lack of standardized data capture at the point of data creation, as reflected in the non-image data accompanying various TCIA breast cancer collections. The work addressed the desire for semantic interoperability between various NCI initiatives by aligning on common clinical metadata elements and supporting use cases that connect clinical, imaging, and genomics data. Accordingly, clinical and measurement data was imported into I2B2 and cross-mapped to industry standard concepts for names and values including those derived from BRIDG, CDISC SDTM, DICOM Structured Reporting models and using NCI Thesaurus, SNOMED CT and LOINC controlled terminology. A subset of the standardized data was then exported from I2B2 to CSV and thence converted to DICOM SR according to the the DICOM Breast Imaging Report template [1] , which supports description of patient characteristics, histopathology, receptor status and clinical findings including measurements. The purpose was not to advocate DICOM SR as an appropriate format for interchange or storage of such information for query purposes, but rather to demonstrate that use of standard concepts harmonized across multiple collections could be transformed into an existing standard report representation. The DICOM SR can be stored and used together with the images in repositories such as TCIA and in image viewers that support rendering of DICOM SR content. During the project, various deficiencies in the DICOM Breast Imaging Report template were identified with respect to describing breast MR studies, laterality of findings versus procedures, more recently developed receptor types, and patient characteristics and status. These were addressed via DICOM CP 1838, finalized in Jan 2019, and this subset reflects those changes. DICOM Breast Imaging Report Templates available from: http://dicom.nema.org/medical/dicom/current/output/chtml/part16/sect_BreastImagingReportTemplates.html

  17. N

    Radiologist and Deep Neural Network Predictions for Low-pass Filtered...

    • datacatalog.med.nyu.edu
    Updated Jun 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taro Makino; Stanisław Jastrzębski; Witold Oleszkiewicz; Celin Chacko; Robin Ehrenpreis; Naziya Samreen; Chloe Chhor; Eric Kim; Jiyon Lee; Kristine Pysarenko; Beatriu Reig; Hildegard Toth; Divya Awal; Linda Du; Alice Kim; James Y. Park; Daniel K. Sodickson; Laura Heacock; Linda Moy; Kyunghyun Cho; Krzysztof J. Geras (2022). Radiologist and Deep Neural Network Predictions for Low-pass Filtered Mammograms [Dataset]. https://datacatalog.med.nyu.edu/dataset/10518
    Explore at:
    Dataset updated
    Jun 20, 2022
    Dataset provided by
    NYU Health Sciences Library
    Authors
    Taro Makino; Stanisław Jastrzębski; Witold Oleszkiewicz; Celin Chacko; Robin Ehrenpreis; Naziya Samreen; Chloe Chhor; Eric Kim; Jiyon Lee; Kristine Pysarenko; Beatriu Reig; Hildegard Toth; Divya Awal; Linda Du; Alice Kim; James Y. Park; Daniel K. Sodickson; Laura Heacock; Linda Moy; Kyunghyun Cho; Krzysztof J. Geras
    Area covered
    New York (State) - New York City
    Description

    Investigators manipulated images from the NYU Breast Cancer Screening Dataset to identify differences in the the features of perception used in diagnosis by radiologists versus deep neural networks (DNNs). Two studies were conducted. In the reader study, a set of 720 exams were processed with Gaussian low-pass filtering at varying severity levels and ten radiologists and five DNNs (trained on unperturbed data) provided binary predictions on whether a malignant lesion was present in each breast (yes or no). In the annotation reader study, a subset of 120 exams with malignant images were presented to seven radiologists for their annotation of up to three regions of interest (ROIs) containing suspicious features. Low-pass filtering was applied to the interior and exterior of ROIs and the entire image before the images were presented to DNNs (trained on unperturbed data). The resulting dataset contains radiologist and DNN reader predictions and radiologist annotations from both studies.

  18. f

    Data distribution of CBIS-DDSM dataset.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jawad Ahmad; Sheeraz Akram; Arfan Jaffar; Zulfiqar Ali; Sohail Masood Bhatti; Awais Ahmad; Shafiq Ur Rehman (2024). Data distribution of CBIS-DDSM dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0304757.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Jawad Ahmad; Sheeraz Akram; Arfan Jaffar; Zulfiqar Ali; Sohail Masood Bhatti; Awais Ahmad; Shafiq Ur Rehman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recent advancements in AI, driven by big data technologies, have reshaped various industries, with a strong focus on data-driven approaches. This has resulted in remarkable progress in fields like computer vision, e-commerce, cybersecurity, and healthcare, primarily fueled by the integration of machine learning and deep learning models. Notably, the intersection of oncology and computer science has given rise to Computer-Aided Diagnosis (CAD) systems, offering vital tools to aid medical professionals in tumor detection, classification, recurrence tracking, and prognosis prediction. Breast cancer, a significant global health concern, is particularly prevalent in Asia due to diverse factors like lifestyle, genetics, environmental exposures, and healthcare accessibility. Early detection through mammography screening is critical, but the accuracy of mammograms can vary due to factors like breast composition and tumor characteristics, leading to potential misdiagnoses. To address this, an innovative CAD system leveraging deep learning and computer vision techniques was introduced. This system enhances breast cancer diagnosis by independently identifying and categorizing breast lesions, segmenting mass lesions, and classifying them based on pathology. Thorough validation using the Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) demonstrated the CAD system’s exceptional performance, with a 99% success rate in detecting and classifying breast masses. While the accuracy of detection is 98.5%, when segmenting breast masses into separate groups for examination, the method’s performance was approximately 95.39%. Upon completing all the analysis, the system’s classification phase yielded an overall accuracy of 99.16% for classification. The potential for this integrated framework to outperform current deep learning techniques is proposed, despite potential challenges related to the high number of trainable parameters. Ultimately, this recommended framework offers valuable support to researchers and physicians in breast cancer diagnosis by harnessing cutting-edge AI and image processing technologies, extending recent advances in deep learning to the medical domain.

  19. RSNA Mammography Breast Cancer TFRecord Dataset

    • kaggle.com
    Updated Dec 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    muhammed (2023). RSNA Mammography Breast Cancer TFRecord Dataset [Dataset]. https://www.kaggle.com/datasets/clkmuhammed/rsna-mammography-breast-cancer-tfrecord-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 17, 2023
    Dataset provided by
    Kaggle
    Authors
    muhammed
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Source RSNA Screening Mammography Breast Cancer Detection

    Processing of the huge 314GB+ Dataset (Include 54713 Images) of this competition into TFRecords for fast dataloading during training.

    All images are resized to 768x1280 and saved in 100 TFRecords, making each TFRecord contain roughly 548 images as 8.6GB+ Dataset.

    TFRecords have the benefit of loading large chunks of data containing many samples instead of loading every image and label seperately.

    Dataset Description

    Note: The dataset for this challenge contains radiographic breast images of female subjects. The goal of this competition is to identify cases of breast cancer in mammograms from screening exams. It is important to identify cases of cancer for obvious reasons, but false positives also have downsides for patients. As millions of women get mammograms each year, a useful machine learning tool could help a great many people. This competition uses a hidden test. When your submitted notebook is scored the actual test data (including a full length sample submission) will be made available to your notebook.

    Files

    [train/test]_images/[patient_id]/[image_id].dcm The mammograms, in dicom format. You can expect roughly 8,000 patients in the hidden test set. There are usually but not always 4 images per patient. Note that many of the images use the jpeg 2000 format which may you may need special libraries to load.

    sample_submission.csv A valid sample submission. Only the first few rows are available for download.

    [train/test].csv Metadata for each patient and image. Only the first few rows of the test set are available for download.

    site_id - ID code for the source hospital. patient_id - ID code for the patient. image_id - ID code for the image. laterality - Whether the image is of the left or right breast. view - The orientation of the image. The default for a screening exam is to capture two views per breast. age - The patient's age in years. implant - Whether or not the patient had breast implants. Site 1 only provides breast implant information at the patient level, not at the breast level. density - A rating for how dense the breast tissue is, with A being the least dense and D being the most dense. Extremely dense tissue can make diagnosis more difficult. Only provided for train. machine_id - An ID code for the imaging device. cancer - Whether or not the breast was positive for malignant cancer. The target value. Only provided for train. biopsy - Whether or not a follow-up biopsy was performed on the breast. Only provided for train. invasive - If the breast is positive for cancer, whether or not the cancer proved to be invasive. Only provided for train. BIRADS - 0 if the breast required follow-up, 1 if the breast was rated as negative for cancer, and 2 if the breast was rated as normal. Only provided for train. prediction_id - The ID for the matching submission row. Multiple images will share the same prediction ID. Test only. difficult_negative_case - True if the case was unusually difficult. Only provided for train.

  20. H

    OPTIMAM Mammographic Image Database

    • find.data.gov.scot
    • dtechtive.com
    Updated Jul 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cancer Research Horizons (2023). OPTIMAM Mammographic Image Database [Dataset]. https://find.data.gov.scot/datasets/25791
    Explore at:
    Dataset updated
    Jul 3, 2023
    Dataset provided by
    Cancer Research Horizons
    Area covered
    United Kingdom
    Description

    The OPTIMAM Mammography Image Database is a sharable resource with processed and unprocessed mammography images from United Kingdom breast screening centers, with annotated cancers and clinical details.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Cancer Imaging Archive (2017). Curated Breast Imaging Subset of Digital Database for Screening Mammography [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.7O02S9CY

Curated Breast Imaging Subset of Digital Database for Screening Mammography

CBIS-DDSM

Explore at:
89 scholarly articles cite this dataset (View in Google Scholar)
csv, dicom, n/aAvailable download formats
Dataset updated
Sep 14, 2017
Dataset authored and provided by
The Cancer Imaging Archive
License

https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

Time period covered
Sep 14, 2017
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description

This CBIS-DDSM (Curated Breast Imaging Subset of DDSM) is an updated and standardized version of the Digital Database for Screening Mammography (DDSM). The DDSM is a database of 2,620 scanned film mammography studies. It contains normal, benign, and malignant cases with verified pathology information. The scale of the database along with ground truth validation makes the DDSM a useful tool in the development and testing of decision support systems. The CBIS-DDSM collection includes a subset of the DDSM data selected and curated by a trained mammographer. The images have been decompressed and converted to DICOM format. Updated ROI segmentation and bounding boxes, and pathologic diagnosis for training data are also included. A manuscript describing how to use this dataset in detail is available at https://www.nature.com/articles/sdata2017177.

Published research results from work in developing decision support systems in mammography are difficult to replicate due to the lack of a standard evaluation data set; most computer-aided diagnosis (CADx) and detection (CADe) algorithms for breast cancer in mammography are evaluated on private data sets or on unspecified subsets of public databases. Few well-curated public datasets have been provided for the mammography community. These include the DDSM, the Mammographic Imaging Analysis Society (MIAS) database, and the Image Retrieval in Medical Applications (IRMA) project. Although these public data sets are useful, they are limited in terms of data set size and accessibility.

For example, most researchers using the DDSM do not leverage all its images for a variety of historical reasons. When the database was released in 1997, computational resources to process hundreds or thousands of images were not widely available. Additionally, the DDSM images are saved in non-standard compression files that require the use of decompression code that has not been updated or maintained for modern computers. Finally, the ROI annotations for the abnormalities in the DDSM were provided to indicate a general position of lesions, but not a precise segmentation for them. Therefore, many researchers must implement segmentation algorithms for accurate feature extraction. This causes an inability to directly compare the performance of methods or to replicate prior results. The CBIS-DDSM collection addresses that challenge by publicly releasing an curated and standardized version of the DDSM for evaluation of future CADx and CADe systems (sometimes referred to generally as CAD) research in mammography.

Please note that the image data for this collection is structured such that each participant has multiple patient IDs. For example, participant 00038 has 10 separate patient IDs which provide information about the scans within the IDs (e.g. Calc-Test_P_00038_LEFT_CC, Calc-Test_P_00038_RIGHT_CC_1). This makes it appear as though there are 6,671 patients according to the DICOM metadata, but there are only 1,566 actual participants in the cohort.

For scientific and other inquiries about this dataset, please contact TCIA's Helpdesk.

Search
Clear search
Close search
Google apps
Main menu