62 datasets found
  1. Tissue images

    • kaggle.com
    zip
    Updated Aug 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Waquar Azam (2022). Tissue images [Dataset]. https://www.kaggle.com/datasets/mdwaquarazam/tissue-images
    Explore at:
    zip(65522357 bytes)Available download formats
    Dataset updated
    Aug 26, 2022
    Authors
    Md Waquar Azam
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Tissue- it consists of a group of structurally and functionally similar cells.

    Task Classify all types of tissue images with better accuracy. Inspiration The question to be answered to classify crops in each type.

    Acknowledgements We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research. ALL IMAGES BELONG TO THE ORIGINAL AUTHORS.

  2. Cancer Instance Segmentation and Classification 1

    • kaggle.com
    zip
    Updated Apr 26, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Larxel (2020). Cancer Instance Segmentation and Classification 1 [Dataset]. https://www.kaggle.com/andrewmvd/cancer-inst-segmentation-and-classification
    Explore at:
    zip(776717144 bytes)Available download formats
    Dataset updated
    Apr 26, 2020
    Authors
    Larxel
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Preview Images

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F793761%2F960867d5f8c0004c8f1507e50f31fcb7%2FUntitled.png?generation=1587934121564242&alt=media" alt="Preview">

    About this Dataset

    This dataset, also known as PanNuke, contains semi automatically generated nuclei instance segmentation and classification images with exhaustive nuclei labels across 19 different tissue types. The dataset consists of 481 visual fields, of which 312 are randomly sampled from more than 20K whole slide images at different magnifications, from multiple data sources. In total the dataset contains 205,343 labeled nuclei, each with an instance segmentation mask. Models trained on PanNuke can aid in whole slide image tissue type segmentation, and generalise to new tissues.

    More Medical Imaging Datasets

    How to Cite the Authors

    If you use this dataset in your research, please credit the authors:

    Original Publications

    @article{gamper2020pannuke, title={PanNuke Dataset Extension, Insights and Baselines}, author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Graham, Simon and Jahanifar, Mostafa and Benet, Ksenija and Khurram, Syed Ali and Azam, Ayesha and Hewitt, Katherine and Rajpoot, Nasir}, journal={arXiv preprint arXiv:2003.10778}, year={2020} }

    @inproceedings{gamper2019pannuke, title={Pannuke: An open pan-cancer histology dataset for nuclei instance segmentation and classification}, author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Benet, Ksenija and Khuram, Ali and Rajpoot, Nasir}, booktitle={European Congress on Digital Pathology}, pages={11--19}, year={2019}, organization={Springer} }

    License

    CC BY NC SA 4.0

    Splash Image

    Image by Otis Brawley released as public domain by National Cancer Institute, available here.

  3. PanNuke Dataset (Experimental Data)

    • kaggle.com
    zip
    Updated Jul 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anjir Ahmed Chowdhury (2024). PanNuke Dataset (Experimental Data) [Dataset]. https://www.kaggle.com/datasets/theredlad/pannuke-dataset-experimental-data
    Explore at:
    zip(1225340151 bytes)Available download formats
    Dataset updated
    Jul 26, 2024
    Authors
    Anjir Ahmed Chowdhury
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    Semi automatically generated nuclei instance segmentation and classification dataset with exhaustive nuclei labels across 19 different tissue types. The dataset consists of 481 visual fields, of which 312 are randomly sampled from more than 20K whole slide images at different magnifications, from multiple data sources. In total the dataset contains 205,343 labeled nuclei, each with an instance segmentation mask. Models trained on PanNuke can aid in whole slide image tissue type segmentation, and generalize to new tissues. PanNuke demonstrates one of the first successfully semi-automatically generated datasets.

    citation @inproceedings{gamper2019pannuke, title={PanNuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification}, author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Benet, Ksenija and Khuram, Ali and Rajpoot, Nasir}, booktitle={European Congress on Digital Pathology}, pages={11--19}, year={2019}, organization={Springer} } @article{gamper2020pannuke, title={PanNuke Dataset Extension, Insights and Baselines}, author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Graham, Simon and Jahanifar, Mostafa and Khurram, Syed Ali and Azam, Ayesha and Hewitt, Katherine and Rajpoot, Nasir}, journal={arXiv preprint arXiv:2003.10778}, year={2020} }

  4. Gastric Cancer Histopathology Tissue Image Dataset

    • kaggle.com
    zip
    Updated Apr 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Orvile (2025). Gastric Cancer Histopathology Tissue Image Dataset [Dataset]. https://www.kaggle.com/datasets/orvile/gastric-cancer-histopathology-tissue-image-dataset
    Explore at:
    zip(3253445716 bytes)Available download formats
    Dataset updated
    Apr 9, 2025
    Authors
    Orvile
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🔬 Gastric Cancer Histopathology Tissue Image Dataset (GCHTID) 🩺

    This dataset, the Gastric Cancer Histopathology Tissue Image Dataset (GCHTID), is a valuable resource for researchers working on artificial intelligence in medical imaging, specifically for gastric cancer. It contains a large collection of expertly curated histopathology images. 🖼️

    💡 Abstract

    Gastric cancer (GC) remains a leading cause of cancer-related deaths globally. The diverse clinical outcomes are significantly influenced by the complex tumor microenvironment (TME). Analyzing the TME from histological images is crucial for understanding disease progression and improving treatment strategies. This dataset addresses the scarcity of well-annotated histological images of GC by providing a large collection of nearly 31,000 histological images from 300 whole slide images, meticulously categorized into 8 tissue classes relevant to the TME. The authors also validated the dataset by demonstrating its utility in training two deep learning models. This dataset serves as a valuable resource for advancing AI-driven research in gastric cancer histopathology. 🧠

    https://www.nature.com/articles/s41597-025-04489-9#Sec6

    📄 Description

    This dataset comprises 31,096 non-overlapping images, each with a size of 224x224 pixels. These images were extracted from H&E-stained pathological slides of human gastric cancer obtained from Harbin Medical University Cancer Hospital. 🏥

    The dataset focuses on the tumor microenvironment (TME) and includes images categorized into eight distinct tissue types:

    • ADI: Adipose (fat tissue) 🧈
    • BACK: Background (non-tissue areas) 🌫️
    • DEB: Debris (cellular waste) 🗑️
    • LYM: Lymphocytes (immune cells) 🛡️
    • MUC: Mucus (protective secretion) 🧴
    • MUS: Smooth Muscle (muscle tissue) 💪
    • NORM: Normal Colon Mucosa (healthy tissue for reference) 🌱
    • STR: Cancer-associated Stroma (connective tissue around the tumor) 🕸️
    • TUM: Tumor (cancerous tissue) 🦠

    The tissue categories were initially predicted using annotations from a publicly available colorectal cancer dataset to create tissue heatmaps. Subsequently, experienced pathologists selected 300 whole slide images with high prediction accuracy. Finally, the individual images belonging to the eight categories were extracted from these slides. 🔍

    🔑 Keywords

    Gastric Cancer, Histopathology, Medical Imaging, Pathological Tissue Components, Digital Pathology, Whole Slide Images, Tissue Classification

    📄 Licence

    This dataset is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This means you are free to share and adapt the material for any purpose, even commercially, as long as you give appropriate credit, provide a link to the license, and indicate if changes were made. 📜

    🔗 Citation

    Shenghan Lou, Jianxin Ji, Xuan Zhang, Huiying Li, Yang Jiang, Menglei Hua, Kexin Chen, Xiaohan Zheng, Qi Zhang, Peng Han, Lei Cao, & Liuying Wang. (2024). Gastric Cancer Histopathology Tissue Image Dataset (GCHTID) [Data set]. figshare. https://doi.org/10.6084/m9.figshare.26014469.v

  5. Cancer Instance Segmentation and Classification 2

    • kaggle.com
    zip
    Updated Apr 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Larxel (2020). Cancer Instance Segmentation and Classification 2 [Dataset]. https://www.kaggle.com/andrewmvd/cancer-instance-segmentation-and-classification-2
    Explore at:
    zip(730699303 bytes)Available download formats
    Dataset updated
    Apr 26, 2020
    Authors
    Larxel
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Preview Images

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F793761%2F960867d5f8c0004c8f1507e50f31fcb7%2FUntitled.png?generation=1587934121564242&alt=media" alt="Preview">

    About this Dataset

    This dataset, also known as PanNuke, contains semi automatically generated nuclei instance segmentation and classification images with exhaustive nuclei labels across 19 different tissue types. The dataset consists of 481 visual fields, of which 312 are randomly sampled from more than 20K whole slide images at different magnifications, from multiple data sources. In total the dataset contains 205,343 labeled nuclei, each with an instance segmentation mask. Models trained on PanNuke can aid in whole slide image tissue type segmentation, and generalise to new tissues.

    More Medical Imaging Datasets

    How to Cite the Authors

    If you use this dataset in your research, please credit the authors:

    Original Publications

    @article{gamper2020pannuke, title={PanNuke Dataset Extension, Insights and Baselines}, author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Graham, Simon and Jahanifar, Mostafa and Benet, Ksenija and Khurram, Syed Ali and Azam, Ayesha and Hewitt, Katherine and Rajpoot, Nasir}, journal={arXiv preprint arXiv:2003.10778}, year={2020} }

    @inproceedings{gamper2019pannuke, title={Pannuke: An open pan-cancer histology dataset for nuclei instance segmentation and classification}, author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Benet, Ksenija and Khuram, Ali and Rajpoot, Nasir}, booktitle={European Congress on Digital Pathology}, pages={11--19}, year={2019}, organization={Springer} }

    License

    CC BY NC SA 4.0

    Splash Image

    Image by Otis Brawley released as public domain by National Cancer Institute, available here.

  6. Corona Virus capillary and liver tumor samples

    • kaggle.com
    zip
    Updated Feb 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janis (2020). Corona Virus capillary and liver tumor samples [Dataset]. https://www.kaggle.com/janiscorona/corona-virus-capillary-and-liver-tumor-samples
    Explore at:
    zip(7259423 bytes)Available download formats
    Dataset updated
    Feb 8, 2020
    Authors
    Janis
    Description

    Dataset

    This dataset was created by Janis

    Contents

  7. Bone marrow cell classification colorized

    • kaggle.com
    zip
    Updated May 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Medi Hunter - 4004 (2025). Bone marrow cell classification colorized [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasakbd/bone-marrow-cell-classification-colorized
    Explore at:
    zip(131130504 bytes)Available download formats
    Dataset updated
    May 27, 2025
    Authors
    Medi Hunter - 4004
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Raw Data, Source, More Information ::

    https://www.kaggle.com/datasets/donajui/bone-marrow-cell-classification

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2Fce13396f0ea3ed74b395f94359fe160e%2FEOS_00038_2.jpg?generation=1748346121487102&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2Fd128e31b780c909af746a8d618e703a1%2FEOS_00099_8.jpg?generation=1748346135465046&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2F86438d0f47fa1bf6e3e4da531abd4194%2FEOS_00111_11.jpg?generation=1748346145789689&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2F473b9edf8b06f369ff7aaf827bf77259%2FEOS_00175_2.jpg?generation=1748346157391212&alt=media" alt=""> This dataset contains 7,000 labeled images of white blood cells, divided into 7 distinct classes based on cell type. It is intended for use in machine learning and deep learning tasks such as image classification, biomedical analysis, and hematological diagnostics.

    📂 Classes Included:

    BLA - Blast: Immature white blood cells, often seen in leukemia. EOS - Eosinophil: Involved in allergic responses and parasitic infections. LYT - Lymphocyte: Key cells in the immune system, includes B and T cells. MON - Monocyte: Large cells that differentiate into macrophages and dendritic cells. NGS - Segmented Neutrophil: Mature neutrophils essential for fighting bacterial infections. NIF - Not Identifiable: Cells that couldn't be confidently categorized. PMO - Promyelocyte: Early precursor in the granulocyte development pathway. Each class includes approximately 1,000 images, offering balanced data for training robust classification models.

    Citation Matek, C., Krappe, S., Münzenmayer, C., Haferlach, T., & Marr, C. (2021). An Expert-Annotated Dataset of Bone Marrow Cytology in Hematologic Malignancies [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/TCIA.AXH3-T579

    Matek, C., Krappe, S., Münzenmayer, C., Haferlach, T., and Marr, C. (2021). Highly accurate differentiation of bone marrow cell morphologies using deep neural networks on a large image dataset. https://doi.org/10.1182/blood.2020010568

    License CC BY 4.0

    Colorized Data Processing Techniques for Medical Imaging

    Medical images like CT scans and X-rays are typically grayscale, making subtle anatomical or pathological differences harder to distinguish. The following image processing and enhancement techniques are used to colorize and improve visual interpretation for diagnostics, training, or AI preprocessing.

    🔷 1. 3D_Rendering Renders medical image volumes into three-dimensional visualizations. Though often grayscale, color can be applied to different tissue types or densities to enhance spatial understanding. Useful in surgical planning or tumor visualization.

    🔷 2. 3D_Volume_Rendering An advanced visualization technique that projects 3D image volumes with transparency and color blending, simulating how light passes through tissue. Color helps distinguish internal structures like organs, vessels, or tumors.

    🔷 3. Adaptive Histogram Equalization (AHE) Enhances contrast locally within the image, especially in low-contrast regions. When colorized, different intensities are mapped to distinct hues, improving visibility of fine-grained details like soft tissues or lesions.

    🔷 4. Alpha Blending A layering technique that combines multiple images (e.g., CT + annotation masks) with transparency. Colors represent different modalities or regions of interest, providing composite visual cues for diagnosis.

    🔷 5. Basic Color Map Applies a standard color palette (like Jet or Viridis) to grayscale data. Different intensities are mapped to different colors, enhancing the visual discrimination of anatomical or pathological regions in the image.

    🔷 6. Contrast Stretching Expands the grayscale range to improve brightness and contrast. When combined with color mapping, tissues with similar intensities become visually distinct, aiding in tasks like bone vs. soft tissue separation.

    🔷 7. Edge Detection Extracts and overlays object boundaries (e.g., organ or lesion outlines) on the original scan. Edge maps are typically colorized (e.g., green or red) to highlight anatomical structures or abnormalities clearly.

    🔷 8. Gamma Correction Adjusts image brightness non-linearly. Color can be used to highlight underexposed or overexposed regions, often revealing soft tissue structures otherwise hidden in raw grayscale CT/X-ray images.

    🔷 9. Gaussian Blur Smooths image noise and details. When visualized with color overlays (e.g., before vs. after), it helps assess denoising effectiveness. It is also used in segmentation preprocessing to reduce edge artifacts.

    🔷 10. Heatmap Visualization Encodes intensity or pre...

  8. TCGA-WSI-Dataset

    • kaggle.com
    zip
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahmood Yousaf 2018 (2024). TCGA-WSI-Dataset [Dataset]. https://www.kaggle.com/datasets/mahmoodyousaf2018/tcga-wsi-svs
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Jun 25, 2024
    Authors
    Mahmood Yousaf 2018
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Explore the TCGA Whole Slide Image (WSI) SVS files available on Kaggle, offering detailed visual representations of tissue samples from various cancer types. These high-resolution images provide valuable insights into tumor morphology and tissue architecture, facilitating cancer diagnosis, prognosis, and treatment research. Delve into the rich landscape of cancer biology, leveraging the wealth of information contained within these SVS files to drive innovative advancements in oncology. This is a dataset of WSI images downloaded from the TCGA portal.

  9. Melanoma Histopathology Dataset

    • kaggle.com
    zip
    Updated Jan 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haasha Bin Atif (2025). Melanoma Histopathology Dataset [Dataset]. https://www.kaggle.com/datasets/haashaatif/melanoma-histopathology-dataset
    Explore at:
    zip(14848620062 bytes)Available download formats
    Dataset updated
    Jan 23, 2025
    Authors
    Haasha Bin Atif
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset is designed for development of deep learning models for segmentation of nuclei and tissue in melanoma H&E stained histopathology. Existing nuclei segmentation models that are trained on non-melanoma specific datasets have low performance due to the ability of melanocytes to mimic other cell types, whereas existing melanoma specific models utilize older, sub-optimal techniques. Moreover, these models do not provide tissue annotations necessary for determining the localization of tumor-infiltrating lymphocytes, which may hold value for predictive and prognostic tasks. To address this, we created a melanoma specific dataset with nuclei and tissue annotations.

    Data Downloaded From: https://zenodo.org/records/14213079 Competition Link: https://puma.grand-challenge.org

  10. Breast Tissue Impedance Measurements

    • kaggle.com
    zip
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarık Tuna Taşaltı (2024). Breast Tissue Impedance Measurements [Dataset]. https://www.kaggle.com/datasets/tarktunataalt/breast-tissue-impedance-measurements
    Explore at:
    zip(8480 bytes)Available download formats
    Dataset updated
    Jul 1, 2024
    Authors
    Tarık Tuna Taşaltı
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Breast Tissue Dataset

    This dataset contains electrical impedance measurements of freshly excised tissue samples from the breast. The data is sourced from the UCI Machine Learning Repository.

    Dataset Characteristics

    • Type: Multivariate
    • Subject Area: Health and Medicine
    • Associated Tasks: Classification
    • Feature Type: Real
    • Instances: 106
    • Features: Various impedance measurements

    Dataset Information

    Impedance measurements were taken at the following frequencies: 15.625, 31.25, 62.5, 125, 250, 500, and 1000 KHz. These measurements, when plotted in the (real, -imaginary) plane, constitute the impedance spectrum from which the breast tissue features are computed. The dataset can be used for predicting the classification of either the original 6 classes or of 4 classes by merging the fibro-adenoma, mastopathy, and glandular classes, which are hard to discriminate.

    Features

    • I0: Impedivity (ohm) at zero frequency
    • PA500: Phase angle at 500 KHz
    • HFS: High-frequency slope of phase angle
    • DA: Impedance distance between spectral ends
    • AREA: Area under spectrum
    • A/DA: Area normalized by DA
    • MAX IP: Maximum of the spectrum
    • DR: Distance between I0 and real part of the maximum frequency point
    • P: Length of the spectral curve
    • Class: Tissue type (carcinoma, fibro-adenoma, mastopathy, glandular, connective, adipose)

    Classes

    • car: Carcinoma
    • fad: Fibro-adenoma
    • mas: Mastopathy
    • gla: Glandular
    • con: Connective
    • adi: Adipose

    Usage

    This dataset is suitable for classification tasks. The impedance measurements can be used to predict the type of breast tissue.

    If you use this dataset, please cite it as follows:

    S, JP and Jossinet, J. (2010). Breast Tissue. UCI Machine Learning Repository. https://doi.org/10.24432/C5P31H.

    @misc{misc_breast_tissue_192,
    author = "S, JP and Jossinet, J",
    title = "Breast Tissue",
    year = 2010,
    howpublished = "UCI Machine Learning Repository",
    note = "DOI: https://doi.org/10.24432/C5P31H"
    }

  11. BRACS-WSI-Group-BT-Type-N

    • kaggle.com
    zip
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saadin (2025). BRACS-WSI-Group-BT-Type-N [Dataset]. https://www.kaggle.com/datasets/saadinn/bracs-wsi-group-bt-type-n
    Explore at:
    zip(62612280122 bytes)Available download formats
    Dataset updated
    Nov 28, 2025
    Authors
    Saadin
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains 44 H&E-stained whole-slide images (WSIs) labeled as Normal (N) from the BRACS (BReAst Carcinoma Subtyping) breast carcinoma subtyping dataset. Normal tissue belongs to the Benign (BT) clinical category and represents 8.0% of total WSIs (~66.1 GB).

    Dataset Details: - Class: Normal (N) - Category: Benign (BT) - Number of WSIs: 44 - Percentage of Total: 8.0% - Approximate Size: 66.1 GB

    Acquisition Details: - Scanner: Aperio AT2 - Magnification: 40× - Resolution: 0.25 μm/pixel - Staining: Hematoxylin and Eosin (H&E) - Source: National Cancer Institute - IRCCS 'Fondazione G. Pascale', Naples, Italy - Acquisition Period: 2019-2020

    Data Splits: Includes train, validation, and test splits with patient-level separation to prevent data leakage.

    Clinical Significance: Normal breast tissue samples serve as baseline reference for comparative analysis in breast cancer classification tasks.

    Citation: Brancati, N., et al. (2022). BRACS: A dataset for breast carcinoma subtyping in H&E histology images. Database, 2022, baac093. https://doi.org/10.1093/database/baac093

  12. Blood–Liver–Kidney Tri-Organ Imaging Dataset

    • kaggle.com
    zip
    Updated Dec 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ujjwal Sinha01 (2025). Blood–Liver–Kidney Tri-Organ Imaging Dataset [Dataset]. https://www.kaggle.com/datasets/ujjwalsinha01/blood-liver-and-kidney-imaging-dataset
    Explore at:
    zip(12420868 bytes)Available download formats
    Dataset updated
    Dec 30, 2025
    Authors
    Ujjwal Sinha01
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    A multi-modal medical imaging dataset for object detection and tissue classification. It includes three datasets: blood cell detection (object detection), liver tissue analysis (classification), and kidney tissue analysis (classification), designed for multi-sample correlation analysis. 1. Blood Cell Detection Dataset Dataset Type: Object Detection Total Images: 874 images Total Annotations: 4,888 labeled objects Classes: 3 classes RBC (Red Blood Cells) - Class ID: 1 WBC (White Blood Cells) - Class ID: 2 Platelets - Class ID: 0 Data Split: Training: 70% (765 images) Validation: 10% (73 images) Testing: 20% (36 images) Annotation Format: Normalized bounding box coordinates Format: class_id x_center y_center width height All coordinates normalized to 0-1 range Each image contains at least one annotated object (no null examples) Image Specifications: Format: JPG/JPEG Minimum Resolution: 640x640 pixels Recommended Resolution: 1024x1024+ pixels Supported Formats: JPG, PNG Use Cases: Automated blood cell counting, hematology analysis, diagnostic assistance, medical image analysis research 2. Liver Tissue Analysis Dataset Dataset Type: Tissue Classification Total Images: 1,000 images Classes: 3 classes Hepatocytes (Class ID: 0) - Main functional cells of the liver responsible for protein synthesis, metabolism, and detoxification Sinusoids (Class ID: 1) - Small blood vessels in the liver that allow blood to flow through the liver tissue Portal Triads (Class ID: 2) - Structural units containing portal vein, hepatic artery, and bile duct branches Class Distribution: Hepatocytes: 461 samples Sinusoids: 296 samples Portal Triads: 243 samples Data Split: Training: 70% (700 images) Validation: 15% (150 images) Testing: 15% (150 images) Image Specifications: Minimum Resolution: 512x512 pixels Recommended Resolution: 1024x1024 pixels Supported Formats: JPG, PNG, TIFF Minimum Images per Class: 300 Recommended Images per Class: 1,000 Annotation Format: Class-based classification or segmentation masks Expert-validated annotations Label extraction from directory structure or filename convention Use Cases: Liver histopathology analysis, disease diagnosis, tissue structure identification, computational pathology research 3. Kidney Tissue Analysis Dataset Dataset Type: Tissue Classification Total Images: 1,000 images Classes: 2 classes Glomeruli (Class ID: 0) - Network of capillaries in the kidney that filter blood to form urine Tubules (Class ID: 1) - Small tubes in the kidney that reabsorb water and nutrients from filtered blood Class Distribution: Glomeruli: 503 samples Tubules: 497 samples Data Split: Training: 70% (700 images) Validation: 15% (150 images) Testing: 15% (150 images) Image Specifications: Minimum Resolution: 512x512 pixels Recommended Resolution: 1024x1024 pixels Supported Formats: JPG, PNG, TIFF Minimum Images per Class: 300 Recommended Images per Class: 1,000 Annotation Format: Class-based classification or segmentation masks Expert-validated annotations Label extraction from directory structure or filename convention Use Cases: Kidney histopathology analysis, renal disease diagnosis, nephrology research, tissue structure identification Multi-Sample Correlation Features Correlation Capabilities: Blood cell samples can be paired with liver tissue samples for comprehensive medical analysis Blood cell samples can be paired with kidney tissue samples for comprehensive medical analysis Supports multiple pairing modes: Random, Correlated (matching identifiers), and Paired (sequential pairing) Use Cases: Multi-modal medical analysis, comprehensive patient diagnosis, cross-tissue correlation studies, integrated medical imaging research Dataset Characteristics License: MIT License (Public Domain - free to use for any purpose) Version: 1.0 Creation Date: 2024-01-01 Data Source: Medical institutions partnership, research collaboration Privacy Compliance: Required Expert Validation: All annotations expert-validated Quality Assurance: Annotation quality checks enabled Preprocessing: Normalization: Pixel values scaled to [0, 1] Resizing: Target resolution based on dataset type Augmentation: Horizontal/vertical flips, rotation, brightness/contrast/saturation adjustment Data Organization: Blood Cells: Images and labels folders with normalized bounding box coordinates Liver/Kidney Tissue: Class-based folder structure or flat structure with metadata Total Dataset Statistics: Combined Total Images: 2,874 images Combined Total Annotations: 4,888 object detection annotations + 2,000 classification samples Total Classes Across All Datasets: 8 unique classes (3 blood cell types + 3 liver tissue structures + 2 kidney tissue structures) Application Domains Medical Imaging & Diagnostics Computational Pathology Hematology & Blood Analysis Histopathology Research Automated Medical Image Analysis Multi-Modal Medical AI Systems Deep...

  13. CoNIC Challenge Dataset

    • kaggle.com
    zip
    Updated Jan 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aadam (2022). CoNIC Challenge Dataset [Dataset]. https://www.kaggle.com/datasets/aadimator/conic-challenge-dataset
    Explore at:
    zip(985929496 bytes)Available download formats
    Dataset updated
    Jan 6, 2022
    Authors
    Aadam
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The dataset consists of Haematoxylin and Eosin stained histology images at 20x objective magnification (~0.5 microns/pixel) from 6 different data sources. For each image, an instance segmentation and a classification mask is provided. Within the dataset, each nucleus is assigned to one of the following categories:

    • Epithelial
    • Lymphocyte
    • Plasma
    • Eosinophil
    • Neutrophil
    • Connective tissue For more information on the dataset and the associated categories, we encourage participants to read the original dataset paper.

    Data Format

    Our provided patch-level dataset contains 4,981 non-overlapping images of size 256x256 provided in the following format: - RGB images - Segmentation & classification maps - Nuclei counts The RGB images and segmentation/classification maps are each stored as a single NumPy array. The RGB image array has dimensions 4981x256x256x3, whereas the segmentation & classification map array has dimensions 4981x256x256x2. Here, the first channel is the instance segmentation map and the second channel is the classification map. For the nuclei counts, we provide a single csv file, where each row corresponds to a given patch and the columns determine the counts for each type of nucleus. The row ordering is in line with the order of patches within the numpy files. https://grand-challenge-public-prod.s3.amazonaws.com/i/2021/11/20/sample.png" alt=""> A given nucleus is considered present in the image if any part of it is within the central 224x224 region within the patch. This ensures that a nucleus is only considered for counting if it lies completely within the original 256x256 image.

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Acknowledgements

    This dataset was provided by the Organizers of the CoNIC Challenge: - Simon Graham (TIA, PathLAKE) - Mostafa Jahanifar (TIA, PathLAKE) - Dang Vu (TIA) - Giorgos Hadjigeorghiou (TIA, PathLAKE) - Thomas Leech (TIA, PathLAKE) - David Snead (UHCW, PathLAKE) - Shan Raza (TIA, PathLAKE) - Fayyaz Minhas (TIA, PathLAKE) - Nasir Rajpoot (TIA, PathLAKE)

    TIA: Tissue Image Analytics Centre, Department of Computer Science, University of Warwick, United Kingdom

    UHCW: Department of Pathology, University Hospitals Coventry and Warwickshire, United Kingdom

    PathLAKE: Pathology Image Data Lake for Analytics Knowledge & Education, University Hospitals Coventry and Warwickshire, United Kingdom

  14. Human Epithelial Cell Colorized Chromosomes

    • kaggle.com
    zip
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Medi Hunter - 4004 (2025). Human Epithelial Cell Colorized Chromosomes [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasakbd/human-epithelial-cell-colorized-chromosomes
    Explore at:
    zip(1535222787 bytes)Available download formats
    Dataset updated
    May 27, 2025
    Authors
    Medi Hunter - 4004
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    #Raw Data, Source, More Information :: https://www.kaggle.com/datasets/yashdogra/cell-system https://www.cellimagelibrary.org/home This high-resolution image captures the intricate chromosomal structures within a human epithelial cell (Homo sapiens). Epithelial cells form the lining of various tissues and organs, playing crucial roles in protection, secretion, and absorption. The image provides a detailed view of the chromosomes, essential for understanding cellular differentiation and genetic organization. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2F5ef17593ef62d2d525b1059fbb565857%2F104011_7.jpg?generation=1748348179357766&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2Fb0314b8f03269cc527141199f8f3c085%2F105162_9.jpg?generation=1748348202567251&alt=media" alt=""> Colorized Data Processing Techniques for Medical Imaging

    Medical images like CT scans and X-rays are typically grayscale, making subtle anatomical or pathological differences harder to distinguish. The following image processing and enhancement techniques are used to colorize and improve visual interpretation for diagnostics, training, or AI preprocessing.

    🔷 1. 3D_Rendering Renders medical image volumes into three-dimensional visualizations. Though often grayscale, color can be applied to different tissue types or densities to enhance spatial understanding. Useful in surgical planning or tumor visualization.

    🔷 2. 3D_Volume_Rendering An advanced visualization technique that projects 3D image volumes with transparency and color blending, simulating how light passes through tissue. Color helps distinguish internal structures like organs, vessels, or tumors.

    🔷 3. Adaptive Histogram Equalization (AHE) Enhances contrast locally within the image, especially in low-contrast regions. When colorized, different intensities are mapped to distinct hues, improving visibility of fine-grained details like soft tissues or lesions.

    🔷 4. Alpha Blending A layering technique that combines multiple images (e.g., CT + annotation masks) with transparency. Colors represent different modalities or regions of interest, providing composite visual cues for diagnosis.

    🔷 5. Basic Color Map Applies a standard color palette (like Jet or Viridis) to grayscale data. Different intensities are mapped to different colors, enhancing the visual discrimination of anatomical or pathological regions in the image.

    🔷 6. Contrast Stretching Expands the grayscale range to improve brightness and contrast. When combined with color mapping, tissues with similar intensities become visually distinct, aiding in tasks like bone vs. soft tissue separation.

    🔷 7. Edge Detection Extracts and overlays object boundaries (e.g., organ or lesion outlines) on the original scan. Edge maps are typically colorized (e.g., green or red) to highlight anatomical structures or abnormalities clearly.

    🔷 8. Gamma Correction Adjusts image brightness non-linearly. Color can be used to highlight underexposed or overexposed regions, often revealing soft tissue structures otherwise hidden in raw grayscale CT/X-ray images.

    🔷 9. Gaussian Blur Smooths image noise and details. When visualized with color overlays (e.g., before vs. after), it helps assess denoising effectiveness. It is also used in segmentation preprocessing to reduce edge artifacts.

    🔷 10. Heatmap Visualization Encodes intensity or prediction confidence into a heatmap overlay (e.g., red for high activity). Common in AI-assisted diagnosis to localize tumors, fractures, or infections, layered over the original grayscale image.

    🔷 11. Interactive Segmentation A semi-automated method to extract regions of interest with user input. Segmented areas are color-coded (e.g., tumor = red, background = blue) for immediate visual confirmation and further analysis.

    🔷 12. LUT (Lookup Table) Color Map Maps grayscale values to custom color palettes using a lookup table. This enhances contrast and emphasizes certain intensity ranges (e.g., blood vessels vs. bone), improving interpretability for radiologists.

    🔷 13. Random Color Palette Applies random but consistent colors to segmented regions or labels. Common in datasets with multiple classes (e.g., liver, spleen, kidneys), it helps in visual verification of label diversity.

    🧬 Conclusion These colorization methods do not change the underlying medical data, but they significantly enhance its interpretability for radiologists, researchers, and machine learning algorithms. Color adds clarity, contrast, and context to features that may be missed in grayscale, making it a powerful tool in modern medical imaging workflows.

    #Raw Data, Source, More Information :: https://www.kaggle.com/datasets/yashdogra/cell-system https://www.cellimagelibrary.org/home

    RRA_Think Differently, Create history’s ne...

  15. Knee X-ray Digital Colorized Images

    • kaggle.com
    zip
    Updated May 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Medi Hunter - 4004 (2025). Knee X-ray Digital Colorized Images [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasakbd/knee-x-ray-digital-colorized-images/code
    Explore at:
    zip(426848247 bytes)Available download formats
    Dataset updated
    May 26, 2025
    Authors
    Medi Hunter - 4004
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    #Raw Data, Source, More Information :: Gornale, Shivanand; Patravali, Pooja (2020), “Digital Knee X-ray Images”, Mendeley Data, V1, doi: 10.17632/t9ndx37v5h.1

    Medical images like CT scans and X-rays are typically grayscale, making subtle anatomical or pathological differences harder to distinguish. The following image processing and enhancement techniques are used to colorize and improve visual interpretation for diagnostics, training, or AI preprocessing. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2F66d52435be2f250c9ed5cfbf4b45ff13%2FModerateG3%20(2)_9.jpg?generation=1748288937334094&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2Fa1a59ccf17b8ea7193d9f2dded716778%2FNormalG0%20(45)_11.jpg?generation=1748288949012277&alt=media" alt=""> 🔷 1. 3D_Rendering Renders medical image volumes into three-dimensional visualizations. Though often grayscale, color can be applied to different tissue types or densities to enhance spatial understanding. Useful in surgical planning or tumor visualization.

    🔷 2. 3D_Volume_Rendering An advanced visualization technique that projects 3D image volumes with transparency and color blending, simulating how light passes through tissue. Color helps distinguish internal structures like organs, vessels, or tumors.

    🔷 3. Adaptive Histogram Equalization (AHE) Enhances contrast locally within the image, especially in low-contrast regions. When colorized, different intensities are mapped to distinct hues, improving visibility of fine-grained details like soft tissues or lesions.

    🔷 4. Alpha Blending A layering technique that combines multiple images (e.g., CT + annotation masks) with transparency. Colors represent different modalities or regions of interest, providing composite visual cues for diagnosis.

    🔷 5. Basic Color Map Applies a standard color palette (like Jet or Viridis) to grayscale data. Different intensities are mapped to different colors, enhancing the visual discrimination of anatomical or pathological regions in the image.

    🔷 6. Contrast Stretching Expands the grayscale range to improve brightness and contrast. When combined with color mapping, tissues with similar intensities become visually distinct, aiding in tasks like bone vs. soft tissue separation.

    🔷 7. Edge Detection Extracts and overlays object boundaries (e.g., organ or lesion outlines) on the original scan. Edge maps are typically colorized (e.g., green or red) to highlight anatomical structures or abnormalities clearly.

    🔷 8. Gamma Correction Adjusts image brightness non-linearly. Color can be used to highlight underexposed or overexposed regions, often revealing soft tissue structures otherwise hidden in raw grayscale CT/X-ray images.

    🔷 9. Gaussian Blur Smooths image noise and details. When visualized with color overlays (e.g., before vs. after), it helps assess denoising effectiveness. It is also used in segmentation preprocessing to reduce edge artifacts.

    🔷 10. Heatmap Visualization Encodes intensity or prediction confidence into a heatmap overlay (e.g., red for high activity). Common in AI-assisted diagnosis to localize tumors, fractures, or infections, layered over the original grayscale image.

    🔷 11. Interactive Segmentation A semi-automated method to extract regions of interest with user input. Segmented areas are color-coded (e.g., tumor = red, background = blue) for immediate visual confirmation and further analysis.

    🔷 12. LUT (Lookup Table) Color Map Maps grayscale values to custom color palettes using a lookup table. This enhances contrast and emphasizes certain intensity ranges (e.g., blood vessels vs. bone), improving interpretability for radiologists.

    🔷 13. Random Color Palette Applies random but consistent colors to segmented regions or labels. Common in datasets with multiple classes (e.g., liver, spleen, kidneys), it helps in visual verification of label diversity.

    🧬 Conclusion These colorization methods do not change the underlying medical data, but they significantly enhance its interpretability for radiologists, researchers, and machine learning algorithms. Color adds clarity, contrast, and context to features that may be missed in grayscale, making it a powerful tool in modern medical imaging workflows.

    #Raw Data, Source, More Information :: Gornale, Shivanand; Patravali, Pooja (2020), “Digital Knee X-ray Images”, Mendeley Data, V1, doi: 10.17632/t9ndx37v5h.1

    RRA_Think Differently, Create history’s next line.

    Hello Data Hunters! Hope you're doing well. Here, you'll find medical datasets collected from various platforms — raw data that I’ve colorized and enhanced, making them ready for ML. If you use any of these datasets, please be sure to cite both my work and the original data source, and don't forget to check out the raw data on the original platforms. https://www.kaggle.com/shuvokumarbasak4004 (More Dataset)

  16. Malaria Colorized Cell Images Dataset

    • kaggle.com
    zip
    Updated Oct 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Medi Hunter - 4004 (2025). Malaria Colorized Cell Images Dataset [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasakbd/malaria-colorized-cell-images-dataset
    Explore at:
    zip(1517135387 bytes)Available download formats
    Dataset updated
    Oct 12, 2025
    Authors
    Medi Hunter - 4004
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    #Raw Data, Source, More Information :: https://www.kaggle.com/datasets/iarunava/cell-images-for-detecting-malaria Acknowledgements This Dataset is taken from the official NIH Website: https://ceb.nlm.nih.gov/repositories/malaria-datasets/ And uploaded here, so anybody trying to start working with this dataset can get started immediately, as to download the dataset from NIH website is quite slow. Photo by Егор Камелев on Unsplash https://unsplash.com/@ekamelev

    Colorized Data Processing Techniques for Medical Imaging

    Medical images like CT scans and X-rays are typically grayscale, making subtle anatomical or pathological differences harder to distinguish. The following image processing and enhancement techniques are used to colorize and improve visual interpretation for diagnostics, training, or AI preprocessing.

    🔷 1. 3D_Rendering Renders medical image volumes into three-dimensional visualizations. Though often grayscale, color can be applied to different tissue types or densities to enhance spatial understanding. Useful in surgical planning or tumor visualization.

    🔷 2. 3D_Volume_Rendering An advanced visualization technique that projects 3D image volumes with transparency and color blending, simulating how light passes through tissue. Color helps distinguish internal structures like organs, vessels, or tumors.

    🔷 3. Adaptive Histogram Equalization (AHE) Enhances contrast locally within the image, especially in low-contrast regions. When colorized, different intensities are mapped to distinct hues, improving visibility of fine-grained details like soft tissues or lesions.

    🔷 4. Alpha Blending A layering technique that combines multiple images (e.g., CT + annotation masks) with transparency. Colors represent different modalities or regions of interest, providing composite visual cues for diagnosis.

    🔷 5. Basic Color Map Applies a standard color palette (like Jet or Viridis) to grayscale data. Different intensities are mapped to different colors, enhancing the visual discrimination of anatomical or pathological regions in the image.

    🔷 6. Contrast Stretching Expands the grayscale range to improve brightness and contrast. When combined with color mapping, tissues with similar intensities become visually distinct, aiding in tasks like bone vs. soft tissue separation.

    🔷 7. Edge Detection Extracts and overlays object boundaries (e.g., organ or lesion outlines) on the original scan. Edge maps are typically colorized (e.g., green or red) to highlight anatomical structures or abnormalities clearly.

    🔷 8. Gamma Correction Adjusts image brightness non-linearly. Color can be used to highlight underexposed or overexposed regions, often revealing soft tissue structures otherwise hidden in raw grayscale CT/X-ray images.

    🔷 9. Gaussian Blur Smooths image noise and details. When visualized with color overlays (e.g., before vs. after), it helps assess denoising effectiveness. It is also used in segmentation preprocessing to reduce edge artifacts.

    🔷 10. Heatmap Visualization Encodes intensity or prediction confidence into a heatmap overlay (e.g., red for high activity). Common in AI-assisted diagnosis to localize tumors, fractures, or infections, layered over the original grayscale image.

    🔷 11. Interactive Segmentation A semi-automated method to extract regions of interest with user input. Segmented areas are color-coded (e.g., tumor = red, background = blue) for immediate visual confirmation and further analysis.

    🔷 12. LUT (Lookup Table) Color Map Maps grayscale values to custom color palettes using a lookup table. This enhances contrast and emphasizes certain intensity ranges (e.g., blood vessels vs. bone), improving interpretability for radiologists.

    🔷 13. Random Color Palette Applies random but consistent colors to segmented regions or labels. Common in datasets with multiple classes (e.g., liver, spleen, kidneys), it helps in visual verification of label diversity.

    🧬 Conclusion These colorization methods do not change the underlying medical data, but they significantly enhance its interpretability for radiologists, researchers, and machine learning algorithms. Color adds clarity, contrast, and context to features that may be missed in grayscale, making it a powerful tool in modern medical imaging workflows.

    #Raw Data, Source, More Information :: https://www.kaggle.com/datasets/iarunava/cell-images-for-detecting-malaria Acknowledgements This Dataset is taken from the official NIH Website: https://ceb.nlm.nih.gov/repositories/malaria-datasets/ And uploaded here, so anybody trying to start working with this dataset can get started immediately, as to download the dataset from NIH website is quite slow. Photo by Егор Камелев on Unsplash https://unsplash.com/@ekamelev

    RRA_Think Differently, Create history’s next line.

    Hello Data Hunters! Hope you're doing well. Here, you'll find medical data...

  17. MIAS Mammography Dataset

    • kaggle.com
    zip
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Orvile (2024). MIAS Mammography Dataset [Dataset]. https://www.kaggle.com/datasets/orvile/mias-dataset
    Explore at:
    zip(73927674 bytes)Available download formats
    Dataset updated
    Nov 4, 2024
    Authors
    Orvile
    Description

    Concise Column Descriptions:

    1. MIAS No: MIAS database reference number.
    2. BG (Background Tissue): Type of background tissue: - F: Fatty - G: Fatty-glandular - D: Dense-glandular
    3. CLASS: Type of abnormality present: - CALC: Calcification - CIRC: Well-defined/circumscribed masses - SPIC: Spiculated masses - MISC: Other, ill-defined masses - ARCH: Architectural distortion - ASYM: Asymmetry - NORM: Normal
    4. SEVERITY: Severity of abnormality: - B: Benign - M: Malignant
    5. (5-6) x, y Coordinates: Coordinates of the center of the abnormality.
    6. Radius (pixels): Approximate radius of the circle enclosing the abnormality.

    Descriptions for Your Additional Columns:

    • DENSITY: Tissue density classification, indicated in mammogram images as A (low density), B, C, or D (high density).
    • BI-RADS: BI-RADS classification used for assessing mammographic abnormalities, e.g., BI-RADS 1 for normal, BI-RADS 5 for highly suspicious of malignancy.
    • Group: General classification category (e.g., Normal, Masses, Calcification).

    Acknowledgements/LICENCE

    MAMMOGRAPHIC IMAGE ANALYSIS SOCIETY MiniMammographic Database LICENCE AGREEMENT This is a legal agreement between you, the end user and the Mammographic Image Analysis Society ("MIAS"). Upon installing the MiniMammographic database (the "DATABASE") on your system you are agreeing to be bound by the terms of this Agreement.

    GRANT OF LICENCE
    MIAS grants you the right to use the DATABASE, for research purposes
    ONLY. For this purpose, you may edit, format, or otherwise modify the
    DATABASE provided that the unmodified portions of the DATABASE included
    in a modified work shall remain subject to the terms of this Agreement.
    COPYRIGHT
    The DATABASE is owned by MIAS and is protected by United Kingdom
    copyright laws, international treaty provisions and all other
    applicable national laws. Therefore you must treat the DATABASE
    like any other copyrighted material. If the DATABASE is used in any
    publications then reference must be made to the DATABASE within that
    publication.
    OTHER RESTRICTIONS
    You may not rent, lease or sell the DATABASE.
    LIABILITY
    To the maximum extent permitted by applicable law, MIAS shall not
    be liable for damages, other than death or personal injury,
    whatsoever (including without limitation, damages for negligence,
    loss of business, profits, business interruption, loss of
    business information, or other pecuniary loss) arising out of the
    use of or inability to use this DATABASE, even if MIAS has been
    advised of the possibility of such damages. In any case, MIAS's
    entire liability under this Agreement shall be limited to the
    amount actually paid by you or your assignor, as the case may be,
    for the DATABASE.
    

    Credits

    Reference: J Suckling et al (1994): The Mammographic Image Analysis Society Digital Mammogram Database Exerpta Medica. International Congress Series 1069 pp375-378.

  18. Romanian Dense Breast Mammography Collection

    • kaggle.com
    zip
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adél Bajcsi (2025). Romanian Dense Breast Mammography Collection [Dataset]. https://www.kaggle.com/datasets/bajcsiadel96/romanian-dense-breast-mammography-collection/code
    Explore at:
    zip(31941282485 bytes)Available download formats
    Dataset updated
    Nov 21, 2025
    Authors
    Adél Bajcsi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Romanian Dense Breast Mammography Collection (RDBMC)

    The Romanian Dense Breast Mammography Collection (RDBMC) contains diagnostic mammograms exclusively from dense breasts (BI-RADS categories C and D), a population with increased breast cancer risk and reduced mammographic visibility. The dataset includes digital 2D mammography, tomosynthesis projections, 2D and 3D tomosynthesis reconstructions, and extensive expert-validated clinical metadata.

    1. Overview

    AttributeValue
    Patients251
    Breasts450
    Images2934
    Age range29-82
    Imaging modalitiesMammography, Tomosynthesis
    ViewsMLO, CC
    Clinical status distributionHealthy 28.89%, Benign 44.44%, Malignant 26.67%
    Image formatDICOM

    Lesion prevalence among abnormal cases: Mass (79.38%), Calcifications (37.50%), Asymmetry (7.81%), Architectural distortion (1.56%).

    2. Files

    FileDescription
    /img/DICOM images (2D + 3D tomosynthesis)
    /pngs/NPZ and PNG images (2D)
    image_counts.csvImage counts per patient
    image_types.csvImage types per patient
    metadata.csvPer-breast structured annotations

    All identifiers are anonymized.

    3. Dataset Structure

    RDBMC/ 
     ├── img/ 
     │  ├── 0029 
     │  ├──── 1 
     │  ├────── 0029_1_M_R_MLO_TP.dcm 
     │  └────── ... 
     ├── pngs/  
     │  ├── 0029_1_M_R_MLO_TP.npz  
     │  ├── 0029_1_M_R_MLO_TP.png  
     │  └── ... 
     ├── image_counts.csv 
     ├── image_types.csv 
     ├── metadata.csv 
     └── README.md
    

    File naming pattern: {patientID}_{session}_{class}_{side}_{view}_{type}.dcm
    Example: 0123_1_M_R_MLO_MAM.dcm

    4. Data Dictionary (metadata.csv)

    Information included for all cases

    ColumnDescriptionTypeValues
    CodeAnonymized identifier encoding patient, session, class, sidestring“xxxx_e_c_s” (e.g. 0001_1_M_L)
    AgeAge of the patient when the mammography was takenint[29–82]
    BreastAnatomical sidecategoricalleft / right
    Tissue typeWhether breast tissue is heterogeneous or homogeneouscategoricalheterogeneous / homogeneous
    Breast compositionACR BI-RADS composition (breast density)categoricalC / D
    ClassificationBreast-level classificationcategoricalmalignant / benign / healthy
    BI-RADSBI-RADS score assigned to the breast (ACR BI-RADS Atlas)categorical[1-6]
    Masses_PresentPresence of any mass in the breastboolyes / no
    Masses_MalignantIf mass present: whether mass is malignant (histopathology)boolyes / no
    Masses_ShapeShape of the mass (if present)categoricaloval/round / lobulated / irregular
    Masses_MarginMargin appearance of the mass (if present)categoricalcircumscribed / obscured / microlobulated / indistinct / spiculated
    Masses_LocalizationAnatomical localization of the masscategoricalquadrant-based / retromammary / dispersed / entire breast
    Masses_Associated featuresAssociated imaging features (if any)multi-categoricalskin retraction, nipple retraction, axillary adenopathy, architectural distortion, calcifications
    Asymmetries_PresentPresence of any asymmetryboolyes / no
    Asymmetries_LocalizationLocalization of asymmetry (if present)categoricalsame as Mass_Localization
    Architectural distortions_PresentPresence of architectural distortion (except postoperative scars)boolyes / no
    Architectural distortions_LocalizationLocalization of distortion (if present)categoricalsame as Mass_Localization
    Calcifications_PresentPresence of calcifications (excluding calcifications associated with masses)boolyes / no
    Calcifications_TypeType of calcifications (if present)categoricalmicro / macro
    Calcifications_DistributionDistribution pattern of calcificationscategoricalDiffuse / Regional / Grouped/Linear / Segmental
    Calcifications_LocalizationLocalization of calcifications (if present)categoricalsame as Mass_Localization
    Postoperative cicatrices_PresentPresence of postoperative scars (may appear as distortions)boolyes / no
    Postoperative cicatrices_LocalizationLocalization of postoperative scars (if present)categoricalsame as Mass_Localization
    DiagnosisFinal diagnosis of each lesion (from histopathology or confirmed with ultrasound for benign)categoricalstandardized labels (e.g. ductal carcinoma, UDH, fibroadenoma, etc.)
    Original mammography reportOriginal mammography report text (in Romanian)textfree text (Romanian)
    Mammography reportMammography report translated into Englishtextfree text (English)

    Information included for biopsy-confirmed malign...

  19. Histopathology WSI

    • kaggle.com
    zip
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md.Nazmus Sakib2025 (2025). Histopathology WSI [Dataset]. https://www.kaggle.com/datasets/mdnazmussakib2025/histopathology-wsi
    Explore at:
    zip(7197418474 bytes)Available download formats
    Dataset updated
    Nov 12, 2025
    Authors
    Md.Nazmus Sakib2025
    Description

    Computational histopathology has made significant strides in the past few years, slowly getting closer to clinical adoption. One area of benefit would be the automatic generation of diagnostic reports from H&E-stained whole slide images, which would further increase the efficiency of the pathologists' routine diagnostic workflows.

    In this study, we compiled a dataset (PatchGastricADC22) of histopathological captions of stomach adenocarcinoma endoscopic biopsy specimens, which we extracted from diagnostic reports and paired with patches extracted from the associated whole slide images. The dataset contains a variety of gastric adenocarcinoma subtypes.

    We trained a baseline attention-based model to predict the captions from features extracted from the patches and obtained promising results. We make the captioned dataset of 262K patches publicly available.

    Purpose

    The dataset was created to support research in medical image captioning — specifically, to automatically generate diagnostic text descriptions from histopathological image patches. It helps train and evaluate models that can interpret tissue morphology and produce human-like pathology reports.

    Domain & Source

    • Medical domain: Histopathology
    • Disease focus: Gastric adenocarcinoma (a common type of stomach cancer)
    • Image type: H&E-stained tissue sections
    • Source images: Whole Slide Images (WSIs) split into small patches
    • Magnification: 20×

    Dataset Structure (PatchGastricADC22)

    📁 Folder: patches_captions/patches_captions/ Contains all patch-level histopathology image files (in .jpg format). Each patch represents a cropped region (300×300 pixels) from a Whole Slide Image (WSI).

    🧾 File: captions.csv Provides the mapping between image IDs and their corresponding diagnostic captions. Each row represents one unique image patch and its textual description.

    🧩 CSV Columns:

    id – Base ID identifying the parent WSI or case from which the patch was extracted. subtype – Indicates the histological subtype (e.g., tubular adenocarcinoma, poorly differentiated). text – Expert-written caption describing the morphological and diagnostic features visible in the patch.

    Dataset Statistics 🧩 Total images (patches) ~262,777 🧪 Total WSIs (slides) 1305 🖼️ Patch size 300 × 300 pixels 🔬 Magnification 20× ✍️ Captions One per patch 🔠 Vocabulary size 344 unique words 📏 Max caption length 47 words ⚖️ Split 70% train / 10% validation / 20% test

    Creation Process 1. Whole Slide Images (WSIs) were collected from gastric cancer pathology archives. 2. Each slide was divided into 300×300 patches (non-overlapping). 3. Expert pathologists annotated each patch with a short caption describing diagnostic features (cellular and structural morphology). 4. Data were consolidated into image files + a master captions.csv.

  20. UBC-OCEAN: Tiles🖽 w/ masks🔬 2048px | scale 0.25

    • kaggle.com
    zip
    Updated Nov 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jirka (2023). UBC-OCEAN: Tiles🖽 w/ masks🔬 2048px | scale 0.25 [Dataset]. https://www.kaggle.com/datasets/jirkaborovec/ubc-ocean-tiles-w-masks-2048px-scale-0-25/data
    Explore at:
    zip(19058059551 bytes)Available download formats
    Dataset updated
    Nov 22, 2023
    Authors
    Jirka
    Description

    This is a pre-processed version of the dataset provided by UBC Ovarian Cancer Subtype Classification and Outlier Detection

    The original Whole Slice Images were split into tiles of 2048px and and scaled down by 0.25. Also, empty tiles (more than 90% background) were skipped.

    Supplement with segmentation masks

    adjusted version for segmentations, see:

    Mask info - The masks use the following color codes:

    labelcolormeaning
    0Blackbackground
    1RedTumor
    2GreenStroma (healthy tissue)
    3BlueNecrosis (dead or dying non-cancerous tissue)

    Organizore's notes

    Pathologist Annotations:

    The annotations made by pathologists represent examples of tumor, stroma, and necrosis tissues. They are not exhaustive but illustrative of each tissue type.

    Important Considerations:

    • Representative Nature: Annotated regions serve as examples. For instance, an area marked as a tumor doesn’t imply it’s the sole tumor area in the slide. Other unannotated tumor regions may exist.
    • Exclusivity: In addition to tumor, stroma, and necrosis, other tissue types may be present in a slide but are not annotated as they are not relevant to this specific problem.

    Visualizations

    For correct loading mask with labels use PIL

    from PIL import Image
    mask = np.array(Image.open(path_to_mask))
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Md Waquar Azam (2022). Tissue images [Dataset]. https://www.kaggle.com/datasets/mdwaquarazam/tissue-images
Organization logo

Tissue images

11 type of human tissue images

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
zip(65522357 bytes)Available download formats
Dataset updated
Aug 26, 2022
Authors
Md Waquar Azam
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Tissue- it consists of a group of structurally and functionally similar cells.

Task Classify all types of tissue images with better accuracy. Inspiration The question to be answered to classify crops in each type.

Acknowledgements We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research. ALL IMAGES BELONG TO THE ORIGINAL AUTHORS.

Search
Clear search
Close search
Google apps
Main menu