Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Explore the TCGA Whole Slide Image (WSI) SVS files available on Kaggle, offering detailed visual representations of tissue samples from various cancer types. These high-resolution images provide valuable insights into tumor morphology and tissue architecture, facilitating cancer diagnosis, prognosis, and treatment research. Delve into the rich landscape of cancer biology, leveraging the wealth of information contained within these SVS files to drive innovative advancements in oncology. This is a dataset of WSI images downloaded from the TCGA portal.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
#Raw Data, Source, More Information :: https://www.kaggle.com/datasets/huseyingunduz/diatom-dataset?select=images Citation @article{gunduz2022, title={Segmentation of diatoms using edge detection and deep learning}, volume={30}, DOI={10.55730/1300-0632.3938}, number={6}, journal={Turkish Journal of Electrical Engineering & Computer Sciences}, author={Gunduz, Huseyin and Solak, Cuneyt Nadir and Gunal, Serkan}, year={2022}, pages={ 2268–2285}} Diatoms are a group of algae found in oceans, freshwater, moist soils, and surfaces. They are one of the most common phytoplankton species found in nature. There are more than 200 genera of diatoms, as well as about 200,000 species. They produce approximately 20-25% of the oxygen on the planet.
Accurate detection, segmentation and classification of diatoms is very important, especially in terms of determining water quality and ecological change.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2F0b140a77bdc1e8b3955453f9eb60a294%2F1049_10.jpg?generation=1748347264264824&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2F5dfebe496555e0d44f88a323020b5c29%2F1435_11.jpg?generation=1748347286426827&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2F52b9bac2244c46778d4e7a5680d5db9b%2F1057_8.jpg?generation=1748347333663011&alt=media" alt="">
Colorized Data Processing Techniques for Medical Imaging
Medical images like CT scans and X-rays are typically grayscale, making subtle anatomical or pathological differences harder to distinguish. The following image processing and enhancement techniques are used to colorize and improve visual interpretation for diagnostics, training, or AI preprocessing.
🔷 1. 3D_Rendering Renders medical image volumes into three-dimensional visualizations. Though often grayscale, color can be applied to different tissue types or densities to enhance spatial understanding. Useful in surgical planning or tumor visualization.
🔷 2. 3D_Volume_Rendering An advanced visualization technique that projects 3D image volumes with transparency and color blending, simulating how light passes through tissue. Color helps distinguish internal structures like organs, vessels, or tumors.
🔷 3. Adaptive Histogram Equalization (AHE) Enhances contrast locally within the image, especially in low-contrast regions. When colorized, different intensities are mapped to distinct hues, improving visibility of fine-grained details like soft tissues or lesions.
🔷 4. Alpha Blending A layering technique that combines multiple images (e.g., CT + annotation masks) with transparency. Colors represent different modalities or regions of interest, providing composite visual cues for diagnosis.
🔷 5. Basic Color Map Applies a standard color palette (like Jet or Viridis) to grayscale data. Different intensities are mapped to different colors, enhancing the visual discrimination of anatomical or pathological regions in the image.
🔷 6. Contrast Stretching Expands the grayscale range to improve brightness and contrast. When combined with color mapping, tissues with similar intensities become visually distinct, aiding in tasks like bone vs. soft tissue separation.
🔷 7. Edge Detection Extracts and overlays object boundaries (e.g., organ or lesion outlines) on the original scan. Edge maps are typically colorized (e.g., green or red) to highlight anatomical structures or abnormalities clearly.
🔷 8. Gamma Correction Adjusts image brightness non-linearly. Color can be used to highlight underexposed or overexposed regions, often revealing soft tissue structures otherwise hidden in raw grayscale CT/X-ray images.
🔷 9. Gaussian Blur Smooths image noise and details. When visualized with color overlays (e.g., before vs. after), it helps assess denoising effectiveness. It is also used in segmentation preprocessing to reduce edge artifacts.
🔷 10. Heatmap Visualization Encodes intensity or prediction confidence into a heatmap overlay (e.g., red for high activity). Common in AI-assisted diagnosis to localize tumors, fractures, or infections, layered over the original grayscale image.
🔷 11. Interactive Segmentation A semi-automated method to extract regions of interest with user input. Segmented areas are color-coded (e.g., tumor = red, background = blue) for immediate visual confirmation and further analysis.
🔷 12. LUT (Lookup Table) Color Map Maps grayscale values to custom color palettes using a lookup table. This enhances contrast and emphasizes certain intensity ranges (e.g., blood vessels vs. bone), improving interpretability for radiologists.
🔷 13. Random Color Palette Applies random but consistent colors to segmented regions or labels. Common in datasets with multiple classes (e.g., liver, spleen, kidneys), it helps in v...
Remark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev
Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
Particular data: "Tabula Muris" project https://tabula-muris.ds.czbiohub.org/ Tabula Muris is a compendium of single cell transcriptome data from the model organism Mus musculus, containing nearly 100,000 cells from 20 organs and tissues. The data allow for direct and controlled comparison of gene expression in cell types shared between tissues, such as immune cells from distinct anatomical locations. They also allow for a comparison of two distinct technical approaches:
microfluidic droplet-based 3’-end counting: provides a survey of thousands of cells per organ at relatively low coverage FACS-based full length transcript analysis: provides higher sensitivity and coverage. We hope this rich collection of annotated cells will be a useful resource for:
Defining gene expression in previously poorly-characterized cell populations. Validating findings in future targeted single-cell studies. Developing of methods for integrating datasets (eg between the FACS and droplet experiments), characterizing batch effects, and quantifying the variation of gene expression in a many cell types between organs and animals. The peer reviewed article describing the analysis and findings is available on Nature. https://www.nature.com/articles/s41586-018-0590-4 Nature volume 562, pages367–372 (2018)Cite this article
GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE109774
Course at Sanger's institute https://scrnaseq-course.cog.sanger.ac.uk/website/tabula-muris.html
Course at CZ-hub: https://chanzuckerberg.github.io/scRNA-python-workshop/intro/about
On kaggle - copies of the notebooks and data from the course above https://www.kaggle.com/aayush9753/singlecell-rnaseq-data-from-mouse-brain
Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6
Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Four pathologists from Longhua Hospital Shanghai University of Traditional Chinese Medicine provide 600 images of gastric cancer pathology images at size 2048$\times$2048 pixels. These images were scanned using a NewUsbCamera and digitized at $\times$20 magnification, tissue-level labels were also given by the four experienced pathologists. Based on that, five biomedical researchers from Northeastern University cropped them to 245,196 sub-sized gastric cancer pathology images, and two experienced pathologists from Liaoning Cancer Hospital and Institute perform the calibration. The 245,196 images were split to three sizes (160$\times$160, 120$\times$120, 80$\times$80) for two categories: abnormal and normal.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description The BreakHis - Breast Cancer Histopathological Dataset is a valuable resource for medical image analysis, particularly in the classification of breast cancer. This dataset contains high-resolution histopathological images of breast tissue, divided into both binary and multi-class labels, to support the development and evaluation of machine learning models in cancer classification.
Context and Sources Source: The dataset was initially developed by the P&D Laboratory - Pathological Anatomy and Cytopathology, in collaboration with the University of Porto, to support research on breast cancer diagnosis through digital pathology. The dataset is freely accessible for non-commercial research and educational purposes. Dataset Structure: The dataset contains images organized by: Classification Type: Binary classification (Benign vs. Malignant) and Multi-Class classification (8 different tumor types). Magnification Levels: Images are available at 40X, 100X, 200X, and 400X magnifications, allowing models to learn across varying levels of tissue detail. Classes: Binary: Benign and Malignant categories. Multi-Class: Adenosis, Ductal Carcinoma, Fibroadenoma, Lobular Carcinoma, Mucinous Carcinoma, Papillary Carcinoma, Phyllodes Tumor, and Tubular Adenoma. Inspiration This dataset is ideal for various research and practical applications, including:
Binary Classification: Distinguish between benign and malignant breast tissue, a crucial step in cancer diagnosis. Multi-Class Classification: Identify specific tumor types, aiding in the development of models that can support pathologists in detailed cancer analysis. Transfer Learning: Fine-tune pre-trained CNN models for improved accuracy in medical image classification. Magnification Level Analysis: Explore how models perform across different levels of zoom to understand which magnification provides optimal classification accuracy. This dataset provides an extensive, diverse collection of images that challenges machine learning models with real-world data variability and offers the potential for groundbreaking advancements in digital pathology and breast cancer diagnostics.
The Breast Cancer Histopathological Image Classification (BreakHis) is composed of 9,109 microscopic images of breast tumor tissue collected from 82 patients using different magnifying factors (40X, 100X, 200X, and 400X). It contains 2,480 benign and 5,429 malignant samples (700X460 pixels, 3-channel RGB, 8-bit depth in each channel, PNG format). This database has been built in collaboration with the P&D Laboratory - Pathological Anatomy and Cytopathology, Parana, Brazil.
Paper: F. A. Spanhol, L. S. Oliveira, C. Petitjean and L. Heutte, "A Dataset for Breast Cancer Histopathological Image Classification," in IEEE Transactions on Biomedical Engineering, vol. 63, no. 7, pp. 1455-1462, July 2016, doi: 10.1109/TBME.2015.2496264
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset supports various deep learning applications, including facial anomaly detection, tissue segmentation, and 3D modeling of facial anatomy. With high-resolution sagittal and axial slices, it is ideal for training AI models aimed at accurate facial analysis.
The dataset includes data that showcases the diversity and complexity of facial MRI imaging, suitable for machine learning models and medical analysis. It includes:
All data is anonymized to ensure privacy and complies with publication consent regulations.
The dataset provides a sample from one patient, showcasing the diversity of the full dataset. It contains the following files for exploration:
- DICOM slices with 100 frames
- 3D representation of the facial structure
- CSV file listing the scan characteristics
The LC25000 dataset contains 25,000 color images with 5 classes of 5,000 images each. All images are 768 x 768 pixels in size and are in jpeg file format. The 5 classes are: colon adenocarcinomas, benign colonic tissues, lung adenocarcinomas, lung squamous cell carcinomas and bening lung tissues.
This dataset was generated to provide a benchmark set of images to test colorimetric algorithms against. We hope that it will be useful in the future to progress the ability of the scientific communities to develop algorithms/methodologies that can reduce the density of colour data contained in complex images like fruit in a unbias, reliable, and rapid manner.
Contained within a a set of standard images that can be used in future studies to benchmark algorithms that undertake colorimetric analysis. These images consist of cross sectional fruit and tubers that contain complex colour gradients across the visible spectrum. These images were analyzed in our publication "A data driven approach to assess complex colour profiles in plant tissues"
A selection of 28 species of fruit and tubers was purchased from a local supermarket in Auckland, New Zealand. These fruit and tubers represented different families including Anacardiaceae (mango), Ebenaceae (persimmon), Actinidiaceae (kiwifruit), Lauraceae (avocado), Musaceae (banana), Rosaceae (apple, peach, pear, plum, and strawberry), Rutaceae (grapefruit, lemon, mandarin, and orange), and Solanaceae (potato, tamarillo, and tomato). Each fruit was cross sectioned along its most symmetrical side. Up to three cross sections of the same fruit type were placed face down on the scanner on a predefined 3x1 grid with defined positions to allow image capture of the individual fruit.
The images contained here were captured using a Canon LIDE 220 flatbed scanner (Scanning element sensor: CIS, Light source: 3 colour RGB LED) that was placed in a 2mm black perspex box with a retractable lid that completely blocked ambient light. Parent images with dimensions of 4960 pixels (W) and 7015 pixels (H) was acquired at a resolution of 600 dots-per-inch/pixels-per-inch (DPI/PPI) and output in a TIFF format. Each sibling image was calibrated using the white tile standard on the X-Rite mini-colour checker card that was included in each scanned parent image (McCamy, C. S. et al., 1976) and segmented.
Mccamy, C.S., Marcus, H., and Davidson, J.G. (1976). A Color Rendition Chart. Journal of Applied Photographic Engineering 11, 95-99.
Peter A. McAtee1*, Simona Nardozza1, Annette Richardson2, Mark Wohlers1, Robert J. Schaffer3,4
The Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
https://wiki.cancerimagingarchive.net/display/Public/TCGA-LUAD
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Explore the TCGA Whole Slide Image (WSI) SVS files available on Kaggle, offering detailed visual representations of tissue samples from various cancer types. These high-resolution images provide valuable insights into tumor morphology and tissue architecture, facilitating cancer diagnosis, prognosis, and treatment research. Delve into the rich landscape of cancer biology, leveraging the wealth of information contained within these SVS files to drive innovative advancements in oncology. This is a dataset of WSI images downloaded from the TCGA portal.