100+ datasets found
  1. p

    Myocardial perfusion scintigraphy image database

    • physionet.org
    Updated Sep 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wesley Calixto; Solange Nogueira; Fernanda Luz; Thiago Fellipe Ortiz de Camargo (2025). Myocardial perfusion scintigraphy image database [Dataset]. http://doi.org/10.13026/ce2z-dw74
    Explore at:
    Dataset updated
    Sep 9, 2025
    Authors
    Wesley Calixto; Solange Nogueira; Fernanda Luz; Thiago Fellipe Ortiz de Camargo
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This database provides a collection of myocardial perfusion scintigraphy images in DICOM format with all metadata and segmentations (masks) in NIfTI format. The images were obtained from patients undergoing scintigraphy examinations to investigate cardiac conditions such as ischemia and myocardial infarction. The dataset encompasses a diversity of clinical cases, including various perfusion patterns and underlying cardiac conditions. All images have been properly anonymized, and the age range of the patients is from 20 to 90 years. This database represents a valuable source of information for researchers and healthcare professionals interested in the analysis and diagnosis of cardiac diseases. Moreover, it serves as a foundation for the development and validation of image processing algorithms and artificial intelligence techniques applied to cardiovascular medicine. Available for free on the PhysioNet platform, its aim is to promote collaboration and advance research in nuclear cardiology and cardiovascular medicine, while ensuring the replicability of studies.

  2. Metadata record for: PIC, a paediatric-specific intensive care database

    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scientific Data Curation Team (2023). Metadata record for: PIC, a paediatric-specific intensive care database [Dataset]. http://doi.org/10.6084/m9.figshare.11481810.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Scientific Data Curation Team
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains key characteristics about the data described in the Data Descriptor PIC, a paediatric-specific intensive care database. Contents:

        1. human readable metadata summary table in CSV format
    
    
        2. machine readable metadata file in JSON format 
         Versioning Note:Version 2 was generated when the metadata format was updated from JSON to JSON-LD. This was an automatic process that changed only the format, not the contents, of the metadata.
    
  3. Image database

    • figshare.com
    png
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristian Dambros (2016). Image database [Dataset]. http://doi.org/10.6084/m9.figshare.1379982.v3
    Explore at:
    pngAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Cristian Dambros
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Images, Maps, GIS images

  4. PMcardio ECG Image Database (PM-ECG-ID): A Diverse ECG Database for...

    • zenodo.org
    zip
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrej Iring; Viera Krešňáková; Viera Krešňáková; Michal Hojcka; Vladimir Boza; Adam Rafajdus; Boris Vavrik; Andrej Iring; Michal Hojcka; Vladimir Boza; Adam Rafajdus; Boris Vavrik (2024). PMcardio ECG Image Database (PM-ECG-ID): A Diverse ECG Database for Evaluating Digitization Solutions [Dataset]. http://doi.org/10.5281/zenodo.13617673
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 31, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrej Iring; Viera Krešňáková; Viera Krešňáková; Michal Hojcka; Vladimir Boza; Adam Rafajdus; Boris Vavrik; Andrej Iring; Michal Hojcka; Vladimir Boza; Adam Rafajdus; Boris Vavrik
    License

    https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html

    Description

    The dataset presents the collection of a diverse electrocardiogram (ECG) database for testing and evaluating ECG digitization solutions. The Powerful Medical ECG image database was curated using 100 ECG waveforms selected from the PTB-XL Digital Waveform Database and various images generated from the base waveforms with varying lead visibility and real-world paper deformations, including the use of different mobile phones, bends, crumbles, scans, and photos of computer screens with ECGs. The ECG waveforms were augmented using various techniques, including changes in contrast, brightness, perspective transformation, rotation, image blur, JPEG compression, and resolution change. This extensive approach yielded 6,000 unique entries, which provides a wide range of data variance and extreme cases to evaluate the limitations of ECG digitization solutions and improve their performance, and serves as a benchmark to evaluate ECG digitization solutions.

    PM-ECG-ID database contains electrocardiogram (ECG) images and their corresponding ECG information. The data records are organized in a hierarchical folder structure, which includes metadata, waveform data, and visual data folders. The contents of each folder are described below:

    • metadata.csv:
      This file serves as a key-to-key bridge between the image data and the corresponding ECG information. It contains the following columns:
      • Image name: image name with extension,
      • ECG ID: this ID corresponds to the specific ECG identifier from the original PTB-XL dataset. Under this ID you can find a cutout array in the leads.npz and rhythms.npz,
      • Image relative path: relative path to the image in question,
      • Image page: page number of the particular image (starting from 0),
      • ECG number of pages: number of pages in the whole ECG,
      • ECG number of columns per page: number of columns per page in the ECG,
      • ECG number of rows per page: number of rows in the ECG,
      • ECG number of rhythm leads: number of rhythms in the ECG,
      • ECG format: short version of the ECG format.
    • data folder:
      • leads.npz: NPZ file containing all underlying cutout lead signals; each signal is there under its ECG ID.
      • rhythms.npz: NPZ file containing all underlying rhythm signals; each signal is there under its ECG ID. If no rhythm lead is in the ECG, you will find an empty array in the NPZ.
    • visual_data folder:
      This folder contains subfolders for various image data, including augmented photos and visualization and different types of photos of ECG printouts. The subfolders are organized based on the specific augmentation or type of photograph. These folders contain images with various augmentation settings, such as different levels of blur, brightness, contrast, padding, perspective transformation, resolution scaling, and rotation. The database is organized in a way that allows for easy navigation and understanding of the different augmentations applied to the image data. Each of these subfolders contains images relevant to the specific augmentation or type of photograph. The metadata.csv file provides a direct link to each image and its associated ECG information.
  5. d

    Sea turtle photo-identification database

    • catalog.data.gov
    • fisheries.noaa.gov
    Updated Apr 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact) (2024). Sea turtle photo-identification database [Dataset]. https://catalog.data.gov/dataset/sea-turtle-photo-identification-database1
    Explore at:
    Dataset updated
    Apr 1, 2024
    Dataset provided by
    (Point of Contact)
    Description

    The ability to correctly and consistently identify sea turtles over time was evaluated using digital imagery of the turtles dorsal and side views of their heads and dorsal views of their carapaces

  6. f

    Data_Sheet_1_art.pics Database: An Open Access Database for Art Stimuli for...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Dec 16, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Medawar, Evelyn; Witte, A. Veronica; Disch, Leonie; Thieleking, Ronja (2020). Data_Sheet_1_art.pics Database: An Open Access Database for Art Stimuli for Experimental Research.CSV [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000550484
    Explore at:
    Dataset updated
    Dec 16, 2020
    Authors
    Medawar, Evelyn; Witte, A. Veronica; Disch, Leonie; Thieleking, Ronja
    Description

    While art is omnipresent in human history, the neural mechanisms of how we perceive, value and differentiate art has only begun to be explored. Functional magnetic resonance imaging (fMRI) studies suggested that art acts as secondary reward, involving brain activity in the ventral striatum and prefrontal cortices similar to primary rewards such as food. However, potential similarities or unique characteristics of art-related neuroscience (or neuroesthetics) remain elusive, also because of a lack of adequate experimental tools: the available collections of art stimuli often lack standard image definitions and normative ratings. Therefore, we here provide a large set of well-characterized, novel art images for use as visual stimuli in psychological and neuroimaging research. The stimuli were created using a deep learning algorithm that applied different styles of popular paintings (based on artists such as Klimt or Hundertwasser) on ordinary animal, plant and object images which were drawn from established visual stimuli databases. The novel stimuli represent mundane items with artistic properties with proposed reduced dimensionality and complexity compared to paintings. In total, 2,332 novel stimuli are available open access as “art.pics” database at https://osf.io/BTWNQ/ with standard image characteristics that are comparable to other common visual stimuli material in terms of size, variable color distribution, complexity, intensity and valence, measured by image software analysis and by ratings derived from a human experimental validation study [n = 1,296 (684f), age 30.2 ± 8.8 y.o.]. The experimental validation study further showed that the art.pics elicit a broad and significantly different variation in subjective value ratings (i.e., liking and wanting) as well as in recognizability, arousal and valence across different art styles and categories. Researchers are encouraged to study the perception, processing and valuation of art images based on the art.pics database which also enables real reward remuneration of the rated stimuli (as art prints) and a direct comparison to other rewards from e.g., food or money.Key Messages: We provide an open access, validated and large set of novel stimuli (n = 2,332) of standardized art images including normative rating data to be used for experimental research. Reward remuneration in experimental settings can be easily implemented for the art.pics by e.g., handing out the stimuli to the participants (as print on premium paper or in a digital format), as done in the presented validation task. Experimental validation showed that the art.pics’ images elicit a broad and significantly different variation in subjective value ratings (i.e., liking, wanting) across different art styles and categories, while size, color and complexity characteristics remained comparable to other visual stimuli databases.

  7. n

    Brain FDG-PET/MR Image Database - Dataset - Taiwan Medical AI and Data...

    • data.dmc.nycu.edu.tw
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Brain FDG-PET/MR Image Database - Dataset - Taiwan Medical AI and Data Portal [Dataset]. https://data.dmc.nycu.edu.tw/dataset/petmri
    Explore at:
    Dataset updated
    Jul 30, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fluorodeoxyglucose Positron Emission Tomography (FDG-PET) is currently one of the powerful tools for the clinical diagnosis of dementia such as Alzheimer's Disease (AD). Meanwhile, MR imaging, being non-radioactive and having high contrast resolution, is highly accessible in clinical settings. Therefore, this dataset intends to use FDG-PET images as the Ground Truth for evaluating AD, for the development of predicting AD patients using MR images. This dataset includes an AD group and a control group (Healthy Group). The determination of the image diagnosis group is made by neurology specialists based on comprehensive judgment using clinically relevant information. Each set of data contains one set of MRI T1 images and one set of FDG-PET images. The image format is DICOM, and all images have been anonymized. To obtain the clinical information and related documentation, please contact the administrator.

  8. NCEI/WDS Natural Hazards Image Database

    • ncei.noaa.gov
    • catalog.data.gov
    Updated Feb 1, 2012
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Geophysical Data Center / World Data Service (NGDC/WDS) (2012). NCEI/WDS Natural Hazards Image Database [Dataset]. http://doi.org/10.7289/v5154f01
    Explore at:
    Dataset updated
    Feb 1, 2012
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
    Authors
    National Geophysical Data Center / World Data Service (NGDC/WDS)
    Time period covered
    1867 - Present
    Area covered
    Description

    Photographs and other visual media provide valuable pre- and post-event data for natural hazards. Research, mitigation, and forecasting rely on visual data for post-analysis, inundation mapping and historic records. Instrumental data only reveal a portion of the whole story; photographs explicitly illustrate the physical and societal impacts from an event. This resource provides high-resolution geologic and damage photographs from natural hazards events, including earthquakes, tsunamis, slides, volcanic eruptions and geologic movement (faults, creep, subsidence and flows). The earliest images date back to 1867. Each event also links to NCEI's Global Historical hazards databases, which provide details for these events.

  9. c

    Recurrent Breast Cancer: Histopathological and Hyperspectral Images Database...

    • cancerimagingarchive.net
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2025). Recurrent Breast Cancer: Histopathological and Hyperspectral Images Database [Dataset]. http://doi.org/10.7937/6kpy-yt49
    Explore at:
    envi, mrxs, and geojson, xlsx, n/aAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Sep 30, 2025
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    Abstract

    Multimodal data has emerged as a promising tool to integrate diverse information, offering a more comprehensive perspective. This study introduces the HistologyHSI-BC-Recurrence Database, the first publicly accessible multimodal dataset designed to advance distant recurrence prediction in breast cancer (BC). The dataset comprises 47 histopathological whole-slide images (WSIs), 677 hyperspectral (HS) images, and demographic and clinical data from 47 BC patients, of whom 22 (47%) experienced distant recurrence over a 12-year follow-up. Histopathological slides were digitized using a WSI scanner and annotated by expert pathologists, while HS images were acquired with a bright-field microscope and a HS camera. This dataset provides a promising resource for BC recurrence prediction and personalized treatment strategies by integrating histopathological WSIs, HS images, and demographic and clinical data.

    Introduction

    Breast cancer (BC) is the most common cancer in women and a leading cause of cancer-related deaths, with metastasis being the main cause of death. About one-third of BC patients develop metastasis, which can be regional or distant, and survival rates drop dramatically with distant metastasis. Despite progress in identifying biomarkers associated with metastasis, there is no consensus for their clinical use. Imaging methods, such as X-ray, ultrasound, and magnetic resonance imaging, play a key role in detection, but histopathological diagnosis is crucial for treatment decisions. Digital pathology, utilizing whole-slide images (WSIs) and machine learning, is transforming BC diagnostics, integrating clinical data to improve prognostic accuracy. Hyperspectral imaging (HSI), which combines spatial and spectral information, is emerging as a promising tool for BC detection and prognosis. However, high-quality datasets integrating WSIs, HS images, and clinical data are scarce. This study introduces the HistologyHSI-BC-Recurrence Database, which includes WSIs, HS images, and clinical data from 47 BC patients, aiming to predict recurrence due to distant metastasis. This multimodal dataset will help develop predictive models, enhance diagnostic accuracy, and support research in computational pathology, ultimately improving personalized treatment strategies for BC.

    Methods

    Subject Inclusion and Exclusion Criteria

    This dataset includes data from 47 patients diagnosed with invasive ductal carcinoma (IDC) between 2006 and 2015. Of these, 22 patients experienced recurrence due to distant metastasis within 12 years, while 25 patients did not. Inclusion criteria required a diagnosis of IDC, representative surgical biopsy, complete clinical and pathological data, and patient consent. Exclusion criteria involved receiving neoadjuvant treatment, regional recurrence rather than in distant organs, presence of distant metastases at diagnosis, or failure to meet inclusion criteria.

    Data Acquisition

    Histopathology WSIs

    Paraffin blocks of primary tumor biopsies with sufficient representative IDC tissue were obtained from the Biobank IISPV-Node Tortosa, Tarragona, Spain. The samples were processed in the Pathology Department, where 2 µm-thick sections were prepared from each paraffin block and stained according to the standard H&E staining protocol. The slides were sealed with coverslips using dibutylphthalate polystyrene xylene (DPX) mounting medium for subsequent digitization and HS microscopic image acquisition. The H&E-stained slides were digitized with a WSI scanner (Pannoramic 250 Flash III, 3DHISTECH Ltd., Budapest, Hungary) at 20× magnification (0.2433 µm/pixel) using MRXS image format.

    Demographic and clinical data

    The data process involved extracting information from clinical records, including demographic and clinical information (please refer to the HistologyHSI-BC-Recurrence-Clinical-Standardized-DataDictionary.xlsx)

    HS images

    The HS images were captured using a Hyperspec® VNIR A-Series pushbroom camera, which scans samples spatially and captures spectral data across 400-1,000 nm. The camera is paired with an Olympus BX-53 microscope and a scanning stage that ensures precise sample alignment. Calibration of the HS images is crucial to adjust for sensor response, light transmission, and source variation, achieved by normalizing pixel values using white and dark references. The system also generates synthetic RGB images for easier visualization of the data. In-house software facilitates sample navigation, synchronizes camera and microscope stage, and processes the data by removing noisy bands and generating calibrated cubes.

    Data Analysis

    WSIs were visualized using QuPath and anonymized with SlideMaster software. The quality of the histopathological slides was verified by pathologists, ensuring no artifacts were present due to tissue preparation or digitization. Pathologists manually annotated the images to differentiate between IDC, healthy tissue, and ductal carcinoma in situ (DCIS) using a color scheme (blue for IDC, green for healthy tissue, and red for DCIS). Annotations were initially made by one pathologist and then validated through a pairwise review with a second pathologist to ensure consistency and minimize inter-observer variability. Furthermore, regions of interest (ROIs) within these tissue types were identified and marked by yellow lines, for further HS imaging analysis.

    Usage Notes

    Data organization and naming conventions

    The database is divided into three main components:

    1. Clinical and demographic data (HistologyHSI-BC-Recurrence -Clinical-Standardized.xlsx)
    2. WSIs and corresponding tissue and ROI annotations (01_01_Histological_Images, 01_02_Tissue_Annotations, and 01_03_HSI_ROI_Annotations)
    3. HSI images (02_01_HSI_Images). The HSI data is stored in folders named according to the regular expression HSI_VNIR_{P}_{TT}_x10_C{CN}, where {P} represents the patient ID, {TT} indicates the tissue type (IDC, healthy, or DCIS), and {CN} is the capture number.

    Working with HSIs

    HSI data is typically stored in specialized formats like .hdr files paired with .dat or .raw files, representing a multidimensional data cube. Python and MATLAB are usually employed for processing these data. See the External Resources section below for example code. First, calibration is essential, followed by optional processing like spectral dimensionality reduction to reduce noise and computational costs (e.g., reducing 826 spectral bands to 275 by averaging neighboring bands). Normalization can also be performed when needed, scaling data to a range or adjusting to have a mean of 0 and standard deviation of 1. Additionally, removing the sample background, typically the white areas, is recommended for more accurate analysis.

    Recommendations for software that can be used to open the data

    Visualizing histopathology WSIs

    The authors suggest using QuPath software to open and analyze WSIs (MRXS format) and annotations (GeoJSON format). WSIs can be loaded via drag and drop or through the "File/Open" option. Annotations for tissue compartments (IDC, healthy, DCIS) and ROIs (yellow rectangles for HS capture) should be imported as GeoJSON files.

  10. H

    OPTIMAM Mammographic Image Database

    • dtechtive.com
    • find.data.gov.scot
    Updated Jul 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cancer Research Horizons (2023). OPTIMAM Mammographic Image Database [Dataset]. https://dtechtive.com/datasets/25791
    Explore at:
    Dataset updated
    Jul 3, 2023
    Dataset provided by
    Cancer Research Horizons
    Area covered
    United Kingdom
    Description

    The OPTIMAM Mammography Image Database is a sharable resource with processed and unprocessed mammography images from United Kingdom breast screening centers, with annotated cancers and clinical details.

  11. o

    Optical Coherence Tomography Image Retinal Database

    • openicpsr.org
    • search.gesis.org
    Updated Feb 15, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peyman Gholami; Vasudevan Lakshminarayanan (2019). Optical Coherence Tomography Image Retinal Database [Dataset]. http://doi.org/10.3886/E108503V1
    Explore at:
    Dataset updated
    Feb 15, 2019
    Dataset provided by
    University of Waterloo
    Authors
    Peyman Gholami; Vasudevan Lakshminarayanan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An open source Optical Coherence Tomography Image Database containing different retinal OCT images with different pathological conditions. Please use the following citation if you use the database: Peyman Gholami, Priyanka Roy, Mohana Kuppuswamy Parthasarathy, Vasudevan Lakshminarayanan, "OCTID: Optical Coherence Tomography Image Database", arXiv preprint arXiv:1812.07056, (2018). For more information and details about the database see: https://arxiv.org/abs/1812.07056

  12. Zenodo Code Images

    • kaggle.com
    zip
    Updated Jun 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Research Computing Center (2018). Zenodo Code Images [Dataset]. https://www.kaggle.com/datasets/stanfordcompute/code-images
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Jun 18, 2018
    Dataset authored and provided by
    Stanford Research Computing Center
    Description

    Code Images

    DOI

    Context

    This is a subset of the Zenodo-ML Dinosaur Dataset [Github] that has been converted to small png files and organized in folders by the language so you can jump right in to using machine learning methods that assume image input.

    Content

    Included are .tar.gz files, each named based on a file extension, and when extracted, will produce a folder of the same name.

     tree -L 1
    .
    ├── c
    ├── cc
    ├── cpp
    ├── cs
    ├── css
    ├── csv
    ├── cxx
    ├── data
    ├── f90
    ├── go
    ├── html
    ├── java
    ├── js
    ├── json
    ├── m
    ├── map
    ├── md
    ├── txt
    └── xml
    

    And we can peep inside a (somewhat smaller) of the set to see that the subfolders are zenodo identifiers. A zenodo identifier corresponds to a single Github repository, so it means that the png files produced are chunks of code of the extension type from a particular repository.

    $ tree map -L 1
    map
    ├── 1001104
    ├── 1001659
    ├── 1001793
    ├── 1008839
    ├── 1009700
    ├── 1033697
    ├── 1034342
    ...
    ├── 836482
    ├── 838329
    ├── 838961
    ├── 840877
    ├── 840881
    ├── 844050
    ├── 845960
    ├── 848163
    ├── 888395
    ├── 891478
    └── 893858
    
    154 directories, 0 files
    

    Within each folder (zenodo id) the files are prefixed by the zenodo id, followed by the index into the original image set array that is provided with the full dinosaur dataset archive.

    $ tree m/891531/ -L 1
    m/891531/
    ├── 891531_0.png
    ├── 891531_10.png
    ├── 891531_11.png
    ├── 891531_12.png
    ├── 891531_13.png
    ├── 891531_14.png
    ├── 891531_15.png
    ├── 891531_16.png
    ├── 891531_17.png
    ├── 891531_18.png
    ├── 891531_19.png
    ├── 891531_1.png
    ├── 891531_20.png
    ├── 891531_21.png
    ├── 891531_22.png
    ├── 891531_23.png
    ├── 891531_24.png
    ├── 891531_25.png
    ├── 891531_26.png
    ├── 891531_27.png
    ├── 891531_28.png
    ├── 891531_29.png
    ├── 891531_2.png
    ├── 891531_30.png
    ├── 891531_3.png
    ├── 891531_4.png
    ├── 891531_5.png
    ├── 891531_6.png
    ├── 891531_7.png
    ├── 891531_8.png
    └── 891531_9.png
    
    0 directories, 31 files
    

    So what's the difference?

    The difference is that these files are organized by extension type, and provided as actual png images. The original data is provided as numpy data frames, and is organized by zenodo ID. Both are useful for different things - this particular version is cool because we can actually see what a code image looks like.

    How many images total?

    We can count the number of total images:

    find "." -type f -name *.png | wc -l
    3,026,993
    

    Dataset Curation

    The script to create the dataset is provided here. Essentially, we start with the top extensions as identified by this work (excluding actual images files) and then write each 80x80 image to an actual png image, organizing by extension then zenodo id (as shown above).

    Saving the Image

    I tested a few methods to write the single channel 80x80 data frames as png images, and wound up liking cv2's imwrite function because it would save and then load the exact same content.

    import cv2
    cv2.imwrite(image_path, image)
    

    Loading the Image

    Given the above, it's pretty easy to load an image! Here is an example using scipy, and then for newer Python (if you get a deprecation message) using imageio.

    image_path = '/tmp/data1/data/csv/1009185/1009185_0.png'
    from imageio import imread
    
    image = imread(image_path)
    array([[116, 105, 109, ..., 32, 32, 32],
        [ 48, 44, 48, ..., 32, 32, 32],
        [ 48, 46, 49, ..., 32, 32, 32],
        ..., 
        [ 32, 32, 32, ..., 32, 32, 32],
        [ 32, 32, 32, ..., 32, 32, 32],
        [ 32, 32, 32, ..., 32, 32, 32]], dtype=uint8)
    
    
    image.shape
    (80,80)
    
    
    # Deprecated
    from scipy import misc
    misc.imread(image_path)
    
    Image([[116, 105, 109, ..., 32, 32, 32],
        [ 48, 44, 48, ..., 32, 32, 32],
        [ 48, 46, 49, ..., 32, 32, 32],
        ..., 
        [ 32, 32, 32, ..., 32, 32, 32],
        [ 32, 32, 32, ..., 32, 32, 32],
        [ 32, 32, 32, ..., 32, 32, 32]], dtype=uint8)
    

    Remember that the values in the data are characters that have been converted to ordinal. Can you guess what 32 is?

    ord(' ')
    32
    
    # And thus if you wanted to convert it back...
    chr(32)
    

    So how t...

  13. d

    Data from: Image and biometric data for fish from Great Lakes tributaries...

    • catalog.data.gov
    • data.usgs.gov
    • +3more
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Image and biometric data for fish from Great Lakes tributaries collected during spring 2019 [Dataset]. https://catalog.data.gov/dataset/image-and-biometric-data-for-fish-from-great-lakes-tributaries-collected-during-spring-201
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    U.S. Geological Survey
    Area covered
    The Great Lakes
    Description

    Image and biometric data were collected for 22 species of fish from Great Lakes Tributaries in Michigan and Ohio, and the Illinois River for the purpose of developing a fish identification classifier. Data consists of a comma delimited spreadsheet that identifies image file names and associated fish identification number, common name, species code, family name, genus, and species, date collected, river from which each fish was collected, location of sampling, fish fork length in millimeters, girth in millimeters, weight in kilograms, and personnel involved with image collection. Biometric data are saved as .csv comma delimited format and image files are saved as .png file type.

  14. Data from: Feed the Future Grain Legumes Project Database

    • catalog.data.gov
    • datasetcatalog.nlm.nih.gov
    • +1more
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Feed the Future Grain Legumes Project Database [Dataset]. https://catalog.data.gov/dataset/feed-the-future-grain-legumes-project-database-72bfa
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    Data from this project focuses on the evaluation of breeding lines. Significant progress was made in advancing breeding populations directed towards release of improved varieties in Tanzania. Thirty promising F4:7, 1st generation 2014 PIC (Phaseolus Improvement Cooperative) and ~100 F4:6, 2nd generation 2015 PIC breeding lines were selected. In addition, ~300 F4:5, 3rd generation 2016 PIC single plant selections were completed in Arusha and Mbeya. These breeding lines, derived from 109 PIC populations specifically developed to combine abiotic and biotic stress tolerance, showed superior agronomic potential compared with checks and local landraces. The diversity, scale, and potential of the material in the PIC breeding pipeline is invaluable and requires continued support to ensure the release of varieties that promise to increase the productivity of common bean in the E. African region. Data available includes databases, spreadsheets, and images related to the project. Resources in this dataset:Resource Title: Data Dictionary. File Name: ADP-1_DD.pdfResource Title: ADP-1 Database. File Name: ADP1-DB.zipResource Description: This file is a link to a draft version of the development and characterization of the common bean diversity panel (ADP) database in Microsoft Access. Preliminary information is provided in this database, while the full version is being prepared. In order to use the database you’ll need to download the complete file, extract it and open the MS access file. You must allow active content when opening the database for it to work properly. Downloaded on November 17, 2017.Resource Title: Anthracnose Screening of Andean Diversity Panel (ADP) . File Name: Anthracnose-screening-of-ADP.pdfResource Description: Approximately 230 ADP lines of the ADP were screened with 8 races of anthracnose under controlled conditions at Michigan State University. Dr. James Kelly has provided this valuable dataset for sharing in light of the Open Data policy of the US government. This dataset represents the first comprehensive screening of the ADP with a broad set of races of a specific pathogen.Resource Title: ARS - Feed the Future Shared Data . File Name: ARS-FtF-Data-Sharing.zipResource Description: The data provided herein is an early draft version of the data that has been generated by the ARS Feed-the-Future Grain Legumes Project that is focused on common bean research. Resource Title: PIC (Phaseolus Improvement Cooperative) Populations . File Name: PIC-breeding-populations.xlsxResource Description: The complete list of PIC breeding populations (Excel Format) PIC (Phaseolus Improvement Cooperative) populations are bulked populations for improvement of common bean in Feed the Future Countries, with a principal focus on sub-Saharan Africa. These populations are for distribution to collaborators, are segregating for key biotic and abiotic stress constraints, and can be used for selection and release of improved cultivars/germplasm. Many of these populations are derived from crosses between ADP landrances and cultivars from sub-Saharan Africa and other improved genotypes with key biotic or abiotic stress tolerance. Phenotypic and genotypic information related to the parents of the crosses can be found in the ADP Database.

  15. Dresden Image Database

    • kaggle.com
    zip
    Updated Jun 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MicsCodes (2024). Dresden Image Database [Dataset]. https://www.kaggle.com/datasets/micscodes/dresden-image-database/code
    Explore at:
    zip(53247469834 bytes)Available download formats
    Dataset updated
    Jun 27, 2024
    Authors
    MicsCodes
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Dresden
    Description

    The 'Dresden Image Database' comprises original JPEG images from 73 camera devices across 25 camera models. This dataset is primarily used for Source Camera Device and Model Identification, offering over 14,000 images captured under controlled conditions.

    Copyright: "Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee."

    Original Source (Not Working as on 28 June 2024): http://forensics.inf.tu-dresden.de/dresden_image_database/

    Please Cite the corresponding paper "Gloe, T., & Böhme, R. (2010, March). The'Dresden Image Database'for benchmarking digital image forensics. In Proceedings of the 2010 ACM symposium on applied computing (pp. 1584-1590)."

  16. SNF Satellite Image Data Inventory - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). SNF Satellite Image Data Inventory - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/snf-satellite-image-data-inventory-5b57e
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The purpose of the SNF Study was to develop the techniques to make the link from biophysical measurements made on the ground to aircraft radiometric measurements and then to scale up to satellite observations. Therefore, satellite image data were acquired for the Superior National Forest study site. These data were selected from all the scenes available from Landsat 1 through 5 and SPOT platforms. Image data substantially contaminated by cloud cover or of poor radiometric quality was not acquired. Of the Landsat scenes, only one Thematic Mapper (TM) scene was acquired, the remainder were Multispectral Scanner (MSS) images. Some of the acquired image data had cloud cover in portions of the scene or other problems with the data. These problems and other comments about the images are summarized in the data set. This data set contains a listing of the scenes that passed inspection and were acquired and archived by Goddard Space Flight Center. Though these image data are no longer available from either the Goddard Space Flight Center or the ORNL DAAC, this data set has been included in the Superior National Forest data collection in order to document which satellite images were used during the project.

  17. Table1_Development and validation of a nomogram for predicting...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jul 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jia-Liang Zhu; Xiao-Mei Xu; Hai-Yan Yin; Jian-Rui Wei; Jun Lyu (2023). Table1_Development and validation of a nomogram for predicting hospitalization longer than 14 days in pediatric patients with ventricular septal defect—a study based on the PIC database.docx [Dataset]. http://doi.org/10.3389/fphys.2023.1182719.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 4, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Jia-Liang Zhu; Xiao-Mei Xu; Hai-Yan Yin; Jian-Rui Wei; Jun Lyu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: Ventricular septal defect is a common congenital heart disease. As the disease progresses, the likelihood of lung infection and heart failure increases, leading to prolonged hospital stays and an increased likelihood of complications such as nosocomial infections. We aimed to develop a nomogram for predicting hospital stays over 14 days in pediatric patients with ventricular septal defect and to evaluate the predictive power of the nomogram. We hope that nomogram can provide clinicians with more information to identify high-risk groups as soon as possible and give early treatment to reduce hospital stay and complications.Methods: The population of this study was pediatric patients with ventricular septal defect, and data were obtained from the Pediatric Intensive Care Database. The resulting event was a hospital stay longer than 14 days. Variables with a variance inflation factor (VIF) greater than 5 were excluded. Variables were selected using the least absolute shrinkage and selection operator (Lasso), and the selected variables were incorporated into logistic regression to construct a nomogram. The performance of the nomogram was assessed by using the area under the receiver operating characteristic curve (AUC), Decision Curve Analysis (DCA) and calibration curve. Finally, the importance of variables in the model is calculated based on the XGboost method.Results: A total of 705 patients with ventricular septal defect were included in the study. After screening with VIF and Lasso, the variables finally included in the statistical analysis include: Brain Natriuretic Peptide, bicarbonate, fibrinogen, urea, alanine aminotransferase, blood oxygen saturation, systolic blood pressure, respiratory rate, heart rate. The AUC values of nomogram in the training cohort and validation cohort were 0.812 and 0.736, respectively. The results of the calibration curve and DCA also indicated that the nomogram had good performance and good clinical application value.Conclusion: The nomogram established by BNP, bicarbonate, fibrinogen, urea, alanine aminotransferase, blood oxygen saturation, systolic blood pressure, respiratory rate, heart rate has good predictive performance and clinical applicability. The nomogram can effectively identify specific populations at risk for adverse outcomes.

  18. Data from: Old Photos

    • kaggle.com
    zip
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcin Rutecki (2024). Old Photos [Dataset]. https://www.kaggle.com/datasets/marcinrutecki/old-photos
    Explore at:
    zip(2564646 bytes)Available download formats
    Dataset updated
    Jan 18, 2024
    Authors
    Marcin Rutecki
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Welcome to the Vintage Photo Restoration Collection, a unique dataset curated for enthusiasts and professionals in the field of image restoration and enhancement. This dataset comprises a diverse range of old photographs, offering a glimpse into the past while serving as a valuable resource for modern image processing techniques.

    Content This collection contains [number of images] high-quality scans of vintage photographs. The images feature a variety of subjects, including portraits, landscapes, urban scenes, and everyday life from different eras. Each photo has been carefully digitized to preserve its original character while ensuring clarity for restoration work.

    Potential Uses The primary aim of this dataset is to facilitate research and projects in areas such as:

    • Photo Restoration: Recovering and repairing damaged or degraded aspects of old photos.
    • Upscaling: Enhancing the resolution of low-quality images without losing their vintage essence.
    • Colorization: Adding color to black and white photos using AI-driven techniques.
    • Enhancement: Improving image quality through brightness, contrast, and detail adjustments. Challenges

    This dataset offers a range of challenges for practitioners:

    • Dealing with various types of image degradation, such as fading, scratches, and tears.
    • Maintaining the integrity and authenticity of historical images during the enhancement process.
    • Balancing modern image processing techniques with the preservation of vintage aesthetics.

    All images are provided in JPG

  19. c

    Data from The Lung Image Database Consortium (LIDC) and Image Database...

    • cancerimagingarchive.net
    dicom, n/a, xls, xlsx +1
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive, Data from The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans [Dataset]. http://doi.org/10.7937/K9/TCIA.2015.LO9QL9SX
    Explore at:
    xlsx, xls, n/a, xml and zip, dicomAvailable download formats
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Sep 21, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions. It is a web-accessible international resource for development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process.

    Seven academic centers and eight medical imaging companies collaborated to create this data set which contains 1018 cases. Each subject includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories ("nodule > or =3 mm," "nodule <3 mm," and "non-nodule > or =3 mm"). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion. The goal of this process was to identify as completely as possible all lung nodules in each CT scan without requiring forced consensus.

    Note : The TCIA team strongly encourages users to review pylidc and the Standardized representation of the TCIA LIDC-IDRI annotations using DICOM (DICOM-LIDC-IDRI-Nodules) of the annotations/segmentations included in this dataset before developing custom tools to analyze the XML version.

  20. OCR image data of Japanese documents

    • kaggle.com
    zip
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Appen Limited (2025). OCR image data of Japanese documents [Dataset]. https://www.kaggle.com/datasets/appenlimited/ocr-image-data-of-japanese-documents
    Explore at:
    zip(20730318 bytes)Available download formats
    Dataset updated
    Jun 25, 2025
    Authors
    Appen Limited
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    如需完整数据集或了解更多,请发邮件至commercialproduct@appen.com For the complete dataset or more, please email commercialproduct@appen.com

    The dataset product can be used in many AI pilot projects and supplement production models with other data. It can improve the model performance and be cost-effectiveness. Dataset is an excellent solution when time and budget is limited. Appen database team can provide a large number of database products, such as ASR, TTS, video, text, image. At the same time, we are also constantly building new datasets to expand resources. Database team always strive to deliver as soon as possible to meet the needs of the global customers. This OCR database consists of image data in Korean, Vietnamese, Spanish, French, Thai, Japanese, Indonesian, Tamil, and Burmese, as well as handwritten images in both Chinese and English (including annotations). On average, each image contains 30 to 40 frames, including texts in various languages, special characters, and numbers. The accuracy rate requirement is over 99% (both position and content are correct). The images include the following categories: - RECEIPT - IDCARD - TRADE - TABLE - WHITEBOARD - NEWSPAPER - THESIS - CARD - NOTE - CONTRACT - BOOKCONTENT - HANDWRITING

    1. Data Specification Usage Cases Image label recognition training Collecting device Mobile phone / Camera Collecting environment Multiple lights environments

    Database Name Category Quantity

    Korean Document OCR Images

    RECEIPT 1500 IDCARD 500 TRADE 1012 TABLE 512 WHITEBOARD 500 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 499 CONTRACT 501 BOOKCONTENT 500 TOTAL 7,024

    Vietnamese Document OCR Images

    RECEIPT 337 IDCARD 100 TRADE 227 TABLE 100 WHITEBOARD 111 NEWSPAPER 100 THESIS 100 CARD 100 NOTE 100 CONTRACT 105 BOOKCONTENT 700 TOTAL 2,080

    Spanish Document OCR Images

    RECEIPT 1500 IDCARD 500 TRADE 1000 TABLE 500 WHITEBOARD 500 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7000

    French Document OCR Images

    RECEIPT 300 IDCARD 100 TRADE 200 TABLE 100 WHITEBOARD 100 NEWSPAPER 100 THESIS 103 CARD 100 NOTE 100 CONTRACT 100 BOOKCONTENT 700 TOTAL 2003

    Thai Document OCR Images

    RECEIPT 1500 IDCARD 500 TRADE 1000 TABLE 537 WHITEBOARD 500 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7037

    Japanese Document OCR Images

    RECEIPT 1586 IDCARD 500 TRADE 1000 TABLE 552 WHITEBOARD 500 NEWSPAPER 500 THESIS 509 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7147

    Indonesian Document OCR Images

    RECEIPT 1500 IDCARD 500 TRADE 1003 TABLE 500 WHITEBOARD 501 NEWSPAPER 502 THESIS 500 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7006

    Tamil Document OCR Images

    RECEIPT 356 IDCARD 98 TRADE 475 TABLE 532 WHITEBOARD 501 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 501 CONTRACT 500 BOOKCONTENT 500 TOTAL 4963

    Burmese Document OCR Images

    RECEIPT 300 IDCARD 100 TRADE 200 TABLE 117 WHITEBOARD 110 NEWSPAPER 108 THESIS 102 CARD 100 NOTE 120 CONTRACT 100 BOOKCONTENT 761 TOTAL 2118

    English Handwritten Datasets HANDWRITING 2278 Chinese Handwritten Datasets HANDWRITING 11118

    1. Information provided by database
    2. Data Format:. JPG
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Wesley Calixto; Solange Nogueira; Fernanda Luz; Thiago Fellipe Ortiz de Camargo (2025). Myocardial perfusion scintigraphy image database [Dataset]. http://doi.org/10.13026/ce2z-dw74

Myocardial perfusion scintigraphy image database

Explore at:
Dataset updated
Sep 9, 2025
Authors
Wesley Calixto; Solange Nogueira; Fernanda Luz; Thiago Fellipe Ortiz de Camargo
License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Description

This database provides a collection of myocardial perfusion scintigraphy images in DICOM format with all metadata and segmentations (masks) in NIfTI format. The images were obtained from patients undergoing scintigraphy examinations to investigate cardiac conditions such as ischemia and myocardial infarction. The dataset encompasses a diversity of clinical cases, including various perfusion patterns and underlying cardiac conditions. All images have been properly anonymized, and the age range of the patients is from 20 to 90 years. This database represents a valuable source of information for researchers and healthcare professionals interested in the analysis and diagnosis of cardiac diseases. Moreover, it serves as a foundation for the development and validation of image processing algorithms and artificial intelligence techniques applied to cardiovascular medicine. Available for free on the PhysioNet platform, its aim is to promote collaboration and advance research in nuclear cardiology and cardiovascular medicine, while ensuring the replicability of studies.

Search
Clear search
Close search
Google apps
Main menu