Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This database provides a collection of myocardial perfusion scintigraphy images in DICOM format with all metadata and segmentations (masks) in NIfTI format. The images were obtained from patients undergoing scintigraphy examinations to investigate cardiac conditions such as ischemia and myocardial infarction. The dataset encompasses a diversity of clinical cases, including various perfusion patterns and underlying cardiac conditions. All images have been properly anonymized, and the age range of the patients is from 20 to 90 years. This database represents a valuable source of information for researchers and healthcare professionals interested in the analysis and diagnosis of cardiac diseases. Moreover, it serves as a foundation for the development and validation of image processing algorithms and artificial intelligence techniques applied to cardiovascular medicine. Available for free on the PhysioNet platform, its aim is to promote collaboration and advance research in nuclear cardiology and cardiovascular medicine, while ensuring the replicability of studies.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains key characteristics about the data described in the Data Descriptor PIC, a paediatric-specific intensive care database. Contents:
1. human readable metadata summary table in CSV format
2. machine readable metadata file in JSON format
Versioning Note:Version 2 was generated when the metadata format was updated from JSON to JSON-LD. This was an automatic process that changed only the format, not the contents, of the metadata.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Images, Maps, GIS images
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
The dataset presents the collection of a diverse electrocardiogram (ECG) database for testing and evaluating ECG digitization solutions. The Powerful Medical ECG image database was curated using 100 ECG waveforms selected from the PTB-XL Digital Waveform Database and various images generated from the base waveforms with varying lead visibility and real-world paper deformations, including the use of different mobile phones, bends, crumbles, scans, and photos of computer screens with ECGs. The ECG waveforms were augmented using various techniques, including changes in contrast, brightness, perspective transformation, rotation, image blur, JPEG compression, and resolution change. This extensive approach yielded 6,000 unique entries, which provides a wide range of data variance and extreme cases to evaluate the limitations of ECG digitization solutions and improve their performance, and serves as a benchmark to evaluate ECG digitization solutions.
PM-ECG-ID database contains electrocardiogram (ECG) images and their corresponding ECG information. The data records are organized in a hierarchical folder structure, which includes metadata, waveform data, and visual data folders. The contents of each folder are described below:
Facebook
TwitterThe ability to correctly and consistently identify sea turtles over time was evaluated using digital imagery of the turtles dorsal and side views of their heads and dorsal views of their carapaces
Facebook
TwitterWhile art is omnipresent in human history, the neural mechanisms of how we perceive, value and differentiate art has only begun to be explored. Functional magnetic resonance imaging (fMRI) studies suggested that art acts as secondary reward, involving brain activity in the ventral striatum and prefrontal cortices similar to primary rewards such as food. However, potential similarities or unique characteristics of art-related neuroscience (or neuroesthetics) remain elusive, also because of a lack of adequate experimental tools: the available collections of art stimuli often lack standard image definitions and normative ratings. Therefore, we here provide a large set of well-characterized, novel art images for use as visual stimuli in psychological and neuroimaging research. The stimuli were created using a deep learning algorithm that applied different styles of popular paintings (based on artists such as Klimt or Hundertwasser) on ordinary animal, plant and object images which were drawn from established visual stimuli databases. The novel stimuli represent mundane items with artistic properties with proposed reduced dimensionality and complexity compared to paintings. In total, 2,332 novel stimuli are available open access as “art.pics” database at https://osf.io/BTWNQ/ with standard image characteristics that are comparable to other common visual stimuli material in terms of size, variable color distribution, complexity, intensity and valence, measured by image software analysis and by ratings derived from a human experimental validation study [n = 1,296 (684f), age 30.2 ± 8.8 y.o.]. The experimental validation study further showed that the art.pics elicit a broad and significantly different variation in subjective value ratings (i.e., liking and wanting) as well as in recognizability, arousal and valence across different art styles and categories. Researchers are encouraged to study the perception, processing and valuation of art images based on the art.pics database which also enables real reward remuneration of the rated stimuli (as art prints) and a direct comparison to other rewards from e.g., food or money.Key Messages: We provide an open access, validated and large set of novel stimuli (n = 2,332) of standardized art images including normative rating data to be used for experimental research. Reward remuneration in experimental settings can be easily implemented for the art.pics by e.g., handing out the stimuli to the participants (as print on premium paper or in a digital format), as done in the presented validation task. Experimental validation showed that the art.pics’ images elicit a broad and significantly different variation in subjective value ratings (i.e., liking, wanting) across different art styles and categories, while size, color and complexity characteristics remained comparable to other visual stimuli databases.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fluorodeoxyglucose Positron Emission Tomography (FDG-PET) is currently one of the powerful tools for the clinical diagnosis of dementia such as Alzheimer's Disease (AD). Meanwhile, MR imaging, being non-radioactive and having high contrast resolution, is highly accessible in clinical settings. Therefore, this dataset intends to use FDG-PET images as the Ground Truth for evaluating AD, for the development of predicting AD patients using MR images. This dataset includes an AD group and a control group (Healthy Group). The determination of the image diagnosis group is made by neurology specialists based on comprehensive judgment using clinically relevant information. Each set of data contains one set of MRI T1 images and one set of FDG-PET images. The image format is DICOM, and all images have been anonymized. To obtain the clinical information and related documentation, please contact the administrator.
Facebook
TwitterPhotographs and other visual media provide valuable pre- and post-event data for natural hazards. Research, mitigation, and forecasting rely on visual data for post-analysis, inundation mapping and historic records. Instrumental data only reveal a portion of the whole story; photographs explicitly illustrate the physical and societal impacts from an event. This resource provides high-resolution geologic and damage photographs from natural hazards events, including earthquakes, tsunamis, slides, volcanic eruptions and geologic movement (faults, creep, subsidence and flows). The earliest images date back to 1867. Each event also links to NCEI's Global Historical hazards databases, which provide details for these events.
Facebook
Twitterhttps://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Multimodal data has emerged as a promising tool to integrate diverse information, offering a more comprehensive perspective. This study introduces the HistologyHSI-BC-Recurrence Database, the first publicly accessible multimodal dataset designed to advance distant recurrence prediction in breast cancer (BC). The dataset comprises 47 histopathological whole-slide images (WSIs), 677 hyperspectral (HS) images, and demographic and clinical data from 47 BC patients, of whom 22 (47%) experienced distant recurrence over a 12-year follow-up. Histopathological slides were digitized using a WSI scanner and annotated by expert pathologists, while HS images were acquired with a bright-field microscope and a HS camera. This dataset provides a promising resource for BC recurrence prediction and personalized treatment strategies by integrating histopathological WSIs, HS images, and demographic and clinical data.
Breast cancer (BC) is the most common cancer in women and a leading cause of cancer-related deaths, with metastasis being the main cause of death. About one-third of BC patients develop metastasis, which can be regional or distant, and survival rates drop dramatically with distant metastasis. Despite progress in identifying biomarkers associated with metastasis, there is no consensus for their clinical use. Imaging methods, such as X-ray, ultrasound, and magnetic resonance imaging, play a key role in detection, but histopathological diagnosis is crucial for treatment decisions. Digital pathology, utilizing whole-slide images (WSIs) and machine learning, is transforming BC diagnostics, integrating clinical data to improve prognostic accuracy. Hyperspectral imaging (HSI), which combines spatial and spectral information, is emerging as a promising tool for BC detection and prognosis. However, high-quality datasets integrating WSIs, HS images, and clinical data are scarce. This study introduces the HistologyHSI-BC-Recurrence Database, which includes WSIs, HS images, and clinical data from 47 BC patients, aiming to predict recurrence due to distant metastasis. This multimodal dataset will help develop predictive models, enhance diagnostic accuracy, and support research in computational pathology, ultimately improving personalized treatment strategies for BC.
This dataset includes data from 47 patients diagnosed with invasive ductal carcinoma (IDC) between 2006 and 2015. Of these, 22 patients experienced recurrence due to distant metastasis within 12 years, while 25 patients did not. Inclusion criteria required a diagnosis of IDC, representative surgical biopsy, complete clinical and pathological data, and patient consent. Exclusion criteria involved receiving neoadjuvant treatment, regional recurrence rather than in distant organs, presence of distant metastases at diagnosis, or failure to meet inclusion criteria.
Paraffin blocks of primary tumor biopsies with sufficient representative IDC tissue were obtained from the Biobank IISPV-Node Tortosa, Tarragona, Spain. The samples were processed in the Pathology Department, where 2 µm-thick sections were prepared from each paraffin block and stained according to the standard H&E staining protocol. The slides were sealed with coverslips using dibutylphthalate polystyrene xylene (DPX) mounting medium for subsequent digitization and HS microscopic image acquisition. The H&E-stained slides were digitized with a WSI scanner (Pannoramic 250 Flash III, 3DHISTECH Ltd., Budapest, Hungary) at 20× magnification (0.2433 µm/pixel) using MRXS image format.
The data process involved extracting information from clinical records, including demographic and clinical information (please refer to the HistologyHSI-BC-Recurrence-Clinical-Standardized-DataDictionary.xlsx)
The HS images were captured using a Hyperspec® VNIR A-Series pushbroom camera, which scans samples spatially and captures spectral data across 400-1,000 nm. The camera is paired with an Olympus BX-53 microscope and a scanning stage that ensures precise sample alignment. Calibration of the HS images is crucial to adjust for sensor response, light transmission, and source variation, achieved by normalizing pixel values using white and dark references. The system also generates synthetic RGB images for easier visualization of the data. In-house software facilitates sample navigation, synchronizes camera and microscope stage, and processes the data by removing noisy bands and generating calibrated cubes.
WSIs were visualized using QuPath and anonymized with SlideMaster software. The quality of the histopathological slides was verified by pathologists, ensuring no artifacts were present due to tissue preparation or digitization. Pathologists manually annotated the images to differentiate between IDC, healthy tissue, and ductal carcinoma in situ (DCIS) using a color scheme (blue for IDC, green for healthy tissue, and red for DCIS). Annotations were initially made by one pathologist and then validated through a pairwise review with a second pathologist to ensure consistency and minimize inter-observer variability. Furthermore, regions of interest (ROIs) within these tissue types were identified and marked by yellow lines, for further HS imaging analysis.
The database is divided into three main components:
HSI data is typically stored in specialized formats like .hdr files paired with .dat or .raw files, representing a multidimensional data cube. Python and MATLAB are usually employed for processing these data. See the External Resources section below for example code. First, calibration is essential, followed by optional processing like spectral dimensionality reduction to reduce noise and computational costs (e.g., reducing 826 spectral bands to 275 by averaging neighboring bands). Normalization can also be performed when needed, scaling data to a range or adjusting to have a mean of 0 and standard deviation of 1. Additionally, removing the sample background, typically the white areas, is recommended for more accurate analysis.
The authors suggest using QuPath software to open and analyze WSIs (MRXS format) and annotations (GeoJSON format). WSIs can be loaded via drag and drop or through the "File/Open" option. Annotations for tissue compartments (IDC, healthy, DCIS) and ROIs (yellow rectangles for HS capture) should be imported as GeoJSON files.
Facebook
TwitterThe OPTIMAM Mammography Image Database is a sharable resource with processed and unprocessed mammography images from United Kingdom breast screening centers, with annotated cancers and clinical details.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
An open source Optical Coherence Tomography Image Database containing different retinal OCT images with different pathological conditions. Please use the following citation if you use the database: Peyman Gholami, Priyanka Roy, Mohana Kuppuswamy Parthasarathy, Vasudevan Lakshminarayanan, "OCTID: Optical Coherence Tomography Image Database", arXiv preprint arXiv:1812.07056, (2018). For more information and details about the database see: https://arxiv.org/abs/1812.07056
Facebook
TwitterThis is a subset of the Zenodo-ML Dinosaur Dataset [Github] that has been converted to small png files and organized in folders by the language so you can jump right in to using machine learning methods that assume image input.
Included are .tar.gz files, each named based on a file extension, and when extracted, will produce a folder of the same name.
tree -L 1
.
├── c
├── cc
├── cpp
├── cs
├── css
├── csv
├── cxx
├── data
├── f90
├── go
├── html
├── java
├── js
├── json
├── m
├── map
├── md
├── txt
└── xml
And we can peep inside a (somewhat smaller) of the set to see that the subfolders are zenodo identifiers. A zenodo identifier corresponds to a single Github repository, so it means that the png files produced are chunks of code of the extension type from a particular repository.
$ tree map -L 1
map
├── 1001104
├── 1001659
├── 1001793
├── 1008839
├── 1009700
├── 1033697
├── 1034342
...
├── 836482
├── 838329
├── 838961
├── 840877
├── 840881
├── 844050
├── 845960
├── 848163
├── 888395
├── 891478
└── 893858
154 directories, 0 files
Within each folder (zenodo id) the files are prefixed by the zenodo id, followed by the index into the original image set array that is provided with the full dinosaur dataset archive.
$ tree m/891531/ -L 1
m/891531/
├── 891531_0.png
├── 891531_10.png
├── 891531_11.png
├── 891531_12.png
├── 891531_13.png
├── 891531_14.png
├── 891531_15.png
├── 891531_16.png
├── 891531_17.png
├── 891531_18.png
├── 891531_19.png
├── 891531_1.png
├── 891531_20.png
├── 891531_21.png
├── 891531_22.png
├── 891531_23.png
├── 891531_24.png
├── 891531_25.png
├── 891531_26.png
├── 891531_27.png
├── 891531_28.png
├── 891531_29.png
├── 891531_2.png
├── 891531_30.png
├── 891531_3.png
├── 891531_4.png
├── 891531_5.png
├── 891531_6.png
├── 891531_7.png
├── 891531_8.png
└── 891531_9.png
0 directories, 31 files
So what's the difference?
The difference is that these files are organized by extension type, and provided as actual png images. The original data is provided as numpy data frames, and is organized by zenodo ID. Both are useful for different things - this particular version is cool because we can actually see what a code image looks like.
How many images total?
We can count the number of total images:
find "." -type f -name *.png | wc -l
3,026,993
The script to create the dataset is provided here. Essentially, we start with the top extensions as identified by this work (excluding actual images files) and then write each 80x80 image to an actual png image, organizing by extension then zenodo id (as shown above).
I tested a few methods to write the single channel 80x80 data frames as png images, and wound up liking cv2's imwrite function because it would save and then load the exact same content.
import cv2
cv2.imwrite(image_path, image)
Given the above, it's pretty easy to load an image! Here is an example using scipy, and then for newer Python (if you get a deprecation message) using imageio.
image_path = '/tmp/data1/data/csv/1009185/1009185_0.png'
from imageio import imread
image = imread(image_path)
array([[116, 105, 109, ..., 32, 32, 32],
[ 48, 44, 48, ..., 32, 32, 32],
[ 48, 46, 49, ..., 32, 32, 32],
...,
[ 32, 32, 32, ..., 32, 32, 32],
[ 32, 32, 32, ..., 32, 32, 32],
[ 32, 32, 32, ..., 32, 32, 32]], dtype=uint8)
image.shape
(80,80)
# Deprecated
from scipy import misc
misc.imread(image_path)
Image([[116, 105, 109, ..., 32, 32, 32],
[ 48, 44, 48, ..., 32, 32, 32],
[ 48, 46, 49, ..., 32, 32, 32],
...,
[ 32, 32, 32, ..., 32, 32, 32],
[ 32, 32, 32, ..., 32, 32, 32],
[ 32, 32, 32, ..., 32, 32, 32]], dtype=uint8)
Remember that the values in the data are characters that have been converted to ordinal. Can you guess what 32 is?
ord(' ')
32
# And thus if you wanted to convert it back...
chr(32)
So how t...
Facebook
TwitterImage and biometric data were collected for 22 species of fish from Great Lakes Tributaries in Michigan and Ohio, and the Illinois River for the purpose of developing a fish identification classifier. Data consists of a comma delimited spreadsheet that identifies image file names and associated fish identification number, common name, species code, family name, genus, and species, date collected, river from which each fish was collected, location of sampling, fish fork length in millimeters, girth in millimeters, weight in kilograms, and personnel involved with image collection. Biometric data are saved as .csv comma delimited format and image files are saved as .png file type.
Facebook
TwitterData from this project focuses on the evaluation of breeding lines. Significant progress was made in advancing breeding populations directed towards release of improved varieties in Tanzania. Thirty promising F4:7, 1st generation 2014 PIC (Phaseolus Improvement Cooperative) and ~100 F4:6, 2nd generation 2015 PIC breeding lines were selected. In addition, ~300 F4:5, 3rd generation 2016 PIC single plant selections were completed in Arusha and Mbeya. These breeding lines, derived from 109 PIC populations specifically developed to combine abiotic and biotic stress tolerance, showed superior agronomic potential compared with checks and local landraces. The diversity, scale, and potential of the material in the PIC breeding pipeline is invaluable and requires continued support to ensure the release of varieties that promise to increase the productivity of common bean in the E. African region. Data available includes databases, spreadsheets, and images related to the project. Resources in this dataset:Resource Title: Data Dictionary. File Name: ADP-1_DD.pdfResource Title: ADP-1 Database. File Name: ADP1-DB.zipResource Description: This file is a link to a draft version of the development and characterization of the common bean diversity panel (ADP) database in Microsoft Access. Preliminary information is provided in this database, while the full version is being prepared. In order to use the database you’ll need to download the complete file, extract it and open the MS access file. You must allow active content when opening the database for it to work properly. Downloaded on November 17, 2017.Resource Title: Anthracnose Screening of Andean Diversity Panel (ADP) . File Name: Anthracnose-screening-of-ADP.pdfResource Description: Approximately 230 ADP lines of the ADP were screened with 8 races of anthracnose under controlled conditions at Michigan State University. Dr. James Kelly has provided this valuable dataset for sharing in light of the Open Data policy of the US government. This dataset represents the first comprehensive screening of the ADP with a broad set of races of a specific pathogen.Resource Title: ARS - Feed the Future Shared Data . File Name: ARS-FtF-Data-Sharing.zipResource Description: The data provided herein is an early draft version of the data that has been generated by the ARS Feed-the-Future Grain Legumes Project that is focused on common bean research. Resource Title: PIC (Phaseolus Improvement Cooperative) Populations . File Name: PIC-breeding-populations.xlsxResource Description: The complete list of PIC breeding populations (Excel Format) PIC (Phaseolus Improvement Cooperative) populations are bulked populations for improvement of common bean in Feed the Future Countries, with a principal focus on sub-Saharan Africa. These populations are for distribution to collaborators, are segregating for key biotic and abiotic stress constraints, and can be used for selection and release of improved cultivars/germplasm. Many of these populations are derived from crosses between ADP landrances and cultivars from sub-Saharan Africa and other improved genotypes with key biotic or abiotic stress tolerance. Phenotypic and genotypic information related to the parents of the crosses can be found in the ADP Database.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The 'Dresden Image Database' comprises original JPEG images from 73 camera devices across 25 camera models. This dataset is primarily used for Source Camera Device and Model Identification, offering over 14,000 images captured under controlled conditions.
Copyright: "Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee."
Original Source (Not Working as on 28 June 2024): http://forensics.inf.tu-dresden.de/dresden_image_database/
Please Cite the corresponding paper "Gloe, T., & Böhme, R. (2010, March). The'Dresden Image Database'for benchmarking digital image forensics. In Proceedings of the 2010 ACM symposium on applied computing (pp. 1584-1590)."
Facebook
TwitterThe purpose of the SNF Study was to develop the techniques to make the link from biophysical measurements made on the ground to aircraft radiometric measurements and then to scale up to satellite observations. Therefore, satellite image data were acquired for the Superior National Forest study site. These data were selected from all the scenes available from Landsat 1 through 5 and SPOT platforms. Image data substantially contaminated by cloud cover or of poor radiometric quality was not acquired. Of the Landsat scenes, only one Thematic Mapper (TM) scene was acquired, the remainder were Multispectral Scanner (MSS) images. Some of the acquired image data had cloud cover in portions of the scene or other problems with the data. These problems and other comments about the images are summarized in the data set. This data set contains a listing of the scenes that passed inspection and were acquired and archived by Goddard Space Flight Center. Though these image data are no longer available from either the Goddard Space Flight Center or the ORNL DAAC, this data set has been included in the Superior National Forest data collection in order to document which satellite images were used during the project.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Ventricular septal defect is a common congenital heart disease. As the disease progresses, the likelihood of lung infection and heart failure increases, leading to prolonged hospital stays and an increased likelihood of complications such as nosocomial infections. We aimed to develop a nomogram for predicting hospital stays over 14 days in pediatric patients with ventricular septal defect and to evaluate the predictive power of the nomogram. We hope that nomogram can provide clinicians with more information to identify high-risk groups as soon as possible and give early treatment to reduce hospital stay and complications.Methods: The population of this study was pediatric patients with ventricular septal defect, and data were obtained from the Pediatric Intensive Care Database. The resulting event was a hospital stay longer than 14 days. Variables with a variance inflation factor (VIF) greater than 5 were excluded. Variables were selected using the least absolute shrinkage and selection operator (Lasso), and the selected variables were incorporated into logistic regression to construct a nomogram. The performance of the nomogram was assessed by using the area under the receiver operating characteristic curve (AUC), Decision Curve Analysis (DCA) and calibration curve. Finally, the importance of variables in the model is calculated based on the XGboost method.Results: A total of 705 patients with ventricular septal defect were included in the study. After screening with VIF and Lasso, the variables finally included in the statistical analysis include: Brain Natriuretic Peptide, bicarbonate, fibrinogen, urea, alanine aminotransferase, blood oxygen saturation, systolic blood pressure, respiratory rate, heart rate. The AUC values of nomogram in the training cohort and validation cohort were 0.812 and 0.736, respectively. The results of the calibration curve and DCA also indicated that the nomogram had good performance and good clinical application value.Conclusion: The nomogram established by BNP, bicarbonate, fibrinogen, urea, alanine aminotransferase, blood oxygen saturation, systolic blood pressure, respiratory rate, heart rate has good predictive performance and clinical applicability. The nomogram can effectively identify specific populations at risk for adverse outcomes.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Welcome to the Vintage Photo Restoration Collection, a unique dataset curated for enthusiasts and professionals in the field of image restoration and enhancement. This dataset comprises a diverse range of old photographs, offering a glimpse into the past while serving as a valuable resource for modern image processing techniques.
Content This collection contains [number of images] high-quality scans of vintage photographs. The images feature a variety of subjects, including portraits, landscapes, urban scenes, and everyday life from different eras. Each photo has been carefully digitized to preserve its original character while ensuring clarity for restoration work.
Potential Uses The primary aim of this dataset is to facilitate research and projects in areas such as:
This dataset offers a range of challenges for practitioners:
All images are provided in JPG
Facebook
Twitterhttps://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions. It is a web-accessible international resource for development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process.
Seven academic centers and eight medical imaging companies collaborated to create this data set which contains 1018 cases. Each subject includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories ("nodule > or =3 mm," "nodule <3 mm," and "non-nodule > or =3 mm"). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion. The goal of this process was to identify as completely as possible all lung nodules in each CT scan without requiring forced consensus.
Note : The TCIA team strongly encourages users to review pylidc and the Standardized representation of the TCIA LIDC-IDRI annotations using DICOM (DICOM-LIDC-IDRI-Nodules) of the annotations/segmentations included in this dataset before developing custom tools to analyze the XML version.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
如需完整数据集或了解更多,请发邮件至commercialproduct@appen.com For the complete dataset or more, please email commercialproduct@appen.com
The dataset product can be used in many AI pilot projects and supplement production models with other data. It can improve the model performance and be cost-effectiveness. Dataset is an excellent solution when time and budget is limited. Appen database team can provide a large number of database products, such as ASR, TTS, video, text, image. At the same time, we are also constantly building new datasets to expand resources. Database team always strive to deliver as soon as possible to meet the needs of the global customers. This OCR database consists of image data in Korean, Vietnamese, Spanish, French, Thai, Japanese, Indonesian, Tamil, and Burmese, as well as handwritten images in both Chinese and English (including annotations). On average, each image contains 30 to 40 frames, including texts in various languages, special characters, and numbers. The accuracy rate requirement is over 99% (both position and content are correct). The images include the following categories: - RECEIPT - IDCARD - TRADE - TABLE - WHITEBOARD - NEWSPAPER - THESIS - CARD - NOTE - CONTRACT - BOOKCONTENT - HANDWRITING
Database Name Category Quantity
RECEIPT 1500 IDCARD 500 TRADE 1012 TABLE 512 WHITEBOARD 500 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 499 CONTRACT 501 BOOKCONTENT 500 TOTAL 7,024
RECEIPT 337 IDCARD 100 TRADE 227 TABLE 100 WHITEBOARD 111 NEWSPAPER 100 THESIS 100 CARD 100 NOTE 100 CONTRACT 105 BOOKCONTENT 700 TOTAL 2,080
RECEIPT 1500 IDCARD 500 TRADE 1000 TABLE 500 WHITEBOARD 500 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7000
RECEIPT 300 IDCARD 100 TRADE 200 TABLE 100 WHITEBOARD 100 NEWSPAPER 100 THESIS 103 CARD 100 NOTE 100 CONTRACT 100 BOOKCONTENT 700 TOTAL 2003
RECEIPT 1500 IDCARD 500 TRADE 1000 TABLE 537 WHITEBOARD 500 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7037
RECEIPT 1586 IDCARD 500 TRADE 1000 TABLE 552 WHITEBOARD 500 NEWSPAPER 500 THESIS 509 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7147
RECEIPT 1500 IDCARD 500 TRADE 1003 TABLE 500 WHITEBOARD 501 NEWSPAPER 502 THESIS 500 CARD 500 NOTE 500 CONTRACT 500 BOOKCONTENT 500 TOTAL 7006
RECEIPT 356 IDCARD 98 TRADE 475 TABLE 532 WHITEBOARD 501 NEWSPAPER 500 THESIS 500 CARD 500 NOTE 501 CONTRACT 500 BOOKCONTENT 500 TOTAL 4963
RECEIPT 300 IDCARD 100 TRADE 200 TABLE 117 WHITEBOARD 110 NEWSPAPER 108 THESIS 102 CARD 100 NOTE 120 CONTRACT 100 BOOKCONTENT 761 TOTAL 2118
English Handwritten Datasets HANDWRITING 2278 Chinese Handwritten Datasets HANDWRITING 11118
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This database provides a collection of myocardial perfusion scintigraphy images in DICOM format with all metadata and segmentations (masks) in NIfTI format. The images were obtained from patients undergoing scintigraphy examinations to investigate cardiac conditions such as ischemia and myocardial infarction. The dataset encompasses a diversity of clinical cases, including various perfusion patterns and underlying cardiac conditions. All images have been properly anonymized, and the age range of the patients is from 20 to 90 years. This database represents a valuable source of information for researchers and healthcare professionals interested in the analysis and diagnosis of cardiac diseases. Moreover, it serves as a foundation for the development and validation of image processing algorithms and artificial intelligence techniques applied to cardiovascular medicine. Available for free on the PhysioNet platform, its aim is to promote collaboration and advance research in nuclear cardiology and cardiovascular medicine, while ensuring the replicability of studies.