The table Test image list is part of the dataset Chest X-ray8, available at https://redivis.com/datasets/612s-dx4rxexky. It contains 25596 rows across 1 variables.
https://nihcc.app.box.com/v/ChestXray-NIHCC/file/249502714403https://nihcc.app.box.com/v/ChestXray-NIHCC/file/249502714403
ChestX-ray8 is a medical imaging dataset which comprises 108,948 frontal-view X-ray images of 32,717 (collected from the year of 1992 to 2015) unique patients with the text-mined eight common disease labels, mined from the text radiological reports via NLP techniques.
The table Training image list is part of the dataset Chest X-ray8, available at https://redivis.com/datasets/612s-dx4rxexky. It contains 86524 rows across 1 variables.
This dataset contains a modified DataFrame of the NIH Chest X-Ray to facilitate easier loading data and managing the class labels. Each of the Class diagnostic categories is converted to their individual column and then encoded as 1 or 0 for the corresponding category's positive or negative example.
The actual dataset is kind of messier because image files are spread into multiple directories. That makes it harder to use ImageDataGenerator from 'keras.preprocessing'. So I also added another column named FilePath which contains the absolute path of the corresponding image files. It made loading the image super fast.
The original dataset can be found at NIH Chest X-Ray.
This dataset can not be used separately, I should be used with the NIH Chest X-Ray dateaset.
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
The NIH Chest X-ray dataset consists of 100,000 de-identified images of chest x-rays. The images are in PNG format.
The data is provided by the NIH Clinical Center and is available through the NIH download site: https://nihcc.app.box.com/v/ChestXray-NIHCC
The table Image metadata and classifications is part of the dataset Chest X-ray8, available at https://redivis.com/datasets/612s-dx4rxexky. It contains 112120 rows across 11 variables.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Chest X-ray exams are one of the most frequent and cost-effective medical imaging examinations available. However, clinical diagnosis of a chest X-ray can be challenging and sometimes more difficult than diagnosis via chest CT imaging. The lack of large publicly available datasets with annotations means it is still very difficult, if not impossible, to achieve clinically relevant computer-aided detection and diagnosis (CAD) in real world medical sites with chest X-rays. One major hurdle in creating large X-ray image datasets is the lack resources for labeling so many images. Prior to the release of this dataset, Openi was the largest publicly available source of chest X-ray images with 4,143 images available.
This NIH Chest X-ray Dataset is comprised of 112,120 X-ray images with disease labels from 30,805 unique patients. To create these labels, the authors used Natural Language Processing to text-mine disease classifications from the associated radiological reports. The labels are expected to be >90% accurate and suitable for weakly-supervised learning. The original radiology reports are not publicly available but you can find more details on the labeling process in this Open Access paper: "ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases." (Wang et al.)
The image labels are NLP extracted so there could be some erroneous labels but the NLP labeling accuracy is estimated to be >90%.
Very limited numbers of disease region bounding boxes (See BBoxlist2017.csv)
Chest x-ray radiology reports are not anticipated to be publicly shared. Parties who use this public dataset are encouraged to share their “updated” image labels and/or new bounding boxes in their own studied later, maybe through manual annotation
Image format: 112,120 total images with size 1024 x 1024
images_001.zip: Contains 4999 images
images_002.zip: Contains 10,000 images
images_003.zip: Contains 10,000 images
images_004.zip: Contains 10,000 images
images_005.zip: Contains 10,000 images
images_006.zip: Contains 10,000 images
images_007.zip: Contains 10,000 images
images_008.zip: Contains 10,000 images
images_009.zip: Contains 10,000 images
images_010.zip: Contains 10,000 images
images_011.zip: Contains 10,000 images
images_012.zip: Contains 7,121 images
README_ChestXray.pdf: Original README file
BBoxlist2017.csv: Bounding box coordinates. Note: Start at x,y, extend horizontally w pixels, and vertically h pixels
Image Index: File name
Finding Label: Disease type (Class label)
Bbox x
Bbox y
Bbox w
Bbox h
Dataentry2017.csv: Class labels and patient data for the entire dataset
Image Index: File name
Finding Labels: Disease type (Class label)
Follow-up #
Patient ID
Patient Age
Patient Gender
View Position: X-ray orientation
OriginalImageWidth
OriginalImageHeight
OriginalImagePixelSpacing_x
OriginalImagePixelSpacing_y
There are 15 classes (14 diseases, and one for "No findings"). Images can be classified as "No findings" or one or more disease classes:
Atelectasis
Consolidation
Infiltration
Pneumothorax
Edema
Emphysema
Fibrosis
Effusion
Pneumonia
Pleural_thickening
Cardiomegaly
Nodule Mass
Hernia
There are 12 zip files in total and range from ~2 gb to 4 gb in size. Additionally, we randomly sampled 5% of these images and created a smaller dataset for use in Kernels. The random sample contains 5606 X-ray images and class labels.
Sample: sample.zip
Original TAR archives were converted to ZIP archives to be compatible with the Kaggle platform
CSV headers slightly modified to be more explicit in comma separation and also to allow fields to be self-explanatory
Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. IEEE CVPR 2017, ChestX-ray8Hospital-ScaleChestCVPR2017_paper.pdf
NIH News release: NIH Clinical Center provides one of the largest publicly available chest x-ray datasets to scientific community
Original source files and documents: https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345
The table BBox List is part of the dataset Chest X-ray8, available at https://redivis.com/datasets/612s-dx4rxexky. It contains 984 rows across 6 variables.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The CheXmask Database presents a comprehensive, uniformly annotated collection of chest radiographs, constructed from five public databases: ChestX-ray8, Chexpert, MIMIC-CXR-JPG, Padchest and VinDr-CXR. The database aggregates 657,566 anatomical segmentation masks derived from images which have been processed using the HybridGNet model to ensure consistent, high-quality segmentation. To confirm the quality of the segmentations, we include in this database individual Reverse Classification Accuracy (RCA) scores for each of the segmentation masks. This dataset is intended to catalyze further innovation and refinement in the field of semantic chest X-ray analysis, offering a significant resource for researchers in the medical imaging domain.
The ChestX-ray8 dataset which contains 108,948 frontal-view X-ray images of 32,717 unique patients.
Each image in the data set contains multiple text-mined labels identifying 14 different pathological conditions. These in turn can be used by physicians to diagnose 8 different diseases. We will use this data to develop a single model that will provide binary classification predictions for each of the 14 labeled pathologies. In other words it will predict 'positive' or 'negative' for each of the pathologies. You can download the entire dataset for free here. (https://nihcc.app.box.com/v/ChestXray-NIHCC)
I have provided a ~1000 image subset of the images here The dataset includes a CSV file that provides the labels for each X-ray.
To make your job a bit easier, I have processed the labels for our small sample and generated three new files to get you started. These three files are:
train-small-new.csv: 875 images from our dataset to be used for training. valid-small-new.csv: 109 images from our dataset to be used for validation. test-small-new.csv: 420 images from our dataset to be used for testing. This dataset has been annotated by consensus among four different radiologists for 5 of our 14 pathologies:
Consolidation Edema Effusion Cardiomegaly Atelectasis
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
![]() (1, Atelectasis; 2, Cardiomegaly; 3, Effusion; 4, Infiltration; 5, Mass; 6, Nodule; 7, Pneumonia; 8, Pneumothorax; 9, Consolidation; 10, Edema; 11, Emphysema; 12, Fibrosis; 13, Pleural_Thickening; 14 Hernia) ### Background & Motivation: Chest X-ray exam is one of the most frequent and cost-effective medical imaging examination. However clinical diagnosis of chest X-ray can be challenging, and sometimes believed to be harder than diagnosis via chest CT imaging. Even some promising work have been reported in the past, and especially in recent deep learning work on Tuberculosis (TB) classification. To achieve clinically relevant computer-aided detection and diagnosis (CAD) in real world medical sites on all data settings of chest X-rays is still very difficult, if not impossible when only several thousands of images are employed for study. This is evident from [2] where the performance deep neural networks for thorax disease recognition is severely
This is an auto-generated index table corresponding to a folder of files in this dataset with the same name. This table can be used to extract a subset of files based on their metadata, which can then be used for further analysis. You can view the contents of specific files by navigating to the "cells" tab and clicking on an individual file_kd.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The NIH Chest X-ray Dataset provides 112,120 X-ray images from 30,805 unique patients, annotated for thoracic disease detection. Labels were derived using Natural Language Processing on radiology reports, achieving over 90% accuracy, making the dataset ideal for weakly-supervised learning, medical AI, and advanced chest imaging research.
Hospital-scale Chest Xray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset is provided for NODE21 public challenge. Node21 dataset consists of frontal chest radiographs with annotated bounding boxes around nodules. It consists of 4882 frontal chest radiographs, where 1134 CXR images (1476 nodules) are annotated with bounding boxes around nodules and the remaining 3748 images are free of nodules hence representing the negative class. The images in this set are from public datasets that allow us to remix and redistribute. They come from the following sources:
The annotations were provided by our chest radiologists. We provide both original and preprocessed versions of the dataset.
Further, for the generation track, we provide a public set of NODE21 CT patches. These are patches of nodules from CT scans, originate from the LUNA16 dataset [5][6] .
For more detailed descriptions of the data, please refer to the challenge website: NODE21
[1] Shiraishi, J., Katsuragawa, S., Ikezoe, J., Matsumoto, T., Kobayashi, T., Komatsu, K.i., Matsui, M., Fujita, H., Kodera, Y., Doi, K., 2000. Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists’ detection of pulmonary nodules. American Journal of Roentgenology 174, 71–74. doi:10.2214/ajr.174.1.1740071.
[2] Bustos, A., Pertusa, A., Salinas, J.M., de la Iglesia-Vaya, M., 2020. PadChest: ´ A large chest x-ray image dataset with multi-label annotated reports. Medical Image Analysis 66, 101797. doi:10.1016/j.media.2020.101797.
[3] Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M., 2017b. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases, in IEEE Conference on Computer Vision and Pattern Recognition, pp. 2097–2106. doi:10.1109/cvpr.2017.369.
[4] Demner-Fushman, D., Antani, S., Simpson, M., Thoma, G.R., 2012. Design and Development of a Multimodal Biomedical Information Retrieval System. Journal of Computing Science and Engineering 6, 168–177. doi:10.5626/JCSE.2012.6.2.168.
[5] Andrey Fedorov, Matthew Hancock, David Clunie, Mathias Brochhausen, Jonathan Bona, Justin Kirby, John Freymann, Steve Pieper, Hugo Aerts, Ron Kikinis1, Fred Prior, 2019. Standardized representation of the LIDC annotations using DICOM. The Cancer Imaging Archive. doi: 10.7937/TCIA.2018.H7UMFURQ
[6] Setio et al., Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images:: The LUNA16 challenge, Medical Image Analysis 42, doi:: 10.1016/j.media.2017.06.015
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contain lung x-ray image including:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15315323%2F8041ddd2485bfe9cdf2ba1f9d96bd7e5%2F6_Class_Img.jpg?generation=1741951756137022&alt=media" alt="">
The dataset we use is compiled from many reputable sources including: Dataset 1 [1]: This dataset includes four classes of diseases: COVID-19, viral pneumonia, bacterial pneumonia, and normal. It has multiple versions, and we are currently using the latest version (version 4). Previous studies, such as those by Hariri et al. [18] and Ahmad et al. [20], have also utilized earlier versions of this dataset. Dataset 2 [2]: This dataset is from the National Institutes of Health (NIH) Chest X-Ray Dataset, which contains over 100,000 chest X-ray images from over 30,000 patients. It includes 14 disease classes, including conditions like atelectasis, consolidation, and infiltration. For this study, we have selected 2,550 chest X-ray images specifically from the Emphysema class. Dataset 3 [3]: This is the COVQU dataset, which we have extended to include two additional classes: COVID-19 and viral pneumonia. This dataset has been widely used in previous studies by M.E.H. Chowdhury et al. [4] and Rahman T et al. [5], establishing its reputation as a reliable resource.
In addition, we also publish a modified dataset that aims to remove image regions that do not contain lungs (abdomen, arms, etc.).
References: [1] U. Sait, K. G. Lal, S. P. Prajapati, R. Bhaumik, T. Kumar, S. Shivakumar, K. Bhalla, Curated dataset for covid-19 posterior-anterior chest radiography images (x-rays)., Mendeley Data V4 (2022). doi:10.17632/9xkhgts2s6.4. [2] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, R. M. Summers, Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases (2017) 3462–3471. doi:10.1109/CVPR.2017.369. [3] A. M. Tahir, M. E. Chowdhury, A. Khandakar, T. Rahman, Y. Qiblawey, U. Khurshid, S. Kiranyaz, N. Ibtehaz, M. S. Rahman, S. Al-Maadeed,S. Mahmud, M. Ezeddin, K. Hameed, T. Hamid, Covid-19 infection localization and severity grading from chest x-ray images, Computers in Biology and Medicine 139 (2021) 105002. URL: https://www.sciencedirect.com/science/article/pii/S0010482521007964. doi:https://doi.org/10.1016/j.compbiomed.2021.105002. [4] M. E. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M. A. Kadir, Z. B. Mahbub, K. R. Islam, M. S. Khan, A. Iqbal, N. A. Emadi, M. B. I. Reaz, M. T. Islam, Can ai help in screening viral and covid-19 pneumonia?, IEEE Access 8 (2020) 132665–132676. doi:10.1109/ACCESS.2020.3010287. [5] T. Rahman, A. Khandakar, Y. Qiblawey, A. Tahir, S. Kiranyaz, S. B. A. Kashem, M. T. Islam, S. A. Maadeed, S. M. Zughaier, M. S. Khan, M. E. Chowdhury, Exploring the effect of image enhancement techniques on covid-19 detection using chest x-ray images, Computers in Biology and Medicine 132 (2021). doi:10.1016/j.compbiomed.2021.104319.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Radiologists and algorithm AUC with CIs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data repository for MedMNIST v1 is out of date! Please check the latest version of MedMNIST v2.
Abstract
We present MedMNIST, a collection of 10 pre-processed medical open datasets. MedMNIST is standardized to perform classification tasks on lightweight 28x28 images, which requires no background knowledge. Covering the primary data modalities in medical image analysis, it is diverse on data scale (from 100 to 100,000) and tasks (binary/multi-class, ordinal regression and multi-label). MedMNIST could be used for educational purpose, rapid prototyping, multi-modal machine learning or AutoML in medical image analysis. Moreover, MedMNIST Classification Decathlon is designed to benchmark AutoML algorithms on all 10 datasets; We have compared several baseline methods, including open-source or commercial AutoML tools. The datasets, evaluation code and baseline methods for MedMNIST are publicly available at https://medmnist.github.io/.
Please note that this dataset is NOT intended for clinical use.
We recommend our official code to download, parse and use the MedMNIST dataset:
pip install medmnist
Citation and Licenses
If you find this project useful, please cite our ISBI'21 paper as: Jiancheng Yang, Rui Shi, Bingbing Ni. "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis," arXiv preprint arXiv:2010.14925, 2020.
or using bibtex: @article{medmnist, title={MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis}, author={Yang, Jiancheng and Shi, Rui and Ni, Bingbing}, journal={arXiv preprint arXiv:2010.14925}, year={2020} }
Besides, please cite the corresponding paper if you use any subset of MedMNIST. Each subset uses the same license as that of the source dataset.
PathMNIST
Jakob Nikolas Kather, Johannes Krisam, et al., "Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study," PLOS Medicine, vol. 16, no. 1, pp. 1–22, 01 2019.
License: CC BY 4.0
ChestMNIST
Xiaosong Wang, Yifan Peng, et al., "Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases," in CVPR, 2017, pp. 3462–3471.
License: CC0 1.0
DermaMNIST
Philipp Tschandl, Cliff Rosendahl, and Harald Kittler, "The ham10000 dataset, a large collection of multisource dermatoscopic images of common pigmented skin lesions," Scientific data, vol. 5, pp. 180161, 2018.
Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, Harald Kittler, and Allan Halpern: “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)”, 2018; arXiv:1902.03368.
License: CC BY-NC 4.0
OCTMNIST/PneumoniaMNIST
Daniel S. Kermany, Michael Goldbaum, et al., "Identifying medical diagnoses and treatable diseases by image-based deep learning," Cell, vol. 172, no. 5, pp. 1122 – 1131.e9, 2018.
License: CC BY 4.0
RetinaMNIST
DeepDR Diabetic Retinopathy Image Dataset (DeepDRiD), "The 2nd diabetic retinopathy – grading and image quality estimation challenge," https://isbi.deepdr.org/data.html, 2020.
License: CC BY 4.0
BreastMNIST
Walid Al-Dhabyani, Mohammed Gomaa, Hussien Khaled, and Aly Fahmy, "Dataset of breast ultrasound images," Data in Brief, vol. 28, pp. 104863, 2020.
License: CC BY 4.0
OrganMNIST_{Axial,Coronal,Sagittal}
Patrick Bilic, Patrick Ferdinand Christ, et al., "The liver tumor segmentation benchmark (lits)," arXiv preprint arXiv:1901.04056, 2019.
Xuanang Xu, Fugen Zhou, et al., "Efficient multiple organ localization in ct image using 3d region proposal network," IEEE Transactions on Medical Imaging, vol. 38, no. 8, pp. 1885–1898, 2019.
License: CC BY 4.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
ChestXray8_activeLearning is a dataset for object detection tasks - it contains ChestXray8_activeLearning annotations for 1,852 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
This project develops a data pipeline where we build a CNN image classifier (in a python notebook) to identify occurrences of cardiomegaly in radiology scans. This is built off of the chest x-ray 8 dataset, a publicly available database of ~120,000 labeled radiology images.
Our approach is to divide the labeled data into test and validation tables, and then pull these collections of images into our notebook. We then run our image classification model against these labeled images. Finally, we test our model against a new collection of test images to understand the performance of the model.
This project is provided only as a proof of concept. The derived model is not intended in any way for clinical use. The model has substantial limitations in its accuracy, and additional tuning and training data should be provided in developing it further.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The table Test image list is part of the dataset Chest X-ray8, available at https://redivis.com/datasets/612s-dx4rxexky. It contains 25596 rows across 1 variables.