13 datasets found
  1. h

    NIH-Chest-X-ray-dataset

    • huggingface.co
    • opendatalab.com
    Updated Nov 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristóbal Alcázar (2022). NIH-Chest-X-ray-dataset [Dataset]. https://huggingface.co/datasets/alkzar90/NIH-Chest-X-ray-dataset
    Explore at:
    Dataset updated
    Nov 4, 2022
    Authors
    Cristóbal Alcázar
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    The NIH Chest X-ray dataset consists of 100,000 de-identified images of chest x-rays. The images are in PNG format.

    The data is provided by the NIH Clinical Center and is available through the NIH download site: https://nihcc.app.box.com/v/ChestXray-NIHCC

  2. NIH Chest X ray 14 (224x224 resized)

    • kaggle.com
    zip
    Updated Jul 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khan Fashee Monowar (Sawrup) (2020). NIH Chest X ray 14 (224x224 resized) [Dataset]. https://www.kaggle.com/khanfashee/nih-chest-x-ray-14-224x224-resized
    Explore at:
    zip(2468882507 bytes)Available download formats
    Dataset updated
    Jul 8, 2020
    Authors
    Khan Fashee Monowar (Sawrup)
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    National Institutes of Health Chest X-Ray Dataset

    Chest X-ray exams are one of the most frequent and cost-effective medical imaging examinations available. However, clinical diagnosis of a chest X-ray can be challenging and sometimes more difficult than diagnosis via chest CT imaging. The lack of large publicly available datasets with annotations means it is still very difficult, if not impossible, to achieve clinically relevant computer-aided detection and diagnosis (CAD) in real world medical sites with chest X-rays. One major hurdle in creating large X-ray image datasets is the lack resources for labeling so many images. Prior to the release of this dataset, Openi was the largest publicly available source of chest X-ray images with 4,143 images available.

    This NIH Chest X-ray Dataset is comprised of 112,120 X-ray images with disease labels from 30,805 unique patients. To create these labels, the authors used Natural Language Processing to text-mine disease classifications from the associated radiological reports. The labels are expected to be >90% accurate and suitable for weakly-supervised learning. The original radiology reports are not publicly available but you can find more details on the labeling process in this Open Access paper: "ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases." (Wang et al.)

    Data limitations:

    The image labels are NLP extracted so there could be some erroneous labels but the NLP labeling accuracy is estimated to be >90%.
    Very limited numbers of disease region bounding boxes (See BBoxlist2017.csv)
    Chest x-ray radiology reports are not anticipated to be publicly shared. Parties who use this public dataset are encouraged to share their “updated” image labels and/or new bounding boxes in their own studied later, maybe through manual annotation
    

    File contents

    Image format: 112,120 total images with size 1024 x 1024
    
    images_001.zip: Contains 4999 images
    
    images_002.zip: Contains 10,000 images
    
    images_003.zip: Contains 10,000 images
    
    images_004.zip: Contains 10,000 images
    
    images_005.zip: Contains 10,000 images
    
    images_006.zip: Contains 10,000 images
    
    images_007.zip: Contains 10,000 images
    
    images_008.zip: Contains 10,000 images
    
    images_009.zip: Contains 10,000 images
    
    images_010.zip: Contains 10,000 images
    
    images_011.zip: Contains 10,000 images
    
    images_012.zip: Contains 7,121 images
    
    README_ChestXray.pdf: Original README file
    
    BBoxlist2017.csv: Bounding box coordinates. Note: Start at x,y, extend horizontally w pixels, and vertically h pixels
      Image Index: File name
      Finding Label: Disease type (Class label)
      Bbox x
      Bbox y
      Bbox w
      Bbox h
    
    Dataentry2017.csv: Class labels and patient data for the entire dataset
      Image Index: File name
      Finding Labels: Disease type (Class label)
      Follow-up #
      Patient ID
      Patient Age
      Patient Gender
      View Position: X-ray orientation
      OriginalImageWidth
      OriginalImageHeight
      OriginalImagePixelSpacing_x
      OriginalImagePixelSpacing_y
    

    Class descriptions

    There are 15 classes (14 diseases, and one for "No findings"). Images can be classified as "No findings" or one or more disease classes:

    Atelectasis
    Consolidation
    Infiltration
    Pneumothorax
    Edema
    Emphysema
    Fibrosis
    Effusion
    Pneumonia
    Pleural_thickening
    Cardiomegaly
    Nodule Mass
    Hernia
    

    Full Dataset Content

    There are 12 zip files in total and range from ~2 gb to 4 gb in size. Additionally, we randomly sampled 5% of these images and created a smaller dataset for use in Kernels. The random sample contains 5606 X-ray images and class labels.

    Sample: sample.zip
    

    Modifications to original data

    Original TAR archives were converted to ZIP archives to be compatible with the Kaggle platform
    
    CSV headers slightly modified to be more explicit in comma separation and also to allow fields to be self-explanatory
    

    Citations

    Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. IEEE CVPR 2017, ChestX-ray8Hospital-ScaleChestCVPR2017_paper.pdf
    
    NIH News release: NIH Clinical Center provides one of the largest publicly available chest x-ray datasets to scientific community
    
    Original source files and documents: https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345
    
  3. a

    NIH Chest X-ray Dataset of 14 Common Thorax Disease Categories

    • academictorrents.com
    bittorrent
    Updated Oct 9, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health - Clinical Center (2017). NIH Chest X-ray Dataset of 14 Common Thorax Disease Categories [Dataset]. https://academictorrents.com/details/557481faacd824c83fbf57dcf7b6da9383b3235a
    Explore at:
    bittorrent(45089461497)Available download formats
    Dataset updated
    Oct 9, 2017
    Dataset authored and provided by
    National Institutes of Health - Clinical Center
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    ![]() (1, Atelectasis; 2, Cardiomegaly; 3, Effusion; 4, Infiltration; 5, Mass; 6, Nodule; 7, Pneumonia; 8, Pneumothorax; 9, Consolidation; 10, Edema; 11, Emphysema; 12, Fibrosis; 13, Pleural_Thickening; 14 Hernia) ### Background & Motivation: Chest X-ray exam is one of the most frequent and cost-effective medical imaging examination. However clinical diagnosis of chest X-ray can be challenging, and sometimes believed to be harder than diagnosis via chest CT imaging. Even some promising work have been reported in the past, and especially in recent deep learning work on Tuberculosis (TB) classification. To achieve clinically relevant computer-aided detection and diagnosis (CAD) in real world medical sites on all data settings of chest X-rays is still very difficult, if not impossible when only several thousands of images are employed for study. This is evident from [2] where the performance deep neural networks for thorax disease recognition is severely

  4. h

    NIH-Chest-Xray-14

    • huggingface.co
    Updated Jun 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bahaa Eldin Moustafa (2024). NIH-Chest-Xray-14 [Dataset]. https://huggingface.co/datasets/BahaaEldin0/NIH-Chest-Xray-14
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2024
    Authors
    Bahaa Eldin Moustafa
    Description

    BahaaEldin0/NIH-Chest-Xray-14 dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. f

    Performance of the models on the NIH ChestX-ray14 external dataset.

    • figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew G. Taylor; Clinton Mielke; John Mongan (2023). Performance of the models on the NIH ChestX-ray14 external dataset. [Dataset]. http://doi.org/10.1371/journal.pmed.1002697.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS Medicine
    Authors
    Andrew G. Taylor; Clinton Mielke; John Mongan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of the models on the NIH ChestX-ray14 external dataset.

  6. NIH Chest X ray 14 (224x224 resized) (Updated)

    • kaggle.com
    Updated Jun 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hani MO (2023). NIH Chest X ray 14 (224x224 resized) (Updated) [Dataset]. https://www.kaggle.com/datasets/hanimohamed/nih-chest-x-ray-14-224x224-resized-updated
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 25, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hani MO
    Description

    Dataset

    This dataset was created by Hani MO

    Contents

  7. a

    NIH Chest X-ray Dataset (Resized to 224x224)

    • academictorrents.com
    bittorrent
    Updated Nov 30, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health - Clinical Center (2019). NIH Chest X-ray Dataset (Resized to 224x224) [Dataset]. https://academictorrents.com/details/e615d3aebce373f1dc8bd9d11064da55bdadede0
    Explore at:
    bittorrent(2513363817)Available download formats
    Dataset updated
    Nov 30, 2019
    Dataset authored and provided by
    National Institutes of Health - Clinical Center
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    This dataset is resized versions of images to 224x224. ![]() (1, Atelectasis; 2, Cardiomegaly; 3, Effusion; 4, Infiltration; 5, Mass; 6, Nodule; 7, Pneumonia; 8, Pneumothorax; 9, Consolidation; 10, Edema; 11, Emphysema; 12, Fibrosis; 13, Pleural_Thickening; 14 Hernia) ### Background & Motivation: Chest X-ray exam is one of the most frequent and cost-effective medical imaging examination. However clinical diagnosis of chest X-ray can be challenging, and sometimes believed to be harder than diagnosis via chest CT imaging. Even some promising work have been reported in the past, and especially in recent deep learning work on Tuberculosis (TB) classification. To achieve clinically relevant computer-aided detection and diagnosis (CAD) in real world medical sites on all data settings of chest X-rays is still very difficult, if not impossible when only several thousands of images are employed for study. This is evident from [2] where the performance deep neu

  8. n

    NIH Chest X-ray Dataset - Dataset - 國網中心Dataset平台

    • scidm.nchc.org.tw
    Updated Oct 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). NIH Chest X-ray Dataset - Dataset - 國網中心Dataset平台 [Dataset]. https://scidm.nchc.org.tw/dataset/nih-chest-x-ray-dataset
    Explore at:
    Dataset updated
    Oct 10, 2020
    Description

    https://www.kaggle.com/nih-chest-xrays Chest X-ray exams are one of the most frequent and cost-effective medical imaging examinations available. However, clinical diagnosis of a chest X-ray can be challenging and sometimes more difficult than diagnosis via chest CT imaging. The lack of large publicly available datasets with annotations means it is still very difficult, if not impossible, to achieve clinically relevant computer-aided detection and diagnosis (CAD) in real world medical sites with chest X-rays. One major hurdle in creating large X-ray image datasets is the lack resources for labeling so many images. Prior to the release of this dataset, Openi was the largest publicly available source of chest X-ray images with 4,143 images available. This NIH Chest X-ray Dataset is comprised of 112,120 X-ray images with disease labels from 30,805 unique patients. To create these labels, the authors used Natural Language Processing to text-mine disease classifications from the associated radiological reports. The labels are expected to be >90% accurate and suitable for weakly-supervised learning. The original radiology reports are not publicly available but you can find more details on the labeling process in this Open Access paper: "ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases." (Wang et al.)

  9. h

    nih-subset-with-reasoning-generated

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thakor, nih-subset-with-reasoning-generated [Dataset]. https://huggingface.co/datasets/Manusinhh/nih-subset-with-reasoning-generated
    Explore at:
    Authors
    Thakor
    Description

    NIH Chest X-ray Reasoning Dataset (Subset + GPT-4.1 Mini Outputs)

    This repository contains a curated subset of the NIH ChestX-ray14 dataset and corresponding reasoning outputs generated using the OpenAI GPT-4.1 Mini API.

      📦 Contents
    

    nih-dataset-subset_generation_16k.ipynb: Jupyter notebook used to create a balanced 16k subset from the full NIH ChestX-ray14 dataset
    nih_balanced_filtered_16K.csv: CSV file containing metadata for the 16,000-image balanced subset… See the full description on the dataset page: https://huggingface.co/datasets/Manusinhh/nih-subset-with-reasoning-generated.

  10. h

    ChestX-Det

    • huggingface.co
    • opendatalab.com
    Updated Nov 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nathaniel Alberti (2024). ChestX-Det [Dataset]. https://huggingface.co/datasets/natealberti/ChestX-Det
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 20, 2024
    Authors
    Nathaniel Alberti
    Description

    ChestX-Det is a chest X-Ray dataset with instance-level annotations (boxes and masks). ChestX-Det is a subset of the public dataset NIH ChestX-ray14. It contains ~3500 images of 13 common disease categories labeled by three board-certified radiologists. I created segmentation masks for each image in the dataset. Each image is mapped to a unique RGB value. The repository from Deepwise AILab can be found at: https://github.com/Deepwise-AILab/ChestX-Det-Dataset. More information at:… See the full description on the dataset page: https://huggingface.co/datasets/natealberti/ChestX-Det.

  11. CXR dataset

    • kaggle.com
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Refle_x7 (2025). CXR dataset [Dataset]. https://www.kaggle.com/datasets/reflex7/cxr-data-set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 18, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Refle_x7
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The database comprises over 72,000 chest X-ray images collected from multiple sources, including NIH ChestX-ray14, COVIDx CXR-4, Shenzhen Chest X-ray Set, Montgomery County X-ray Set, Chest X-Ray Images (Pneumonia), and TB Portal. The images are classified into four primary classes: Normal, Tuberculosis (TB), COVID-19, and Pneumonia, covering a wide range of thoracic diseases. The dataset includes images in formats such as PNG, JPG, and DICOM, sourced from diverse clinical settings. It is a valuable resource for research and development in medical imaging, particularly for disease detection and classification tasks.

    Normal: 18,097 images Covid-19: 18,011 images Pneumonia: 18,187 images Tuberculosis: 18,003 images

    All datasets are publicly available . Researchers and developers are encouraged to review the licensing and usage terms for each dataset before downloading and using the images.

    This repository is a valuable resource for advancing the field of medical imaging and improving diagnostic accuracy for thoracic diseases. Let me know if you need further assistance or additional datasets!

  12. f

    Data from: Automated detection of moderate and large pneumothorax on frontal...

    • datasetcatalog.nlm.nih.gov
    Updated Nov 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taylor, Andrew G.; Mielke, Clinton; Mongan, John (2018). Automated detection of moderate and large pneumothorax on frontal chest X-rays using deep convolutional neural networks: A retrospective study [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000618208
    Explore at:
    Dataset updated
    Nov 20, 2018
    Authors
    Taylor, Andrew G.; Mielke, Clinton; Mongan, John
    Description

    BackgroundPneumothorax can precipitate a life-threatening emergency due to lung collapse and respiratory or circulatory distress. Pneumothorax is typically detected on chest X-ray; however, treatment is reliant on timely review of radiographs. Since current imaging volumes may result in long worklists of radiographs awaiting review, an automated method of prioritizing X-rays with pneumothorax may reduce time to treatment. Our objective was to create a large human-annotated dataset of chest X-rays containing pneumothorax and to train deep convolutional networks to screen for potentially emergent moderate or large pneumothorax at the time of image acquisition.Methods and findingsIn all, 13,292 frontal chest X-rays (3,107 with pneumothorax) were visually annotated by radiologists. This dataset was used to train and evaluate multiple network architectures. Images showing large- or moderate-sized pneumothorax were considered positive, and those with trace or no pneumothorax were considered negative. Images showing small pneumothorax were excluded from training. Using an internal validation set (n = 1,993), we selected the 2 top-performing models; these models were then evaluated on a held-out internal test set based on area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and positive predictive value (PPV). The final internal test was performed initially on a subset with small pneumothorax excluded (as in training; n = 1,701), then on the full test set (n = 1,990), with small pneumothorax included as positive. External evaluation was performed using the National Institutes of Health (NIH) ChestX-ray14 set, a public dataset labeled for chest pathology based on text reports. All images labeled with pneumothorax were considered positive, because the NIH set does not classify pneumothorax by size. In internal testing, our “high sensitivity model” produced a sensitivity of 0.84 (95% CI 0.78–0.90), specificity of 0.90 (95% CI 0.89–0.92), and AUC of 0.94 for the test subset with small pneumothorax excluded. Our “high specificity model” showed sensitivity of 0.80 (95% CI 0.72–0.86), specificity of 0.97 (95% CI 0.96–0.98), and AUC of 0.96 for this set. PPVs were 0.45 (95% CI 0.39–0.51) and 0.71 (95% CI 0.63–0.77), respectively. Internal testing on the full set showed expected decreased performance (sensitivity 0.55, specificity 0.90, and AUC 0.82 for high sensitivity model and sensitivity 0.45, specificity 0.97, and AUC 0.86 for high specificity model). External testing using the NIH dataset showed some further performance decline (sensitivity 0.28–0.49, specificity 0.85–0.97, and AUC 0.75 for both). Due to labeling differences between internal and external datasets, these findings represent a preliminary step towards external validation.ConclusionsWe trained automated classifiers to detect moderate and large pneumothorax in frontal chest X-rays at high levels of performance on held-out test data. These models may provide a high specificity screening solution to detect moderate or large pneumothorax on images collected when human review might be delayed, such as overnight. They are not intended for unsupervised diagnosis of all pneumothoraces, as many small pneumothoraces (and some larger ones) are not detected by the algorithm. Implementation studies are warranted to develop appropriate, effective clinician alerts for the potentially critical finding of pneumothorax, and to assess their impact on reducing time to treatment.

  13. D

    Chest Xray Masks and Labels Dataset

    • datasetninja.com
    Updated Nov 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan Jaeger; Sema Candemir; Sameer Antani (2023). Chest Xray Masks and Labels Dataset [Dataset]. https://datasetninja.com/chest-xray
    Explore at:
    Dataset updated
    Nov 3, 2023
    Dataset provided by
    Dataset Ninja
    Authors
    Stefan Jaeger; Sema Candemir; Sameer Antani
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The provided Chest Xray Masks and Labels dataset includes X-rays along with their corresponding masks. Notably, some masks may be absent, so cross-referencing images and masks is recommended. This dataset is derived from a modification of an original dataset, which combines Shenzhen and Montgomery County publicly available chest X-ray datasets.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Cristóbal Alcázar (2022). NIH-Chest-X-ray-dataset [Dataset]. https://huggingface.co/datasets/alkzar90/NIH-Chest-X-ray-dataset

NIH-Chest-X-ray-dataset

NIH-CXR14

alkzar90/NIH-Chest-X-ray-dataset

Explore at:
Dataset updated
Nov 4, 2022
Authors
Cristóbal Alcázar
License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

The NIH Chest X-ray dataset consists of 100,000 de-identified images of chest x-rays. The images are in PNG format.

The data is provided by the NIH Clinical Center and is available through the NIH download site: https://nihcc.app.box.com/v/ChestXray-NIHCC

Search
Clear search
Close search
Google apps
Main menu