27 datasets found

Lung Disease Classification Dataset (100+ images)
kaggle.com
Updated Sep 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AyushTankha (2023). Lung Disease Classification Dataset (100+ images) [Dataset]. https://www.kaggle.com/datasets/ayushtankha/lung-disease-classification-dataset-100-images
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 1, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
AyushTankha
Description
This file does not have a description yet.

Covid-19_and_Pneumonia_X-Ray_Detector Aim of this project is to detect Covid-19 from X-ray and also able to differentitate Covid-19 from viral pneumonia and bacterial pneumonia. I have created a custom dataset that contains covid-19 x-ray images, viral pneumonia x-ray images, bacterial pneumonia x-ray iamges and normal person x-ray images.Each class contains 133 images.

Dataset I have used data from https://github.com/ieee8023/covid-chestxray-dataset and https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia.

0 - Covid-19

1 - Normal X-ray

2 - Viral Pneumonia X-ray

3 - Bacterial Pneumonia X-ray
R
Chest X Rays Dataset
universe.roboflow.com
zip
Updated Nov 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Traore (2022). Chest X Rays Dataset [Dataset]. https://universe.roboflow.com/mohamed-traore-2ekkp/chest-x-rays-qjmia/model/2
Explore at:
zipAvailable download formats
Dataset updated
Nov 4, 2022
Dataset authored and provided by
Mohamed Traore
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Pneumonia
Description
This classification dataset is from Kaggle and was uploaded to Kaggle by Paul Mooney.

It contains over 5,000 images of chest x-rays in two categories: "PNEUMONIA" and "NORMAL."

Version 1 contains the raw images, and only has the pre-processing feature of "Auto-Orient" applied to strip out EXIF data, and ensure all images are "right side up."

Version 2 contains the raw images with pre-processing features of "Auto-Orient" and Resize of 640 by 640 applied

Version 3 was trained with Roboflow's model architecture for classification datasets and contains the raw images with pre-processing features of "Auto-Orient" and Resize of 640 by 640 applied + augmentations:

Outputs per training example: 3

Shear: ±3° Horizontal, ±2° Vertical

Saturation: Between -5% and +5%

Brightness: Between -5% and +5%

Exposure: Between -5% and +5%

Below you will find the description provided on Kaggle:

Context

http://www.cell.com/cell/fulltext/S0092-8674(18)30154-5 https://i.imgur.com/jZqpV51.png" alt="Figure S6"> Figure S6. Illustrative Examples of Chest X-Rays in Patients with Pneumonia, Related to Figure 6 The normal chest X-ray (left panel) depicts clear lungs without any areas of abnormal opacification in the image. Bacterial pneumonia (middle) typically exhibits a focal lobar consolidation, in this case in the right upper lobe (white arrows), whereas viral pneumonia (right) manifests with a more diffuse ‘‘interstitial’’ pattern in both lungs. http://www.cell.com/cell/fulltext/S0092-8674(18)30154-5

Content

The dataset is organized into 3 folders (train, test, val) and contains subfolders for each image category (Pneumonia/Normal). There are 5,863 X-Ray images (JPEG) and 2 categories (Pneumonia/Normal).

Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients’ routine clinical care.

For the analysis of chest x-ray images, all chest radiographs were initially screened for quality control by removing all low quality or unreadable scans. The diagnoses for the images were then graded by two expert physicians before being cleared for training the AI system. In order to account for any grading errors, the evaluation set was also checked by a third expert.

Acknowledgements

Data: https://data.mendeley.com/datasets/rscbjbr9sj/2

License: CC BY 4.0

Citation: http://www.cell.com/cell/fulltext/S0092-8674(18)30154-5 https://i.imgur.com/8AUJkin.png" alt="citation - latest version (Kaggle)">

Inspiration

Automated methods to detect and classify human diseases from medical images.
NIH Chest X-rays Bbox version
kaggle.com
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huthayfa Hodeb (2024). NIH Chest X-rays Bbox version [Dataset]. https://www.kaggle.com/datasets/huthayfahodeb/nih-chest-x-rays-bbox-version
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 25, 2024
Dataset provided by
Kaggle
Authors
Huthayfa Hodeb
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
NIH Chest X-ray Dataset

National Institutes of Health Chest X-Ray Dataset

Chest X-ray exams are one of the most frequent and cost-effective medical imaging examinations available. However, clinical diagnosis of a chest X-ray can be challenging and sometimes more difficult than diagnosis via chest CT imaging. The lack of large publicly available datasets with annotations means it is still very difficult, if not impossible, to achieve clinically relevant computer-aided detection and diagnosis (CAD) in real world medical sites with chest X-rays. One major hurdle in creating large X-ray image datasets is the lack resources for labeling so many images. Prior to the release of this dataset, Openi was the largest publicly available source of chest X-ray images with 4,143 images available.

This NIH Chest X-ray Dataset is comprised of 112,120 X-ray images with disease labels from 30,805 unique patients. To create these labels, the authors used Natural Language Processing to text-mine disease classifications from the associated radiological reports. The labels are expected to be >90% accurate and suitable for weakly-supervised learning. The original radiology reports are not publicly available but you can find more details on the labeling process in this Open Access paper: "ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases." (Wang et al.)

Link to paper

Data limitations

The image labels are NLP extracted so there could be some erroneous labels but the NLP labeling accuracy is estimated to be >90%.

Very limited numbers of disease region bounding boxes (See BBox_list_2017.csv)

File contents

Image format: 880 total images with size 1024 x 1024

bbox_img: Contains 880 bbox images

README_ChestXray.pdf: Original README file

BBox_list_2017.csv: Bounding box coordinates. Note: Start at x,y, extend horizontally w pixels, and vertically h pixels

Image Index: File name

Finding Label: Disease type (Class label)

Bbox x

Bbox y

Bbox w

Bbox h

Data_entry_2017.csv: Class labels and patient data for the entire dataset

Image Index: File name

Finding Labels: Disease type (Class label)

Follow-up #

Patient ID

Patient Age

Patient Gender

View Position: X-ray orientation

OriginalImageWidth

OriginalImageHeight

OriginalImagePixelSpacing_x

OriginalImagePixelSpacing_y

label.csv: Class labels

tesnorlfow.csv: tensorflow version of the dataset

Class descriptions

There are 8 classes . Images can be classified as one or more disease classes: - Infiltrate - Atelectasis - Pneumonia - Cardiomegaly - Effusion - Pneumothorax - Mass - Nodule

Citations

Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. IEEE CVPR 2017, ChestX-ray8_Hospital-Scale_Chest_CVPR_2017_paper.pdf

NIH News release: NIH Clinical Center provides one of the largest publicly available chest x-ray datasets to scientific community -Original source files and documents: https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345

Acknowledgements

This work was supported by the Intramural Research Program of the NClinical Center (clinicalcenter.nih.gov) and National Library of Medicine (www.nlm.nih.gov).
Covid-19 X-Ray Classification Dataset
kaggle.com
Updated Jun 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sanidhya Goel (2024). Covid-19 X-Ray Classification Dataset [Dataset]. https://www.kaggle.com/datasets/sanidhyagoel/covid-19-x-ray-classification-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 13, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sanidhya Goel
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains 284 images of human chest X-ray belonging to 2 classes (Covid-19 Positive and Negative). The dataset has been divided into train and validation splits with 112 and 30 images respectively. The dataset shall be used to train deep learning models such as CNN.

Dataset Hierarchy

Dataset.zip ├── train │ ├── normal │ └── infected └── val ├── normal └── infected

Citations : - Covid-19 Positive Patient Chest X-ray images (Source : https://github.com/ieee8023/covid-chestxray-dataset/tree/master) - Kaggle Human Lung X-ray Image Dataset (Extracted only "Normal") (Source : https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia)
P
HDSNE Chest X-ray Dataset Dataset
paperswithcode.com
Updated Feb 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). HDSNE Chest X-ray Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/hdsne-chest-x-ray-dataset
Explore at:
Dataset updated
Feb 25, 2025
Description
Description:

👉 Download the dataset here

The continuous release of medical image databases, often featuring overlapping or identical categories, poses a significant challenge for the development of autonomous Computer-Aided Diagnostics (CAD) systems. These systems are essential for creating truly comprehensive medical diagnostics. However, one of the main obstacles lies in the frequent bulk release of datasets, which commonly suffer from two critical issues: image duplication and data corruption.

The Problem of Dataset Redundancy

Repeated releases of the same categories often fail to integrate or deduplicate similar images across databases, which can severely impact the effectiveness of machine learning models. Data duplication not only reduces the efficiency of learning models but also leads to overfitting, wastes computational resources, and increases the carbon footprint due to the energy required for training complex models.

Download Dataset

Proposed Solution: Global Data Aggregation Model

In response to these challenges, we introduce a global data aggregation model that intelligently combines data from six distinct and reputable medical imaging databases. Each database was carefully curated to ensure the elimination of redundancies while preserving data diversity. Two robust algorithms were employed:

Hash MD5 Algorithm: This algorithm generates unique hash values for each image, helping in the effective detection and elimination of duplicate images.

t-SNE Algorithm: This technique is used for dimensionality reduction, with a tunable perplexity parameter to ensure accurate representation of high-dimensional data.

Dataset Categories

The final dataset includes an equal number of samples from three key categories of chest X-ray images:

Normal Pneumonia COVID-19

This uniform distribution ensures that the dataset is balanced, avoiding class imbalance—a common issue that can skew results in medical image analysis.

Dataset Application & Model Evaluation

The dataset was applied to the Inception V3 pre-trained model, a leading convolutional neural network (CNN) architecture known for its excellence in image classification tasks. The evaluation was conduct using the following performance metrics:

Accuracy: An exceptional accuracy rate of 98.48% was achieve.

Precision, Recall, and F1-score: The dataset showed strong performance across these metrics, reducing both false positives and false negatives.

Statistical Validation: A t-test was conduct to validate the results, and the t-values and p-values confirm the statistical significance of the model’s performance.

Conclusion

The HDSNE Chest X-ray Dataset offers a novel and effective approach to data aggregation, tackling the issues of redundancy and data duplication that have long plagued the field of medical imaging. By maintaining a balance class distribution and eliminating unnecessary data, this dataset provides a cleaner and more efficient resource for training machine learning models.

This dataset is sourced from Kaggle.
n
NIH Chest X-ray Dataset - Dataset - 國網中心Dataset平台
scidm.nchc.org.tw
Updated Oct 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). NIH Chest X-ray Dataset - Dataset - 國網中心Dataset平台 [Dataset]. https://scidm.nchc.org.tw/dataset/nih-chest-x-ray-dataset
Explore at:
Dataset updated
Oct 10, 2020
Description
https://www.kaggle.com/nih-chest-xrays Chest X-ray exams are one of the most frequent and cost-effective medical imaging examinations available. However, clinical diagnosis of a chest X-ray can be challenging and sometimes more difficult than diagnosis via chest CT imaging. The lack of large publicly available datasets with annotations means it is still very difficult, if not impossible, to achieve clinically relevant computer-aided detection and diagnosis (CAD) in real world medical sites with chest X-rays. One major hurdle in creating large X-ray image datasets is the lack resources for labeling so many images. Prior to the release of this dataset, Openi was the largest publicly available source of chest X-ray images with 4,143 images available. This NIH Chest X-ray Dataset is comprised of 112,120 X-ray images with disease labels from 30,805 unique patients. To create these labels, the authors used Natural Language Processing to text-mine disease classifications from the associated radiological reports. The labels are expected to be >90% accurate and suitable for weakly-supervised learning. The original radiology reports are not publicly available but you can find more details on the labeling process in this Open Access paper: "ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases." (Wang et al.)
Lung Area Specific COVID-19 Xray Dataset
kaggle.com
Updated Mar 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
foram sanghavi (2021). Lung Area Specific COVID-19 Xray Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/2060331
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/2060331
Dataset updated
Mar 26, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
foram sanghavi
Description
In this dataset, the full radiographs are hand-cropped to obtain the lung area-specific radiographs. On using the lung area-specific dataset please cite the following paper: "Automated Detection of COVID-19 cases on Radiographs using Shape-dependent Fibonacci-p Patterns"

The full radiographs were collected from the Kaggle dataset (M. E. Chowdhury et al., "Can AI help in screening viral and COVID-19 pneumonia?," arXiv preprint arXiv:2003.13145, 2020), and from the COVIDGR dataset (S. Tabik et al., "COVIDGR dataset and COVID-SDNet methodology for predicting COVID-19 based on Chest X-Ray images," IEEE journal of biomedical and health informatics, vol. 24, no. 12, pp. 3595-3605, 2020.)
P
ChestX-ray14 Dataset
paperswithcode.com
Updated Feb 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaosong Wang; Yifan Peng; Le Lu; Zhiyong Lu; Mohammadhadi Bagheri; Ronald M. Summers (2021). ChestX-ray14 Dataset [Dataset]. https://paperswithcode.com/dataset/chestx-ray14
Explore at:
Dataset updated
Feb 19, 2021
Authors
Xiaosong Wang; Yifan Peng; Le Lu; Zhiyong Lu; Mohammadhadi Bagheri; Ronald M. Summers
Description
ChestX-ray14 is a medical imaging dataset which comprises 112,120 frontal-view X-ray images of 30,805 (collected from the year of 1992 to 2015) unique patients with the text-mined fourteen common disease labels, mined from the text radiological reports via NLP techniques. It expands on ChestX-ray8 by adding six additional thorax diseases: Edema, Emphysema, Fibrosis, Pleural Thickening and Hernia.
P
ChestX-ray8 Dataset
paperswithcode.com
opendatalab.com
Updated Feb 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaosong Wang; Yifan Peng; Le Lu; Zhiyong Lu; Mohammadhadi Bagheri; Ronald M. Summers (2021). ChestX-ray8 Dataset [Dataset]. https://paperswithcode.com/dataset/chestx-ray8
Explore at:
Dataset updated
Feb 9, 2021
Authors
Xiaosong Wang; Yifan Peng; Le Lu; Zhiyong Lu; Mohammadhadi Bagheri; Ronald M. Summers
Description
ChestX-ray8 is a medical imaging dataset which comprises 108,948 frontal-view X-ray images of 32,717 (collected from the year of 1992 to 2015) unique patients with the text-mined eight common disease labels, mined from the text radiological reports via NLP techniques.
Z
DECIMER Image classifier dataset
data.niaid.nih.gov
Updated Jul 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M. Isabel agea (2022). DECIMER Image classifier dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6670745
Explore at:
Dataset updated
Jul 9, 2022
Dataset authored and provided by
M. Isabel agea
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Images dataset divided into train (10905114 images), validation (2115528 images) and test (544946 images) folders containing a balanced number of images for two classes (chemical structures and non-chemical structures).

The chemical structures were generated using RanDepict to random picked compounds from the ChEMBL30 database and the COCONUT database.

The non-chemical structures were generated using Python or they were retrieved from several public datasets:

COCO dataset, MIT Places-205 dataset, Visual Genome dataset, Google Open labeled Images, MMU-OCR-21 (kaggle), HandWritten_Character (kaggle), CoronaHack -Chest X-Ray-dataset (kaggle), PANDAS Augmented Images (kaggle), Bacterial_Colony (kaggle), Ceylon Epigraphy Periods (kaggle), Chinese Calligraphy Styles by Calligraphers (kaggle), Graphs Dataset (kaggle), Function_Graphs Polynomial (kaggle), sketches (kaggle), Person Face Sketches (kaggle), Art Pictograms (kaggle), Russian handwritten letters (kaggle), Handwritten Russian Letters (kaggle), Covid-19 Misinformation Tweets Labeled Dataset (kaggle) and grapheme-imgs-224x224 (kaggle).

This data was used to build a CNN classification model using as a base model EfficienNetB0 and fine tuning it. The model is available on Github.
Pneumonia_chest_xray
kaggle.com
Updated Nov 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adnan Alaref (2024). Pneumonia_chest_xray [Dataset]. https://www.kaggle.com/datasets/adnanalaref/pneumonia-chest-xray/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 6, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Adnan Alaref
Description
Dataset

This dataset was created by Adnan Alaref

Released under Other (specified in description)

Contents
Chest X-Ray Worldwide Datasets
kaggle.com
Updated Dec 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Homayoon khadivi (2020). Chest X-Ray Worldwide Datasets [Dataset]. https://www.kaggle.com/homayoonkhadivi/chest-xray-worldwide-datasets/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Homayoon khadivi
Description
The ChestX-ray8 dataset which contains 108,948 frontal-view X-ray images of 32,717 unique patients.

Each image in the data set contains multiple text-mined labels identifying 14 different pathological conditions. These in turn can be used by physicians to diagnose 8 different diseases. We will use this data to develop a single model that will provide binary classification predictions for each of the 14 labeled pathologies. In other words it will predict 'positive' or 'negative' for each of the pathologies. You can download the entire dataset for free here. (https://nihcc.app.box.com/v/ChestXray-NIHCC)

I have provided a ~1000 image subset of the images here The dataset includes a CSV file that provides the labels for each X-ray.

To make your job a bit easier, I have processed the labels for our small sample and generated three new files to get you started. These three files are:

train-small-new.csv: 875 images from our dataset to be used for training. valid-small-new.csv: 109 images from our dataset to be used for validation. test-small-new.csv: 420 images from our dataset to be used for testing. This dataset has been annotated by consensus among four different radiologists for 5 of our 14 pathologies:

Consolidation Edema Effusion Cardiomegaly Atelectasis
Mini NIH XRay Dataset for Binary Classification
kaggle.com
Updated Jan 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abby Morgan (2023). Mini NIH XRay Dataset for Binary Classification [Dataset]. https://www.kaggle.com/datasets/abbymorgan/create-mini-xray-dataset-binary-classification-100
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 4, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abby Morgan
Description
The original full dataset contained 112,120 X-ray images with disease labels from 30,805 unique patients.

This notebook is modified from K Scott Mader's notebook here to create a mini chest x-ray dataset that is split 50:50 between normal and diseased images.

In my notebook I will use this dataset to test a pretrained model on a binary classification task (diseased vs. healthy xray), and then visualize which specific labels the model has the most trouble with.

Also, because disease classification is such an important task to get right, it's likely that any AI/ML medical classification task will include a human-in-the-loop. In this way, this process more closely resembles how this sort of ML would be used in the real world.

Note that the original notebook on which this one was based had two versions: Standard and Equalized. In this notebook we will be using the equalized version in order to save ourselves the extra step of performing CLAHE during the tensor transformations.

The goal of this notebook, as originally stated by Mader, is "to make a much easier to use mini-dataset out of the Chest X-Ray collection. The idea is to have something akin to MNIST or Fashion MNIST for medical images." In order to do this, we will preprocess, normalize, and scale down the images, and then save them into an HDF5 file with the corresponding tabular data.

Data limitations: The image labels are NLP extracted so there could be some erroneous labels but the NLP labeling accuracy is estimated to be >90%. Very limited numbers of disease region bounding boxes (See BBoxlist2017.csv) Chest x-ray radiology reports are not anticipated to be publicly shared. Parties who use this public dataset are encouraged to share their “updated” image labels and/or new bounding boxes in their own studied later, maybe through manual annotation

File Contents File is an HDF5 file of shape 200, 28. Main file contains nested HDF5 file of xray images with key images. Main HDF5 file keys are: - Image Index
- Finding Labels: list of disease labels
- Follow-up #
- Patient ID
- Patient Age
- Patient Gender: 'F'/'M'
- View Position: 'PA', 'AP' - OriginalImageWidth
- OriginalImageHeight
- OriginalImagePixelSpacing_x
- Normal: Binary; if Xray finding is 'Normal' - Atelectasis: Binary; if Xray finding includes 'Atelectasis' - Cardiomegaly: Binary; if Xray finding includes 'Cardiomegaly' - Consolidation: Binary; if Xray finding includes 'Consolidation' - Edema: Binary; if Xray finding includes 'Edema' - Effusion: Binary; if Xray finding includes 'Effusion' - Emphysema: Binary; if Xray finding includes 'Emphysema' - Fibrosis: Binary; if Xray finding includes 'Fibrosis' - Hernia: Binary; if Xray finding includes 'Hernia' - Infiltration: Binary; if Xray finding includes 'Infiltration' - Mass: Binary; if Xray finding includes 'Mass' - Nodule: Binary; if Xray finding includes 'Nodule' - Pleural_Thickening: Binary; if Xray finding includes 'Pleural_Thickening' - Pneumonia: Binary; if Xray finding includes'Pneumonia'
- Pneumothorax: Binary; if Xray finding includes 'Pneumothorax'
NIH Chest X ray 14 (224x224 resized)
kaggle.com
zip
Updated Jul 8, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khan Fashee Monowar (Sawrup) (2020). NIH Chest X ray 14 (224x224 resized) [Dataset]. https://www.kaggle.com/khanfashee/nih-chest-x-ray-14-224x224-resized
Explore at:
zip(2468882507 bytes)Available download formats
Dataset updated
Jul 8, 2020
Authors
Khan Fashee Monowar (Sawrup)
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
National Institutes of Health Chest X-Ray Dataset

Chest X-ray exams are one of the most frequent and cost-effective medical imaging examinations available. However, clinical diagnosis of a chest X-ray can be challenging and sometimes more difficult than diagnosis via chest CT imaging. The lack of large publicly available datasets with annotations means it is still very difficult, if not impossible, to achieve clinically relevant computer-aided detection and diagnosis (CAD) in real world medical sites with chest X-rays. One major hurdle in creating large X-ray image datasets is the lack resources for labeling so many images. Prior to the release of this dataset, Openi was the largest publicly available source of chest X-ray images with 4,143 images available.

This NIH Chest X-ray Dataset is comprised of 112,120 X-ray images with disease labels from 30,805 unique patients. To create these labels, the authors used Natural Language Processing to text-mine disease classifications from the associated radiological reports. The labels are expected to be >90% accurate and suitable for weakly-supervised learning. The original radiology reports are not publicly available but you can find more details on the labeling process in this Open Access paper: "ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases." (Wang et al.)

Data limitations:

The image labels are NLP extracted so there could be some erroneous labels but the NLP labeling accuracy is estimated to be >90%. Very limited numbers of disease region bounding boxes (See BBoxlist2017.csv) Chest x-ray radiology reports are not anticipated to be publicly shared. Parties who use this public dataset are encouraged to share their “updated” image labels and/or new bounding boxes in their own studied later, maybe through manual annotation

File contents

Image format: 112,120 total images with size 1024 x 1024 images_001.zip: Contains 4999 images images_002.zip: Contains 10,000 images images_003.zip: Contains 10,000 images images_004.zip: Contains 10,000 images images_005.zip: Contains 10,000 images images_006.zip: Contains 10,000 images images_007.zip: Contains 10,000 images images_008.zip: Contains 10,000 images images_009.zip: Contains 10,000 images images_010.zip: Contains 10,000 images images_011.zip: Contains 10,000 images images_012.zip: Contains 7,121 images README_ChestXray.pdf: Original README file BBoxlist2017.csv: Bounding box coordinates. Note: Start at x,y, extend horizontally w pixels, and vertically h pixels Image Index: File name Finding Label: Disease type (Class label) Bbox x Bbox y Bbox w Bbox h Dataentry2017.csv: Class labels and patient data for the entire dataset Image Index: File name Finding Labels: Disease type (Class label) Follow-up # Patient ID Patient Age Patient Gender View Position: X-ray orientation OriginalImageWidth OriginalImageHeight OriginalImagePixelSpacing_x OriginalImagePixelSpacing_y

Class descriptions

There are 15 classes (14 diseases, and one for "No findings"). Images can be classified as "No findings" or one or more disease classes:

Atelectasis Consolidation Infiltration Pneumothorax Edema Emphysema Fibrosis Effusion Pneumonia Pleural_thickening Cardiomegaly Nodule Mass Hernia

Full Dataset Content

There are 12 zip files in total and range from ~2 gb to 4 gb in size. Additionally, we randomly sampled 5% of these images and created a smaller dataset for use in Kernels. The random sample contains 5606 X-ray images and class labels.

Sample: sample.zip

Modifications to original data

Original TAR archives were converted to ZIP archives to be compatible with the Kaggle platform CSV headers slightly modified to be more explicit in comma separation and also to allow fields to be self-explanatory

Citations

Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. IEEE CVPR 2017, ChestX-ray8Hospital-ScaleChestCVPR2017_paper.pdf NIH News release: NIH Clinical Center provides one of the largest publicly available chest x-ray datasets to scientific community Original source files and documents: https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345
f
Classification accuracy comparison.
figshare.com
xls
Updated Sep 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weiguang Liu; Rafael Delalibera Rodrigues; Jianglong Yan; Yu-tao Zhu; Everson José de Freitas Pereira; Gen Li; Qiusheng Zheng; Liang Zhao (2023). Classification accuracy comparison. [Dataset]. http://doi.org/10.1371/journal.pone.0290968.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0290968.t004
Dataset updated
Sep 1, 2023
Dataset provided by
PLOS ONE
Authors
Weiguang Liu; Rafael Delalibera Rodrigues; Jianglong Yan; Yu-tao Zhu; Everson José de Freitas Pereira; Gen Li; Qiusheng Zheng; Liang Zhao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this work, we present a network-based technique for chest X-ray image classification to help the diagnosis and prognosis of patients with COVID-19. From visual inspection, we perceive that healthy and COVID-19 chest radiographic images present different levels of geometric complexity. Therefore, we apply fractal dimension and quadtree as feature extractors to characterize such differences. Moreover, real-world datasets often present complex patterns, which are hardly handled by only the physical features of the data (such as similarity, distance, or distribution). This issue is addressed by complex networks, which are suitable tools for characterizing data patterns and capturing spatial, topological, and functional relationships in data. Specifically, we propose a new approach combining complexity measures and complex networks to provide a modified high-level classification technique to be applied to COVID-19 chest radiographic image classification. The computational results on the Kaggle COVID-19 Radiography Database show that the proposed method can obtain high classification accuracy on X-ray images, being competitive with state-of-the-art classification techniques. Lastly, a set of network measures is evaluated according to their potential in distinguishing the network classes, which resulted in the choice of communicability measure. We expect that the present work will make significant contributions to machine learning at the semantic level and to combat COVID-19.
5k trachea bifurcation on chest xray
kaggle.com
Updated Feb 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dr. Konya (2021). 5k trachea bifurcation on chest xray [Dataset]. https://www.kaggle.com/sandorkonya/5k-trachea-bifurcation-on-chest-xray
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 20, 2021
Dataset provided by
Kaggle
Authors
dr. Konya
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

I made these annotations for the RANZCR CLiP - Catheter and Line Position Challenge.

Content

The dataset contains: - 5281 ROIs for trachea bifurcation in VGG's json and COCO-Style json format.

A faster RCNN with a 200 px bounding box around the point trained performs pretty good, the average distance to GT is below 50 px , see histogram (X distance in pixel from GT):

https://i.postimg.cc/4xZZQJYS/trachea-bifurcation.jpg" alt="predicted trachea distance on image from GT">

Inspiration

I hope this helps you to determine the abnormal positin of ET tubes on x-rays!

If you use this dataset, please cite as: Trachea bifurcation dataset by Kónya et al., 2021 , https://www.kaggle.com/sandorkonya/5k-trachea-bifurcation-on-chest-xray https://orcid.org/0000-0001-7356-0541

Thank you!
Data from: Covid19 Detection
kaggle.com
Updated May 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
donjon00 (2021). Covid19 Detection [Dataset]. https://www.kaggle.com/donjon00/covid19-detection/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 27, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
donjon00
Description
Datasets Used

This dataset is made from multiple publicly available datasets, which are listed below- 1. NIH Chest X-ray Dataset of 14 Common Thorax Disease 2. Tuberculosis (TB) Chest X-ray Database 3. COVID-19 CHEST X-RAY DATABASE 4. "https://data.mendeley.com/datasets/rscbjbr9sj/2">Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification

Acknowledgements

NIH Chest X-ray dataset: Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, Ronald Summers, ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases, IEEE CVPR, pp. 3462-3471, 2017.

TB dataset: Tawsifur Rahman, Amith Khandakar, Muhammad A. Kadir, Khandaker R. Islam, Khandaker F. Islam, Zaid B. Mahbub, Mohamed Arselene Ayari, Muhammad E. H. Chowdhury. (2020) "Reliable Tuberculosis Detection using Chest X-ray with Deep Learning, Segmentation and Visualization". IEEE Access, Vol. 8, pp 191586 - 191601. DOI. 10.1109/ACCESS.2020.3031384.

COVID dataset: -M.E.H. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M.A. Kadir, Z.B. Mahbub, K.R. Islam, M.S. Khan, A. Iqbal, N. Al-Emadi, M.B.I. Reaz, M. T. Islam, “Can AI help in screening Viral and COVID-19 pneumonia?” IEEE Access, Vol. 8, 2020, pp. 132665 - 132676. -Rahman, T., Khandakar, A., Qiblawey, Y., Tahir, A., Kiranyaz, S., Kashem, S.B.A., Islam, M.T., Maadeed, S.A., Zughaier, S.M., Khan, M.S. and Chowdhury, M.E., 2020. Exploring the Effect of Image Enhancement Techniques on COVID-19 Detection using Chest X-ray Images. arXiv preprint arXiv:2012.02238.

Pneumonia dataset: Kermany, Daniel; Zhang, Kang; Goldbaum, Michael (2018), “Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification”, Mendeley Data, V2, doi: 10.17632/rscbjbr9sj.2

Inspiration

Automating the detection and classification of pulmonary diseases using CXR images.
Dataset (Covid-Bacterial-Viral-Normal-Emphysema)
kaggle.com
Updated Jun 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nhật Nguyễn Minh (2024). Dataset (Covid-Bacterial-Viral-Normal-Emphysema) [Dataset]. https://www.kaggle.com/datasets/minhnhat232/dataset-covid-bacterial-viral-normal-emphysema/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 13, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nhật Nguyễn Minh
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset contain lung x-ray image including:

Normal - 3,270 images

Covid-19 - 3,017 images

Viral-pneumonia - 3,013 images

Bacterial-pneumonia - 3,000 images

Emphysema - 2,550 images

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15315323%2F8041ddd2485bfe9cdf2ba1f9d96bd7e5%2F6_Class_Img.jpg?generation=1741951756137022&alt=media" alt="">

The dataset we use is compiled from many reputable sources including: Dataset 1 [1]: This dataset includes four classes of diseases: COVID-19, viral pneumonia, bacterial pneumonia, and normal. It has multiple versions, and we are currently using the latest version (version 4). Previous studies, such as those by Hariri et al. [18] and Ahmad et al. [20], have also utilized earlier versions of this dataset. Dataset 2 [2]: This dataset is from the National Institutes of Health (NIH) Chest X-Ray Dataset, which contains over 100,000 chest X-ray images from over 30,000 patients. It includes 14 disease classes, including conditions like atelectasis, consolidation, and infiltration. For this study, we have selected 2,550 chest X-ray images specifically from the Emphysema class. Dataset 3 [3]: This is the COVQU dataset, which we have extended to include two additional classes: COVID-19 and viral pneumonia. This dataset has been widely used in previous studies by M.E.H. Chowdhury et al. [4] and Rahman T et al. [5], establishing its reputation as a reliable resource.

In addition, we also publish a modified dataset that aims to remove image regions that do not contain lungs (abdomen, arms, etc.).

References: [1] U. Sait, K. G. Lal, S. P. Prajapati, R. Bhaumik, T. Kumar, S. Shivakumar, K. Bhalla, Curated dataset for covid-19 posterior-anterior chest radiography images (x-rays)., Mendeley Data V4 (2022). doi:10.17632/9xkhgts2s6.4. [2] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, R. M. Summers, Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases (2017) 3462–3471. doi:10.1109/CVPR.2017.369. [3] A. M. Tahir, M. E. Chowdhury, A. Khandakar, T. Rahman, Y. Qiblawey, U. Khurshid, S. Kiranyaz, N. Ibtehaz, M. S. Rahman, S. Al-Maadeed,S. Mahmud, M. Ezeddin, K. Hameed, T. Hamid, Covid-19 infection localization and severity grading from chest x-ray images, Computers in Biology and Medicine 139 (2021) 105002. URL: https://www.sciencedirect.com/science/article/pii/S0010482521007964. doi:https://doi.org/10.1016/j.compbiomed.2021.105002. [4] M. E. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M. A. Kadir, Z. B. Mahbub, K. R. Islam, M. S. Khan, A. Iqbal, N. A. Emadi, M. B. I. Reaz, M. T. Islam, Can ai help in screening viral and covid-19 pneumonia?, IEEE Access 8 (2020) 132665–132676. doi:10.1109/ACCESS.2020.3010287. [5] T. Rahman, A. Khandakar, Y. Qiblawey, A. Tahir, S. Kiranyaz, S. B. A. Kashem, M. T. Islam, S. A. Maadeed, S. M. Zughaier, M. S. Khan, M. E. Chowdhury, Exploring the effect of image enhancement techniques on covid-19 detection using chest x-ray images, Computers in Biology and Medicine 132 (2021). doi:10.1016/j.compbiomed.2021.104319.

UNET Lung Segmentation Weights for Chest X Rays

kaggle.com

Updated Dec 31, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Farhan Hai Khan (2023). UNET Lung Segmentation Weights for Chest X Rays [Dataset]. http://doi.org/10.34740/kaggle/dsv/7312855

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/7312855

Dataset updated

Dec 31, 2023

Dataset provided by

Kaggle

Authors

Farhan Hai Khan

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Often CXRs contain a lot of noise around them, for cardiovascular disease identification, the Lung is an essential part of the CXR and mostly the only object of interest. To eliminate learning from noise, it is often advisable to preprocess datasets first using UNET lung Segmentation and then apply Object Detection/Classification Algorithms. hence this model is being uploaded.

Starter Code

I strongly recommend this notebook for training. Model Architecture : ```python

def unet(input_size=(256,256,1)): inputs = Input(input_size)

conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)

conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(pool1)
conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)

conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool2)
conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)

conv4 = Conv2D(256, (3, 3), activation='relu', padding='same')(pool3)
conv4 = Conv2D(256, (3, 3), activation='relu', padding='same')(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)

conv5 = Conv2D(512, (3, 3), activation='relu', padding='same')(pool4)
conv5 = Conv2D(512, (3, 3), activation='relu', padding='same')(conv5)

up6 = concatenate([Conv2DTranspose(256, (2, 2), strides=(2, 2), padding='same')(conv5), conv4], axis=3)
conv6 = Conv2D(256, (3, 3), activation='relu', padding='same')(up6)
conv6 = Conv2D(256, (3, 3), activation='relu', padding='same')(conv6)

up7 = concatenate([Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same')(conv6), conv3], axis=3)
conv7 = Conv2D(128, (3, 3), activation='relu', padding='same')(up7)
conv7 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv7)

up8 = concatenate([Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same')(conv7), conv2], axis=3)
conv8 = Conv2D(64, (3, 3), activation='relu', padding='same')(up8)
conv8 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv8)

up9 = concatenate([Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same')(conv8), conv1], axis=3)
conv9 = Conv2D(32, (3, 3), activation='relu', padding='same')(up9)
conv9 = Conv2D(32, (3, 3), activation='relu', padding='same')(conv9)

conv10 = Conv2D(1, (1, 1), activation='sigmoid')(conv9)

return Model(inputs=[inputs], outputs=[conv10])


### Acknowledgements

This model would not be possible without [Nikhil Pandey](https://www.kaggle.com/nikhilpandey360).
Here is the [Source Notebook](https://www.kaggle.com/nikhilpandey360/lung-segmentation-from-chest-x-ray-dataset/output).
Also the dataset over which it is trained : [Chest Xray Masks and Labels](https://www.kaggle.com/nikhilpandey360/chest-xray-masks-and-labels)

### Inspiration

Go forth and apply your own amazing DEEP NEURAL NETWORKS!

2.3k tracheostomy tube annotated on chest x-ray
kaggle.com
Updated Feb 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dr. Konya (2021). 2.3k tracheostomy tube annotated on chest x-ray [Dataset]. https://www.kaggle.com/sandorkonya/23k-tracheostomy-tube-annotated-on-chest-xray/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 19, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
dr. Konya
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

I made these segmentations for the RANZCR CLiP - Catheter and Line Position Challenge.

Content

The dataset contains: - 2231 bounding boxes for tracheostomy tubes in VGG's json and COCO-Style json format.

Inspiration

I hope this helps you to segment tracheostomy tubes on x-rays for others!

If you use this dataset, please cite as: Tracheostomy tube segmentation dataset by Kónya et al., 2021 , https://www.kaggle.com/sandorkonya/23k-tracheostomy-tube-annotated-on-chest-xray

Facebook

Twitter

Click to copy link

Link copied

Cite

AyushTankha (2023). Lung Disease Classification Dataset (100+ images) [Dataset]. https://www.kaggle.com/datasets/ayushtankha/lung-disease-classification-dataset-100-images

Lung Disease Classification Dataset (100+ images)

Computer Vision Training Data

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 1, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

AyushTankha

Description

This file does not have a description yet.

Covid-19_and_Pneumonia_X-Ray_Detector Aim of this project is to detect Covid-19 from X-ray and also able to differentitate Covid-19 from viral pneumonia and bacterial pneumonia. I have created a custom dataset that contains covid-19 x-ray images, viral pneumonia x-ray images, bacterial pneumonia x-ray iamges and normal person x-ray images.Each class contains 133 images.

Dataset I have used data from https://github.com/ieee8023/covid-chestxray-dataset and https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia.

0 - Covid-19

1 - Normal X-ray

2 - Viral Pneumonia X-ray

3 - Bacterial Pneumonia X-ray

Clear search

Close search

Google apps

Main menu

Lung Disease Classification Dataset (100+ images)

Chest X Rays Dataset

This classification dataset is from Kaggle and was uploaded to Kaggle by Paul Mooney.

It contains over 5,000 images of chest x-rays in two categories: "PNEUMONIA" and "NORMAL."

Below you will find the description provided on Kaggle:

Context

Content

Acknowledgements

Inspiration

NIH Chest X-rays Bbox version

NIH Chest X-ray Dataset

National Institutes of Health Chest X-Ray Dataset

Data limitations

File contents

Class descriptions

Citations

Acknowledgements

Covid-19 X-Ray Classification Dataset

HDSNE Chest X-ray Dataset Dataset

NIH Chest X-ray Dataset - Dataset - 國網中心Dataset平台

Lung Area Specific COVID-19 Xray Dataset

ChestX-ray14 Dataset

ChestX-ray8 Dataset

DECIMER Image classifier dataset

Pneumonia_chest_xray

Dataset

Contents

Chest X-Ray Worldwide Datasets

Mini NIH XRay Dataset for Binary Classification

NIH Chest X ray 14 (224x224 resized)

National Institutes of Health Chest X-Ray Dataset

Data limitations:

File contents

Class descriptions

Full Dataset Content

Modifications to original data

Citations

Classification accuracy comparison.

5k trachea bifurcation on chest xray

Context

Content

Inspiration

Data from: Covid19 Detection

Datasets Used

Acknowledgements

Inspiration

Dataset (Covid-Bacterial-Viral-Normal-Emphysema)

UNET Lung Segmentation Weights for Chest X Rays

Context

Starter Code

2.3k tracheostomy tube annotated on chest x-ray

Context

Content

Inspiration

Lung Disease Classification Dataset (100+ images)

Computer Vision Training Data