100+ datasets found

m
The IQ-OTHNCCD lung cancer dataset
data.mendeley.com
Updated Oct 19, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hamdalla alyasriy (2020). The IQ-OTHNCCD lung cancer dataset [Dataset]. http://doi.org/10.17632/bhmdr45bh2.1
Explore at:
Unique identifier
https://doi.org/10.17632/bhmdr45bh2.1
Dataset updated
Oct 19, 2020
Authors
hamdalla alyasriy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases (IQ-OTH/NCCD) lung cancer dataset was collected in the above-mentioned specialist hospitals over a period of three months in fall 2019. It includes CT scans of patients diagnosed with lung cancer in different stages, as well as healthy subjects. IQ-OTH/NCCD slides were marked by oncologists and radiologists in these two centers. The dataset contains a total of 1190 images representing CT scan slices of 110 cases (see Figure 1). These cases are grouped into three classes: normal, benign, and malignant. of these, 40 cases are diagnosed as malignant; 15 cases diagnosed with benign; and 55 cases classified as normal cases. The CT scans were originally collected in DICOM format. The scanner used is SOMATOM from Siemens. CT protocol includes: 120 kV, slice thickness of 1 mm, with window width ranging from 350 to 1200 HU and window center from 50 to 600 were used for reading. with breath hold at full inspiration. All images were de-identified before performing analysis. Written consent was waived by the oversight review board. The study was approved by the institutional review board of participating medical centers. Each scan contains several slices. The number of these slices range from 80 to 200 slices, each of them represents an image of the human chest with different sides and angles. The 110 cases vary in gender, age, educational attainment, area of residence and living status. Some of them are employees of the Iraqi ministries of Transport and Oil, others are farmers and gainers. Most of them come from places in the middle region of Iraq, particularly, the provinces of Baghdad, Wasit, Diyala, Salahuddin, and Babylon.
c
A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis
cancerimagingarchive.net
dicom, n/a, xlsx, xml
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive, A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis [Dataset]. http://doi.org/10.7937/TCIA.2020.NNC2-0461
Explore at:
xml, n/a, xlsx, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/TCIA.2020.NNC2-0461
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Dec 22, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
This dataset consists of CT and PET-CT DICOM images of lung cancer subjects with XML Annotation files that indicate tumor location with bounding boxes. The images were retrospectively acquired from patients with suspicion of lung cancer, and who underwent standard-of-care lung biopsy and PET/CT. Subjects were grouped according to a tissue histopathological diagnosis. Patients with Names/IDs containing the letter 'A' were diagnosed with Adenocarcinoma, 'B' with Small Cell Carcinoma, 'E' with Large Cell Carcinoma, and 'G' with Squamous Cell Carcinoma.
The images were analyzed on the mediastinum (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval varies from 0.625 mm to 5 mm. Scanning mode includes plain, contrast and 3D reconstruction.
Before the examination, the patient underwent fasting for at least 6 hours, and the blood glucose of each patient was less than 11 mmol/L. Whole-body emission scans were acquired 60 minutes after the intravenous injection of 18F-FDG (4.44MBq/kg, 0.12mCi/kg), with patients in the supine position in the PET scanner. FDG doses and uptake times were 168.72-468.79MBq (295.8±64.8MBq) and 27-171min (70.4±24.9 minutes), respectively. 18F-FDG with a radiochemical purity of 95% was provided. Patients were allowed to breathe normally during PET and CT acquisitions. Attenuation correction of PET images was performed using CT data with the hybrid segmentation method. Attenuation corrections were performed using a CT protocol (180mAs,120kV,1.0pitch). Each study comprised one CT volume, one PET volume and fused PET and CT images: the CT resolution was 512 × 512 pixels at 1mm × 1mm, the PET resolution was 200 × 200 pixels at 4.07mm × 4.07mm, with a slice thickness and an interslice distance of 1mm. Both volumes were reconstructed with the same number of slices. Three-dimensional (3D) emission and transmission scanning were acquired from the base of the skull to mid femur. The PET images were reconstructed via the TrueX TOF method with a slice thickness of 1mm.
The location of each tumor was annotated by five academic thoracic radiologists with expertise in lung cancer to make this dataset a useful tool and resource for developing algorithms for medical diagnosis. Two of the radiologists had more than 15 years of experience and the others had more than 5 years of experience. After one of the radiologists labeled each subject the other four radiologists performed a verification, resulting in all five radiologists reviewing each annotation file in the dataset. Annotations were captured using Labellmg. The image annotations are saved as XML files in PASCAL VOC format, which can be parsed using the PASCAL Development Toolkit: https://pypi.org/project/pascal-voc-tools/. Python code to visualize the annotation boxes on top of the DICOM images can be downloaded here.
Two deep learning researchers used the images and the corresponding annotation files to train several well-known detection models which resulted in a maximum a posteriori probability (MAP) of around 0.87 on the validation set.
H
Air Quality-Lung Cancer Data
dataverse.harvard.edu
Updated Jan 31, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mithun Acharjee; Kumer Pial Das; Young S.Stanley (2020). Air Quality-Lung Cancer Data [Dataset]. http://doi.org/10.7910/DVN/HMOEJO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/HMOEJO
Dataset updated
Jan 31, 2020
Dataset provided by
Harvard Dataverse
Authors
Mithun Acharjee; Kumer Pial Das; Young S.Stanley
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data comes from two different sources. Population-based lung cancer incidence rates for the period 2010-2014 (most updated data) were abstracted from National Cancer Institute state cancer profiles (Schwartz et al. 1996).This national county-level database of cancer data is collected by state public health surveillance systems. The domain specific county level environmental quality index (EQI) data for the period 2000-2005 were abstracted from United States Environmental Protection Agency (USEPA) profile. Complete descriptions of the datasets used in the EQI are provided in Lobdell’s paper (Lobdell 2011). Data were merged based on the Federal Information Processing Standards (FIPS) code. Out of 3144 counties in United States this study has available information for 2602 counties: Data was not available for four states namely Kansas, Michigan, Minnesota and Nevada due to state legislation and regulations which prohibit the release of county-level data to outside entities, county whose lung cancer mortality information is missing were omitted from the data set, the Union county, Florida is an outlier in terms of mortality information which was deleted from the data set, in the process of local control analysis this study experiences two (cluster 28 and 29) non-informative clusters (non-informative cluster is one for which either treatment or control group information is missing). For analysis, non-informative clusters information was deleted from the data set. Three types of variables are used in this study: (i) lung cancer mortality as an outcome variable (ii) binary treatment indicator is the PM2.5 high (greater than 10.59 mg/m3) vs. low (less than 10.59 mg/m3) (iii) three potential X confounder for clustering namely land EQI, sociodemographic EQI and built EQI. For each index, higher values correspond to poorer environmental quality (Jagai et al. 2017). As PM2.5 is one of the indicators for measuring air EQI, that is why we do not consider the air EQI to avoid confounding effects.
h
lung-cancer
huggingface.co
Updated Jun 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nate Raw (2022). lung-cancer [Dataset]. https://huggingface.co/datasets/nateraw/lung-cancer
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 24, 2022
Authors
Nate Raw
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for Lung Cancer

Dataset Summary

The effectiveness of cancer prediction system helps the people to know their cancer risk with low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system .

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/nateraw/lung-cancer.
LNDb Dataset
zenodo.org
bin, csv, pdf, zip
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Pedrosa; João Pedrosa; Guilherme; Guilherme; Carlos; Carlos; Márcio; Patrícia; André; João; Eduardo; Isabel; António; António; Aurélio; Aurélio; Márcio; Patrícia; André; João; Eduardo; Isabel (2024). LNDb Dataset [Dataset]. http://doi.org/10.5281/zenodo.6613714
Explore at:
pdf, bin, zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6613714
Dataset updated
Jul 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
João Pedrosa; João Pedrosa; Guilherme; Guilherme; Carlos; Carlos; Márcio; Patrícia; André; João; Eduardo; Isabel; António; António; Aurélio; Aurélio; Márcio; Patrícia; André; João; Eduardo; Isabel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Lung Nodule Database (LNDb) was developed as an external dataset complimentary to LIDCIDRI. The publication of this database gives continuity to LIDC-IDRI and allows the community to perform an external and comparable validation of proposed computer-aided diagnosis (CAD) systems.

The LNDb contains 294 CT scans collected retrospectively at the Centro Hospitalar e Universitário de São João (CHUSJ) in Porto, Portugal between 2016 and 2018. All data was acquired under approval from the CHUSJ Ethical Commitee and was anonymised prior to any analysis to remove personal information except for patient birth year and gender.

The database served as the basis for the Grand Challenge on automatic lung cancer patient management, or LNDb challenge.

THIS DATASET IS PUBLICALLY AVAILABLE UNDER A CREATIVE COMMONS CC BY-NC-ND LICENSE (ATTRIBUTION-NONCOMMERCIAL-NODERIVS)
ESSENCIALLY, YOU ARE GRANTED ACCESS TO THE DATASET FOR USE IN YOUR RESEARCH AS LONG AS YOU CREDIT OUR WORK/PUBLICATIONS(*), BUT YOU CANNOT CHANGE THEM IN ANY WAY OR USE THEM COMMERCIALLY

(*) Pedrosa, João, et al. "LNDb: a lung nodule database on computed tomography." arXiv preprint arXiv:1911.08434 (2019).

(*) Pedrosa, João, et al. "LNDb challenge on automatic lung cancer patient management." Medical image analysis 70 (2021): 102027.
h
lung-cancer
huggingface.co
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dorsa Rohani (2024). lung-cancer [Dataset]. https://huggingface.co/datasets/dorsar/lung-cancer
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 13, 2024
Authors
Dorsa Rohani
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Lung Cancer CT Scan Dataset

Dataset Description

This dataset contains CT scan images for lung cancer detection and classification. It includes images of four different categories: adenocarcinoma, large cell carcinoma, squamous cell carcinoma, and normal (non-cancerous) lung tissue.

Classes

Adenocarcinoma Large Cell Carcinoma Normal (non-cancerous) Squamous Cell Carcinoma

Dataset Statistics

Total number of images: 315 Number of classes: 4 Class… See the full description on the dataset page: https://huggingface.co/datasets/dorsar/lung-cancer.

National Lung Screening Trial

cancerimagingarchive.net
stage.cancerimagingarchive.net

dicom, docx, n/a +2

Updated Sep 24, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

The Cancer Imaging Archive (2021). National Lung Screening Trial [Dataset]. http://doi.org/10.7937/TCIA.HMQ8-J677

Explore at:

docx, svs, dicom, n/a, sas, zip, and docAvailable download formats

Unique identifier

https://doi.org/10.7937/TCIA.HMQ8-J677

Dataset updated

Sep 24, 2021

Dataset authored and provided by

The Cancer Imaging Archive

License

https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

Time period covered

Sep 24, 2021

Dataset funded by

National Cancer Institutehttp://www.cancer.gov/

Description

https://www.cancerimagingarchive.net/wp-content/uploads/nctn-logo-300x108.png" alt="" width="300" height="108" />

Demographic Summary of Available Imaging

Characteristic	Value (N = 26254)
Age (years)	Mean ± SD: 61.4± 5 Median (IQR): 60 (57-65) Range: 43-75
Sex	Male: 15512 (59%) Female: 10742 (41%)
Race	White: 23969 (91.3%) Black: 1135 (4.3%) Asian: 547 (2.1%) American Indian/Alaska Native: 88 (0.3%) Native Hawaiian/Other Pacific Islander: 87 (0.3%) Unknown: 428 (1.6%)
Ethnicity	Not Available

Background: The aggressive and heterogeneous nature of lung cancer has thwarted efforts to reduce mortality from this cancer through the use of screening. The advent of low-dose helical computed tomography (CT) altered the landscape of lung-cancer screening, with studies indicating that low-dose CT detects many tumors at early stages. The National Lung Screening Trial (NLST) was conducted to determine whether screening with low-dose CT could reduce mortality from lung cancer.

Methods: From August 2002 through April 2004, we enrolled 53,454 persons at high risk for lung cancer at 33 U.S. medical centers. Participants were randomly assigned to undergo three annual screenings with either low-dose CT (26,722 participants) or single-view posteroanterior chest radiography (26,732). Data were collected on cases of lung cancer and deaths from lung cancer that occurred through December 31, 2009. This dataset includes the low-dose CT scans from 26,254 of these subjects, as well as digitized histopathology images from 451 subjects.

Results: The rate of adherence to screening was more than 90%. The rate of positive screening tests was 24.2% with low-dose CT and 6.9% with radiography over all three rounds. A total of 96.4% of the positive screening results in the low-dose CT group and 94.5% in the radiography group were false positive results. The incidence of lung cancer was 645 cases per 100,000 person-years (1060 cancers) in the low-dose CT group, as compared with 572 cases per 100,000 person-years (941 cancers) in the radiography group (rate ratio, 1.13; 95% confidence interval [CI], 1.03 to 1.23). There were 247 deaths from lung cancer per 100,000 person-years in the low-dose CT group and 309 deaths per 100,000 person-years in the radiography group, representing a relative reduction in mortality from lung cancer with low-dose CT screening of 20.0% (95% CI, 6.8 to 26.7; P=0.004). The rate of death from any cause was reduced in the low-dose CT group, as compared with the radiography group, by 6.7% (95% CI, 1.2 to 13.6; P=0.02).

Conclusions: Screening with the use of low-dose CT reduces mortality from lung cancer. (Funded by the National Cancer Institute; National Lung Screening Trial ClinicalTrials.gov number, NCT00047385).

Data Availability: A summary of the National Lung Screening Trial and its available datasets are provided on the Cancer Data Access System (CDAS). CDAS is maintained by Information Management System (IMS), contracted by the National Cancer Institute (NCI) as keepers and statistical analyzers of the NLST trial data. The full clinical data set from NLST is available through CDAS. Users of TCIA can download without restriction a publicly distributable subset of that clinical data, along with the CT and Histopathology images collected during the trial. (These previously were restricted.)

c
Data from The Lung Image Database Consortium (LIDC) and Image Database...
cancerimagingarchive.net
dicom, n/a, xls, xlsx +1
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive, Data from The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans [Dataset]. http://doi.org/10.7937/K9/TCIA.2015.LO9QL9SX
Explore at:
xlsx, xls, n/a, xml and zip, dicomAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2015.LO9QL9SX
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Sep 21, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions. It is a web-accessible international resource for development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process.
Seven academic centers and eight medical imaging companies collaborated to create this data set which contains 1018 cases. Each subject includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories ("nodule > or =3 mm," "nodule <3 mm," and "non-nodule > or =3 mm"). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion. The goal of this process was to identify as completely as possible all lung nodules in each CT scan without requiring forced consensus.
Note : The TCIA team strongly encourages users to review pylidc and the Standardized representation of the TCIA LIDC-IDRI annotations using DICOM (DICOM-LIDC-IDRI-Nodules) of the annotations/segmentations included in this dataset before developing custom tools to analyze the XML version.
c
Lung Cancer Dataset
cubig.ai
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Lung Cancer Dataset [Dataset]. https://cubig.ai/store/products/185/lung-cancer-dataset
Explore at:
Dataset updated
May 2, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • Lung Cancer dataset comprises medical imaging data of lung scans, annotated for binary classification indicating the Yes (1) or No(0) of lung cancer.

2) Data Utilization (1) Lung Cancer data has characteristics that: • The dataset includes 1 continuous variable, 15 category variables. (2) Lung Cancer data can be used to: • Model Learning: Deep learning models such as convolutional neural networks (CNNs) can be used to analyze lung scan images, and develop diagnostic systems that predict lung cancer. • Simulation Diagnostic Training: Using medical imaging data, doctors can perform simulation diagnostic training and improve diagnostic capabilities.
lung cancer data.xlsx
figshare.com
xlsx
Updated Jan 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jehan Al-Musawi; Farah Al-Shadeedi; Nabaa Shakir; Sabreen Ibrahim (2025). lung cancer data.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.28235576.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28235576.v1
Dataset updated
Jan 19, 2025
Dataset provided by
figshare
Authors
Jehan Al-Musawi; Farah Al-Shadeedi; Nabaa Shakir; Sabreen Ibrahim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract Objective: To identify the socioepidemiologic and histopathologic patterns of lung cancer patients in the Middle Euphrates region. Patients and Methods: This study analyzed medical information from lung cancer patients at the Middle Euphrates Cancer Center in Iraq from January 2018 to December 2023. Demographic information (age, gender, residency, and education level) as well as clinical details (histopathological categorization) were obtained. The inclusion criteria included all confirmed lung cancer cases, while cases with inadequate data or non-lung cancer diagnosis were omitted. The data were analyzed using IBM SPSS Statistics (version 26). The data summarized using descriptive statistics, and chi-square tests used to identify correlations between categorical variables at a significance level of p < 0.05. Ethical approval was obtained from the relevant institutional review board. Results: A total of 1162 patients were included with mean age at diagnosis(64.47±11.45) years. Majority of patients are over 60 years (64.4%), followed by (40–60 years), 34%, and the least affected group is under 40 years (1.6%). Males account for the majority of cases (68%), while females about 32%, with male:female ratio that fluctuate around 2:1. Illiterate patients and those with low education levels represent the largest proportion accounting for about 87.9% of the study population. Squamous Cell Carcinoma (SCC) is the most frequent subtype (41.7%), followed closely by Adenocarcinoma (AC) at 37%, and Small Cell Lung Cancer (SCLC), 10.5%. Although SCC is the predominant subtype overall, AC incidence is increasing overtime (from 31.7% in 2018 to 41.4% in 2023) with predominance in females, younger and higher educated groups. While the percentage of SCLC and other less common subgroups remained relatively stable over time, there is a significant reduction in NSCLC-NOS diagnoses (from 11.1% in 2018 to 3.2% in 2023). Conclusions: In Iraq, specifically in the Middle Euphrates region, lung cancer is a major public health issue in the elder age groups. The two main subtypes, SCC and AC, are the main contributors, with obvious increment in AC cases in the recent years. The shifting trends indicate the urgent need for improved screening strategies, focused preventative initiatives, and customized treatment plans in view of changing risk profiles.
S
scRNA-seq data of lung cancer
scidb.cn
Updated Jul 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weimin Li (2022). scRNA-seq data of lung cancer [Dataset]. http://doi.org/10.57760/sciencedb.02028
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.02028
Dataset updated
Jul 21, 2022
Dataset provided by
Science Data Bank
Authors
Weimin Li
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
we collected 40 tumor and adjacent normal tissue samples from 19 pathologically diagnosed NSCLC patients (10 LUAD and 9 LUSC) during surgical resections, and rapidly digested the tissues to obtain single-cell suspensions and constructed the cDNA libraries of these samples within 24 hours using the protocol of 10X gennomic. These libraries were sequenced on the Illumina NovaSeq 6000 platform. Finally we obtained the raw gene expression matrices were generated using CellRanger (version 3.0.1). Information was processed in R (version 3.6.0) using the Seurat R package (version 2.3.4).
Duke Lung Cancer Screening Dataset 2024
zenodo.org
Updated Feb 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Avivah Wang; Avivah Wang; FAKRUL ISLAM TUSHAR; FAKRUL ISLAM TUSHAR; Michael R. Harowicz; Michael R. Harowicz; Kyle J. Lafata; Tina D. Tailor; Joseph Y. Lo; Joseph Y. Lo; Kyle J. Lafata; Tina D. Tailor (2025). Duke Lung Cancer Screening Dataset 2024 [Dataset]. http://doi.org/10.5281/zenodo.13799069
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.13799069
Dataset updated
Feb 5, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Avivah Wang; Avivah Wang; FAKRUL ISLAM TUSHAR; FAKRUL ISLAM TUSHAR; Michael R. Harowicz; Michael R. Harowicz; Kyle J. Lafata; Tina D. Tailor; Joseph Y. Lo; Joseph Y. Lo; Kyle J. Lafata; Tina D. Tailor
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Note - This is part 1 of the dataset.

Part 1 can be found at : https://zenodo.org/records/13799069
Part 2 can be found at : https://zenodo.org/records/12784601
Part 3 can be found at : https://zenodo.org/records/14659131

Background: Lung cancer risk classification is an increasingly important area of research as low-dose thoracic CT screening programs have become standard of care for patients at high risk for lung cancer. There is limited availability of large, annotated public databases for the training and testing of algorithms for lung nodule classification.

Methods: Screening chest CT scans done between January 1, 2015 and June 30, 2021 at Duke University Health System were considered for this study. Efficient nodule annotation was performed semi-automatically by using a publicly available deep learning nodule detection algorithm trained on the LUNA16 dataset to identify initial candidates, which were then accepted based on nodule location in the radiology text report or manually annotated by a medical student and a fellowship-trained cardiothoracic radiologist.

Results: The dataset contains 1613 CT volumes with 2487 annotated nodules, selected from a total dataset of 2061 patients, with the remaining data reserved for future testing. Radiologist spot-checking confirmed the semi-automated annotation had an accuracy rate of >90%.

Conclusions: The Duke Lung Cancer Screening Dataset 2024 is the first large dataset for CT screening for lung cancer reflecting the use of current CT technology. This represents a useful resource of lung cancer risk classification research, and the efficient annotation methods described for its creation may be used to generate similar databases for research in the future.

Dataset part Details:
Part 1: DLCS subset 1 to 7 and, metadata and Annotations.
Part 2: DLCS subset 8,9 and CT image info metadata.
Part 3: DLCS subset 10.

Updates and Versions:

Part 1, Version 1.0 (Published on [03/05/2024]): Released initial dataset, including partial data subsets 1 to 7 and 3D bounding box annotations of the lung nodules.

Part 1, Version 1.1 (Published on [09/19/2024]): Added metadata file (DLCSD24_metadata_v1.1.xlsx) and updated the dataset description and title. 10.5281/zenodo.13799069

Part 2, Version 1.0 (Published on [02/04/2025]): Released DLCS subset 8,9, CT image info metadata (DLCSD24_CT_ImageInfo_v1.csv and metadata documentation).

Part 3, Version 1.0 (Published on [02/04/2025]): Released DLCS subset 10.

Code Repository:
To support reproducible open-access research and benchmarking, we have shared several pre-trained models and baseline results in a GitHub and GitLab repository.

GitLab: https://gitlab.oit.duke.edu/cvit-public/ai_lung_health_benchmarking
GitHub: https://github.com/fitushar/AI-in-Lung-Health-Benchmarking-Detection-and-Diagnostic-Models-Across-Multiple-CT-Scan-Datasets

Funding:
This work was supported by the Duke Department of Radiology Charles E. Putman Vision Award, NIH/NIBIB P41-EB028744, and NIH/NCI R01-CA261457.
n
Chest X-Ray for Lung Cancer Detection Dataset - Dataset - Taiwan AI Data...
data.dmc.nycu.edu.tw
Updated Oct 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Chest X-Ray for Lung Cancer Detection Dataset - Dataset - Taiwan AI Data Sharing Portal [Dataset]. https://data.dmc.nycu.edu.tw/dataset/d11-x-ray
Explore at:
Dataset updated
Oct 5, 2022
Description
The dataset includes annotated 2D CXR images for lung nodule detection, capturing the longest diameter and its perpendicular diameter, along with various characteristics such as density, margins, and location. This dataset is designed for using CXR as a first-line screening tool to assess the likelihood of nodule malignancy, aiding in the decision of whether further examinations, such as CT scans or biopsies, are necessary. This tool is especially beneficial for remote areas and telemedicine applications, where access to advanced diagnostic facilities may be limited. By providing an initial assessment of nodule characteristics and potential malignancy, the dataset helps prioritize patients who need immediate further investigation, improving early detection and treatment outcomes in underserved regions.
R
Cancer Lung Dataset
universe.roboflow.com
zip
Updated Jan 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
abod (2025). Cancer Lung Dataset [Dataset]. https://universe.roboflow.com/abod-0vl9b/cancer-lung/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jan 22, 2025
Dataset authored and provided by
abod
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Cancer Bounding Boxes
Description
Cancer Lung

## Overview Cancer Lung is a dataset for object detection tasks - it contains Cancer annotations for 972 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Lung Cancer Detection - Dataset.zip
figshare.com
zip
Updated Mar 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paarth Kapur (2025). Lung Cancer Detection - Dataset.zip [Dataset]. http://doi.org/10.6084/m9.figshare.28497596.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28497596.v1
Dataset updated
Mar 2, 2025
Dataset provided by
figshare
Authors
Paarth Kapur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset comprises medical images categorized into four distinct classes: Adenocarcinoma, Large Cell Carcinoma, Squamous Cell Carcinoma, and Normal. The dataset includes a total of 1,000 images, with 338 images labeled as Adenocarcinoma, 187 as Large Cell Carcinoma, 260 as Squamous Cell Carcinoma, and 215 as Normal. The images are primarily in PNG format (988 images) with a small fraction in JPG format (12 images). The average image dimensions are 258 pixels in height and 356 pixels in width.The dataset is structured into three subsets: training, validation, and test sets, ensuring proper evaluation and model generalization. Additionally, a separate category, referred to as "bad images," stores non-readable or corrupted images that are unsuitable for processing. The dataset provides a valuable resource for developing and evaluating deep learning models for lung cancer detection and classification.
i
lung cancer
ieee-dataport.org
Updated Nov 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiajia Qin (2024). lung cancer [Dataset]. https://ieee-dataport.org/documents/lung-cancer
Explore at:
Dataset updated
Nov 29, 2024
Authors
Jiajia Qin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As a retrospective study
NSCLC-Radiomics
kaggle.com
Updated May 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
umut Karadurmuş (2024). NSCLC-Radiomics [Dataset]. https://www.kaggle.com/datasets/umutkrdrms/nsclc-radiomics
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 26, 2024
Dataset provided by
Kaggle
Authors
umut Karadurmuş
Description
DOI: 10.7937/K9/TCIA.2015.PF0M9REI Summary

This collection contains images from 422 non-small cell lung cancer (NSCLC) patients. For these patients pretreatment CT scans, manual delineation by a radiation oncologist of the 3D volume of the gross tumor volume and clinical outcome data are available. This dataset refers to the Lung1 dataset of the study published in Nature Communications.

In short, this publication applies a radiomic approach to computed tomography data of 1,019 patients with lung or head-and-neck cancer. Radiomics refers to the comprehensive quantification of tumour phenotypes by applying a large number of quantitative image features. In present analysis 440 features quantifying tumour image intensity, shape and texture, were extracted. We found that a large number of radiomic features have prognostic power in independent data sets, many of which were not identified as significant before. Radiogenomics analysis revealed that a prognostic radiomic signature, capturing intra-tumour heterogeneity, was associated with underlying gene-expression patterns. These data suggest that radiomics identifies a general prognostic phenotype existing in both lung and head-and-neck cancer. This may have a clinical impact as imaging is routinely used in clinical practice, providing an unprecedented opportunity to improve decision-support in cancer treatment at low cost. The DICOM Radiotherapy Structure Sets (RTSTRUCT) and DICOM Segmentation (SEG) files in this data contain a manual delineation by a radiation oncologist of the 3D volume of the primary gross tumor volume ("GTV-1") and selected anatomical structures (i.e., lung, heart and esophagus). Of note, DICOM SEG objects contain a subset of annotations available in RTSTRUCT.

The dataset described here (Lung1) was used to build a prognostic radiomic signature. The Lung3 dataset used to investigate the association of radiomic imaging features with gene-expression profiles consisting of 89 NSCLC CT scans with outcome data can be found here: NSCLC-Radiomics-Genomics.

Other data sets in the Cancer Imaging Archive that were used in the same study published in Nature Communications: Head-Neck-Radiomics-HN1, NSCLC-Radiomics-Interobserver1, RIDER-LungCT-Seg.

For scientific or other inquiries about this dataset, please contact TCIA's Helpdesk.
p
Lung Cancer Prediction - Dataset - CKAN
data.poltekkes-smg.ac.id
Updated Oct 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Lung Cancer Prediction - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/lung-cancer-prediction
Explore at:
Dataset updated
Oct 7, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains information on patients with lung cancer, including their age, gender, air pollution exposure, alcohol use, dust allergy, occupational hazards, genetic risk, chronic lung disease, balanced diet, obesity, smoking, passive smoker, chest pain, coughing of blood, fatigue, weight loss ,shortness of breath ,wheezing ,swallowing difficulty ,clubbing of finger nails and snoring
i
Lung Cancer Data Set and Brain Stroke Data Set
ieee-dataport.org
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gobinath J (2025). Lung Cancer Data Set and Brain Stroke Data Set [Dataset]. https://ieee-dataport.org/documents/lung-cancer-data-set-and-brain-stroke-data-set
Explore at:
Dataset updated
May 29, 2025
Authors
Gobinath J
Description
It is a adenocarcinoma lung cancer image
d
Synthea lung cancer synthetic patient data series for ML
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chen, AJ (2023). Synthea lung cancer synthetic patient data series for ML [Dataset]. http://doi.org/10.7910/DVN/Q5LK5A
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/Q5LK5A
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Chen, AJ
Description
These synthetic patient datasets were created for machine learning (ML) study of lung cancer risk prediction in simulation of ML-enabled learning health systems. Five populations of 30K patients were generated by the Synthea patient generator. They were combined sequentially to form 5 different size populations, from 30K to 150K patients. Patients with or without lung cancer were selected roughly at 1:3 ratio and their electronic health records (EHR) were processed to data table files ready for machine learning. The ML-ready table files also have the continuous numeric values converted to categorical values. Because Synthea patients are closely resemble to real patients, these ML-ready dataset can be used to develop and test ML algorithms, and train researchers. Unlike real patient data, these Synthea datasets can be shared with collaborators anywhere without privacy concerns. The first use of these datasets was in a LHS simulation study, which was published in Nature Scientific Reports (see https://www.nature.com/articles/s41598-022-23011-4).

Facebook

Twitter

Click to copy link

Link copied

Cite

hamdalla alyasriy (2020). The IQ-OTHNCCD lung cancer dataset [Dataset]. http://doi.org/10.17632/bhmdr45bh2.1

The IQ-OTHNCCD lung cancer dataset

Explore at:

73 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.17632/bhmdr45bh2.1

Dataset updated

Oct 19, 2020

Authors

hamdalla alyasriy

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases (IQ-OTH/NCCD) lung cancer dataset was collected in the above-mentioned specialist hospitals over a period of three months in fall 2019. It includes CT scans of patients diagnosed with lung cancer in different stages, as well as healthy subjects. IQ-OTH/NCCD slides were marked by oncologists and radiologists in these two centers. The dataset contains a total of 1190 images representing CT scan slices of 110 cases (see Figure 1). These cases are grouped into three classes: normal, benign, and malignant. of these, 40 cases are diagnosed as malignant; 15 cases diagnosed with benign; and 55 cases classified as normal cases. The CT scans were originally collected in DICOM format. The scanner used is SOMATOM from Siemens. CT protocol includes: 120 kV, slice thickness of 1 mm, with window width ranging from 350 to 1200 HU and window center from 50 to 600 were used for reading. with breath hold at full inspiration. All images were de-identified before performing analysis. Written consent was waived by the oversight review board. The study was approved by the institutional review board of participating medical centers. Each scan contains several slices. The number of these slices range from 80 to 200 slices, each of them represents an image of the human chest with different sides and angles. The 110 cases vary in gender, age, educational attainment, area of residence and living status. Some of them are employees of the Iraqi ministries of Transport and Oil, others are farmers and gainers. Most of them come from places in the middle region of Iraq, particularly, the provinces of Baghdad, Wasit, Diyala, Salahuddin, and Babylon.

Clear search

Close search

Google apps

Main menu

The IQ-OTHNCCD lung cancer dataset

A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis

Air Quality-Lung Cancer Data

lung-cancer

LNDb Dataset

lung-cancer

National Lung Screening Trial

Demographic Summary of Available Imaging

Data from The Lung Image Database Consortium (LIDC) and Image Database...

Lung Cancer Dataset

lung cancer data.xlsx

scRNA-seq data of lung cancer

Duke Lung Cancer Screening Dataset 2024

Chest X-Ray for Lung Cancer Detection Dataset - Dataset - Taiwan AI Data...

Cancer Lung Dataset

Cancer Lung

Lung Cancer Detection - Dataset.zip

lung cancer

NSCLC-Radiomics

Lung Cancer Prediction - Dataset - CKAN

Lung Cancer Data Set and Brain Stroke Data Set

Synthea lung cancer synthetic patient data series for ML

The IQ-OTHNCCD lung cancer datasetSee More Versions

The IQ-OTHNCCD lung cancer dataset