100+ datasets found
  1. m

    The IQ-OTHNCCD lung cancer dataset

    • data.mendeley.com
    Updated Oct 19, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hamdalla alyasriy (2020). The IQ-OTHNCCD lung cancer dataset [Dataset]. http://doi.org/10.17632/bhmdr45bh2.1
    Explore at:
    Dataset updated
    Oct 19, 2020
    Authors
    hamdalla alyasriy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases (IQ-OTH/NCCD) lung cancer dataset was collected in the above-mentioned specialist hospitals over a period of three months in fall 2019. It includes CT scans of patients diagnosed with lung cancer in different stages, as well as healthy subjects. IQ-OTH/NCCD slides were marked by oncologists and radiologists in these two centers. The dataset contains a total of 1190 images representing CT scan slices of 110 cases (see Figure 1). These cases are grouped into three classes: normal, benign, and malignant. of these, 40 cases are diagnosed as malignant; 15 cases diagnosed with benign; and 55 cases classified as normal cases. The CT scans were originally collected in DICOM format. The scanner used is SOMATOM from Siemens. CT protocol includes: 120 kV, slice thickness of 1 mm, with window width ranging from 350 to 1200 HU and window center from 50 to 600 were used for reading. with breath hold at full inspiration. All images were de-identified before performing analysis. Written consent was waived by the oversight review board. The study was approved by the institutional review board of participating medical centers. Each scan contains several slices. The number of these slices range from 80 to 200 slices, each of them represents an image of the human chest with different sides and angles. The 110 cases vary in gender, age, educational attainment, area of residence and living status. Some of them are employees of the Iraqi ministries of Transport and Oil, others are farmers and gainers. Most of them come from places in the middle region of Iraq, particularly, the provinces of Baghdad, Wasit, Diyala, Salahuddin, and Babylon.

  2. c

    A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis

    • cancerimagingarchive.net
    dicom, n/a, xlsx, xml
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive, A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis [Dataset]. http://doi.org/10.7937/TCIA.2020.NNC2-0461
    Explore at:
    xml, n/a, xlsx, dicomAvailable download formats
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Dec 22, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    This dataset consists of CT and PET-CT DICOM images of lung cancer subjects with XML Annotation files that indicate tumor location with bounding boxes. The images were retrospectively acquired from patients with suspicion of lung cancer, and who underwent standard-of-care lung biopsy and PET/CT. Subjects were grouped according to a tissue histopathological diagnosis. Patients with Names/IDs containing the letter 'A' were diagnosed with Adenocarcinoma, 'B' with Small Cell Carcinoma, 'E' with Large Cell Carcinoma, and 'G' with Squamous Cell Carcinoma.

    The images were analyzed on the mediastinum (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval varies from 0.625 mm to 5 mm. Scanning mode includes plain, contrast and 3D reconstruction.

    Before the examination, the patient underwent fasting for at least 6 hours, and the blood glucose of each patient was less than 11 mmol/L. Whole-body emission scans were acquired 60 minutes after the intravenous injection of 18F-FDG (4.44MBq/kg, 0.12mCi/kg), with patients in the supine position in the PET scanner. FDG doses and uptake times were 168.72-468.79MBq (295.8±64.8MBq) and 27-171min (70.4±24.9 minutes), respectively. 18F-FDG with a radiochemical purity of 95% was provided. Patients were allowed to breathe normally during PET and CT acquisitions. Attenuation correction of PET images was performed using CT data with the hybrid segmentation method. Attenuation corrections were performed using a CT protocol (180mAs,120kV,1.0pitch). Each study comprised one CT volume, one PET volume and fused PET and CT images: the CT resolution was 512 × 512 pixels at 1mm × 1mm, the PET resolution was 200 × 200 pixels at 4.07mm × 4.07mm, with a slice thickness and an interslice distance of 1mm. Both volumes were reconstructed with the same number of slices. Three-dimensional (3D) emission and transmission scanning were acquired from the base of the skull to mid femur. The PET images were reconstructed via the TrueX TOF method with a slice thickness of 1mm.

    The location of each tumor was annotated by five academic thoracic radiologists with expertise in lung cancer to make this dataset a useful tool and resource for developing algorithms for medical diagnosis. Two of the radiologists had more than 15 years of experience and the others had more than 5 years of experience. After one of the radiologists labeled each subject the other four radiologists performed a verification, resulting in all five radiologists reviewing each annotation file in the dataset. Annotations were captured using Labellmg. The image annotations are saved as XML files in PASCAL VOC format, which can be parsed using the PASCAL Development Toolkit: https://pypi.org/project/pascal-voc-tools/. Python code to visualize the annotation boxes on top of the DICOM images can be downloaded here.

    Two deep learning researchers used the images and the corresponding annotation files to train several well-known detection models which resulted in a maximum a posteriori probability (MAP) of around 0.87 on the validation set.

  3. H

    Air Quality-Lung Cancer Data

    • dataverse.harvard.edu
    Updated Jan 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mithun Acharjee; Kumer Pial Das; Young S.Stanley (2020). Air Quality-Lung Cancer Data [Dataset]. http://doi.org/10.7910/DVN/HMOEJO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 31, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Mithun Acharjee; Kumer Pial Das; Young S.Stanley
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data comes from two different sources. Population-based lung cancer incidence rates for the period 2010-2014 (most updated data) were abstracted from National Cancer Institute state cancer profiles (Schwartz et al. 1996).This national county-level database of cancer data is collected by state public health surveillance systems. The domain specific county level environmental quality index (EQI) data for the period 2000-2005 were abstracted from United States Environmental Protection Agency (USEPA) profile. Complete descriptions of the datasets used in the EQI are provided in Lobdell’s paper (Lobdell 2011). Data were merged based on the Federal Information Processing Standards (FIPS) code. Out of 3144 counties in United States this study has available information for 2602 counties: Data was not available for four states namely Kansas, Michigan, Minnesota and Nevada due to state legislation and regulations which prohibit the release of county-level data to outside entities, county whose lung cancer mortality information is missing were omitted from the data set, the Union county, Florida is an outlier in terms of mortality information which was deleted from the data set, in the process of local control analysis this study experiences two (cluster 28 and 29) non-informative clusters (non-informative cluster is one for which either treatment or control group information is missing). For analysis, non-informative clusters information was deleted from the data set. Three types of variables are used in this study: (i) lung cancer mortality as an outcome variable (ii) binary treatment indicator is the PM2.5 high (greater than 10.59 mg/m3) vs. low (less than 10.59 mg/m3) (iii) three potential X confounder for clustering namely land EQI, sociodemographic EQI and built EQI. For each index, higher values correspond to poorer environmental quality (Jagai et al. 2017). As PM2.5 is one of the indicators for measuring air EQI, that is why we do not consider the air EQI to avoid confounding effects.

  4. h

    lung-cancer

    • huggingface.co
    Updated Jun 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nate Raw (2022). lung-cancer [Dataset]. https://huggingface.co/datasets/nateraw/lung-cancer
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 24, 2022
    Authors
    Nate Raw
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for Lung Cancer

      Dataset Summary
    

    The effectiveness of cancer prediction system helps the people to know their cancer risk with low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system .

      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    [More Information Needed]

      Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/nateraw/lung-cancer.
    
  5. LNDb Dataset

    • zenodo.org
    bin, csv, pdf, zip
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Pedrosa; João Pedrosa; Guilherme; Guilherme; Carlos; Carlos; Márcio; Patrícia; André; João; Eduardo; Isabel; António; António; Aurélio; Aurélio; Márcio; Patrícia; André; João; Eduardo; Isabel (2024). LNDb Dataset [Dataset]. http://doi.org/10.5281/zenodo.6613714
    Explore at:
    pdf, bin, zip, csvAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    João Pedrosa; João Pedrosa; Guilherme; Guilherme; Carlos; Carlos; Márcio; Patrícia; André; João; Eduardo; Isabel; António; António; Aurélio; Aurélio; Márcio; Patrícia; André; João; Eduardo; Isabel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Lung Nodule Database (LNDb) was developed as an external dataset complimentary to LIDCIDRI. The publication of this database gives continuity to LIDC-IDRI and allows the community to perform an external and comparable validation of proposed computer-aided diagnosis (CAD) systems.

    The LNDb contains 294 CT scans collected retrospectively at the Centro Hospitalar e Universitário de São João (CHUSJ) in Porto, Portugal between 2016 and 2018. All data was acquired under approval from the CHUSJ Ethical Commitee and was anonymised prior to any analysis to remove personal information except for patient birth year and gender.

    The database served as the basis for the Grand Challenge on automatic lung cancer patient management, or LNDb challenge.

    THIS DATASET IS PUBLICALLY AVAILABLE UNDER A CREATIVE COMMONS CC BY-NC-ND LICENSE (ATTRIBUTION-NONCOMMERCIAL-NODERIVS)
    ESSENCIALLY, YOU ARE GRANTED ACCESS TO THE DATASET FOR USE IN YOUR RESEARCH AS LONG AS YOU CREDIT OUR WORK/PUBLICATIONS(*), BUT YOU CANNOT CHANGE THEM IN ANY WAY OR USE THEM COMMERCIALLY

    • (*) Pedrosa, João, et al. "LNDb: a lung nodule database on computed tomography." arXiv preprint arXiv:1911.08434 (2019).
    • (*) Pedrosa, João, et al. "LNDb challenge on automatic lung cancer patient management." Medical image analysis 70 (2021): 102027.
  6. h

    lung-cancer

    • huggingface.co
    Updated Aug 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dorsa Rohani (2024). lung-cancer [Dataset]. https://huggingface.co/datasets/dorsar/lung-cancer
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 13, 2024
    Authors
    Dorsa Rohani
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Lung Cancer CT Scan Dataset

      Dataset Description
    

    This dataset contains CT scan images for lung cancer detection and classification. It includes images of four different categories: adenocarcinoma, large cell carcinoma, squamous cell carcinoma, and normal (non-cancerous) lung tissue.

      Classes
    

    Adenocarcinoma Large Cell Carcinoma Normal (non-cancerous) Squamous Cell Carcinoma

      Dataset Statistics
    

    Total number of images: 315 Number of classes: 4 Class… See the full description on the dataset page: https://huggingface.co/datasets/dorsar/lung-cancer.

  7. c

    National Lung Screening Trial

    • cancerimagingarchive.net
    • stage.cancerimagingarchive.net
    dicom, docx, n/a +2
    Updated Sep 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2021). National Lung Screening Trial [Dataset]. http://doi.org/10.7937/TCIA.HMQ8-J677
    Explore at:
    docx, svs, dicom, n/a, sas, zip, and docAvailable download formats
    Dataset updated
    Sep 24, 2021
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Sep 24, 2021
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    https://www.cancerimagingarchive.net/wp-content/uploads/nctn-logo-300x108.png" alt="" width="300" height="108" />

    Demographic Summary of Available Imaging

    CharacteristicValue (N = 26254)
    Age (years)Mean ± SD: 61.4± 5
    Median (IQR): 60 (57-65)
    Range: 43-75
    SexMale: 15512 (59%)
    Female: 10742 (41%)
    Race

    White: 23969 (91.3%)
    Black: 1135 (4.3%)
    Asian: 547 (2.1%)
    American Indian/Alaska Native: 88 (0.3%)
    Native Hawaiian/Other Pacific Islander: 87 (0.3%)
    Unknown: 428 (1.6%)

    Ethnicity

    Not Available

    Background: The aggressive and heterogeneous nature of lung cancer has thwarted efforts to reduce mortality from this cancer through the use of screening. The advent of low-dose helical computed tomography (CT) altered the landscape of lung-cancer screening, with studies indicating that low-dose CT detects many tumors at early stages. The National Lung Screening Trial (NLST) was conducted to determine whether screening with low-dose CT could reduce mortality from lung cancer.

    Methods: From August 2002 through April 2004, we enrolled 53,454 persons at high risk for lung cancer at 33 U.S. medical centers. Participants were randomly assigned to undergo three annual screenings with either low-dose CT (26,722 participants) or single-view posteroanterior chest radiography (26,732). Data were collected on cases of lung cancer and deaths from lung cancer that occurred through December 31, 2009. This dataset includes the low-dose CT scans from 26,254 of these subjects, as well as digitized histopathology images from 451 subjects.

    Results: The rate of adherence to screening was more than 90%. The rate of positive screening tests was 24.2% with low-dose CT and 6.9% with radiography over all three rounds. A total of 96.4% of the positive screening results in the low-dose CT group and 94.5% in the radiography group were false positive results. The incidence of lung cancer was 645 cases per 100,000 person-years (1060 cancers) in the low-dose CT group, as compared with 572 cases per 100,000 person-years (941 cancers) in the radiography group (rate ratio, 1.13; 95% confidence interval [CI], 1.03 to 1.23). There were 247 deaths from lung cancer per 100,000 person-years in the low-dose CT group and 309 deaths per 100,000 person-years in the radiography group, representing a relative reduction in mortality from lung cancer with low-dose CT screening of 20.0% (95% CI, 6.8 to 26.7; P=0.004). The rate of death from any cause was reduced in the low-dose CT group, as compared with the radiography group, by 6.7% (95% CI, 1.2 to 13.6; P=0.02).

    Conclusions: Screening with the use of low-dose CT reduces mortality from lung cancer. (Funded by the National Cancer Institute; National Lung Screening Trial ClinicalTrials.gov number, NCT00047385).

    Data Availability: A summary of the National Lung Screening Trial and its available datasets are provided on the Cancer Data Access System (CDAS). CDAS is maintained by Information Management System (IMS), contracted by the National Cancer Institute (NCI) as keepers and statistical analyzers of the NLST trial data. The full clinical data set from NLST is available through CDAS. Users of TCIA can download without restriction a publicly distributable subset of that clinical data, along with the CT and Histopathology images collected during the trial. (These previously were restricted.)

  8. c

    Data from The Lung Image Database Consortium (LIDC) and Image Database...

    • cancerimagingarchive.net
    dicom, n/a, xls, xlsx +1
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive, Data from The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans [Dataset]. http://doi.org/10.7937/K9/TCIA.2015.LO9QL9SX
    Explore at:
    xlsx, xls, n/a, xml and zip, dicomAvailable download formats
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Sep 21, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions. It is a web-accessible international resource for development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process.

    Seven academic centers and eight medical imaging companies collaborated to create this data set which contains 1018 cases. Each subject includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories ("nodule > or =3 mm," "nodule <3 mm," and "non-nodule > or =3 mm"). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion. The goal of this process was to identify as completely as possible all lung nodules in each CT scan without requiring forced consensus.

    Note : The TCIA team strongly encourages users to review pylidc and the Standardized representation of the TCIA LIDC-IDRI annotations using DICOM (DICOM-LIDC-IDRI-Nodules) of the annotations/segmentations included in this dataset before developing custom tools to analyze the XML version.

  9. c

    Lung Cancer Dataset

    • cubig.ai
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Lung Cancer Dataset [Dataset]. https://cubig.ai/store/products/185/lung-cancer-dataset
    Explore at:
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • Lung Cancer dataset comprises medical imaging data of lung scans, annotated for binary classification indicating the Yes (1) or No(0) of lung cancer.

    2) Data Utilization (1) Lung Cancer data has characteristics that: • The dataset includes 1 continuous variable, 15 category variables. (2) Lung Cancer data can be used to: • Model Learning: Deep learning models such as convolutional neural networks (CNNs) can be used to analyze lung scan images, and develop diagnostic systems that predict lung cancer. • Simulation Diagnostic Training: Using medical imaging data, doctors can perform simulation diagnostic training and improve diagnostic capabilities.

  10. lung cancer data.xlsx

    • figshare.com
    xlsx
    Updated Jan 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jehan Al-Musawi; Farah Al-Shadeedi; Nabaa Shakir; Sabreen Ibrahim (2025). lung cancer data.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.28235576.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 19, 2025
    Dataset provided by
    figshare
    Authors
    Jehan Al-Musawi; Farah Al-Shadeedi; Nabaa Shakir; Sabreen Ibrahim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract Objective: To identify the socioepidemiologic and histopathologic patterns of lung cancer patients in the Middle Euphrates region. Patients and Methods: This study analyzed medical information from lung cancer patients at the Middle Euphrates Cancer Center in Iraq from January 2018 to December 2023. Demographic information (age, gender, residency, and education level) as well as clinical details (histopathological categorization) were obtained. The inclusion criteria included all confirmed lung cancer cases, while cases with inadequate data or non-lung cancer diagnosis were omitted. The data were analyzed using IBM SPSS Statistics (version 26). The data summarized using descriptive statistics, and chi-square tests used to identify correlations between categorical variables at a significance level of p < 0.05. Ethical approval was obtained from the relevant institutional review board. Results: A total of 1162 patients were included with mean age at diagnosis(64.47±11.45) years. Majority of patients are over 60 years (64.4%), followed by (40–60 years), 34%, and the least affected group is under 40 years (1.6%). Males account for the majority of cases (68%), while females about 32%, with male:female ratio that fluctuate around 2:1. Illiterate patients and those with low education levels represent the largest proportion accounting for about 87.9% of the study population. Squamous Cell Carcinoma (SCC) is the most frequent subtype (41.7%), followed closely by Adenocarcinoma (AC) at 37%, and Small Cell Lung Cancer (SCLC), 10.5%. Although SCC is the predominant subtype overall, AC incidence is increasing overtime (from 31.7% in 2018 to 41.4% in 2023) with predominance in females, younger and higher educated groups. While the percentage of SCLC and other less common subgroups remained relatively stable over time, there is a significant reduction in NSCLC-NOS diagnoses (from 11.1% in 2018 to 3.2% in 2023). Conclusions: In Iraq, specifically in the Middle Euphrates region, lung cancer is a major public health issue in the elder age groups. The two main subtypes, SCC and AC, are the main contributors, with obvious increment in AC cases in the recent years. The shifting trends indicate the urgent need for improved screening strategies, focused preventative initiatives, and customized treatment plans in view of changing risk profiles.

  11. S

    scRNA-seq data of lung cancer

    • scidb.cn
    Updated Jul 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weimin Li (2022). scRNA-seq data of lung cancer [Dataset]. http://doi.org/10.57760/sciencedb.02028
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 21, 2022
    Dataset provided by
    Science Data Bank
    Authors
    Weimin Li
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    we collected 40 tumor and adjacent normal tissue samples from 19 pathologically diagnosed NSCLC patients (10 LUAD and 9 LUSC) during surgical resections, and rapidly digested the tissues to obtain single-cell suspensions and constructed the cDNA libraries of these samples within 24 hours using the protocol of 10X gennomic. These libraries were sequenced on the Illumina NovaSeq 6000 platform. Finally we obtained the raw gene expression matrices were generated using CellRanger (version 3.0.1). Information was processed in R (version 3.6.0) using the Seurat R package (version 2.3.4).

  12. Duke Lung Cancer Screening Dataset 2024

    • zenodo.org
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avivah Wang; Avivah Wang; FAKRUL ISLAM TUSHAR; FAKRUL ISLAM TUSHAR; Michael R. Harowicz; Michael R. Harowicz; Kyle J. Lafata; Tina D. Tailor; Joseph Y. Lo; Joseph Y. Lo; Kyle J. Lafata; Tina D. Tailor (2025). Duke Lung Cancer Screening Dataset 2024 [Dataset]. http://doi.org/10.5281/zenodo.13799069
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Avivah Wang; Avivah Wang; FAKRUL ISLAM TUSHAR; FAKRUL ISLAM TUSHAR; Michael R. Harowicz; Michael R. Harowicz; Kyle J. Lafata; Tina D. Tailor; Joseph Y. Lo; Joseph Y. Lo; Kyle J. Lafata; Tina D. Tailor
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Note - This is part 1 of the dataset.

    Part 1 can be found at : https://zenodo.org/records/13799069
    Part 2 can be found at : https://zenodo.org/records/12784601
    Part 3 can be found at : https://zenodo.org/records/14659131

    Background: Lung cancer risk classification is an increasingly important area of research as low-dose thoracic CT screening programs have become standard of care for patients at high risk for lung cancer. There is limited availability of large, annotated public databases for the training and testing of algorithms for lung nodule classification.

    Methods: Screening chest CT scans done between January 1, 2015 and June 30, 2021 at Duke University Health System were considered for this study. Efficient nodule annotation was performed semi-automatically by using a publicly available deep learning nodule detection algorithm trained on the LUNA16 dataset to identify initial candidates, which were then accepted based on nodule location in the radiology text report or manually annotated by a medical student and a fellowship-trained cardiothoracic radiologist.

    Results: The dataset contains 1613 CT volumes with 2487 annotated nodules, selected from a total dataset of 2061 patients, with the remaining data reserved for future testing. Radiologist spot-checking confirmed the semi-automated annotation had an accuracy rate of >90%.

    Conclusions: The Duke Lung Cancer Screening Dataset 2024 is the first large dataset for CT screening for lung cancer reflecting the use of current CT technology. This represents a useful resource of lung cancer risk classification research, and the efficient annotation methods described for its creation may be used to generate similar databases for research in the future.

    Dataset part Details:
    Part 1: DLCS subset 1 to 7 and, metadata and Annotations.
    Part 2: DLCS subset 8,9 and CT image info metadata.
    Part 3: DLCS subset 10.

    Updates and Versions:

    1. Part 1, Version 1.0 (Published on [03/05/2024]): Released initial dataset, including partial data subsets 1 to 7 and 3D bounding box annotations of the lung nodules.
    2. Part 1, Version 1.1 (Published on [09/19/2024]): Added metadata file (DLCSD24_metadata_v1.1.xlsx) and updated the dataset description and title. 10.5281/zenodo.13799069
    3. Part 2, Version 1.0 (Published on [02/04/2025]): Released DLCS subset 8,9, CT image info metadata (DLCSD24_CT_ImageInfo_v1.csv and metadata documentation).
    4. Part 3, Version 1.0 (Published on [02/04/2025]): Released DLCS subset 10.


    Code Repository:

    To support reproducible open-access research and benchmarking, we have shared several pre-trained models and baseline results in a GitHub and GitLab repository.

    GitLab: https://gitlab.oit.duke.edu/cvit-public/ai_lung_health_benchmarking
    GitHub:
    https://github.com/fitushar/AI-in-Lung-Health-Benchmarking-Detection-and-Diagnostic-Models-Across-Multiple-CT-Scan-Datasets

    Funding:
    This work was supported by the Duke Department of Radiology Charles E. Putman Vision Award, NIH/NIBIB P41-EB028744, and NIH/NCI R01-CA261457.



  13. n

    Chest X-Ray for Lung Cancer Detection Dataset - Dataset - Taiwan AI Data...

    • data.dmc.nycu.edu.tw
    Updated Oct 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Chest X-Ray for Lung Cancer Detection Dataset - Dataset - Taiwan AI Data Sharing Portal [Dataset]. https://data.dmc.nycu.edu.tw/dataset/d11-x-ray
    Explore at:
    Dataset updated
    Oct 5, 2022
    Description

    The dataset includes annotated 2D CXR images for lung nodule detection, capturing the longest diameter and its perpendicular diameter, along with various characteristics such as density, margins, and location. This dataset is designed for using CXR as a first-line screening tool to assess the likelihood of nodule malignancy, aiding in the decision of whether further examinations, such as CT scans or biopsies, are necessary. This tool is especially beneficial for remote areas and telemedicine applications, where access to advanced diagnostic facilities may be limited. By providing an initial assessment of nodule characteristics and potential malignancy, the dataset helps prioritize patients who need immediate further investigation, improving early detection and treatment outcomes in underserved regions.

  14. R

    Cancer Lung Dataset

    • universe.roboflow.com
    zip
    Updated Jan 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    abod (2025). Cancer Lung Dataset [Dataset]. https://universe.roboflow.com/abod-0vl9b/cancer-lung/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 22, 2025
    Dataset authored and provided by
    abod
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Cancer Bounding Boxes
    Description

    Cancer Lung

    ## Overview
    
    Cancer Lung is a dataset for object detection tasks - it contains Cancer annotations for 972 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  15. Lung Cancer Detection - Dataset.zip

    • figshare.com
    zip
    Updated Mar 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paarth Kapur (2025). Lung Cancer Detection - Dataset.zip [Dataset]. http://doi.org/10.6084/m9.figshare.28497596.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 2, 2025
    Dataset provided by
    figshare
    Authors
    Paarth Kapur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises medical images categorized into four distinct classes: Adenocarcinoma, Large Cell Carcinoma, Squamous Cell Carcinoma, and Normal. The dataset includes a total of 1,000 images, with 338 images labeled as Adenocarcinoma, 187 as Large Cell Carcinoma, 260 as Squamous Cell Carcinoma, and 215 as Normal. The images are primarily in PNG format (988 images) with a small fraction in JPG format (12 images). The average image dimensions are 258 pixels in height and 356 pixels in width.The dataset is structured into three subsets: training, validation, and test sets, ensuring proper evaluation and model generalization. Additionally, a separate category, referred to as "bad images," stores non-readable or corrupted images that are unsuitable for processing. The dataset provides a valuable resource for developing and evaluating deep learning models for lung cancer detection and classification.

  16. i

    lung cancer

    • ieee-dataport.org
    Updated Nov 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiajia Qin (2024). lung cancer [Dataset]. https://ieee-dataport.org/documents/lung-cancer
    Explore at:
    Dataset updated
    Nov 29, 2024
    Authors
    Jiajia Qin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As a retrospective study

  17. NSCLC-Radiomics

    • kaggle.com
    Updated May 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    umut Karadurmuş (2024). NSCLC-Radiomics [Dataset]. https://www.kaggle.com/datasets/umutkrdrms/nsclc-radiomics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 26, 2024
    Dataset provided by
    Kaggle
    Authors
    umut Karadurmuş
    Description

    DOI: 10.7937/K9/TCIA.2015.PF0M9REI Summary

    This collection contains images from 422 non-small cell lung cancer (NSCLC) patients. For these patients pretreatment CT scans, manual delineation by a radiation oncologist of the 3D volume of the gross tumor volume and clinical outcome data are available. This dataset refers to the Lung1 dataset of the study published in Nature Communications.

    In short, this publication applies a radiomic approach to computed tomography data of 1,019 patients with lung or head-and-neck cancer. Radiomics refers to the comprehensive quantification of tumour phenotypes by applying a large number of quantitative image features. In present analysis 440 features quantifying tumour image intensity, shape and texture, were extracted. We found that a large number of radiomic features have prognostic power in independent data sets, many of which were not identified as significant before. Radiogenomics analysis revealed that a prognostic radiomic signature, capturing intra-tumour heterogeneity, was associated with underlying gene-expression patterns. These data suggest that radiomics identifies a general prognostic phenotype existing in both lung and head-and-neck cancer. This may have a clinical impact as imaging is routinely used in clinical practice, providing an unprecedented opportunity to improve decision-support in cancer treatment at low cost. The DICOM Radiotherapy Structure Sets (RTSTRUCT) and DICOM Segmentation (SEG) files in this data contain a manual delineation by a radiation oncologist of the 3D volume of the primary gross tumor volume ("GTV-1") and selected anatomical structures (i.e., lung, heart and esophagus). Of note, DICOM SEG objects contain a subset of annotations available in RTSTRUCT.

    The dataset described here (Lung1) was used to build a prognostic radiomic signature. The Lung3 dataset used to investigate the association of radiomic imaging features with gene-expression profiles consisting of 89 NSCLC CT scans with outcome data can be found here: NSCLC-Radiomics-Genomics.

    Other data sets in the Cancer Imaging Archive that were used in the same study published in Nature Communications: Head-Neck-Radiomics-HN1, NSCLC-Radiomics-Interobserver1, RIDER-LungCT-Seg.

    For scientific or other inquiries about this dataset, please contact TCIA's Helpdesk.

  18. p

    Lung Cancer Prediction - Dataset - CKAN

    • data.poltekkes-smg.ac.id
    Updated Oct 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Lung Cancer Prediction - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/lung-cancer-prediction
    Explore at:
    Dataset updated
    Oct 7, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains information on patients with lung cancer, including their age, gender, air pollution exposure, alcohol use, dust allergy, occupational hazards, genetic risk, chronic lung disease, balanced diet, obesity, smoking, passive smoker, chest pain, coughing of blood, fatigue, weight loss ,shortness of breath ,wheezing ,swallowing difficulty ,clubbing of finger nails and snoring

  19. i

    Lung Cancer Data Set and Brain Stroke Data Set

    • ieee-dataport.org
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gobinath J (2025). Lung Cancer Data Set and Brain Stroke Data Set [Dataset]. https://ieee-dataport.org/documents/lung-cancer-data-set-and-brain-stroke-data-set
    Explore at:
    Dataset updated
    May 29, 2025
    Authors
    Gobinath J
    Description

    It is a adenocarcinoma lung cancer image

  20. d

    Synthea lung cancer synthetic patient data series for ML

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chen, AJ (2023). Synthea lung cancer synthetic patient data series for ML [Dataset]. http://doi.org/10.7910/DVN/Q5LK5A
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Chen, AJ
    Description

    These synthetic patient datasets were created for machine learning (ML) study of lung cancer risk prediction in simulation of ML-enabled learning health systems. Five populations of 30K patients were generated by the Synthea patient generator. They were combined sequentially to form 5 different size populations, from 30K to 150K patients. Patients with or without lung cancer were selected roughly at 1:3 ratio and their electronic health records (EHR) were processed to data table files ready for machine learning. The ML-ready table files also have the continuous numeric values converted to categorical values. Because Synthea patients are closely resemble to real patients, these ML-ready dataset can be used to develop and test ML algorithms, and train researchers. Unlike real patient data, these Synthea datasets can be shared with collaborators anywhere without privacy concerns. The first use of these datasets was in a LHS simulation study, which was published in Nature Scientific Reports (see https://www.nature.com/articles/s41598-022-23011-4).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
hamdalla alyasriy (2020). The IQ-OTHNCCD lung cancer dataset [Dataset]. http://doi.org/10.17632/bhmdr45bh2.1

The IQ-OTHNCCD lung cancer dataset

Explore at:
73 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Oct 19, 2020
Authors
hamdalla alyasriy
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases (IQ-OTH/NCCD) lung cancer dataset was collected in the above-mentioned specialist hospitals over a period of three months in fall 2019. It includes CT scans of patients diagnosed with lung cancer in different stages, as well as healthy subjects. IQ-OTH/NCCD slides were marked by oncologists and radiologists in these two centers. The dataset contains a total of 1190 images representing CT scan slices of 110 cases (see Figure 1). These cases are grouped into three classes: normal, benign, and malignant. of these, 40 cases are diagnosed as malignant; 15 cases diagnosed with benign; and 55 cases classified as normal cases. The CT scans were originally collected in DICOM format. The scanner used is SOMATOM from Siemens. CT protocol includes: 120 kV, slice thickness of 1 mm, with window width ranging from 350 to 1200 HU and window center from 50 to 600 were used for reading. with breath hold at full inspiration. All images were de-identified before performing analysis. Written consent was waived by the oversight review board. The study was approved by the institutional review board of participating medical centers. Each scan contains several slices. The number of these slices range from 80 to 200 slices, each of them represents an image of the human chest with different sides and angles. The 110 cases vary in gender, age, educational attainment, area of residence and living status. Some of them are employees of the Iraqi ministries of Transport and Oil, others are farmers and gainers. Most of them come from places in the middle region of Iraq, particularly, the provinces of Baghdad, Wasit, Diyala, Salahuddin, and Babylon.

Search
Clear search
Close search
Google apps
Main menu