https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Prostate cancer T1- and T2-weighted magnetic resonance images (MRIs) were acquired on a 1.5 T Philips Achieva by combined surface and endorectal coil, including dynamic contrast-enhanced images obtained prior to, during and after I.V. administration of 0.1 mmol/kg body weight of Gadolinium-DTPA (pentetic acid). Corresponding clinical metadata (XLS format) and 3D segmentation files (NRRD format) are offered as a supplement to this image collection. The XLS file contains pathology biopsy and excised gland tissue reports and the MRI radiology report for most subjects.
The Multi-component NRRD Segmentations allow visualization and downstream analysis in 3D Slicer of the following prostate components: prostate gland boundary; internal capsule; central gland, peripheral zone; seminal vesicles; urethra; cancer – dominant nodule; neurovascular bundle; penile bulb; ejaculatory duct; veru-montanum; and rectum. See our tutorial on Using 3D Slicer with the Prostate-Diagnosis data if you are not familiar with using this kind of data.
The Seminal vesicles (SV) and neurovascular bundle (NVB) Segmentations delineate the neurovascular bundle and seminal vessicles as MHA files. These were provided as part of a planned challenge competition that did not materialize.
The Third Party Analysis dataset mentioned beneath the Data Access table was added later as part of the NCI-ISBI 2013 Challenge - Automated Segmentation of Prostate Structures. It includes segmentations for 30 Prostate-Diagnosis subjects in NRRD format which mark the boundaries of the central gland and peripheral zone were also provided
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The Cancer Genome Atlas Glioblastoma Multiforme (TCGA-GBM) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Glioma Phenotype Research Group.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
The Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
https://wiki.cancerimagingarchive.net/display/Public/TCGA-LUAD
Contains data from the HNSCC-01-0019 in DICOM format. Can be used for testing purposes.
The HNSCC collection is distributed under the Creative Commons Attribution 3.0 Unported License (http://creativecommons.org/licenses/by/3.0/).
By downloading the data, you agree to abide by terms of this license.
_
Data Usage Policy
Any user accessing TCIA data must agree to:
- Not use the requested datasets, either alone or in concert with any other information, to identify or contact individual participants from whom data and/or samples were collected and follow all other conditions specified in the TCIA Site Disclaimer. Approved Users also agree not to generate and use information (e.g., facial images or comparable representations) in a manner that could allow the identities of research participants to be readily ascertained. These provisions do not apply to research investigators operating with specific IRB approval, pursuant to 45 CFR 46, to contact individuals within datasets or to obtain and use identifying information under an IRB-approved research protocol. All investigators including any Approved User conducting “human subjects research” within the scope of 45 CFR 46 must comply with the requirements contained therein.
- Acknowledge in all oral or written presentations, disclosures, or publications the specific dataset(s) or applicable accession number(s) and the NIH-designated data repositories through which the investigator accessed any data. Citation guidelines for doing this are outlined below.
- If you are considering mirroring a copy of our publicly available datasets or providing direct access to any of the TCIA data via another tool or website using the REST API (https://wiki.cancerimagingarchive.net/x/NIIiAQ) please review our Data Analysis Centers (DACs) page (https://wiki.cancerimagingarchive.net/x/x49XAQ) for more information. DACs must provide attribution and links back to this TCIA data use policy and must require downstream users to do the same.
The summary page for every TCIA dataset includes a Citations & Data Usage Policy tab. Please consult the Citation & Data Usage Policy for each Collection before using them.
- Most data are freely available to browse, download, and use for commercial, scientific and educational purposes as outlined in the Creative Commons Attribution 3.0 Unported License or the Creative Commons Attribution 4.0 International License. In rare circumstances commercial use may be prohibited using Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0) or Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).
- Most data are immediately accessible and do not require account registration. A small subset of collections do require registration and special permission to gain access. Refer to the "Access" column on https://www.cancerimagingarchive.net/collections/ for more details.
5 bone segmentation masks and 15 annotations of anatomical landmarks for pelvis bones in each of 90 Computed Tomography (CT) cases extracted from the CT Lymph nodes and CT Colonography collections from the The Cancer Imaging Archive (TCIA).
(from the original page) Many Cancers routinely identified by imaging haven’t yet benefited from recent advances in computer science. Approaches such as machine learning and deep learning can generate quantitative tumor 3D volumes, complex features and therapy-tracking temporal dynamics. However, cross-disciplinary researchers striving to develop new approaches often lack disease understanding or sufficient contacts within the medical community. Their research can greatly benefit from labeling and annotating basic information in the images such as tumor locations, which are obvious to radiologists.
Crowd-sourcing the creation of publicly-accessible reference data sets could address this challenge. In 2011 the National Cancer Institute funded development of The Cancer Imaging Archive (TCIA), a free and open-access database of medical images. However, most of these collections lack the labeling and annotations needed by image processing researchers for progress in deep learning and radiomics. As a result, TCIA has partnered with the Radiological Society of North America (RSNA) and numerous academic centers to harness the vast knowledge of RSNA meeting attendees to generate these tumor markups.
The csv file contains a list of all annotations on the images organized by author, disease type, location and patient There are two subfolders
The original dataset was downloaded from https://wiki.cancerimagingarchive.net/plugins/servlet/mobile?contentId=33948774#content/view/33948774
The citation for the data should be used as below:
Jayashree Kalpathy-Cramer, Andrew Beers, Artem Mamonov, Erik Ziegler, Rob Lewis, Andre Botelho Almeida, Gordon Harris, Steve Pieper, Ashish Sharma, Lawrence Tarbox, Jeff Tobler, Fred Prior, Adam Flanders, Jamie Dulkowski, Brenda Fevrier-Sullivan, Carl Jaffe, John Freymann, Justin Kirby. Crowds Cure Cancer: Data collected at the RSNA 2017 annual meeting. The Cancer Imaging Archive. doi: 10.7937/K9/TCIA.2018.OW73VLO2
The work was done by volunteer, unpaid radiologists and non-radiologists, which makes it a very unreliable dataset. Even in the example image it is clear the definition of a tumor and where its boundaries are varies from person to person.
The biggest question is how do you perform quality control?
How can you determine which annotators create the best data?
Are bad annotations useful or should they be deleted?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data belonging to the 'Predicting the 1p/19q co-deletion status of presumed low grade glioma with an externally validated machine learning algorithm' paper, as publisched in Clinical Cancer Research. When using this data please cite: (Citation follows later).
Data includes trained SVM models, image features derived for all patients, labels for all patients, PCE models used to derive feature importance and segmentations made for the LGG-1p19qDeletion dataset from The Cancer Imaging Archive (https://wiki.cancerimagingarchive.net/display/Public/LGG-1p19qDeletion#a888d85b04c640eeaf802e12db2dc8ad)
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions. It is a web-accessible international resource for development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process.
Seven academic centers and eight medical imaging companies collaborated to create this data set which contains 1018 cases. Each subject includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories ("nodule > or =3 mm," "nodule <3 mm," and "non-nodule > or =3 mm"). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion. The goal of this process was to identify as completely as possible all lung nodules in each CT scan without requiring forced consensus.
Note : The TCIA team strongly encourages users to review pylidc and the Standardized representation of the TCIA LIDC-IDRI annotations using DICOM (DICOM-LIDC-IDRI-Nodules) of the annotations/segmentations included in this dataset before developing custom tools to analyze the XML version.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Computational Precision Medicine (CPM) 2018 was held on September 16, in Granada (Spain), in conjunction with MICCAI 2018. As part of the CPM program, a series of imaging Grand Challenges were offered, hosted by kaggle. This competition, "18F-FDG PET Radiomics Risk Stratifiers in Head and Neck Cancer", was organized as a Medical Image Computing and Computer Assisted Intervention (MICCAI) Computational Precision Medicine (CPM) grand challenge. Contestants weretasked to predict, using primary tumor 18F-FDG PET-derived radiomics features +/- matched clinical data, whether a tumor arising from the oropharynx will be controlled by definitive radiation treatment (RT). The head and neck radiation oncology team from University of Texas MD Anderson Cancer Center (MDACC) have curated and harmonized a multi-institutional dataset of 248 oropharynx cancer (OPC) patients, using our in-house 'LAMBDA-RAD' data management platform. Scans came from six different institutes from: the US (MDACC), Canada [four different clinical institutions in Québec: Hôpital Général Juif de Montréal (HGJ), Centre Hospitalier Universitaire de Sherbrooke (CHUS), Centre Hopitalier de l'Université de Montréal (CHUM) and Hôpital Maisonneuve-Rosemont de Montréal (HMR)], and Europe (MAASTRO Clinic, The Netherlands).The challenge was open from June 15, 2018, 11:59 p.m. toAug. 30, 2018, midnight UT; this repository serves as FAIR (re)use durable repository for challenge data.Details on the 2018 CPM Challenges can be found at: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=37224869 The "18F-FDG PET Radiomics Risk Stratifiers in Head and Neck Cancer" challenge website (archived) can be viewed at: https://web.archive.org/web/20190106050801/http://miccai.cloudapp.net/competitions/77and at: https://web.archive.org/web/20210418000112/https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=37224869The kaggle-in-class host page for the challenge and results can be found at: https://www.kaggle.com/c/pet-radiomics-challenges.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Citations & Data Usage Policy
Users of this data must abide by the TCIA Data Usage Policy and the Creative Commons Attribution 3.0 Unported License under which it has been published. Attribution should include references to the following citations:
Data Citation
Grossberg A, Elhalawani H, Mohamed A, Mulder S, Williams B, White AL, Zafereo J, Wong AJ, Berends JE, AboHashem S, Aymard JM, Kanwar A, Perni S, Rock CD, Chamchod S, Kantor M, Browne T, Hutcheson K, Gunn GB, Frank SJ, Rosenthal DI, Garden AS, Fuller CD, M.D. Anderson Cancer Center Head and Neck Quantitative Imaging Working Group. (2020) HNSCC [ Dataset ]. The Cancer Imaging Archive. DOI: https://doi.org/10.7937/k9/tcia.2020.a8sh-7363
In addition to the dataset citation above, please be sure to cite the following if you utilize these data in your research:
Publication Citation
Grossberg A, Mohamed A, Elhalawani H, Bennett W, Smith K, Nolan T, Williams B, Chamchod S, Heukelom J, Kantor M, Browne T, Hutcheson K, Gunn G, Garden A, Morrison W, Frank S, R osenthal D, Freymann J, Fuller C. (2018) Imaging and Clinical Data Archive for Head and Neck Squamous Cell Carcinoma Patients Treated with Radiotherapy. Scientific Data 5 :180173 (2018) DOI: 10.1038/sdata.2018.173
Publication Citation
Elhalawani, H., Mohamed, A., White, A. et al. Matched computed tomography segmentation and demographic data for oropharyngeal cancer radiomics challenges. Sci Data 4, 170077 (2017). DOI: 10.1038/sdata.2017.77
TCIA Citation
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. DOI: 10.1007/s10278-013-9622-7
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please cite our data paper published in "Data in Brief": https://www.sciencedirect.com/science/article/pii/S2352340923007473
BackgroundLiver cancer ranks as the third leading cause of cancer-related mortality worldwide [1] and alarmingly, both the incidence and mortality rates of liver cancer are increasing [2; 3]. Among the various types of primary liver cancer, hepatocellular carcinoma (HCC) stands out as the most prevalent, accounting for approximately 70-85% of liver cancer cases [4]. Leveraging the advantages of magnetic resonance (MR) imaging, HCC can be reliably detected and diagnosed without the requirement of an invasive biopsy [5]. MR imaging offers high tissue contrast, which can be further enhanced through contrast-enhanced multiphasic magnetic resonance imaging (mpMRI) techniques. This enables accurate identification and non-invasive diagnosis of HCC [6].
ObjectivePrecise segmentation of the liver plays a crucial role in volumetry assessment and serves as a vital pre-processing step for subsequent tumor detection algorithms [7]. However, accurate liver segmentation can be particularly challenging in patients with cancer-related tissue alterations and deformations in shape [8]. Accurate HCC tumor segmentation is essential for the extraction of quantitative imaging biomarkers such as radiomics and can be used for studies on treatment response assessment and prognosis evaluation and provides critical information about the tumor biology. In order to enhance the reproducibility of liver and tumor segmentation, automated methods utilizing image analysis techniques and machine learning have been developed. These methods have demonstrated promising results [7; 8]; however, most algorithms were tested only on small internal test sets and therefore do not guarantee generalizable and consistent performance on external data. Publicly available datasets allow for fair and objective comparisons between different algorithms, techniques, or approaches. Researchers can evaluate the strengths and weaknesses of their methods in relation to existing solutions and establish benchmarks for performance evaluation. In addition to providing a benchmark with this dataset, we also assess the inter-rater variability between two different sets of tumor segmentations. This analysis serves as a measure of reproducibility for human segmentations, highlighting the consistency or variability that may exist among different human raters. Understanding the reproducibility of human segmentations is essential in assessing the reliability of manual annotations and establishing a baseline for algorithm performance comparison. By introducing LiverHccSeg, we aim to fill the gap of lacking publicly available mpMRI HCC datasets and offer researchers and developers a valuable resource for algorithmic evaluation on external data and imaging biomarker analyzes.
Materials and Methods Inclusion of PatientsAll available scans from The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection (TCGA-LIHC) (https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=6885436) were downloaded [9]. One multiphasic MRI scan (pre and triphasic post contrast) per patient was included. Patients who did not exhibit a tumor or residual tumor were excluded from the tumor segmentation dataset; however, they were included in the liver segmentation dataset.
MR Imaging DataSubsequently, all imaging data was converted to the Neuroimaging Informatics Technology Initiative (NIfTI) format with the dcm2nii (v2.1.53) package [10] and available header information was extracted using the pydicom (v.2.1.2) package [11]. Multiparametric MR sequences were labeled with a consistent syntax ('pre', 'art', 'pv', 'del', for the pre-contrast, arterial, portal-venous and delayed contrast phases, respectively). All images were already de-identified by the TCIA website. Images were acquired between the years 1993 and 2007 on Philips and Siemens scanners with field strengths of 1.5 and 3 Tesla. Full details of the imaging parameters can be found in Table 5. Briefly, the median repetition time (TR) and median echo time (TE) were 365.8 ms and 26.4 ms, respectively. The median slice thickness was 9.5 mm, the median bandwidth 536.9 Hz.
Scientific ReadingAfter conversion, all images were read in a scientific reading by two board-certified abdominal radiologists (S.A. and S.H with 9 and 10 years of experience, respectively). Any disagreement between the two raters was discussed in a consensus meeting. All HCC lesions were classified according to LI-RADS criteria [6].
Image RegistrationThe co-registration of pre-contrast, portal-venous, and delayed-phase images with arterial phase images was performed using the software BioImage Suite (v3.5) [12]. A non-rigid intensity-based registration approach was applied, employing a parameterized free-form deformation (FFD) with 3D B-splines [13]. The optimal FFD transformation was estimated by maximizing the normalized mutual information similarity metric [14] through gradient descent optimization. To enhance the optimization process, a multi-resolution image pyramid with three levels was utilized. The final B-spline control point spacing was set to 80 mm. The estimated transformation was then employed to warp the moving images (pre-contrast, portal-venous, and delayed-phase) into the reference image space, specifically the arterial phase image.
Liver and Tumor Segmentation and Statistical AnalysisAll livers and tumors were manually segmented under the supervision of two board-certified abdominal radiologists using the software 3D Slicer (v4.10.2) [15]. To compare the segmentation agreement between the two sets of liver and tumor segmentations, we calculated segmentation metrics using the Python package seg-metrics (v1.0.0) [16]. All segmentation metrics and statistics were calculated in Python (v3.7).
Data descriptionThe data that appears in this article include:
dicoms.zip: This zip file contains all the raw MR images from The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection (TCGA-LIHC) [1] in the Digital Imaging and Communications in Medicine (DICOM) format used for the curation of this dataset. The data is structured as Patient-ID/DATE/SEQUENCE where Patient-ID is the unique unidentified patient ID, DATE is the date of the image acquisition, and SEQUENCE is the name of the MR sequence. LiverHccSeg_MetaData.xlsx: This spreadsheet contains all the metadata from the DICOM headers along with the data from the scientific image readings. nifti_and_segms.zip: This zip file contains all MR images along with the liver and tumor segmentations in the Neuroimaging Informatics Technology Initiative (NIfTI) format.The data is structured as Patient-ID/DATE/SEQUENCE where Patient-ID is the unique anonymized patient identifier, DATE is the date of the image acquisition, and SEQUENCE is the name of the MRI sequence or segmentation image.The NIfTI files are named as follows:pre.nii.gz : Pre-contrast T1-weighted MRIart.nii.gz: Arterial-phase T1-weighted MRIpv.nii.gz: Portal-venous-phase T1-weighted MRIdel.nii.gz: Delayed-phase T1-weighted MRIart_pre.nii.gz: Pre-contrast T1-weighted MRI registered to the corresponding arterial-phase T1-weighted imageart_pv.nii.gz: Portal-venous-phase T1-weighted MRI registered to the corresponding arterial-phase T1-weighted MRIart_del.nii.gz: Delayed-phase T1-weighted MRI registered to the corresponding arterial-phase T1-weighted MRIThe corresponding manual segmentations are named after the rater and the type of segmentation and follow the format 'RATER_ROI.nii.gz' where RATER denotes the human rater and ROI denotes the region of interest that was segmented, for example, 'rater1_liver.nii.gz', 'rater2_liver.nii.gz', 'rater1_tumor1.nii.gz', and 'rater2_tumor1.nii.gz'. For tumor segmentations, an integer indicates the tumor identification number for different tumor ROIs, for example, 'rater1_tumor1.nii.gz' and 'rater2_tumor1.nii.gz'. The segmentations can be used for the arterial phase NIfTI file as well as the corresponding co-registered pre-contrast (art_pre.nii.gz), portal-venous (art_pv.nii.gz), and delayed-phase (art_del.nii.gz) images. segm_metrics.xlsx: This spreadsheet summarizes the segmentation agreement between the two sets of liver and tumor segmentations by the two board-certified abdominal radiologists.
References 1 Sung H, Ferlay J, Siegel RL et al (2021) Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 71:209-249 2 Siegel RL, Miller KD, Jemal A (2019) Cancer statistics, 2019. CA Cancer J Clin 69:7-34 3 White DL, Thrift AP, Kanwal F, Davila J, El-Serag HB (2017) Incidence of Hepatocellular Carcinoma in All 50 United States, From 2000 Through 2012. Gastroenterology 152:812-820.e815 4 Perz JF, Armstrong GL, Farrington LA, Hutin YJ, Bell BP (2006) The contributions of hepatitis B virus and hepatitis C virus infections to cirrhosis and primary liver cancer worldwide. J Hepatol 45:529-538 5 Hamer OW, Schlottmann K, Sirlin CB, Feuerbach S (2007) Technology insight: advances in liver imaging. Nat Clin Pract Gastroenterol Hepatol 4:215-228 6 Chernyak V, Fowler KJ, Kamaya A et al (2018) Liver Imaging Reporting and Data System (LI-RADS) Version 2018: Imaging of Hepatocellular Carcinoma in At-Risk Patients. Radiology 289:816-830 7 Bousabarah K, Letzen B, Tefera J et al (2020) Automated detection and delineation of hepatocellular carcinoma on multiphasic contrast-enhanced MRI using deep learning. Abdom Radiol. 10.1007/s00261-020-02604-5 8 Gross M, Spektor M, Jaffe A et al (2021) Improved performance and consistency of deep learning 3D liver segmentation with heterogeneous cancer stages in magnetic resonance imaging. PLoS One 16:e0260630 9 Erickson BJ, Kirk S, Lee Y et al (2016) Radiology Data from The Cancer Genome Atlas
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This data set is part of the public development data for the 2023 Automated Universal Classification Challenge (AUC23). The data set concerns the classification of breast cancer molecular subtypes on dynamic contrast-enhanced magnetic resonance imaging (MRI) and was derived from Duke Hospital. The data set is a subset of the data originally introduced and described by Saha et al. (2018), with no additional images or patient information. Data was restructured in compliance with the AUC23 challenge format. The dataset is a single-institutional, retrospective collection of 737 biopsy-confirmed patients from 1 January 2000 to 23 March 2014 with invasive breast cancer and available pre-operative MRI at Duke Hospital.
Images are 3D tensors:
Classification labels:
Folder structure:
imagesTr (root folder with all patients and studies)
├── Breast_MRI_0001_0000.mha (3D T1-subtraction MRI imaging for study 0001)
├── Breast_MRI_0003_0000.mha (3D T1-subtraction MRI imaging for study 0003)
├── ...
Please cite the following article if you are using the Duke-Breast-Cancer-MRI Dataset:
A. Saha, M. R. Harowicz, L. J. Grimm, C. E. Kim, S. V. Ghate, R. Walsh, M. A. Mazurowski, "A machine learning approach to radiogenomics of breast cancer: a study of 922 subjects and 529 DCE-MRI features". Br J Cancer. 2018 Aug;119(4):508-516. doi: 10.1038/s41416-018-0185-8. Epub 2018 Jul 23. PMID: 30033447; PMCID: PMC6134102.
The Ivy Glioblastoma Atlas Project (Ivy GAP) is a detailed anatomically based transcriptomic atlas of human glioblastoma tumors. As collaborators, the Ivy Foundation funded the Allen Institute and the Swedish Neuroscience Institute to design and create the atlas. The Paul G. Allen Family Foundation also supported the project. This resource consists of a viewer interface that resolves the manually- and machine-annotated histologic images (H&E and RNA in situ hybridization) at 0.5 µm/pixel, a transcriptome browser to view and mine the anatomically-based RNA-Seq samples, an application programming interface, help documentation that describes the methods and how to use the resource, as well as SNP array data and the supporting longitudinal clinical information and MRI time course data. The resource is made available to the public without charge as part of the Ivy GAP (http://glioblastoma.alleninstitute.org/) via the Allen Institute data portal (http://www.brain-map.org), the Ivy GAP Clinical and Genomic Database (http://ivygap.org/) via the Swedish Neuroscience Institute (http://www.swedish.org/services/neuroscience-institute), and The Cancer Imaging Archive (https://wiki.cancerimagingarchive.net/display/Public/Ivy+GAP). The Ivy GAP processed data at GEO includes normalized RNA-Seq FPKM files used for analysis in "An anatomic transcriptional atlas of glioblastoma,” which is under review. Other processed data files as well as sample and donor meta-data and QC metrics are available at http://glioblastoma.alleninstitute.org/static/download.html. The raw RNA-Seq and SNP array data will be submitted to dbGaP.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains the download data for the LUNA16 challenge available at https://luna16.grand-challenge.org/
The dataset is bigger than Zenodo currently allows, the remaining files can be found in LUNA16 Part 2/2.
This dataset includes the images from the LIDC/IDRI dataset in a different format, together with additional annotations. The LIDC/IDRI dataset is available at https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI under a Creative Commons Attribution 3.0 Unported License.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We downloaded Head-Neck-Radiomics-HN1, align CT and segmentation images and use PyRadiomics to extract all kind of features for data analysis.
CSV contains all the features. Some of them are settings information, setting information should be remove before any data analysis. By using feature extraction algorithm or... by checking values, all the settings have the same values in the columns. Each row is a patient.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Lower limb bioheat COMSOL Multiphysics® (Massachusetts, USA) model. The model is based on the computed tomography dataset acquired from the Cancer Imaging Archives (Subject ID TGGA-CV-A6JU) [1,2,3]
This COMSOL model simulates the peripheral thermal behavior using Pennes bioheat equation and by considering blood flow in the main arterial structure.
[1] Zuley, M. L., Jarosz, R., Kirk, S., Lee, Y., Colen, R., Garcia, K., … Aredes, N. D. (2016). Radiology Data from The Cancer Genome Atlas Head-Neck Squamous Cell Carcinoma [TCGA-HNSC] collection. The Cancer Imaging Archive. http://doi.org/10.7937/K9/TCIA.2016.LXKQ47MS 6
[2] Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. (paper)
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This collection contains subjects from the National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium Uterine Corpus Endometrial Carcinoma (CPTAC-UCEC) cohort. CPTAC is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics. Radiology and pathology images from CPTAC patients are being collected and made publicly available by The Cancer Imaging Archive to enable researchers to investigate cancer phenotypes which may correlate to corresponding proteomic, genomic and clinical data.
Imaging from each cancer type will be contained in its own TCIA Collection, with the collection name "CPTAC-cancertype". Radiology imaging is collected from standard of care imaging performed on patients immediately before the pathological diagnosis, and from follow-up scans where available. For this reason the radiology image data sets are heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. Pathology imaging is collected as part of the CPTAC qualification workflow.
All CPTAC cohorts are released as either a single combined cohort, or split into Discovery and Confirmatory where applicable. There are two main types of proteomic studies: discovery proteomics and targeted proteomics. The term "discovery proteomics" is in reference to "untargeted" identification and quantification of a maximal number of proteins in a biological or clinical sample. The term “targeted proteomics” refers to quantitative measurements on a defined subset of total proteins in a biological or clinical sample, often following the completion of discovery proteomics studies to confirm interesting targets selected. Commonly used proteomic technologies and platforms are different types of mass spectrometry and protein microarrays depending on the needs, throughput and sample input requirement of an analysis, with further development on nanotechnologies and automation in the pipeline in order to improve the detection of low abundance proteins, increase throughput, and selectively reach a target protein in vivo. Once the protein targets of interest are identified, high-throughput targeted assays are developed for confirmatory studies: tests to affirm that the initial tests were accurate. A summary of CPTAC imaging efforts can be found on the CPTAC Imaging Proteomics page.
You can join the CPTAC Imaging Special Interest Group to be notified of webinars & data releases, collaborate on common data wrangling tasks and seek out partners to explore research hypotheses! Artifacts from previous webinars such as slide decks and video recordings can be found on the CPTAC SIG Webinars page.
On January 14, 2020 Emily Kawaler presented the consortium's proteogenomic analyses of the CPTAC-UCEC. This deep dive into the UCEC genomic and proteomic datasets will help researchers better understand how they can be correlated with features derived from the imaging data. (Download the slides)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These files contain the BRATS2013 Brain tumour data and belong to the International BRATS 2013 Challenge in Image Segmentation from the MICCAI Conference of 2013.Public data can be found in https://www.virtualskeleton.ch/BRATS/Start2013More information can be found in https://wiki.cancerimagingarchive.net/display/Public/NCI-MICCAI+2013+Grand+Challenges+in+Image+SegmentationThe BRATS2013_CHALLENGE.zip file contains the 10 test cases released for the evaluation of the methods that participated in the Challenge.The BRATS_Leaderboard.zip file conforms the Learderboard set used to perform an initial ranking of the best methods before the final test set evaluation.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This collection contains subjects from the National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium Cutaneous Melanoma (CPTAC-CM) cohort. CPTAC is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics. Radiology and pathology images from CPTAC patients are being collected and made publicly available by The Cancer Imaging Archive to enable researchers to investigate cancer phenotypes which may correlate to corresponding proteomic, genomic and clinical data.
Imaging from each cancer type will be contained in its own TCIA Collection, with the collection name "CPTAC-cancertype". Radiology imaging is collected from standard of care imaging performed on patients immediately before the pathological diagnosis, and from follow-up scans where available. For this reason the radiology image data sets are heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. Pathology imaging is collected as part of the CPTAC qualification workflow.
All CPTAC cohorts are released as either a single combined cohort, or split into Discovery and Confirmatory where applicable. There are two main types of proteomic studies: discovery proteomics and targeted proteomics. The term "discovery proteomics" is in reference to "untargeted" identification and quantification of a maximal number of proteins in a biological or clinical sample. The term “targeted proteomics” refers to quantitative measurements on a defined subset of total proteins in a biological or clinical sample, often following the completion of discovery proteomics studies to confirm interesting targets selected. Commonly used proteomic technologies and platforms are different types of mass spectrometry and protein microarrays depending on the needs, throughput and sample input requirement of an analysis, with further development on nanotechnologies and automation in the pipeline in order to improve the detection of low abundance proteins, increase throughput, and selectively reach a target protein in vivo. Once the protein targets of interest are identified, high-throughput targeted assays are developed for confirmatory studies: tests to affirm that the initial tests were accurate. A summary of CPTAC imaging efforts can be found on the CPTAC Imaging Proteomics page.
You can join the CPTAC Imaging Special Interest Group to be notified of webinars & data releases, collaborate on common data wrangling tasks and seek out partners to explore research hypotheses! Artifacts from previous webinars such as slide decks and video recordings can be found on the CPTAC SIG Webinars page.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Prostate cancer T1- and T2-weighted magnetic resonance images (MRIs) were acquired on a 1.5 T Philips Achieva by combined surface and endorectal coil, including dynamic contrast-enhanced images obtained prior to, during and after I.V. administration of 0.1 mmol/kg body weight of Gadolinium-DTPA (pentetic acid). Corresponding clinical metadata (XLS format) and 3D segmentation files (NRRD format) are offered as a supplement to this image collection. The XLS file contains pathology biopsy and excised gland tissue reports and the MRI radiology report for most subjects.
The Multi-component NRRD Segmentations allow visualization and downstream analysis in 3D Slicer of the following prostate components: prostate gland boundary; internal capsule; central gland, peripheral zone; seminal vesicles; urethra; cancer – dominant nodule; neurovascular bundle; penile bulb; ejaculatory duct; veru-montanum; and rectum. See our tutorial on Using 3D Slicer with the Prostate-Diagnosis data if you are not familiar with using this kind of data.
The Seminal vesicles (SV) and neurovascular bundle (NVB) Segmentations delineate the neurovascular bundle and seminal vessicles as MHA files. These were provided as part of a planned challenge competition that did not materialize.
The Third Party Analysis dataset mentioned beneath the Data Access table was added later as part of the NCI-ISBI 2013 Challenge - Automated Segmentation of Prostate Structures. It includes segmentations for 30 Prostate-Diagnosis subjects in NRRD format which mark the boundaries of the central gland and peripheral zone were also provided