Facebook
TwitterThe NCI DIS 3D database is a collection of 3D structures for over 400,000 drugs. The database is an extension of the NCI Drug Information System. The structural information stored in the DIS is only the connection table for each drug. The connection table is just a list of which atoms are connected and how they are connected. It is essentially a searcheable database of three-dimensional structures has been developed from the chemistry database of the NCI Drug Information System (DIS), a file of about 450,000 primarily organic compounds which have been tested by NCI for anticancer activity. The DIS database is very similar in size and content to the proprietary databases used in the pharmaceutical industry; its development began in the 1950s; and this history led to a number of problems in the generation of 3D structures. This information can be searched to find drugs that share similar patterns of connections, which can correlate with similar biological activity. But the cellular targets for drug action, as well as the drugs themselves, are 3 dimensional objects and advances in computer hardware and software have reached the point where they can be represented as such. In many cases the important points of interaction between a drug and its target can be represented by a 3D arrangement of a small number of atoms. Such a group of atoms is called a pharmacophore. The pharmacophore can be used to search 3D databases and drugs that match the pharmacophore could have similar biological activity, but have very different patterns of atomic connections. Having a diverse set of lead compounds increases the chances of finding an active compound with acceptable properties for clinical development. Sponsor: The ICBG are supported by the Cooperative Agreement mechanism, with funds from nine components of the NIH, the National Science Foundation, and the Foreign Agricultural Service of the USDA.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Welsh Cancer Intelligence & Surveillance Unit (WCISU) is the National Cancer Registry for Wales and its primary role is to record, store and report on all incidence of cancer for the resident population of Wales wherever they are treated. Cancer registration in Wales began almost five decades ago and today’s electronic database which holds records going back to 1972 contains in the region of 686,000 records.
WCISU collects data about occurrences of cancer in Welsh residents via direct or indirect submissions from Welsh Hospitals.
Staging of malignant melanoma (ICD 10 code C43), breast (C50), colorectal (C18-C20) and cervix (C53) started in 2001 since this was when we started receiving pathological information. Staging for all other cancers started in 2010.
Treatment information started in 1995.
Facebook
TwitterSEER Limited-Use cancer incidence data with associated population data. Geographic areas available are county and SEER registry. The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute collects and distributes high quality, comprehensive cancer data from a number of population-based cancer registries. Data include patient demographics, primary tumor site, morphology, stage at diagnosis, first course of treatment, and follow-up for vital status. The SEER Program is the only comprehensive source of population-based information in the United States that includes stage of cancer at the time of diagnosis and survival rates within each stage.
Facebook
TwitterThe NCI DIS 3D database is a collection of 3D structures for over 400,000 drugs. The database is an extension of the NCI Drug Information System. The structural information stored in the DIS is only the connection table for each drug. The connection table is just a list of which atoms are connected and how they are connected. It is essentially a searcheable database of three-dimensional structures has been developed from the chemistry database of the NCI Drug Information System (DIS), a file of about 450,000 primarily organic compounds which have been tested by NCI for anticancer activity. The DIS database is very similar in size and content to the proprietary databases used in the pharmaceutical industry; its development began in the 1950s; and this history led to a number of problems in the generation of 3D structures. This information can be searched to find drugs that share similar patterns of connections, which can correlate with similar biological activity. But the cellular targets for drug action, as well as the drugs themselves, are 3 dimensional objects and advances in computer hardware and software have reached the point where they can be represented as such. In many cases the important points of interaction between a drug and its target can be represented by a 3D arrangement of a small number of atoms. Such a group of atoms is called a pharmacophore. The pharmacophore can be used to search 3D databases and drugs that match the pharmacophore could have similar biological activity, but have very different patterns of atomic connections. Having a diverse set of lead compounds increases the chances of finding an active compound with acceptable properties for clinical development. Sponsor: The ICBG are supported by the Cooperative Agreement mechanism, with funds from nine components of the NIH, the National Science Foundation, and the Foreign Agricultural Service of the USDA.
Facebook
TwitterImaging Data Commons (IDC) is a repository within the Cancer Research Data Commons (CRDC) that manages imaging data and enables its integration with the other components of CRDC. IDC hosts a growing number of imaging collections that are contributed by either funded US National Cancer Institute (NCI) data collection activities, or by the individual researchers.Image data hosted by IDC is stored in DICOM format.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Outcomes of patients undergoing HIPEC and CRS by gender-NCDB.
Facebook
TwitterBackground: Limited data exists demonstrating the clinical benefit of proton radiotherapy (PRT) in breast cancer. Using the National Cancer Database, we evaluated predictors associated with PRT use for patients with breast cancer. An exploratory analysis also investigates the impact of PRT on overall survival (OS).Methods: Patients with non-metastatic breast cancer treated with adjuvant radiotherapy from 2004 to 2014 were identified. Patients were stratified based on receipt of PRT or non-PRT (i.e., photons ± electrons). A logistic regression model was used to determine predictors for PRT utilization. For OS, Multivariable analysis (MVA) was performed using Cox proportional hazard model.Results: A total of 724,492 patients were identified: 871 received PRT and 723,621 received non-PRT. 58.3% of the PRT patients were group stage 0–1. Median follow-up time was 62.2 months. On multivariate logistic analysis, the following factors were found to be significant for receipt of PRT (all p < 0.05): academic facility (odds ratio [OR] = 2.50), South (OR = 2.01) and West location (OR = 12.43), left-sided (OR = 1.21), ER-positive (OR = 1.59), and mastectomy (OR = 1.47); pT2-T4 disease predicted for decrease use (OR = 0.79). PRT was not associated with OS on MVA for all patients: Hazard Ratio: 0.85, p = 0.168. PRT remained not significant on MVA after stratifying for subsets likely associated with higher heart radiation doses, including: left-sided (p = 0.140), inner-quadrant (p = 0.173), mastectomy (p = 0.095), node positivity (p = 0.680), N2-N3 disease (p = 0.880), and lymph node irradiation (LNI) (p = 0.767).Conclusions: Receipt of PRT was associated with left-sided, ER+ tumors, mastectomy, South and West location, and academic facilities, but not higher group stages or LNI. PRT was not associated with OS, including in subsets likely at risk for higher heart doses. Further studies are required to determine non-OS benefits of PRT. In the interim, given the high cost of protons, only well-selected patients should receive PRT unless enrolled on a clinical trial.
Facebook
TwitterThe Veterans Affairs Central Cancer Registry (VACCR) receives and stores information on cancer diagnosis and treatment constraints compiled and sent in by the local cancer registry staff at each of the 132 Veterans Affairs Medical Centers that diagnose and/or treat Veterans with cancer. The information sent is encoded to meet the site-specific requirements for registry inclusion as established by several oversight bodies, including the North American Association of Central Cancer Registries, the American College of Surgeons' Commission on Cancer, and the American Joint Commission on Cancer, among others. The information is obtained from a wide variety of medical record documents at the local medical center pertaining to each Veterans Health Administration (VHA) cancer patient. The information is then transmitted to the VACCR. Details collected include extensive demographics, cancer identification, extent of disease and staging, first course of treatment, and outcomes. Data extraction is available to researchers with VA approved Institutional Review Board studies, peer review, and Data Use Agreements.
Facebook
Twitterhttps://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Rapid Cancer Registration Data (RCRD) provides a quick, indicative source of cancer data. It is provided to support the planning and provision of cancer services. The data is based on a rapid processing of cancer registration data sources, in particular on Cancer Outcomes and Services Dataset (COSD) information. In comparison, National Cancer Registration Data (NCRD) relies on additional data sources, enhanced follow-up with trusts and expert processing by cancer registration officers. The Rapid Cancer Registration Data (RCRD) may be useful for service improvement projects including healthcare planning and prioritisation. However, it is poorly suited for epidemiological research due to limitations in the data quality and completeness.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Characteristics of the study population by type of cancer.
Facebook
TwitterThe United States Cancer Statistics (USCS) online databases in WONDER provide cancer incidence and mortality data for the United States for the years since 1999, by year, state and metropolitan areas (MSA), age group, race, ethnicity, sex, childhood cancer classifications and cancer site. Report case counts, deaths, crude and age-adjusted incidence and death rates, and 95% confidence intervals for rates. The USCS data are the official federal statistics on cancer incidence from registries having high-quality data and cancer mortality statistics for 50 states and the District of Columbia. USCS are produced by the Centers for Disease Control and Prevention (CDC) and the National Cancer Institute (NCI), in collaboration with the North American Association of Central Cancer Registries (NAACCR). Mortality data are provided by the Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS), National Vital Statistics System (NVSS).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: CCDI-MCI. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.
The Molecular Characterization Initiative (MCI) [2] is a component of the National Cancer Institute’s (NCI) Childhood Cancer Data Initiative (CCDI). It offers state-of-the-art molecular testing at no cost to newly diagnosed children, adolescents, and young adults (AYAs) with central nervous system (CNS) tumors, soft tissue sarcomas (STS), certain rare childhood cancers (RAR), and certain neuroblastomas (NBL) treated at a Children’s Oncology Group (COG)–affiliated hospital. The goal of MCI is to enhance the understanding of genetic factors in pediatric cancers and to provide timely, clinically relevant findings to doctors and families to aid in treatment decisions and determine eligibility for certain planned COG clinical trials.
The original images in vendor-specific format were collected on IRB-approved clinical trials or tissue banking studies from Children’s Oncology Group (COG) patients enrolled in EveryChild APEC14B1 protocol.
Those images, augmented with the metadata describing their content, were provided to the IDC team for the purposes of archival, and were converted into DICOM Whole Slide Microscopy (SM) representation [3,4] using custom open source scripts and tools as described in [5]. The resulting converted images were released in IDC in the CCDI-MCI collection with the IDC data release v19.
To learn how to access related clinical and genomic data accompanying this collection please see the CCDI-MCI page and CCDI Hub.
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.
ccdi_mci-idc_v19-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services bucketsccdi_mci-idc_v19-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage bucketsccdi_mci-idc_v19-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.
Each of the manifests include instructions in the header on how to download the included files.
To download the files using .s5cmd manifests:
pip install --upgrade idc-index.s5cmd manifest file: idc download manifest.s5cmdTo download the files using .dcf manifest, see manifest header.
Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.
[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W. L., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National cancer institute imaging data commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics 43, (2023).
[3] National Electrical Manufacturers Association (NEMA). DICOM PS3.3 - Information Object Definitions: A.32.8 VL Whole Slide Microscopy Image IOD. at <https://dicom.nema.org/medical/dicom/current/output/html/part03.html#sect_A.32.8>
[4] Herrmann, M. D., Clunie, D. A., Fedorov, A., Doyle, S. W., Pieper, S., Klepeis, V., Le, L. P., Mutter, G. L., Milstone, D. S., Schultz, T. J., Kikinis, R., Kotecha, G. K., Hwang, D. H., Andriole, K. P., John Lafrate, A., Brink, J. A., Boland, G. W., Dreyer, K. J., Michalski, M., Golden, J. A., Louis, D. N. & Lennerz, J. K. Implementing the DICOM standard for digital pathology. J. Pathol. Inform. 9, 37 (2018).
[5] Clunie, D., Fedorov, A. & Herrmann, M. D. ImagingDataCommons/idc-wsi-conversion: Initial release. (Zenodo, 2023). doi:10.5281/ZENODO.8240154
Facebook
TwitterThe US National Cancer Institute (NCI) maintains and administers data elements, forms, models, and components of these items in a metadata registry referred to as the Cancer Data Standards Registry and Repository, or caDSR.
Facebook
Twitterhttps://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
| Characteristic | Value (N = 26254) |
|---|---|
| Age (years) | Mean ± SD: 61.4± 5 Median (IQR): 60 (57-65) Range: 43-75 |
| Sex | Male: 15512 (59%) Female: 10742 (41%) |
| Race | White: 23969 (91.3%) |
| Ethnicity | Not Available |
Background: The aggressive and heterogeneous nature of lung cancer has thwarted efforts to reduce mortality from this cancer through the use of screening. The advent of low-dose helical computed tomography (CT) altered the landscape of lung-cancer screening, with studies indicating that low-dose CT detects many tumors at early stages. The National Lung Screening Trial (NLST) was conducted to determine whether screening with low-dose CT could reduce mortality from lung cancer.
Methods: From August 2002 through April 2004, we enrolled 53,454 persons at high risk for lung cancer at 33 U.S. medical centers. Participants were randomly assigned to undergo three annual screenings with either low-dose CT (26,722 participants) or single-view posteroanterior chest radiography (26,732). Data were collected on cases of lung cancer and deaths from lung cancer that occurred through December 31, 2009. This dataset includes the low-dose CT scans from 26,254 of these subjects, as well as digitized histopathology images from 451 subjects.
Results: The rate of adherence to screening was more than 90%. The rate of positive screening tests was 24.2% with low-dose CT and 6.9% with radiography over all three rounds. A total of 96.4% of the positive screening results in the low-dose CT group and 94.5% in the radiography group were false positive results. The incidence of lung cancer was 645 cases per 100,000 person-years (1060 cancers) in the low-dose CT group, as compared with 572 cases per 100,000 person-years (941 cancers) in the radiography group (rate ratio, 1.13; 95% confidence interval [CI], 1.03 to 1.23). There were 247 deaths from lung cancer per 100,000 person-years in the low-dose CT group and 309 deaths per 100,000 person-years in the radiography group, representing a relative reduction in mortality from lung cancer with low-dose CT screening of 20.0% (95% CI, 6.8 to 26.7; P=0.004). The rate of death from any cause was reduced in the low-dose CT group, as compared with the radiography group, by 6.7% (95% CI, 1.2 to 13.6; P=0.02).
Conclusions: Screening with the use of low-dose CT reduces mortality from lung cancer. (Funded by the National Cancer Institute; National Lung Screening Trial ClinicalTrials.gov number, NCT00047385).
Data Availability: A summary of the National Lung Screening Trial and its available datasets are provided on the Cancer Data Access System (CDAS). CDAS is maintained by Information Management System (IMS), contracted by the National Cancer Institute (NCI) as keepers and statistical analyzers of the NLST trial data. The full clinical data set from NLST is available through CDAS. Users of TCIA can download without restriction a publicly distributable subset of that clinical data, along with the CT and Histopathology images collected during the trial. (These previously were restricted.)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Characteristics for young breast cancer patients, NCDB 2007–2013.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The English Cancer Patient Experience Survey (CPES) is commissioned by NHS England and administered on their behalf by an external survey provider organisation (Quality Health). The survey provides insights into the care experienced by cancer patients across England who were treated as day cases or inpatients. Data from CPES has been linked to cancer registration records recorded by the National Cancer Registration and Analysis Service (the cancer registry in England). Individual responses to Wave 3 of CPES are recorded , alongside characteristics of the patient who has completed the survey. Wave 3 of the National Cancer Patient Experience Survey is limited to patients discharged from cancer care between 01/09/2012 – 30/09/2012. Data within the file: --PATIENT_PSEUDO_ID (Project specific Pseudonymised Patient ID) GENDER (coded Male, Female) --QUINTILE2010 (Deprivation quintile [1-5], describing the Income Deprivation Domain where 1= least deprived and 5= most deprived) --FINAL_ROUTE (One of eight Routes to Diagnosis- methodology for the assignment of each route is described in Elliss-Brookes L, McPhail S, Greenslade M, Shelton J, Hiom S, Richards M (2012) Routes to diagnosis for cancer – determining the patient journey using multiple routine data sets. British Journal of Cancer 107: 1220–1226.) --AGE (aggregated in 4 categories: <55, 55-64, 65-74, 75+) --STAGE (stage of the cancer coded as I, II, III, IV, missing) --CANCER_SITE (Cancer sites coded in accordance with ICD 10: C00-C14, C15, C16, C18, C19-C20, C25, C33-C34, C43, C49, C50, C54, C56, C61, C64, C67, C73, C82, C83, C85, C90, C91-C95, D05 and ‘all other ICD-10 codes’ Specific disclosure controls applied: --Gender omitted from the data specification in the following cancer sites: • Female only for C50, D05 and C73 • Male only for C49 --Self-reported ethnicity (from the CPES surveys) aggregated into white British / non-white British / not specified. --Self-reported ethnicity omitted for C49, C64, C73 (replaced as “missing”).
Facebook
TwitterIn particular the data is used to help:
The National Cancer Waiting Times Monitoring Data Set is managed by NHS England.
Facebook
Twitterhttps://data.go.kr/ugs/selectPortalPolicyView.dohttps://data.go.kr/ugs/selectPortalPolicyView.do
This data is from the [2023 Cancer Registration Statistics] collected by the Central Cancer Registry (designated by the National Cancer Center) through the National Cancer Registry Statistics Project. It provides cancer incidence statistics from 1999 to 2023. (Unit: persons, incidence rate per 100,000 people)
It provides annual statistical information on cancer cases in Korea, broken down by gender, age group, and cancer type. It serves as a foundational resource for understanding the scale and distribution of cancer incidence, and is widely used in health care policymaking and disease prevention research. Built with reliability based on the National Cancer Registry, it can be used as a basis for analyzing cancer incidence trends and conducting related research.
Facebook
TwitterThe quarterly emergency presentations of cancer data has been updated by PHE’s National Cancer Registration and Analysis Service (NCRAS).
Data estimates are for all malignant cancers (excluding non-melanoma skin cancer) and are at CCG level, with England as a whole for comparison.
This latest publication includes quarterly data for July 2019 to September 2019 (quarter 2 of financial year 2019 to 2020) and an update of the one year rolling average.
The proportion of emergency presentations for cancer is an indicator of patient outcomes.
Facebook
TwitterThis data was collected by the National Cancer Institute in 2021. This dataset provides detailed information on lung cancer patients, covering demographic attributes, medical history, treatment specifics, and survival outcomes.
Facebook
TwitterThe NCI DIS 3D database is a collection of 3D structures for over 400,000 drugs. The database is an extension of the NCI Drug Information System. The structural information stored in the DIS is only the connection table for each drug. The connection table is just a list of which atoms are connected and how they are connected. It is essentially a searcheable database of three-dimensional structures has been developed from the chemistry database of the NCI Drug Information System (DIS), a file of about 450,000 primarily organic compounds which have been tested by NCI for anticancer activity. The DIS database is very similar in size and content to the proprietary databases used in the pharmaceutical industry; its development began in the 1950s; and this history led to a number of problems in the generation of 3D structures. This information can be searched to find drugs that share similar patterns of connections, which can correlate with similar biological activity. But the cellular targets for drug action, as well as the drugs themselves, are 3 dimensional objects and advances in computer hardware and software have reached the point where they can be represented as such. In many cases the important points of interaction between a drug and its target can be represented by a 3D arrangement of a small number of atoms. Such a group of atoms is called a pharmacophore. The pharmacophore can be used to search 3D databases and drugs that match the pharmacophore could have similar biological activity, but have very different patterns of atomic connections. Having a diverse set of lead compounds increases the chances of finding an active compound with acceptable properties for clinical development. Sponsor: The ICBG are supported by the Cooperative Agreement mechanism, with funds from nine components of the NIH, the National Science Foundation, and the Foreign Agricultural Service of the USDA.