CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Users can access data about cancer statistics in the United States including but not limited to searches by type of cancer and race, sex, ethnicity, age at diagnosis, and age at death. Background Surveillance Epidemiology and End Results (SEER) database’s mission is to provide information on cancer statistics to help reduce the burden of disease in the U.S. population. The SEER database is a project to the National Cancer Institute. The SEER database collects information on incidence, prevalence, and survival from specific geographic areas representing 28 percent of the United States population. User functionality Users can access a variety of reso urces. Cancer Stat Fact Sheets allow users to look at summaries of statistics by major cancer type. Cancer Statistic Reviews are available from 1975-2008 in table format. Users are also able to build their own tables and graphs using Fast Stats. The Cancer Query system provides more flexibility and a larger set of cancer statistics than F ast Stats but requires more input from the user. State Cancer Profiles include dynamic maps and graphs enabling the investigation of cancer trends at the county, state, and national levels. SEER research data files and SEER*Stat software are available to download through your Internet connection (SEER*Stat’s client-server mode) or via discs shipped directly to you. A signed data agreement form is required to access the SEER data Data Notes Data is available in different formats depending on which type of data is accessed. Some data is available in table, PDF, and html formats. Detailed information about the data is available under “Data Documentation and Variable Recodes”.
https://www.wiki.ed.ac.uk/display/CAN/Governancehttps://www.wiki.ed.ac.uk/display/CAN/Governance
The Edinburgh Ovarian Cancer Database was founded by Professor John Smyth in 1984 with the main aim of tracking the disease course of every ovarian cancer patient in the South-East of Scotland (Lothian, Fife, Borders and Dumfries and Galloway). Clinical, pathological, genetic, surgical and treatment information is recorded. The database tracks the patient’s disease course including therapies, responses to treatment, progression episodes, radiological investigations, tumour marker results and ultimately cause of death. It has been and continues to be a huge resource for retrospective research, sample collection and uniform prospective data collection. The data helps identify patients suitable for particular therapy options and clinical trials. There are over 4500 patients documented to date. Data is curated by a team of 2 data managers who source data from patient case notes, electronic patient records, SCI-Store, APEX, the Scottish morbidity registers and from Scotland’s genetic services. Going forward some areas of the database will be populated using automated feeds from various national, regional and bespoke databases and EPRs.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Cancer Registries (RT) are structures responsible for the collection and registration of all tumors occurring in a given territory. The primary role of a Cancer Registry is to establish and manage over time an archive of all new cancer cases diagnosed and to ensure that the data is recorded rigorously and continuously and systematically.
Most of the Italian registers are population registers, that is, they collect data relating to cancer diseases of all the residents of a given territory (it can be a single city or an entire region, a province or the territory of an ASL).
Alongside the general population registers, there are specialized registers, which collect information on a single type of tumor or on specific age groups or on occupational cancers.
The Veterans Affairs Central Cancer Registry (VACCR) receives and stores information on cancer diagnosis and treatment constraints compiled and sent in by the local cancer registry staff at each of the 132 Veterans Affairs Medical Centers that diagnose and/or treat Veterans with cancer. The information sent is encoded to meet the site-specific requirements for registry inclusion as established by several oversight bodies, including the North American Association of Central Cancer Registries, the American College of Surgeons' Commission on Cancer, and the American Joint Commission on Cancer, among others. The information is obtained from a wide variety of medical record documents at the local medical center pertaining to each Veterans Health Administration (VHA) cancer patient. The information is then transmitted to the VACCR. Details collected include extensive demographics, cancer identification, extent of disease and staging, first course of treatment, and outcomes. Data extraction is available to researchers with VA approved Institutional Review Board studies, peer review, and Data Use Agreements.
SEER Limited-Use cancer incidence data with associated population data. Geographic areas available are county and SEER registry. The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute collects and distributes high quality, comprehensive cancer data from a number of population-based cancer registries. Data include patient demographics, primary tumor site, morphology, stage at diagnosis, first course of treatment, and follow-up for vital status. The SEER Program is the only comprehensive source of population-based information in the United States that includes stage of cancer at the time of diagnosis and survival rates within each stage.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Framing the investigation of diverse cancers as a machine learning problem has recently shown significant potential in multi-omics analysis and cancer research. Empowering these successful machine learning models are the high-quality training datasets with sufficient data volume and adequate preprocessing. However, while there exist several public data portals including The Cancer Genome Atlas (TCGA) multi-omics initiative or open-bases such as the LinkedOmics, these databases are not off-the-shelf for existing machine learning models. we propose MLOmics, an open cancer multi-omics database aiming at serving better the development and evaluation of bioinformatics and machine learning models. MLOmics contains 8,314 patient samples covering all 32 cancer types with four omics types, stratified features, and extensive baselines. Complementary support for downstream analysis and bio-knowledge linking are also included to support interdisciplinary analysis.
BioXpress is a gene expression and cancer association database in which the expression levels are mapped to genes using RNA-seq data obtained from The Cancer Genome Atlas, International Cancer Genome Consortium, Expression Atlas and publications. BioXpress can be searched by gene name or cancer type. To search the database by gene name, select the appropriate identifier type from the dropdown menu and type in the corresponding identifier in the adjacent text box. The results are computed and presented to the user with information such as variable expression levels and tumor expression. To search by cancer type, select the desired type from the dropdown menu, such as "Cancer Type", "Significant", "Expression", "Adjusted p-value" and "p-value". Results are shown in a graph displaying the top 10 differentially expressed genes for the specified cancer type in terms of the frequency of significant altered expression between the tumor and normal pairs.
Cancer Rates for Lake County Illinois. Explanation of field attributes: Colorectal Cancer - Cancer that develops in the colon (the longest part of the large intestine) and/or the rectum (the last several inches of the large intestine). This is a rate per 100,000. Lung Cancer – Cancer that forms in tissues of the lung, usually in the cells lining air passages. This is a rate per 100,000. Breast Cancer – Cancer that forms in tissues of the breast. This is a rate per 100,000. Prostate Cancer – Cancer that forms in tissues of the prostate. This is a rate per 100,000. Urinary System Cancer – Cancer that forms in the organs of the body that produce and discharge urine. These include the kidneys, ureters, bladder, and urethra. This is a rate per 100,000. All Cancer – All cancers including, but not limited to: colorectal cancer, lung cancer, breast cancer, prostate cancer, and cancer of the urinary system. This is a rate per 100,000.
A database of oncogenes and tumor suppressor genes. Users can search by genes, chromosomes, and keywords. The coAnsensus domain analysis tool functions to identify conserved protein domains and GO terms among selected TAG genes, while the “oncogenic domain analysis” can analyze oncogenic potential of any user-provided protein based on a weighed term frequency table calculated from the TAG proteins. The completion of human genome sequences allows one to rapidly identify and analyze genes of interest through the use of computational approach. The available annotations including physical characterization and functional domains of known tumor-related genes thus can be used to study the role of genes involved in carcinogenesis. The tumor-associated gene (TAG) database was designed to utilize information from well-characterized oncogenes and tumor suppressor genes to facilitate cancer research. All target genes were identified through text-mining approach from the PubMed database. A semi-automatic information retrieving engine was built to collect specific information of these target genes from various resources and store in the TAG database. At current stage, 519 TAGs including 198 oncogenes, 170 tumor suppressor genes, and 151 genes related to oncogenesis were collected. Information collected in TAG database can be browsed through user-friendly web interfaces that provide searching genes by chromosome or by keywords. The “consensus domain analysis” tool functions to identify conserved protein domains and GO terms among selected TAG genes. In addition, the “oncogenic domain analysis” can analyze oncogenic potential of any user-provided protein based on a weighed term frequency table calculated from the TAG proteins. This study was supported by grant from National research program for genomic medicine (NRPGM) and personnel from Bioinformatics Center of Center for Biotechnology and Biosciences in the National Cheng Kung University, Taiwan.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases (IQ-OTH/NCCD) lung cancer dataset was collected in the above-mentioned specialist hospitals over a period of three months in fall 2019. It includes CT scans of patients diagnosed with lung cancer in different stages, as well as healthy subjects. IQ-OTH/NCCD slides were marked by oncologists and radiologists in these two centers. The dataset contains a total of 1190 images representing CT scan slices of 110 cases (see Figure 1). These cases are grouped into three classes: normal, benign, and malignant. of these, 40 cases are diagnosed as malignant; 15 cases diagnosed with benign; and 55 cases classified as normal cases. The CT scans were originally collected in DICOM format. The scanner used is SOMATOM from Siemens. CT protocol includes: 120 kV, slice thickness of 1 mm, with window width ranging from 350 to 1200 HU and window center from 50 to 600 were used for reading. with breath hold at full inspiration. All images were de-identified before performing analysis. Written consent was waived by the oversight review board. The study was approved by the institutional review board of participating medical centers. Each scan contains several slices. The number of these slices range from 80 to 200 slices, each of them represents an image of the human chest with different sides and angles. The 110 cases vary in gender, age, educational attainment, area of residence and living status. Some of them are employees of the Iraqi ministries of Transport and Oil, others are farmers and gainers. Most of them come from places in the middle region of Iraq, particularly, the provinces of Baghdad, Wasit, Diyala, Salahuddin, and Babylon.
The NCI DIS 3D database is a collection of 3D structures for over 400,000 drugs. The database is an extension of the NCI Drug Information System. The structural information stored in the DIS is only the connection table for each drug. The connection table is just a list of which atoms are connected and how they are connected. It is essentially a searcheable database of three-dimensional structures has been developed from the chemistry database of the NCI Drug Information System (DIS), a file of about 450,000 primarily organic compounds which have been tested by NCI for anticancer activity. The DIS database is very similar in size and content to the proprietary databases used in the pharmaceutical industry; its development began in the 1950s; and this history led to a number of problems in the generation of 3D structures. This information can be searched to find drugs that share similar patterns of connections, which can correlate with similar biological activity. But the cellular targets for drug action, as well as the drugs themselves, are 3 dimensional objects and advances in computer hardware and software have reached the point where they can be represented as such. In many cases the important points of interaction between a drug and its target can be represented by a 3D arrangement of a small number of atoms. Such a group of atoms is called a pharmacophore. The pharmacophore can be used to search 3D databases and drugs that match the pharmacophore could have similar biological activity, but have very different patterns of atomic connections. Having a diverse set of lead compounds increases the chances of finding an active compound with acceptable properties for clinical development. Sponsor: The ICBG are supported by the Cooperative Agreement mechanism, with funds from nine components of the NIH, the National Science Foundation, and the Foreign Agricultural Service of the USDA.
https://kankerregister.org/Researchhttps://kankerregister.org/Research
The Belgian Cancer Registry collects information about all new cancer diagnoses in Belgium and their follow-up. Based on this information it maps out the nature and extent of cancer in Belgium. It regularly bundles this information in a publication. The Belgian Cancer Registry also collects all anatomopathological test results as part of the early screening programs for certain cancers (cervical cancer, breast and colon cancer).
The Belgian Cancer Registry data are an important source of information for
The Massachusetts Cancer Registry (MCR) collects information on all newly diagnosed cases of cancer in the state.
Link Function: information
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains information of 213 cancer patients undergoing clinical or surgical treatment characterized on sociodemographic and clinical data as well as data from the Care Transition Measure (CTM 15-Brazil). Data collection was carried out 7 to 30 days after their discharge from hospital from June to August 2019. Understanding these data can contribute to improving quality of care transitions and avoiding hospital readmissions. To this end, this dataset contains a broad array of variables:
*gender
*age group
*place of residence
*race
*marital status
*schooling
*paid work activity
*type of treatment
*cancer staging
*metastasis
*comorbidities
*main complaint
*continue use medication
*diagnosis
*cancer type
*diagnostic year
*oncology treatment
*first hospitalization
*readmission in the last 30 days
*number of hospitalizations in the last 30 days
*readmission in the last 6 months
*number of hospitalizations in the last 6 months
*readmission in the last year
*number of hospitalizations in the last year
*questions 1-15 from CTM 15-Brazil
The data are presented as a single Excel XLSX file: cancer patient´s care transitions dataset.xlsx.
The analyses of the present dataset have the potential to generate hospital readmission prevention strategies to be implemented by the hospital team. Researchers who are interested in CTs of cancer patients can extensively explore the variables described here.
The project from which these data were extracted was approved by the institution’s research ethics committee (approval n. 3.266.259/2019) at Associação Hospital de Caridade Ijuí, Rio Grande do Sul, Brazil.
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Rapid Cancer Registration Data (RCRD) provides a quick, indicative source of cancer data. It is provided to support the planning and provision of cancer services. The data is based on a rapid processing of cancer registration data sources, in particular on Cancer Outcomes and Services Dataset (COSD) information. In comparison, National Cancer Registration Data (NCRD) relies on additional data sources, enhanced follow-up with trusts and expert processing by cancer registration officers. The Rapid Cancer Registration Data (RCRD) may be useful for service improvement projects including healthcare planning and prioritisation. However, it is poorly suited for epidemiological research due to limitations in the data quality and completeness.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundDelays in time to treatment initiation (TTI) for new cancer diagnoses cause patient distress and may adversely affect outcomes. We investigated trends in TTI for common solid tumors treated with curative intent, determinants of increased TTI and association with overall survival.Methods and findingsWe utilized prospective data from the National Cancer Database for newly diagnosed United States patients with early-stage breast, prostate, lung, colorectal, renal and pancreas cancers from 2004–13. TTI was defined as days from diagnosis to first treatment (surgery, systemic or radiation therapy). Negative binomial regression and Cox proportional hazard models were used for analysis. The study population of 3,672,561 patients included breast (N = 1,368,024), prostate (N = 944,246), colorectal (N = 662,094), non-small cell lung (N = 363,863), renal (N = 262,915) and pancreas (N = 71,419) cancers. Median TTI increased from 21 to 29 days (P
The Greater Bay Area Cancer Registry (GBACR), in compliance with California state law, gathers information about all cancers diagnosed or treated in a nine-county area (Alameda, Contra Costa, Marin, Monterey, San Benito, San Francisco, San Mateo, Santa...
PHS does NOT host these data. This listing is information only.
The Greater Bay Area Cancer Registry (GBACR), in compliance with California state law, gathers information about all cancers diagnosed or treated in a nine-county area (Alameda, Contra Costa, Marin, Monterey, San Benito, San Francisco, San Mateo, Santa Clara and Santa Cruz). This information is obtained from medical records provided by hospitals, doctors\342\200\231 offices, and other related facilities.
The information, stored under secure conditions with strict regulations that protect confidentiality, helps the GBACR understand cancer occurrence and survival in the Greater Bay Area. For each patient, the information includes basic demographic facts like age, gender, and race/ethnicity, as well as cancer type, extent of disease, treatment and survival. Combined over the diverse Bay Area population, this information gives the GBACR and all users an opportunity to learn how such characteristics may be related to cancer causes, mortality, care and prevention.
In addition to its local use, information collected by the GBACR becomes part of state and federal population-based registries whose mission is to monitor cancer occurrence at the state and national levels, respectively. Data from the GBACR have contributed to the National Cancer Institute’s Surveillance, Epidemiology and End Results (SEER) program since 1973. The nine counties are also part of the statewide California Cancer Registry (CCR), which conducts essential monitoring of cancer occurrence and survival in California.
GBACR data are of the highest quality, as recognized by national and international registry standard-setting organizations, including SEER, the National Program for Cancer Registries, and the North American Association for Central Cancer Registries (NAACCR).
The CPIC has also started collecting data on environmenal factors. These data are available in the The California Neighborhoods Data System. This a new resource for examining the impact of neighborhood characteristics on cancer incidence and outcomes in populations includes a compilation of existing geospatial and other secondary data for characterizing contextual factors
A summary and description of social and built environment data and measures in the California Neighborhoods Data System (2010) can be found here: Social and Built Environment Data and Measures
More information about this new data source can be found here: The California Neighborhoods Data System
Patient characteristics All reported cancer cases in the state of California.
Data overview Data categories Socioeconomic status Racial/ethnic composition Immigration/acculturation characteristics Racial/ethnic residential segregation Population density Urbanicity (Rural/Urban) Housing Businesses Commuting Street connectivity Parks Farmers Markets Traffic density Crime Tapestry Segmentation
Notes To apply for these data, you can see instructions here: https://www.ccrcal.org/retrieve-data/data-for-researchers/how-to-request-ccr-data/
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Users can access data about cancer statistics in the United States including but not limited to searches by type of cancer and race, sex, ethnicity, age at diagnosis, and age at death. Background Surveillance Epidemiology and End Results (SEER) database’s mission is to provide information on cancer statistics to help reduce the burden of disease in the U.S. population. The SEER database is a project to the National Cancer Institute. The SEER database collects information on incidence, prevalence, and survival from specific geographic areas representing 28 percent of the United States population. User functionality Users can access a variety of reso urces. Cancer Stat Fact Sheets allow users to look at summaries of statistics by major cancer type. Cancer Statistic Reviews are available from 1975-2008 in table format. Users are also able to build their own tables and graphs using Fast Stats. The Cancer Query system provides more flexibility and a larger set of cancer statistics than F ast Stats but requires more input from the user. State Cancer Profiles include dynamic maps and graphs enabling the investigation of cancer trends at the county, state, and national levels. SEER research data files and SEER*Stat software are available to download through your Internet connection (SEER*Stat’s client-server mode) or via discs shipped directly to you. A signed data agreement form is required to access the SEER data Data Notes Data is available in different formats depending on which type of data is accessed. Some data is available in table, PDF, and html formats. Detailed information about the data is available under “Data Documentation and Variable Recodes”.