TCGA Testicular Germ Cell Cancer. Source data from GDAC Firehose. Previously known as TCGA Provisional. This dataset contains summary data visualizations and clinical data from a broad sampling of 156 carcinomas from 150 patients. The data was gathered as part of the Broad Institute of MIT and Harvard Firehose initiative, a cancer analysis pipeline. The clinical data includes mutation count, information about mutated genes, patient demographics, sample type, disease code, Adjuvant Postoperative Pharmaceutical Therapy Administered Indicator, American Joint Committee on Cancer Lymph Node Stage Code, American Joint Committee on Cancer Lymph Node Stage Code.1, American Joint Committee on Cancer Metastasis Stage Code, American Joint Committee on Cancer Publication Version Type, American Joint Committee on Cancer Tumor Stage Code, Bilateral Diagnosis Timing Type, Cause of death source, and Days to bilateral tumor dx. The dataset includes Next-Generation Clustered Heat Maps (NG-CHM) viewable via an embedded NG-CHM Heat Map Viewer, provided my MD Anderson Cancer Center, which provides a graphical environment for exploration of clustered or non-clustered heat map data. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.
This dataset contains summary data visualizations and clinical data from a broad sampling of 182 esophageal adenocarcinomas.
TCGA Esophageal Carcinoma . Source data from GDAC Firehose. Previously known as TCGA Provisional. The data was gathered as part of the Broad Institute of MIT and Harvard Firehose initiative, a cancer analysis pipeline. his dataset contains summary data visualizations and clinical data from a broad sampling of 186 carcinomas from 185 patients. The clinical data includes mutation count, information about mutated genes, patient demographics, disease status, tumor typing, numbers of samples per patient, Adjuvant Postoperative Pharmaceutical Therapy Administered Indicator, Alcohol Consumption Frequency, Alcohol History Documented, American Joint Committee on Cancer Lymph Node Stage Code, American Joint Committee on Cancer Metastasis Stage Code, American Joint Committee on Cancer Publication Version Type, American Joint Committee on Cancer Tumor Stage Code, Antireflux treatment type, and the presence of Barrett's esophagus. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer. The dataset includes Next-Generation Clustered Heat Maps (NG-CHM) viewable via an embedded NG-CHM Heat Map Viewer, provided my MD Anderson Cancer Center, which provides a graphical environment for exploration of clustered or non-clustered heat map data. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With the increasing focus on patient-centred care, this study sought to understand priorities considered by patients and healthcare providers from their experience with head and neck cancer treatment, and to compare how patients’ priorities compare to healthcare providers’ priorities. Group concept mapping was used to actively identify priorities from participants (patients and healthcare providers) in two phases. In phase one, participants brainstormed statements reflecting considerations related to their experience with head and neck cancer treatment. In phase two, statements were sorted based on their similarity in theme and rated in terms of their priority. Multidimensional scaling and cluster analysis were performed to produce multidimensional maps to visualize the findings. Two-hundred fifty statements were generated by participants in the brainstorming phase, finalized to 94 statements that were included in phase two. From the sorting activity, a two-dimensional map with stress value of 0.2213 was generated, and eight clusters were created to encompass all statements. Timely care, education, and person-centred care were the highest rated priorities for patients and healthcare providers. Overall, there was a strong correlation between patient and healthcare providers’ ratings (r = 0.80). Our findings support the complexity of the treatment planning process in head and neck cancer, evident by the complex maps and highly interconnected statements related to the experience of treatment. Implications for improving the quality of care delivered and care experience of head and cancer are discussed.
TCGA Acute Myeloid Leukemia. Source data from GDAC Firehose. Previously known as TCGA Provisional. This dataset contains summary data visualizations and clinical data from a broad sampling of 200 carcinomas from 200 patients. The data was gathered as part of the Broad Institute of MIT and Harvard Firehose initiative, a cancer analysis pipeline. The clinical data includes mutation count, information about mutated genes, patient demographics, sample type, disease code, Abnormal Lymphocyte Percent, Atra Exposure, Basophils Cell Count, Blast Count, Cytogenetic abnormality type, and FAB. The dataset includes Next-Generation Clustered Heat Maps (NG-CHM) viewable via an embedded NG-CHM Heat Map Viewer, provided my MD Anderson Cancer Center, which provides a graphical environment for exploration of clustered or non-clustered heat map data. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.
TCGA Glioblastoma Multiforme. Source data from GDAC Firehose. Previously known as TCGA Provisional. This dataset contains summary data visualizations and clinical data from a broad sampling of 619 glioblastoma multiformes from 606 patients. The data was gathered as part of the Broad Institute of MIT and Harvard Firehose initiative, a cancer analysis pipeline. The clinical data includes mutation count, information about mutated genes, patient demographics, disease status, tumor typing, chromosomal gain or loss, Adjuvant Postoperative Pharmaceutical Therapy Administered Indicator, Days to Sample Collection, if the patient start adjuvant postoperative radiotherapy, Disease Free (Months), Disease Free Status, and First Pathologic Diagnosis Biospecimen Acquisition Method Type. The dataset includes Next-Generation Clustered Heat Maps (NG-CHM) viewable via an embedded NG-CHM Heat Map Viewer, provided my MD Anderson Cancer Center, which provides a graphical environment for exploration of clustered or non-clustered heat map data. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.
description: This map service displays all air-related layers used in the USEPA Community/Tribal-Focused Exposure and Risk Screening Tool (C/T-FERST) mapping application (http://cfpub.epa.gov/cferst/index.cfm). The following data sources (and layers) are contained in this service: USEPA's 2005 National-Scale Air Toxic Assessment (NATA) data. Data are shown at the census tract level (2000 census tract boundaries, US Census Bureau) for Cumulative Cancer and Non-Cancer risks (Neurological and Respiratory) from 139 air toxics. In addition, individual pollutant estimates of Ambient Concentration, Exposure Concentration, Cancer, and Non-Cancer risks (Neurological and Respiratory) are provided for: Acetaldehyde, Acrolein, Arsenic, Benzene, 1,3-Butadiene, Chromium, Diesel PM, Formaldehyde, Lead, Naphthalene, and Polycyclic Aromatic Hydrocarbon (PAH). The original Access tables were downloaded from USEPA's Office of Air and Radiation (OAR) http://www.epa.gov/ttn/atw/nata2005/tables.html. The data classification (defined interval) for this map service was developed for USEPA's Office of Research and Development's (ORD) Community-Focused Exposure and Risk Screening Tool (C-FERST) per guidance provided by OAR. The 2005 NATA provides information on 177 of the 187 Clean Air Act air toxics (http://www.epa.gov/ttn/atw/nata2005/05pdf/2005polls.pdf) plus diesel particulate matter (diesel PM was assessed for non-cancer only). For additional information about NATA, go to http://www.epa.gov/ttn/atw/nata2005/05pdf/nata_tmd.pdf or contact Ted Palma, USEPA (palma.ted@epa.gov). NATA data disclaimer: USEPA strongly cautions that these modeling results are most meaningful when viewed at the state or national level, and should not be used to draw conclusions about local exposures or risks (e.g., to compare local areas, to identify the exact location of "hot spots", or to revise or design emission reduction programs). Substantial uncertainties with the input data for these models may cause the results to misrepresent actual risks, especially at the census tract level. However, we believe the census tract data and maps can provide a useful approximation of geographic patterns of variation in risk within counties. For example, a cluster of census tracts with higher estimated risks may suggest the existence of a "hot spot," although the specific tracts affected will be uncertain. More refined assessments based on additional data and analysis would be needed to better characterize such risks at the tract level. (http://www.epa.gov/ttn/atw/nata2005/countyxls/cancer_risk02_county_042009.xls). Note that these modeled estimates are derived from outdoor sources only; indoor sources are not included in these examples, but may be significant in some cases. The modeled exposure estimates are for a median individual in the geographic area shown. Note that in some cases the estimated relationship between human exposure and health effect may be calculated as a high end estimate, and thus may be more likely to overestimate than underestimate actual health effects for the median individual in the geographic area shown. Other limitations to consider when looking at the results are detailed on the EPA 2005 NATA website. For these reasons, the NATA maps included in C-FERST are provided for screening purposes only. See the 2005 National Air Toxic Assessment website for recommended usage and limitations on the estimated cancer and noncancer data provided above. USEPA's NonAttainment areas data. C-FERST displays Ozone for 8-hour Ozone based on the 1997 standard for reporting and Particulate Matter PM-2.5 based on the 2006 standard for reporting. These are areas of the country where air pollution levels consistently exceed the national ambient air quality standards. Details about the USEPA's NonAttainment data are available at http://www.epa.gov/airquality/greenbook/index.html. Center of Disease Control's (CDC) Environmental Public Health Tracking (EPHT) data. Averaged over three years (2004 - 2006). The USEPA's ORD calculated a three-year average (2004 - 2006) using the values for Ozone (number of days with the maximum 8-hour average above the National Ambient Air Quality Standards (NAAQS)) and PM 2.5 (annual ambient concentration). These data were extracted by the CDC from the USEPA's ambient air monitors and are displayed at the county level. USEPA received the Monitor and Modeled data from the CDC and calculated the three year average displayed in the web service. For more details about the CDC EPHT data, go to http://ephtracking.cdc.gov/showHome.action.; abstract: This map service displays all air-related layers used in the USEPA Community/Tribal-Focused Exposure and Risk Screening Tool (C/T-FERST) mapping application (http://cfpub.epa.gov/cferst/index.cfm). The following data sources (and layers) are contained in this service: USEPA's 2005 National-Scale Air Toxic Assessment (NATA) data. Data are shown at the census tract level (2000 census tract boundaries, US Census Bureau) for Cumulative Cancer and Non-Cancer risks (Neurological and Respiratory) from 139 air toxics. In addition, individual pollutant estimates of Ambient Concentration, Exposure Concentration, Cancer, and Non-Cancer risks (Neurological and Respiratory) are provided for: Acetaldehyde, Acrolein, Arsenic, Benzene, 1,3-Butadiene, Chromium, Diesel PM, Formaldehyde, Lead, Naphthalene, and Polycyclic Aromatic Hydrocarbon (PAH). The original Access tables were downloaded from USEPA's Office of Air and Radiation (OAR) http://www.epa.gov/ttn/atw/nata2005/tables.html. The data classification (defined interval) for this map service was developed for USEPA's Office of Research and Development's (ORD) Community-Focused Exposure and Risk Screening Tool (C-FERST) per guidance provided by OAR. The 2005 NATA provides information on 177 of the 187 Clean Air Act air toxics (http://www.epa.gov/ttn/atw/nata2005/05pdf/2005polls.pdf) plus diesel particulate matter (diesel PM was assessed for non-cancer only). For additional information about NATA, go to http://www.epa.gov/ttn/atw/nata2005/05pdf/nata_tmd.pdf or contact Ted Palma, USEPA (palma.ted@epa.gov). NATA data disclaimer: USEPA strongly cautions that these modeling results are most meaningful when viewed at the state or national level, and should not be used to draw conclusions about local exposures or risks (e.g., to compare local areas, to identify the exact location of "hot spots", or to revise or design emission reduction programs). Substantial uncertainties with the input data for these models may cause the results to misrepresent actual risks, especially at the census tract level. However, we believe the census tract data and maps can provide a useful approximation of geographic patterns of variation in risk within counties. For example, a cluster of census tracts with higher estimated risks may suggest the existence of a "hot spot," although the specific tracts affected will be uncertain. More refined assessments based on additional data and analysis would be needed to better characterize such risks at the tract level. (http://www.epa.gov/ttn/atw/nata2005/countyxls/cancer_risk02_county_042009.xls). Note that these modeled estimates are derived from outdoor sources only; indoor sources are not included in these examples, but may be significant in some cases. The modeled exposure estimates are for a median individual in the geographic area shown. Note that in some cases the estimated relationship between human exposure and health effect may be calculated as a high end estimate, and thus may be more likely to overestimate than underestimate actual health effects for the median individual in the geographic area shown. Other limitations to consider when looking at the results are detailed on the EPA 2005 NATA website. For these reasons, the NATA maps included in C-FERST are provided for screening purposes only. See the 2005 National Air Toxic Assessment website for recommended usage and limitations on the estimated cancer and noncancer data provided above. USEPA's NonAttainment areas data. C-FERST displays Ozone for 8-hour Ozone based on the 1997 standard for reporting and Particulate Matter PM-2.5 based on the 2006 standard for reporting. These are areas of the country where air pollution levels consistently exceed the national ambient air quality standards. Details about the USEPA's NonAttainment data are available at http://www.epa.gov/airquality/greenbook/index.html. Center of Disease Control's (CDC) Environmental Public Health Tracking (EPHT) data. Averaged over three years (2004 - 2006). The USEPA's ORD calculated a three-year average (2004 - 2006) using the values for Ozone (number of days with the maximum 8-hour average above the National Ambient Air Quality Standards (NAAQS)) and PM 2.5 (annual ambient concentration). These data were extracted by the CDC from the USEPA's ambient air monitors and are displayed at the county level. USEPA received the Monitor and Modeled data from the CDC and calculated the three year average displayed in the web service. For more details about the CDC EPHT data, go to http://ephtracking.cdc.gov/showHome.action.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We have developed ProjecTILs, a computational approach to project new data sets into a reference map of T cells, enabling their direct comparison in a stable, annotated system of coordinates. Because new cells are embedded in the same space of the reference, ProjecTILs enables the classification of query cells into annotated, discrete states, but also over a continuous space of intermediate states. By comparing multiple samples over the same map, and across alternative embeddings, the method allows exploring the effect of cellular perturbations (e.g. as the result of therapy or genetic engineering) and identifying genetic programs significantly altered in the query compared to a control set or to the reference map. We illustrate the projection of several data sets from recent publications over two cross-study murine T cell reference atlases: the first describing tumor-infiltrating T lymphocytes (TILs), the second characterizing acute and chronic viral infection.To construct the reference TIL atlas, we obtained single-cell gene expression matrices from the following GEO entries: GSE124691, GSE116390, GSE121478, GSE86028; and entry E-MTAB-7919 from Array-Express. Data from GSE124691 contained samples from tumor and from tumor-draining lymph nodes, and were therefore treated as two separate datasets. For the TIL projection examples (OVA Tet+, miR-155 KO and Regnase-KO), we obtained the gene expression counts from entries GSE122713, GSE121478 and GSE137015, respectively.Prior to dataset integration, single-cell data from individual studies were filtered using TILPRED-1.0 (https://github.com/carmonalab/TILPRED), which removes cells not enriched in T cell markers (e.g. Cd2, Cd3d, Cd3e, Cd3g, Cd4, Cd8a, Cd8b1) and cells enriched in non T cell genes (e.g. Spi1, Fcer1g, Csf1r, Cd19). Dataset integration was performed using STACAS (https://github.com/carmonalab/STACAS), a batch-correction algorithm based on Seurat 3. For the TIL reference map, we specified 600 variable genes per dataset, excluding cell cycling genes, mitochondrial, ribosomal and non-coding genes, as well as genes expressed in less than 0.1% or more than 90% of the cells of a given dataset. For integration, a total of 800 variable genes were derived as the intersection of the 600 variable genes of individual datasets, prioritizing genes found in multiple datasets and, in case of draws, those derived from the largest datasets. We determined pairwise dataset anchors using STACAS with default parameters, and filtered anchors using an anchor score threshold of 0.8. Integration was performed using the IntegrateData function in Seurat3, providing the anchor set determined by STACAS, and a custom integration tree to initiate alignment from the largest and most heterogeneous datasets.Next, we performed unsupervised clustering of the integrated cell embeddings using the Shared Nearest Neighbor (SNN) clustering method implemented in Seurat 3 with parameters {resolution=0.6, reduction=”umap”, k.param=20}. We then manually annotated individual clusters (merging clusters when necessary) based on several criteria: i) average expression of key marker genes in individual clusters; ii) gradients of gene expression over the UMAP representation of the reference map; iii) gene-set enrichment analysis to determine over- and under- expressed genes per cluster using MAST. In order to have access to predictive methods for UMAP, we recomputed PCA and UMAP embeddings independently of Seurat3 using respectively the prcomp function from basic R package “stats”, and the “umap” R package (https://github.com/tkonopka/umap).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Material 1
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The quantity of mRNA transcripts in a cell is determined by a complex interplay of cooperative and counteracting biological processes. Independent Component Analysis (ICA) is one of a few number of unsupervised algorithms that have been applied to microarray gene expression data in an attempt to understand phenotype differences in terms of changes in the activation/inhibition patterns of biological pathways. While the ICA model has been shown to outperform other linear representations of the data such as Principal Components Analysis (PCA), a validation using explicit pathway and regulatory element information has not yet been performed. We apply a range of popular ICA algorithms to six of the largest microarray cancer datasets and use pathway-knowledge and regulatory-element databases for validation. We show that ICA outperforms PCA and clustering-based methods in that ICA components map closer to known cancer-related pathways, regulatory modules, and cancer phenotypes. Furthermore, we identify cancer signalling and oncogenic pathways and regulatory modules that play a prominent role in breast cancer and relate the differential activation patterns of these to breast cancer phenotypes. Importantly, we find novel associations linking immune response and epithelial–mesenchymal transition pathways with estrogen receptor status and histological grade, respectively. In addition, we find associations linking the activity levels of biological pathways and transcription factors (NF1 and NFAT) with clinical outcome in breast cancer. ICA provides a framework for a more biologically relevant interpretation of genomewide transcriptomic data. Adopting ICA as the analysis tool of choice will help understand the phenotype–pathway relationship and thus help elucidate the molecular taxonomy of heterogeneous cancers and of other complex genetic diseases.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Surgery for brain cancer is a major problem in neurosurgery. The diffuse infiltration into the surrounding normal brain by these tumors makes their accurate identification by the naked eye difficult. Since surgery is the common treatment for brain cancer, an accurate radical resection of the tumor leads to improved survival rates for patients. However, the identification of the tumor boundaries during surgery is challenging. Hyperspectral imaging is a non-contact, non-ionizing and non-invasive technique suitable for medical diagnosis. This study presents the development of a novel classification method taking into account the spatial and spectral characteristics of the hyperspectral images to help neurosurgeons to accurately determine the tumor boundaries in surgical-time during the resection, avoiding excessive excision of normal tissue or unintentionally leaving residual tumor. The algorithm proposed in this study to approach an efficient solution consists of a hybrid framework that combines both supervised and unsupervised machine learning methods. Firstly, a supervised pixel-wise classification using a Support Vector Machine classifier is performed. The generated classification map is spatially homogenized using a one-band representation of the HS cube, employing the Fixed Reference t-Stochastic Neighbors Embedding dimensional reduction algorithm, and performing a K-Nearest Neighbors filtering. The information generated by the supervised stage is combined with a segmentation map obtained via unsupervised clustering employing a Hierarchical K-Means algorithm. The fusion is performed using a majority voting approach that associates each cluster with a certain class. To evaluate the proposed approach, five hyperspectral images of surface of the brain affected by glioblastoma tumor in vivo from five different patients have been used. The final classification maps obtained have been analyzed and validated by specialists. These preliminary results are promising, obtaining an accurate delineation of the tumor area.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 1 : Table S1. The 26 stem gene sets used for identification of BLCA subtype. BLCA: Bladder cancer. Figure S1. Clustering heat map of stem cell subtype in (A) E-MTAB-4321, (B) GSE13507, (C) GSE31684, (D) GSE32548, and (E) GSE32894. Figure S2. Evaluation of immune cell infiltration level, tumor purity, and stromal content in BLCA. (A–F) Immune score, (G–L) stromal score (stromal content), and (M–R) tumor purity in all six datasets. *P < 0.05, **P < 0.01, ***P < 0.001; ns means not significant. BLCA: bladder cancer. Figure S3. Comparisons of the expression levels of immune-related genes between BLCA subtypes. (A–C) Expression levels of HLA genes between BLCA subtypes in TCGA, E-MTAB-4321 and GSE32894. (D–E) Expression levels of immune cell subgroup marker genes between BLCA subtypes. Kruskal–Wallis test, *P < 0.05, **P < 0.01, ***P < 0.001; ns means not significant. BLCA: bladder cancer. Figure S4. Difference analysis of 22 human immune cell subgroups of BLCA stem cell subtypes in CIBERSORT. Immune cell subgroups with significant differences in BLCA stem cell subtypes in (A) TCGA, (B) GSE32894, (C) GSE31684, (D) E-MTAB-4321, (E) GSE13507, and (F) GSE32548 cohort with CIBERSORT. Fraction of different immune cell subgroups among the four subtypes evaluated using Kruskal–Wallis tests, * P < 0.05, ** P < 0.01, *** P < 0.001. Kaplan–Meier survival curve based on median ssGSEA score for (G) TCGA, (H) GSE13507, (I) GSE32548, and (J) GSE32894, and best cut-off for (K) E-MTAB-4321 cohort in OS for macrophage M0, together with median ssGSEA score for (L) TCGA in OS for macrophage M2. BLCA: bladder cancer; TCGA: The Cancer Genome Atlas. Table S2. Univariate Cox analysis for all six datasets. Table S3. GSEA for BLCA stem cell subtypes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundGastric cancer is a common gastrointestinal malignancy. Since it is often diagnosed in the advanced stage, its mortality rate is high. Traditional therapies (such as continuous chemotherapy) are not satisfactory for advanced gastric cancer, but immunotherapy has shown great therapeutic potential. Gastric cancer has high molecular and phenotypic heterogeneity. New strategies for accurate prognostic evaluation and patient selection for immunotherapy are urgently needed.MethodsWeighted gene coexpression network analysis (WGCNA) was used to identify hub genes related to gastric cancer progression. Based on the hub genes, the samples were divided into two subtypes by consensus clustering analysis. After obtaining the differentially expressed genes between the subtypes, a gastric cancer risk model was constructed through univariate Cox regression, least absolute shrinkage and selection operator (LASSO) regression and multivariate Cox regression analysis. The differences in prognosis, clinical features, tumor microenvironment (TME) components and immune characteristics were compared between subtypes and risk groups, and the connectivity map (CMap) database was applied to identify potential treatments for high-risk patients.ResultsWGCNA and screening revealed nine hub genes closely related to gastric cancer progression. Unsupervised clustering according to hub gene expression grouped gastric cancer patients into two subtypes related to disease progression, and these patients showed significant differences in prognoses, TME immune and stromal scores, and suppressive immune checkpoint expression. Based on the different expression patterns between the subtypes, we constructed a gastric cancer risk model and divided patients into a high-risk group and a low-risk group based on the risk score. High-risk patients had a poorer prognosis, higher TME immune/stromal scores, higher inhibitory immune checkpoint expression, and more immune characteristics suitable for immunotherapy. Multivariate Cox regression analysis including the age, stage and risk score indicated that the risk score can be used as an independent prognostic factor for gastric cancer. On the basis of the risk score, we constructed a nomogram that relatively accurately predicts gastric cancer patient prognoses and screened potential drugs for high-risk patients.ConclusionsOur results suggest that the 7-gene signature related to tumor progression could predict the clinical prognosis and tumor immune characteristics of gastric cancer.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Text: Supplementary Methods and Supplementary Results. Supplementary Tables: Table S1. Primers used for RT-qPCR. Table S2. List of genes selected for expression analysis by PCR array. Table S3. Number of AA and GBM patient samples in training set, test set and three independent cohorts of patient samples (TCGA, GSE1993 and GSE4422). Table S4. Expression of 16 genes in AA (n = 20) and GBM (n = 54) samples of the test set. Table S5. Expression of 16 genes in Grade III glioma (n = 27) and GBM (n = 152) samples of the TCGA dataset. Table S6. Expression of 16 genes in AA (n = 19) and GBM (n = 39) samples of GSE1993 dataset. Table S7. Expression of 16 genes in AA (n = 5) and GBM (n = 71) samples of the GSE4422 dataset. Supplementary Figures: Figure S1. Heat map of one-way hierarchical clustering of 16 PAM-identified genes in AA (n = 20) and GBM (n = 54) patient samples in the test set. A dual-color code was used, with red and green indicating up- and down regulation, respectively. Figure S2. Heat map of one-way hierarchical clustering of 16 PAM-identified genes in grade III glioma (n = 27) and GBM (n = 152) patient samples in TCGA dataset. A dual-color code was used, with red and green indicating up- and down regulation, respectively. Figure S3. A. Heat map of one-way hierarchical clustering of 16 PAM-identified genes in AA (n = 19) and GBM (n = 39) patient samples in GSE1993 dataset. A dual-color code was used, with red and green indicating up- and down regulation, respectively. B. PCA was performed using expression values of 16-PAM identified genes between AA and GBM samples in GSE1993 dataset. A scatter plot is generated using the first two principal components for each sample. The color of the samples is as indicated. C. The detailed probabilities of 10-fold cross-validation for the samples of GSE1993 dataset based on the expression values of 16 genes are shown. For each sample, its probability as AA (orange color) and GBM (blue color) are shown and it was predicted by the PAM program as either AA or GBM based on which grade's probability is higher. The original histological grade of the samples is shown on the top. Figure S4. A. Heat map of one-way hierarchical clustering of 16 PAM-identified genes in AA (n = 5) and GBM (n = 71) patient samples in GSE4422 dataset. A dual-color code was used, with red and green indicating up- and down regulation, respectively. B. PCA was performed using expression values of 16-PAM identified genes between AA and GBM samples in GSE4422 dataset. A scatter plot is generated using the first two principal components for each sample. The color of the samples is as indicated. C. The detailed probabilities of 10-fold cross-validation for the samples of GSE4422 dataset based on the expression values of 16 genes are shown. For each sample, its probability as AA (orange color) and GBM (blue color) are shown and it was predicted by the PAM program as either AA or GBM based on which grade's probability is higher. The original histological grade of the samples is shown on the top. Figure S5. A. The detailed probabilities of 10-fold cross-validation for the samples of GSE4271 dataset based on the expression values of 16 genes are shown. For each sample, its probability as AA (orange color) and GBM (blue color) are shown and it was predicted by the PAM program as either AA or GBM based on which grade's probability is higher. The original histological grade of the samples is shown on the top. B. The average Age at Diagnosis along with standard deviation is plotted for Authentic AAs (n = 12), Authentic GBMs (n = 68), Discordant AAs (n = 10) and Discordant GBMs (n = 8) of GSE4271 dataset. C. The Kaplan Meier survival analysis of samples of GSE4271 dataset. Figure S6. PAM analysis of the Petalidis-gene signature in TCGA dataset. A. Plot showing classification error for the Petalidis gene set in TCGA dataset. The threshold value of 0.0 corresponded to all 54 genes which classified AA (n = 27) and GBM (n = 604) samples with classification error of 0.000. B. The detailed probabilities of 10-fold cross-validation for the samples of TCGA dataset based on Petalidis gene set are shown. For each sample, its probability as AA (green color) and GBM (red color) are shown and it was predicted by the PAM program as either AA or GBM based on which grade's probability is higher. The original histological grade of the samples is shown on the top. Figure S7. PAM analysis of the Phillips gene signature in our dataset. A. Plot showing classification error for the Phillips gene set in our dataset. The threshold value of 0.0 that correspond to all 5 genes which classified AA (n = 50) and GBM (n = 132) samples with classification error of 0.159. B. The detailed probabilities of 10-fold cross-validation for the samples of our dataset based on Phillips gene set are shown. For each sample, its probability as AA (orange color) and GBM (blue color) are shown and it was predicted by the PAM program as either AA or GBM based on which grade's probability is higher. The original histological grade of the samples is shown on the top. Figure S8. PAM analysis of the Phillips gene signature in Phillips dataset. A. Plot showing classification error for the Phillips gene set in Phillips dataset. The threshold value of 0.0 that correspond to all 8 genes which classified AA (n = 24) and GBM (n = 76) samples with classification error of 0.169. B. The detailed probabilities of 10-fold cross-validation for the samples of our dataset based on Phillips gene set are shown. For each sample, its probability as AA (orange color) and GBM (blue color) are shown and it was predicted by the PAM program as either AA or GBM based on which grade's probability is higher. The original histological grade of the samples is shown on the top. Figure S9. PAM analysis of the Phillips gene signature in GSE4422 dataset. A. Plot showing classification error for the Phillips gene set in GSE4422 dataset. The threshold value of 0.0 that correspond to all 8 genes which classified AA (n = 5) and GBM (n = 76) samples with classification error of 0.065. B. The detailed probabilities of 10-fold cross-validation for the samples of our dataset based on Phillips gene set are shown. For each sample, its probability as AA (orange color) and GBM (blue color) are shown and it was predicted by the PAM program as either AA or GBM based on which grade's probability is higher. The original histological grade of the samples is shown on the top. Figure S10. PAM analysis of the Phillips-gene signature in TCGA dataset. A. Plot showing classification error for the Phillips gene set in TCGA dataset. The threshold value of 0.0 corresponded to all 8 genes which classified AA (n = 27) and GBM (n = 604) samples with classification error of 0.008. B. The detailed probabilities of 10-fold cross-validation for the samples of TCGA dataset based on Phillips gene set are shown. For each sample, its probability as AA (orange color) and GBM (blue color) are shown and it was predicted by the PAM program as either AA or GBM based on which grade's probability is higher. The original histological grade of the samples is shown on the top. Figure S11. Network obtained by using 16-genes of classification signature as input genes to Bisogenet plugin in Cytoscape. The gene rated network had 252 nodes (genes) and 1498 edges (interactions between genes/proteins). This network consisted of the seed proteins with their immediate interacting neighbors. The nodes corresponding to the input genes are highlighted by the bigger node size as compared to the rest of the interacting partners. The color code is as indicated in the scale. (PDF)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Protein-Protein Interaction Gene Sets and KEGG Pathway Results Post-STRING Clustering.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figure S1. Levels of free hemoglobin were measured in plasma samples by spectrophotometry at wavelengths from 350 to 650 nm. A dilution series of lysed red blood cells in plasma was prepared (below the chart). The degree of hemolysis was determined based on the optical density (OD) at 414 nm (absorbance peak of free hemoglobin, called Soret band), with additional peaks at 541 and 576 nm. Samples were classified as being hemolysed if the OD at 414 exceeded 0.25. The integrated curve of BC plasma samples comprises values from 0.08 to 0.20 indicating that the samples were non-hemolysed. Figure S2. Hierarchical cluster is shown by heat map of median centered ΔCq values of exosomal miRNAs (in rows) derived from plasma samples of 435 BC patients before treatment and 20 healthy women (in columns). The red and green colors indicate that the ΔCq values are below (relatively high expression) and above (relatively low expression levels) the median of all ΔCq values in the study, respectively. Bottom: clustering of samples. Left: clustering of probes. The scale bar provides information on the degree of regulation. The 5 clinically relevant miRNAs derived from the microRNA array cards containing 384 different miRNAs are indicated by a red arrow. Figure S3. Exosomal miRNAs differ between HER2-positive and TNBC patients. ROC analyses show the profiles of sensitivity and specificity of exosomal miR-335, miR-422a, and miR-628 and their combinations to distinguish TNBC from HER2-positive BC patients. The table below the ROC shows the summarization of sensitivities and specificities of exosomal miR-335, miR-422a, miR-628, and their combinations. Table S1. Patient characteristics at the time of primary diagnosis of breast cancer (continuous variables). Table S2. Significant associations between the plasma levels of exosomal miRNAs and clinicopathological risk parameters (continuous variables). (ZIP 2140 kb)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
aAnalyzed microarray data of A549 cell line over expressing STAT3C [52] using SBEAMS [72].bAnalyzed data of the cluster in the Cancer Module Map 51.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Demographic characteristics of multiple myeloma patients with and without depression.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance of 25 classification algorithms.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
TCGA Testicular Germ Cell Cancer. Source data from GDAC Firehose. Previously known as TCGA Provisional. This dataset contains summary data visualizations and clinical data from a broad sampling of 156 carcinomas from 150 patients. The data was gathered as part of the Broad Institute of MIT and Harvard Firehose initiative, a cancer analysis pipeline. The clinical data includes mutation count, information about mutated genes, patient demographics, sample type, disease code, Adjuvant Postoperative Pharmaceutical Therapy Administered Indicator, American Joint Committee on Cancer Lymph Node Stage Code, American Joint Committee on Cancer Lymph Node Stage Code.1, American Joint Committee on Cancer Metastasis Stage Code, American Joint Committee on Cancer Publication Version Type, American Joint Committee on Cancer Tumor Stage Code, Bilateral Diagnosis Timing Type, Cause of death source, and Days to bilateral tumor dx. The dataset includes Next-Generation Clustered Heat Maps (NG-CHM) viewable via an embedded NG-CHM Heat Map Viewer, provided my MD Anderson Cancer Center, which provides a graphical environment for exploration of clustered or non-clustered heat map data. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.