Facebook
TwitterBackground Microarray technologies are emerging as a promising tool for genomic studies. The challenge now is how to analyze the resulting large amounts of data. Clustering techniques have been widely applied in analyzing microarray gene-expression data. However, normal mixture model-based cluster analysis has not been widely used for such data, although it has a solid probabilistic foundation. Here, we introduce and illustrate its use in detecting differentially expressed genes. In particular, we do not cluster gene-expression patterns but a summary statistic, the t-statistic. Results The method is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle-ear infection. Three clusters were found, two of which contain more than 95% genes with almost no altered gene-expression levels, whereas the third one has 30 genes with more or less differential gene-expression levels. Conclusions Our results indicate that model-based clustering of t-statistics (and possibly other summary statistics) can be a useful statistical tool to exploit differential gene expression for microarray data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Microarray analysis results. Clusters generated through K-means clustering in Genesis for the microarray. (XLSX 330 kb)
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
his dataset is based on National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) DataSet accession GDS2778. girke.bioinformatics.ucr.edu +1
The dataset originates from a microarray experiment measuring global gene expression under specific experimental conditions. girke.bioinformatics.ucr.edu +1
Raw and processed expression data (for all probes/genes) are included, enabling downstream analysis such as normalization, differential expression, and clustering.
The dataset has been used to perform differential gene expression (DGE) analysis to identify genes that are up- or down-regulated under the experimental condition compared to control.
Data processing steps typically include normalization (e.g., log-transformation), quality control, probe-to-gene mapping, and statistical testing for significance (e.g., using packages such as limma or other DGE tools). mahsa-ehsanifard.github.io +1
Resulting differentially expressed genes (DEGs) include statistics such as log fold change (logFC), adjusted p‑values (adj.P.Val), and possibly other metrics (e.g., B-statistic), allowing assessment of both magnitude and significance of changes.
The dataset also includes a visualization file (heatmap image) that displays expression patterns of DEGs (or top variable genes) across samples — enabling clustering and pattern recognition across samples and genes.
The heatmap helps illustrate sample-wise and gene-wise expression variation: clustering groups together samples (e.g. control vs treatment) and genes with similar expression dynamics. NCBI +1
This dataset is suitable for further bioinformatics analysis: e.g. functional enrichment (GO/Pathway), co‑expression analysis, gene signature identification, or integration with other datasets.
Users who download this dataset can reproduce or extend analyses, such as re-normalization, alternative clustering, custom DEG thresholds, or downstream biological interpretation (pathway, network analysis).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We present a large-scale analysis of mRNA coexpression based on 60 large human data sets containing a total of 3924 microarrays. We sought pairs of genes that were reliably coexpressed (based on the correlation of their expression profiles) in multiple data sets, establishing a high-confidence network of 8805 genes connected by 220,649 “coexpression links” that are observed in at least three data sets. Confirmed positive correlations between genes were much more common than confirmed negative correlations. We show that confirmation of coexpression in multiple data sets is correlated with functional relatedness, and show how cluster analysis of the network can reveal functionally coherent groups of genes. Our findings demonstrate how the large body of accumulated microarray data can be exploited to increase the reliability of inferences about gene function.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The total number of genes identified by microarray analysis as significantly altered in the offspring due to parental adolescent binge EtOH exposure. Genes are grouped according to unique functional gene clusters.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes). Results We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows gene s with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. Conclusion The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a wide range of biological functions with high precision
Facebook
TwitterThis file contains example R scripts used for the study presented here. The several sections in the file correspond to the different DE and clustering analyses performed. (TXT)
Facebook
TwitterBio Resource for array genes is a free online resource for easy access to collective and integrated information from various public biological resources for human, mouse, rat, fly and c. elegans genes. The resource includes information about the genes that are represented in Unigene clusters. This resource provides interactive tools to selectively view, analyze and interpret gene expression patterns against the background of gene and protein functional information. Different query options are provided to mine the biological relationships represented in the underlying database. Search button will take you to the list of query tools available. This Bio resource is a platform designed as an online resource to assist researchers in analyzing results of microarray experiments and developing a biological interpretation of the results. This site is mainly to interpret the unique gene expression patterns found as biological changes that can lead to new diagnostic procedures and drug targets. This interactive site allows users to selectively view a variety of information about gene functions that is stored in an underlying database. Although there are other online resources that provide a comprehensive annotation and summary of genes, this resource differs from these by further enabling researchers to mine biological relationships amongst the genes captured in the database using new query tools. Thus providing a unique way of interpreting the microarray data results based on the knowledge provided for the cellular roles of genes and proteins. A total of six different query tools are provided and each offer different search features, analysis options and different forms of display and visualization of data. The data is collected in relational database from public resources: Unigene, Locus link, OMIM, NCBI dbEST, protein domains from NCBI CDD, Gene Ontology, Pathways (Kegg, Genmapp and Biocarta) and BIND (Protein interactions). Data is dynamically collected and compiled twice a week from public databases. Search options offer capability to organize and cluster genes based on their Interactions in biological pathways, their association with Gene Ontology terms, Tissue/organ specific expression or any other user-chosen functional grouping of genes. A color coding scheme is used to highlight differential gene expression patterns against a background of gene functional information. Concept hierarchies (Anatomy and Diseases) of MESH (Medical Subject Heading) terms are used to organize and display the data related to Tissue specific expression and Diseases. Sponsors: BioRag database is maintained by the Bioinformatics group at Arizona Cancer Center. The material presented here is compiled from different public databases. BioRag is hosted by the Biotechnology Computing Facility of the University of Arizona. 2002,2003 University of Arizona.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Protein-Protein, Genetic, and Chemical Interactions for Powell DW (2004):Cluster analysis of mass spectrometry data reveals a novel component of SAGA. curated by BioGRID (https://thebiogrid.org); ABSTRACT: The SAGA histone acetyltransferase and TFIID complexes play key roles in eukaryotic transcription. Using hierarchical cluster analysis of mass spectrometry data to identify proteins that copurify with components of the budding yeast TFIID transcription complex, we discovered that an uncharacterized protein corresponding to the YPL047W open reading frame significantly associated with shared components of the TFIID and SAGA complexes. Using mass spectrometry and biochemical assays, we show that YPL047W (SGF11, 11-kDa SAGA-associated factor) is an integral subunit of SAGA. However, SGF11 does not appear to play a role in SAGA-mediated histone acetylation. DNA microarray analysis showed that SGF11 mediates transcription of a subset of SAGA-dependent genes, as well as SAGA-independent genes. SAGA purified from a sgf11 Delta deletion strain has reduced amounts of Ubp8p, and a ubp8 Delta deletion strain shows changes in transcription similar to those seen with the sgf11 Delta deletion strain. Together, these data show that Sgf11p is a novel component of the yeast SAGA complex and that SGF11 regulates transcription of a subset of SAGA-regulated genes. Our data suggest that the role of SGF11 in transcription is independent of SAGA's histone acetyltransferase activity but may involve Ubp8p recruitment to or stabilization in SAGA.
Facebook
TwitterBackgroundBiliary atresia (BA) is a severe cholangiopathy of early infancy that destroys cholangiocytes, obstructs ductular pathways and if left untreated, culminates to liver cirrhosis. Mechanisms underlying the etiological heterogeneity remain elusive and few studies have attempted phenotyping BA. We applied machine learning to identify distinct subtypes of BA which correlate with the underlying pathogenesis.MethodsThe BA microarray dataset GSE46995 was downloaded from the Gene Expression Omnibus (GEO) database. Unsupervised hierarchical cluster analysis was performed to identify BA subtypes. Then, functional enrichment analysis was applied and hub genes identified to explore molecular mechanisms associated with each subtype. An independent dataset GSE15235 was used for validation process.ResultsBased on unsupervised cluster analysis, BA patients can be classified into three distinct subtypes: Autoimmune, Viral and Embryonic subtypes. Functional analysis of Subtype 1 correlated with Fc Gamma Receptor (FCGR) activation and hub gene FCGR2A, suggesting an autoimmune response targeting bile ducts. Subtype 2 was associated with immune receptor activity, cytokine receptor, signaling by interleukins, viral protein interaction, suggesting BA is associated with viral infection. Subtype 3 was associated with signaling and regulation of expression of Robo receptors and hub gene ITGB2, corresponding to embryonic BA. Moreover, Reactome pathway analysis showed Neutrophil degranulation pathway enrichment in all subtypes, suggesting it may result from an early insult that leads to biliary stasis.ConclusionsThe classification of BA into different subtypes improves our current understanding of the underlying pathogenesis of BA and provides new insights for future studies.
Facebook
TwitterClustering of genes and/or samples is a common task in gene expression analysis. The goals in clustering can vary, but an important scenario is that of finding biologically meaningful subtypes within the samples. This is an application that is particularly appropriate when there are large numbers of samples, as in many human disease studies. With the increasing popularity of single-cell transcriptome sequencing (RNA-Seq), many more controlled experiments on model organisms are similarly creating large gene expression datasets with the goal of detecting previously unknown heterogeneity within cells. It is common in the detection of novel subtypes to run many clustering algorithms, as well as rely on subsampling and ensemble methods to improve robustness. We introduce a Bioconductor R package, clusterExperiment, that implements a general and flexible strategy we entitle Resampling-based Sequential Ensemble Clustering (RSEC). RSEC enables the user to easily create multiple, competing clusterings of the data based on different techniques and associated tuning parameters, including easy integration of resampling and sequential clustering, and then provides methods for consolidating the multiple clusterings into a final consensus clustering. The package is modular and allows the user to separately apply the individual components of the RSEC procedure, i.e., apply multiple clustering algorithms, create a consensus clustering or choose tuning parameters, and merge clusters. Additionally, clusterExperiment provides a variety of visualization tools for the clustering process, as well as methods for the identification of possible cluster signatures or biomarkers. The R package clusterExperiment is publicly available through the Bioconductor Project, with a detailed manual (vignette) as well as well documented help pages for each function.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of significantly enriched GO functions for the cluster shown in Fig. 7. (XLSX 17 kb)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Motivation: Gene clustering and sample clustering are commonly used to find patterns in gene expression datasets. However, in heterogeneous samples (e.g. different tissues or disease states), genes may cluster differently. Biclustering algorithms aim to solve this issue by performing sample clustering and gene clustering simultaneously. Existing reviews of biclustering algorithms have yet to include a number of more recent algorithms and have based comparisons on simplistic simulated datasets without specific evaluation of biclusters in real datasets, using less robust metrics.
Results: In this study we compared four classes of sparse biclustering algorithms on a range of simulated and real datasets. In particular we use a knockout mouse RNA-seq dataset to evaluate each algorithm’s ability to simultaneously cluster genes and cluster samples. We found that Bayesian algorithms with strict sparsity constraints had high accuracy on the simulated datasets and didn't require any post-processing, but were considerably slower than other algorithm classes. We assessed whether non-negative matrix factorisation algorithms can be repurposed for biclustering and found that, although the raw output was poor, after using a sparsity-inducing post-processing procedure we introduce, one such algorithm was one of the most highly ranked on real datasets. We also exhibit the limitations of biclustering algorithms by varying the complexity of simulated datasets. The algorithms generally struggled on simulated datasets with a large number of implanted factors, or with a large number of genes. In real datasets, the algorithms rarely returned clusters containing samples from multiple tissues, which highlights the need for further thought in the design and analysis of multi-tissue studies to avoid differences between tissues dominating the analysis.
Code to run the analysis is available at https://github.com/nichollskc/biclust_comp, including wrappers for each algorithm, implementations of evaluation metrics, and code to simulate datasets and perform pre- and post-processing.
Facebook
TwitterColon dataset
Authors: U. Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, A. Levine
Please cite: (URL): U. Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, A. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Nat. Acad. Sci. 96 (12) (1999) 6745-6750
Facebook
TwitterPAPER 1:'Identification of novel subgroups of high-risk pediatric precursor B acute lymphoblastic leukemia (B-ALL) by unsupervised microarray analysis: clinical correlates and therapeutic implications. A Children's Oncology Group (COG) study.' ABSTRACT We examined gene expression profiles of pre-treatment specimens from 207 patients from the COG P9906 study to identify signatures of children with high risk B-precursor acute lymphoblastic leukemia (ALL) and to determine whether the resulting clusters are associated with either specific clinical features or treatment response characteristics. Four unsupervised clustering methods were utilized to classify patients into similar groups. The different clustering algorithms showed significant overlap in cluster membership. Two clusters contained all cases with either t(1;19)(q23;p13) translocations or MLL rearrangements. The other six clusters were novel and had no recurring chromosomal abnormalities or distinctive clinical features. Members of two of these novel clusters had significant survival differences when compared to the overall 4-year relapse-free survival (RFS) of 61%. These included clusters of patients with either significantly better (94.7%) or worse (21.0%) RFS at 4 years. Children of Hispanic/Latino ethnicity were disproportionately present in the poor outcome cluster. The poor outcome cluster represents a novel biologically distinctive subset of B-precursor ALL that may occur at least as frequently as BCR/ABL. Further molecular characterization of this cluster may lead to the discovery of genomic abnormalities that can be targeted to improve the currently dismal outcome for children with this gene signature. The Sample data have also been used in another study: PAPER 2: 'Gene expression classifiers for minimal residual disease and relapse free survival improve outcome prediction and risk classification in children with high risk acute lymphoblastic leukemia. A Children's Oncology Group study'. ABSTRACT Background. Nearly 25% of children with B-precursor ALL present with 'high-risk' disease (HR-ALL) that is resistant to current therapies. Gene expression profiling may yield molecular classifiers for outcome prediction that can be used to improve risk classification and therapeutic targeting. Methods. Expression profiles were obtained in pre-treatment leukemic samples from 207 uniformly treated children with HR-ALL. Relapse free survival (RFS) was 61% at 4 years and flow cytometric measures of minimal residual disease (MRD) at the end of induction (day 29) were predictive of outcome (P<0.001). Molecular classifiers predictive of RFS and MRD were developed using extensive cross-validation procedures. Results. A 38 gene molecular risk classifier predictive of RFS (MRC-RFS) distinguished two groups in HR-ALL with different relapse risks: low (4 yr RFS: 81%, n=109) vs. high (4 yr RFS: 50%, n=98) (P<0.0001). In multivariate analysis, the best predictor combined MRC-RFS and day 29 flow MRD data, classifying children into low (87% RFS), intermediate (62% RFS), or high risk (29% RFS) groups (P<0.0001). A 21 gene molecular classifier predictive of MRD could effectively substitute for day 29 flow MRD, yielding a combined classifier that similarly distinguished three risk groups at pre-treatment (low: 82% RFS; intermediate: 63% RFS; and high risk: 45% RFS) (P<0.0001). This combined molecular classifier was further validated on an independent cohort of 84 children with HR-ALL (P = 0.006). Conclusions. Molecular classifiers predictive of RFS and MRD can be used to distinguish distinct prognostic groups within HR-ALL, significantly improving risk classification schemes and the ability to prospectively identify children at diagnosis who will respond to or fail current treatment regimens. NOTE: Due to Children's Oncology Group (COG) restrictions, outcome and MRD data cannot be provided as part of the covariate data for this dataset at the present time. If you would like to arrange individual access to this data, please contact COG or the PI of this study, Dr. Cheryl Willman, at the University of New Mexico Cancer Center (cwillman@unm.edu) to arrange a collaboration. Overall design: Unsupervised clustering and supervised risk classification analyses of 207 diagnostic samples and associated clinical covariate data. See the Summary for greater details. The data were analyzed using Microarray Suite version 5.0 (MAS 5.0) in the Affymetrix Gene Chip Operating Software Version 1.4. Probe masking was used (see 9906_TT207_Affymetrix_probe_mask.msk, linked below as a supplementary file). Otherwise all Affymetrix default parameter settings were used. Global scaling as the normalization method, with the default target intensity of 500, was used.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: It is still uncertain whether small cell lung carcinomas (SCLCs), pulmonary carcinoids, and the gastrointestinal neuroendocrine tumors (GI-NETs) have a common origin. MicroRNA (miRNA) expression may clarify their genetic relationships and origin. Methods: First, we compared the miRNA expression signature of formalin-fixed paraffin-embedded (FFPE) samples with frozen samples to verify the applicability of microarray analysis. Second, we compared the comprehensive miRNA expression patterns of pulmonary carcinoids and GI-NETs as well as other types of tumors and normal tissues from each organ using FFPE samples. These data were analyzed by hierarchical clustering and consensus clustering with nonnegative matrix factorization. Results: We confirmed that FFPE samples retained the miRNA signatures. In the first hierarchical clustering comparing carcinoids/NETs with adenocarcinomas and normal tissues, most of the carcinoids (48/50) formed 1 major cluster with loose subpartitioning into each organ type, while all the adenocarcinomas (9/9) and normal tissues (15/15) formed another major cluster. The nonnegative matrix factorization approach largely matched the classification of the hierarchical clustering. In the additional cluster analysis comparing carcinoids/NETs with SCLCs, most carcinoids/NETs (17/22) formed a major cluster, while SCLCs (9/9) grouped together with pulmonary adenocarcinomas (3/3) and normal tissues (6/6) in another major cluster. Furthermore, a subset of miRNAs was successfully identified that exhibited significant expression in carcinoids/NETs. Conclusion: Carcinoids/NETs had a characteristic pattern of miRNA expression, suggesting a common origin for pulmonary carcinoids and GI-NETs. The expression profiles of pulmonary carcinoids and SCLCs were quite different, indicating the distinct histogenesis of these neuroendocrine neoplasms.
Facebook
TwitterThe vascular endothelium is considered as a key cell compartment for the response to ionizing radiation of normal tissues and tumors, and as a promising target to improve the differential effect of radiotherapy in the future. Following radiation exposure, the global endothelial cell response covers a wide range of gene, miRNA, protein and metabolite expression modifications. Changes occur at the transcriptional, translational and post-translational levels and impact cell phenotype as well as the microenvironment by the production and secretion of soluble factors such as reactive oxygen species, chemokines, cytokines and growth factors. These radiation-induced dynamic modifications of molecular networks may control the endothelial cell phenotype and govern recruitment of immune cells, stressing the importance of clearly understanding the mechanisms which underlie these temporal processes. A wide variety of time series data is commonly used in bioinformatics studies, including gene expression, protein concentrations and metabolomics data. The use of clustering of these data is still an unclear problem. Here, we introduce kernels between Gaussian processes modeling time series, and subsequently introduce a spectral clustering algorithm. We apply the methods to the study of human primary endothelial cells (HUVECs) exposed to a radiotherapy dose fraction (2 Gy). Time windows of differential expressions of 301 genes involved in key cellular processes such as angiogenesis, inflammation, apoptosis, immune response and protein kinase were determined from 12 hours to 3 weeks post-irradiation. Then, 43 temporal clusters corresponding to profiles of similar expressions, including 49 genes out of 301 initially measured, were generated according to the proposed method. Forty-seven transcription factors (TFs) responsible for the expression of clusters of genes were predicted from sequence regulatory elements using the MotifMap system. Their temporal profiles of occurrences were established and clustered. Dynamic network interactions and molecular pathways of TFs and differential genes were finally explored, revealing key node genes and putative important cellular processes involved in tissue infiltration by immune cells following exposure to a radiotherapy dose fraction.
Facebook
TwitterCronobacter species are opportunistic pathogens capable of causing life-threatening infections in humans, with serious complications arising in neonates, infants, immuno-compromised individuals, and elderly adults. The genus is comprised of seven species: Cronobacter sakazakii, Cronobacter malonaticus, Cronobacter turicensis, Cronobacter muytjensii, Cronobacter dublinensis, Cronobacter universalis, and Cronobacter condimenti. Despite a multiplicity of genomic data for the genus, little is known about likely transmission vectors. Using DNA microarray analysis, in parallel with whole genome sequencing, and targeted PCR analyses, the total gene content of two C. malonaticus, three C. turicensis, and 14 C. sakazaki isolated from various filth flies was assessed. Phylogenetic relatedness among these and other strains obtained during surveillance and outbreak investigations were comparatively assessed. Specifically, microarray analysis (MA) demonstrated its utility to cluster strains according to species-specific and sequence type (ST) phylogenetic relatedness, and that the fly strains clustered among strains obtained from clinical, food and environmental sources from United States, Europe, and Southeast Asia. This combinatorial approach was useful in data mining for virulence factor genes, and phage genes and gene clusters. In addition, results of plasmidotyping were in agreement with the species identity for each strain as determined by species-specific PCR assays, MA, and whole genome sequencing. Microarray and BLAST analyses of Cronobacter fly sequence datasets were corroborative and showed that the presence and absence of virulence factors followed species and ST evolutionary lines even though such genes were orthologous. Additionally, zebrafish infectivity studies showed that these pathotypes were as virulent to zebrafish embryos as other clinical strains. In summary, these findings support a striking phylogeny amongst fly, clinical, and surveillance strains isolated during 2010–2015, suggesting that flies are capable vectors for transmission of virulent Cronobacter spp.; they continue to circulate among United States and European populations, environments, and that this “pattern of circulation” has continued over decades.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe demographic shift towards an older population presents significant challenges for kidney transplantation (KTx), particularly due to the vulnerability of aged donor kidneys to ischemic damage, delayed graft function, and reduced graft survival. KTx rejection poses a significant threat to allograft function and longevity of the kidney graft. The relationship between senescence and rejection remains elusive and controversial.MethodsGene Expression Omnibus (GEO) provided microarray and single-cell RNA sequencing datasets. After integrating Senescence-Related Genes (SRGs) from multiple established databases, differential expression analysis, weighted gene co-expression network analysis (WGCNA), and machine learning algorithms were applied to identify predictive SRGs (pSRGs). A cluster analysis of rejection samples was conducted using the consensus clustering algorithm. Subsequently, we utilized multiple machine learning methods (RF, SVM, XGB, GLM and LASSO) based on pSRGs to develop the optimal Acute Rejection (AR) diagnostic model and long-term graft survival predictive signatures. Finally, we validated the role of pSRGs and senescence in kidney rejection through the single-cell landscape.ResultsThirteen pSRGs were identified, correlating with rejection. Two rejection clusters were divided (Cluster C1 and C2). GSVA analysis of two clusters underscored a positive correlation between senescence, KTx rejection occurrence and worse graft survival. A non-invasive diagnostic model (AUC = 0.975) and a prognostic model (1- Year AUC = 0.881; 2- Year AUC = 0.880; 3- Year AUC = 0.883) for graft survival were developed, demonstrating significant predictive capabilities to early detect acute rejection and long-term graft outcomes. Single-cell sequencing analysis provided a detailed cellular-level landscape of rejection, supporting the conclusions drawn from above.ConclusionOur comprehensive analysis underscores the pivotal role of senescence in KTx rejection, highlighting the potential of SRGs as biomarkers for diagnosing rejection and predicting graft survival, which may enhance personalized treatment strategies and improve transplant outcomes.
Facebook
TwitterAttribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
The radiation bystander effect is an important component of the overall biological response of tissues and organisms to ionizing radiation. Little is known about the contribution of genome level changes in neighboring bystander cells to tissue and organ stress after irradiation. The timing of these changes is critical in the physiological context and these questions can only be answered by studying signaling and global transcriptomics in a chronological way. Here, we present a strategy to identify different biologically important signaling modules that act in concert in the radiation and bystander responses. We used time series gene expression analysis of normal human fibroblast cells measured at 0.5 hour, 1 hour, 2 hours, 4 hours, 6 hours and 24 hours after exposure to radiation coupled with a novel clustering method targeted to short time series, Feature Based Partitioning around medoids Algorithm (FBPA), to look for genes that were potentially co-regulated. This method uses biologically meaningful features of the expression profile and dimension augmentation to address the analysis of sparse data sets such as ours. We applied FBPA and Short Time series Expression Miner (STEM) to the same datasets and present the results of our comparisons using computational metrics as well as biological enrichment. Enrichment showed that gene expression in irradiated cells fell into broad categories of signal transduction, cell cycle/cell death and inflammation/immunity; but only FBPA clustered functions well. In bystander cells, the gene expression response was also broadly categorized into functions associated with cell communication and motility, signal transduction and inflammation; but neither STEM nor FBPA separated biological functions as well as in irradiated samples. Network analysis revealed that p53 and NF-kappaB were central players in gene expression in both irradiated and bystander gene clusters. Analysis of individual clusters also suggested new regulators of gene expression in the radiation and bystander response that may act at the epigenetic level such as histone deacetylases (HDAC1 and HDAC2) and methylases (KDM5B) that can act as strong transcription repressors. Based on these results, we propose a novel time series clustering method, FBPA, as a powerful approach that can be applied to sparse data sets (including genomic profiling data), where the choice of features selected for clustering and stringent statistical outcome analysis can augment our knowledge of the underlying cellular mechanisms in biological processes. There are 72 total samples, 4 corresponding biological replicates of IMR90 cells that were not irradiated (control=C), irradiated (alpha=A) and bystander (B), cells were harvested at 0.5 hour, 1 hour, 2 hours, 4 hours, 6 hours and 24 hours after treatment
Facebook
TwitterBackground Microarray technologies are emerging as a promising tool for genomic studies. The challenge now is how to analyze the resulting large amounts of data. Clustering techniques have been widely applied in analyzing microarray gene-expression data. However, normal mixture model-based cluster analysis has not been widely used for such data, although it has a solid probabilistic foundation. Here, we introduce and illustrate its use in detecting differentially expressed genes. In particular, we do not cluster gene-expression patterns but a summary statistic, the t-statistic. Results The method is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle-ear infection. Three clusters were found, two of which contain more than 95% genes with almost no altered gene-expression levels, whereas the third one has 30 genes with more or less differential gene-expression levels. Conclusions Our results indicate that model-based clustering of t-statistics (and possibly other summary statistics) can be a useful statistical tool to exploit differential gene expression for microarray data.