Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundMeta-analysis of gene expression array databases has the potential to reveal information about gene function. The identification of gene-gene interactions may be inferred from gene expression information but such meta-analysis is often limited to a single microarray platform. To address this limitation, we developed a gene-centered approach to analyze differential expression across thousands of gene expression experiments and created the CO-Regulation Database (CORD) to determine which genes are correlated with a queried gene.ResultsUsing the GEO and ArrayExpress database, we analyzed over 120,000 group by group experiments from gene microarrays to determine the correlating genes for over 30,000 different genes or hypothesized genes. CORD output data is presented for sample queries with focus on genes with well-known interaction networks including p16 (CDKN2A), vimentin (VIM), MyoD (MYOD1). CDKN2A, VIM, and MYOD1 all displayed gene correlations consistent with known interacting genes.ConclusionsWe developed a facile, web-enabled program to determine gene-gene correlations across different gene expression microarray platforms. Using well-characterized genes, we illustrate how CORD's identification of co-expressed genes contributes to a better understanding a gene's potential function. The website is found at http://cord-db.org.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains data files and identifiers for original data sources for 39 gene expression datasets from over 7,000 individuals with estrogen receptor positive (ER-positive) Breast Cancer (BC).BackgroundThe related study developed a novel in silico approach to assess activation of different signalling pathways. The phosphatidylinositol 3-kinase (PI3K)/AKT/mTOR signalling pathway mediates key cellular functions, including growth, proliferation and survival and is frequently involved in carcinogenesis, tumor progression and metastases. This research seeks to target relative contribution of AKT and mTOR (downstream of PI3K) in BC outcomes using the in silico approach via integrated reverse phase protein array (RPPA) and matched gene expression.Methods and sample sizeThe methodology includes the development of gene signatures that reflect level of expression of pAKT and p-mTOR separately. Pooled analysis of gene expression data from over 7,000 patients with ER-positive BC was then performed. This data record holds links to the repositories holding these data, as well as the R-data files for each data record used in the analysis. All gene signatures developed are captured in Supplementary Data Sonnenblick.pdf.xlsxData sourcesThe dataset name, relevant DOI, accession number or access requirements are listed alongside the file type and repository name or other source where applicable.GEO=Gene Expression OmnibusEGA=European Genome-phenome ArchiveThis data table is available to download as NPJBCANCER-00304R1-data-sources.xlsx including more detailed information and web urls to each data source. data_db.tab contains more detailed technical metadata for each data source.
Dataset Data location Permanent identifier/url
NKI CCB NKI http://ccb.nki.nl/data/van-t-Veer_Nature_2002/
UCSF GEO GSE123833
STNO2 GEO GSE4335
NCI Research Article (Supplementary files) 10.1073/pnas.1732912100
UNC4 GEO GSE18229
CAL Array Express E-TABM-158
MDA4 GEO GSE123832
KOO GEO GSE123831
HLP Array Express E-TABM-543
EXPO GEO GSE2109
VDX GEO GSE2034/GSE5327
MSK GEO GSE2603
UPP GEO GSE3494
STK GEO GSE1456
UNT GEO GSE2990
DUKE GEO GSE3143
TRANSBIG GEO GSE7390
DUKE2 GEO GSE6961
MAINZ GEO GSE11121
LUND2 GEO GSE5325
LUND GEO GSE5325
FNCLCC GEO GSE7017
EMC2 GEO GSE12276
MUG GEO GSE10510
NCCS GEO GSE5364
MCCC GEO GSE19177
EORTC10994 GEO GSE1561
DFHCC GEO GSE19615
DFHCC2 GEO GSE18864
DFHCC3 GEO GSE3744
DFHCC4 GEO GSE5460
MAQC2 GEO GSE20194
TAM GEO GSE6532/GSE9195
MDA5 GEO GSE17705
VDX3 GEO GSE12093
METABRIC EGA EGAS00000000083
TCGA TCGA https://tcga-data.nci.nih.gov/docs/publications/brca_2012/
DNA methylation (Dedeurwaerder et al. 2011) GEO https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20713
Facebook
TwitterGene Expression Omnibus is a public functional genomics data repository supporting MIAME-compliant submissions of array- and sequence-based data. Tools are provided to help users query and download experiments and curated gene expression profiles.
Facebook
TwitterAgilent gene expression arrays were used for intrinsic subtyping and to measure changes after anastrozole (C1D1) and after combination of anastrazole and palbociclib (C1D15). 118 breast cancer patient samples (baseline, n=32; C1D1, n=33; C1D15, n=29; surgery, n=24) from clinical trial NCT01723774 were arrayed, subtyped, and used to detect changes with treatment.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Top 20 Genes Co-expressed with vimentin (VIM) identified by CORD.
Facebook
TwitterCollection of gene expression and similar datasets related to brain tumors. In particular Medulloblastoma. Medulloblastoma is the most common malignant brain tumor in childhood. Typically csv files genes x samples.
GSE124814 WOW! Integration of many (all?) medulloblastoma datasets(!): 1641 samples, of which 1350 samples represent primary medulloblastomas and 291 samples represent normal brain
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE124814 Weishaupt H, Johansson P, Sundström A, Lubovac-Pilav Z et al. Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes. Bioinformatics 2019 Sep 15;35(18):3357-3364. PMID: 30715209 https://doi.org/10.1093/bioinformatics/btz066 We downloaded a total of 1796 CEL files from previously published GEO or ArrayExpress records: GSE85217(n=763), GSE25219(n=154), GSE60862(n=130), GSE12992(n=40), GSE67850(n=22), GSE10327(n=62), GSE30074(n=30), E-MTAB-292(n=19), GSE74195(n=30), GSE37418(n=76), GSE4036(n=14), GSE62803(n=52), GSE21140(n=103), GSE37382(n=50), GSE22569(n=24), GSE35974(n=50), GSE73038(n=46), GSE50161(n=24), GSE3526(n=9), GSE50765(n=12), GSE49243(n=58), GSE41842(n=19), GSE44971(n=9). After preprocessing of all CEL files, we averaged the expression profiles of samples that mapped to the same patient in a single dataset, producing a final expression array comprising 1641 samples, of which 1350 samples represent primary medulloblastomas and 291 samples represent normal brain (cerebellum/upper rhombic lip). Also discussed in paper: A transcriptome-based classifier to determine molecular subtypes in medulloblastoma https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008263
GSE85217 (Cavalli ... Taylor ) 768 samples 2016 ( Affimetrix Human Gene 1.1 ST Array ) Cavalli FMG, Remke M, Rampasek L, Peacock J et al. Intertumoral Heterogeneity within Medulloblastoma Subgroups. Cancer Cell 2017 Jun 12;31(6):737-754.e6. PMID: 28609654 Ramaswamy V, Taylor MD. Bioinformatic Strategies for the Genomic and Epigenomic Characterization of Brain Tumors. Methods Mol Biol 2019;1869:37-56. PMID: 30324512 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85217
GSE202043 (Pomeroy) 214 samples, 2011 (Expression profiling by array) Cho YJ, Tsherniak A, Tamayo P, Santagata S et al. Integrative genomic analysis of medulloblastoma identifies a molecular subgroup that drives poor clinical outcome. J Clin Oncol 2011 Apr 10;29(11):1424-30. PMID: 21098324 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE202043
GSE12992 (Fattet ... Delattre) 72 samples, 2009 (Expression profiling by array) Fattet S, Haberler C, Legoix P, Varlet P et al. Beta-catenin status in paediatric medulloblastomas: correlation of immunohistochemical expression with mutational status, genetic profiles, and clinical characteristics. J Pathol 2009 May;218(1):86-94. PMID: 19197950 A series of 72 pediatric medulloblastoma tumors has been studied at the genomic level (array-CGH), screened for CTNNB1 mutations and beta-catenin expression (immunohistochemistry). A subset of 40 tumor samples has been analyzed at the RNA expression level (Affymetrix HG U133 Plus 2.0). https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12992
GSE37382 (Northcott ... Taylor) 2012 (Expression profiling by array, Affymetrix Human Gene 1.1 ST Array profiling of 285 primary medulloblastoma samples.) Northcott PA, Shih DJ, Peacock J, Garzia L et al. Subgroup-specific structural variation across 1,000 medulloblastoma genomes. Nature 2012 Aug 2;488(7409):49-56. PMID: 22832581 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37382
GSE10327 (M. Kool ) 62 samples, 2008 ( Expression profiling by array ) (beware it is sometimes referred as GSE10237 in original paper and several references - that is an error reference). Kool M, Koster J, Bunt J, Hasselt NE et al. Integrated genomics identifies five medulloblastoma subtypes with distinct genetic profiles, pathway signatures and clinicopathological features. PLoS One 2008 Aug 28;3(8):e3088. PMID: 18769486 Rack PG, Ni J, Payumo AY, Nguyen V et al. Arhgap36-dependent activation of Gli transcription factors. Proc Natl Acad Sci U S A 2014 Jul 29;111(30):11061-6. PMID: 25024229 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10327
Other datasets (not yet loaded):
(47.1 Gb, 2012) (Expression profiling by array, Genome variation profiling by SNP array, SNP genotyping by SNP array ) Northcott PA, Shih DJ, Peacock J, Garzia L et al. Subgroup-specific structural variation across 1,000 medulloblastoma genomes. Nature 2012 Aug 2;488(7409):49-56. PMID: 22832581 Here we report somatic copy number aberrations (SCNAs) in 1087 unique medulloblastomas. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37385
Facebook
TwitterArray Manufacturer: Affymetrix, Distribution: commercial, Technology: in situ oligonucleotide, Affymetrix submissions are typically submitted to GEO using the GEOarchive method described at http://www.ncbi.nlm.nih.gov/projects/geo/info/geo_affy.html Based on this UniGene build and associated annotations, the HG-U95Av2 array represents approximately 10,000 full-length genes.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains microarray-based gene expression profiles of granulosa cells collected from women diagnosed with Polycystic Ovary Syndrome (PCOS) and from healthy controls. It originates from the NCBI GEO DataSet GDS4399, which was generated to study the molecular mechanisms underlying PCOS pathogenesis and its relationship to insulin resistance, steroidogenesis, and oocyte maturation.
The data were collected using the Affymetrix Human Genome U133 Plus 2.0 Array (GPL570 platform). Each sample corresponds to an RNA expression profile of granulosa cells isolated from ovarian aspirates of PCOS and non-PCOS women undergoing in-vitro fertilization (IVF).
Key Details
NCBI GEO Accession: GDS4399
Source: Gene Expression Omnibus (GEO), NCBI. GEO Accession: GDS4399 Title: Polycystic ovary syndrome: granulosa cells Platform: Affymetrix Human Genome U133 Plus 2.0 Array (GPL570) Authors: Wood JR, et al. (Original study contributors) National Center for Biotechnology Information, U.S. National Library of Medicine. Available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GDS4399
Recommended citation style (IEEE): [1] J. R. Wood et al., “Polycystic ovary syndrome: granulosa cells,” Gene Expression Omnibus (GEO), GDS4399, NCBI, Bethesda, MD, USA. [Online]. Available: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GDS4399
License: This dataset is part of the public NCBI GEO database and is distributed under the Public Domain / CC0 License for research and educational use. Please cite the original GEO entry when reusing this dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveTo systematically review the literature on human gene expression data of placental tissue in pre-eclampsia and to characterize a meta-signature of differentially expressed genes in order to identify novel putative diagnostic markers.Data SourcesMedline through 11 February 2011 using MeSH terms and keywords related to placenta, gene expression and gene expression arrays; GEO database using the term “placent*”; and reference lists of eligible primary studies, without constraints.MethodsFrom 1068 studies retrieved from the search, we included original publications that had performed gene expression array analyses of placental tissue in the third trimester and that reported on differentially expressed genes in pre-eclampsia versus normotensive controls. Two reviewers independently identified eligible studies, extracted descriptive and gene expression data and assessed study quality. Using a vote-counting method based on a comparative meta-profiling algorithm, we determined a meta-signature that characterizes the significant intersection of differentially expressed genes from the collection of independent gene signatures.ResultsWe identified 33 eligible gene expression array studies of placental tissue in the 3rd trimester comprising 30 datasets on mRNA expression and 4 datasets on microRNA expression. The pre-eclamptic placental meta-signature consisted of 40 annotated gene transcripts and 17 microRNAs. At least half of the mRNA transcripts encode a protein that is secreted from the cell and could potentially serve as a biomarker.ConclusionsIn addition to well-known and validated genes, we identified 14 transcripts not reported previously in relation to pre-eclampsia of which the majority is also expressed in the 1st trimester placenta, and three encode a secreted protein.
Facebook
TwitterNo matter how much you wash your hands, you are still susceptible to flu airborne viruses or cold viruses in close proximity to others who have a cold or flu. The flu vaccine is a treatment many folks get in hopes of not getting sick that cold/flu season. The flu vaccine is somewhat of a math cheat sheet for your body preparing for a math course final without having to know all of the formulas off hand, but only the ones that are on the exam. If you have a crooked teacher/TA that decided not to allow the cheat sheet to be a good representation of what the content of the final exam is, then you could assume that is how your body will be with a flu vaccine that doesn't have the strand(s) of flu your body is likely to encounter that flu season. I found this data set munging the GEO database sets of NCBI while searching for 'flu vaccines' and wanted some microarray gene expression data sets that I could also compare those values to other blood micro array samples from separate studies on females using EGCG for obesity, and males who do/don't have heart disease. This data can be blended with the other data sets here or in my github repositories at janjanjan2018.
Blood gene expressions of microarray samples.
NCBI and the GEO grant funded data repositories of gene expression data.
Sick people.
Facebook
TwitterWe developed and validated a small-footprint array of miniature chemostats built from readily available parts for low cost. Physiological and experimental evolution results were similar to larger volume chemostats. The ministat array provides a compact, inexpensive, and accessible platform for traditional chemostat experiments, functional genomics, and chemical screening applications. Three experiments are gene expression comparisons between three ministat cultures and a single Sixfors sample. The four CGH arrays are individual clones evolved in four sulfate limitation ministats compared to a wt ancestor strain.
Facebook
TwitterGene-expression microarray datasets generated as part of the Immunological Genome Project (ImmGen). Primary cells from multiple immune lineages are isolated ex-vivo, primarily from young adult B6 male mice, and double-sorted to >99% purity. RNA is extracted from cells in a centralized manner, amplified and hybridized to Affymetrix 1.0 ST MuGene arrays. Protocols are rigorously standardized for all sorting and RNA preparation. Data is released monthly in batches of cell populations. This Series record provides access to Immunological Genome Project data submitted to GEO.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe delineation of genomic copy number abnormalities (CNAs) from cancer samples has been instrumental for identification of tumor suppressor genes and oncogenes and proven useful for clinical marker detection. An increasing number of projects have mapped CNAs using high-resolution microarray based techniques. So far, no single resource does provide a global collection of readily accessible oncogenomic array data. Methodology/Principal FindingsWe here present arrayMap, a curated reference database and bioinformatics resource targeting copy number profiling data in human cancer. The arrayMap database provides a platform for meta-analysis and systems level data integration of high-resolution oncogenomic CNA data. To date, the resource incorporates more than 40,000 arrays in 224 cancer types extracted from several resources, including the NCBI’s Gene Expression Omnibus (GEO), EBI’s ArrayExpress (AE), The Cancer Genome Atlas (TCGA), publication supplements and direct submissions. For the majority of the included datasets, probe level and integrated visualization facilitate gene level and genome wide data review. Results from multi-case selections can be connected to downstream data analysis and visualization tools. Conclusions/SignificanceTo our knowledge, currently no data source provides an extensive collection of high resolution oncogenomic CNA data which readily could be used for genomic feature mining, across a representative range of cancer entities. arrayMap represents our effort for providing a long term platform for oncogenomic CNA data independent of specific platform considerations or specific project dependence. The online database can be accessed at http//www.arraymap.org.
Facebook
TwitterArray Manufacturer: Affymetrix, Distribution: commercial, Technology: in situ oligonucleotide, Tiling array submissions are typically submitted to GEO using the GEOarchive method described at http://www.ncbi.nlm.nih.gov/projects/geo/info/geo_affy.html The GeneChip C. elegans Tiling 1.0R Array is designed for identifying novel transcripts or mapping sites of protein/DNA interaction in chromatin immunoprecipitation (ChIP) experiments, or other Caenorhabditis elegans whole-genome experiments. The C. elegans 1.0R Array is a single array comprised of over 3.2 million perfect match/mismatch probe pairs tiled through the complete non-repetitive Caenorhabditis elegans genome. Sequences used in the design of the C. elegans Tiling 1.0R Array were selected from the WormBase web site, www.wormbase.org, release WS140, March 26, 2005. Probes are tiled at an average resolution of 25 base pair, as measured from the central position of adjacent 25-mer oligos. BPMAP and other files can be downloaded from the Affymetrix Web site below.
Facebook
TwitterABX464, a new drug for curing HIV and treating inflammatory diseases induces upregulation of the anti-inflammatory miR-124.We used microarrays to show the implication of ABX464 in the biogenesis of small noncoding RNAs. So, we decided to evaluate if miRNAs or small nucleolar RNAs (snoRNAs) were differentially regulated by ABX464. We performed a microarray analysis for these RNAs from the PBMCs of 6 donors. Cells that were infected with the YU2 strain, followed by treatment with ABX464 were compared with uninfected and untreated controls. A total of 104 human miRNAs and 40 snoRNAs were significantly differentially expressed in infected PBMCs, when compared to uninfected PBMCs (data file S4), with a false discovery rate lower than 0.05 and fold change higher than 1.5.
Facebook
TwitterWe present a meta-dataset comprising of a total of 237 samples including both primary tumors and tumor-free prostate tissues from six independent GEO datasets. To minimise inter-platform variation, only datasets generated from the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array) were processed to develop the meta-dataset. Using multiple open source R packages implemented in our previously developed bioinformatics pipeline, each dataset has been preprocessed with RMA normalisation, merged, and batch effect-corrected via Combat method. With increased sample size, the present meta-dataset serves an excellent 'discovery cohort' for discovering differentially expressed in diseased phenotype.
Facebook
TwitterCurrent prognostic gene expression profiles for breast cancer mainly reflect proliferation status and are most useful in ER-positive cancers. Triple-negative breast cancers (TNBCs) are clinically heterogeneous, and prognostic markers and biology-based therapies are needed to better treat this disease. We assembled Affymetrix gene expression data for 579 TNBCs and performed unsupervised analysis to define metagenes that distinguish molecular subsets within TNBC. We used n=394 cases for discovery and n=185 cases for validation. Sixteen metagenes emerged that identified basal-like, apocrine and claudin-low molecular subtypes, or reflected various non-neoplastic cell populations including immune cells, blood, adipocytes, stroma, angiogenesis, and inflammation within the cancer. The expressions of these metagenes were correlated with survival and multivariate analysis was performed including routine clinical and pathological variables. 73% of TNBCs displayed basal-like molecular subtype that correlated with high histological grade and younger age. Survival of basal-like TNBC was not different from non-basal-like TNBC. High expression of immune cell metagenes was associated with good and high expression of inflammation and angiogenesis-related metagenes were associated with poor prognosis. A ratio of high B-cell and low IL-8 metagenes identified 32% of TNBC with good prognosis (HR 0.37, 95% CI 0.22-0.61; P<0.001) and was the only significant predictor in multivariate analysis including routine clincopathological variables. We describe a ratio of high B-cell presence and low IL-8 activity as a powerful new prognostic marker for TNBC. Inhibition of the IL-8 pathway also represents an attractive novel therapeutic target for this disease. Analysis of primary breast cancer biopsies from patients before treatment. No replicates. No control or reference samples are included. The set of 579 TNBCs includes: (1) 67 new GEO Samples (GSM782523-GSM782589), (2) 489 re-analyzed GEO Samples (see 'Relation' links below), and (3) 23 re-analyzed ArrayExpress Samples. Cohorts: HH = University of Hamburg FRA = University of Frankfurt, adjuvant chemotherapy FRA-2 = University of Frankfurt, neoadjuvant chemotherapy FRA-3 = University of Frankfurt, no adjuvant chemotherapy Data processing of the 579 TNBC Samples: MAS5 values were taken from GEO if available. For samples with no MAS5 values, CEL files were downloaded from GEO and the affy package from Bioconductor was used to generate MAS5 values. Next, MAS5 values corresponding only to the 22283 probesets from the U133A array were compiled. Subsequently, normalization of MAS5 data was performed using the command line version of the program CLUSTER 3.0 (Michael Eisen; updated by Michiel de Hoon; http://bonsai.hgc.jp/~mdehoon/software/cluster/command.txt). The following three steps were performed in the following order: 1. log2 transformation of MAS5 values 2. median centering of arrays 3. magnitude normalization of arrays These three steps correspond to the following commands: cluster.com filename -l cluster.com filename -ca m cluster.com filename -na The resulting dataset, which is linked below as a supplementary file, was used for the subsequent analyses.
Facebook
TwitterWe examine how different transcriptional network structures can evolve from a common, ancestral network. We show that regulatory protein modularity, conversion of one cis-regulatory sequence to another, distribution of binding energy among protein-protein and protein-DNA interactions, and exploitation of ancestral network features all contribute to the evolution of a novel mode of regulation at a conserved gene set. The formation of this derived mode of regulation did not disrupt the ancestral mode and thereby created a hybrid regulatory state where both means of transcription regulation (ancestral and derived) contribute to the conserved expression pattern of the network. Finally, we show how this hybrid regulatory state has resolved in different ways in different lineages to generate the diversity of regulatory network structures observed in modern species. a2 KO and alpha2 KO mRNA abundance was measured relative to a WT cell of the same mating type. 2 replicates each. Dye-swaps were performed.
Facebook
TwitterArray Manufacturer: Agilent, Distribution: custom-commercial, Technology: in situ oligonucleotide, We tiled the entire DMD gene, in both sense and antisense directions, using the web-based Agilent eArray database, Version 4.5 (Agilent Technologies), with 60-mer oligos every 66 bp of repeat-masked genome sequence. We defined probe sets for both orientations, encompassing the DMD exons, promoters, introns, predicted MiRNA (identified by PromiRII) and conserved non-coding sequences (CNSs) identified within dystrophin introns using the VISTA programme (http://genome.lbl.gov/vista/index.shtml). Two specific sets of probes were designed to cover, in both directions, the cDNA sequences of a group of control genes (Supplementary Table S1) identified in the Gene Expression Omnibus (GEO) database http://www.ncbi.nlm.nih.gov/geo/) and expressed equally in both normal and dystrophic muscles. Each probe set was opportunely distributed and replicated several times in order to obtain two 4x44k microarrays, referred to as DMD GEx Sense and DMD GEx Antisense, respectively, able to detect transcripts in the same and opposite directions as that of DMD gene transcription.
Facebook
TwitterExperiments conducted on this tiling array are used to (1) validate the frozen gene sets of the current genome annotation, (2) improve the predicted gene structures by empirically determining UTRs and intron-exon boundaries, identifying missing upstream, internal, and downstream exons and alternative transcripts, (3) propose gene structure models in transcribed regions containing no predicted genes and (4) delineate transcriptionally active regions of the genome from intergenic, intronic and genic regions. Signal to background ratios were determined by first calling probes that fluoresced at intensities greater than 99% of the random probes’ signal intensities; therefore, only 1% of fluorescing experimental probes should be false positives. We conducted two-color competitive hybridizations that measure differential expression from three replicates, each using RNA from independent biological extractions. Transcriptional active regions (TARs) were defined by stringing together overlapping probes showing fluorescence above a 1% false positive rate (FPR). Positive probes were joined into a TAR if they were adjacent (maxgap=0, no intermittent non-positive probe) and a TAR’s length had to be at least 45 bp (minrun=45, mid-point first positive probe to mid-point last positive probe, resulting in at least 3 adjacent positive probes for a TAR). The data analysis to measure differential expression of genes and of unannotated TARs was performed using the statistical software package R and Bioconductor with additions and modifications. The signal distributions across chips, samples and replicates were adjusted to be equal according to the mean fluorescence of the random probes on each array. All probes including random probes were quantile-normalized across replicates. Expression-level scores were assigned for each predicted gene based on the median log2 fluorescence over background intensity of probes falling within the exon boundaries.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundMeta-analysis of gene expression array databases has the potential to reveal information about gene function. The identification of gene-gene interactions may be inferred from gene expression information but such meta-analysis is often limited to a single microarray platform. To address this limitation, we developed a gene-centered approach to analyze differential expression across thousands of gene expression experiments and created the CO-Regulation Database (CORD) to determine which genes are correlated with a queried gene.ResultsUsing the GEO and ArrayExpress database, we analyzed over 120,000 group by group experiments from gene microarrays to determine the correlating genes for over 30,000 different genes or hypothesized genes. CORD output data is presented for sample queries with focus on genes with well-known interaction networks including p16 (CDKN2A), vimentin (VIM), MyoD (MYOD1). CDKN2A, VIM, and MYOD1 all displayed gene correlations consistent with known interacting genes.ConclusionsWe developed a facile, web-enabled program to determine gene-gene correlations across different gene expression microarray platforms. Using well-characterized genes, we illustrate how CORD's identification of co-expressed genes contributes to a better understanding a gene's potential function. The website is found at http://cord-db.org.