100+ datasets found

f
Table_5_Testing Proximity of Genomic Regions to Transcription Start Sites...
frontiersin.figshare.com
xlsx
Updated Jun 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher Lee; Kai Wang; Tingting Qin; Maureen A. Sartor (2023). Table_5_Testing Proximity of Genomic Regions to Transcription Start Sites and Enhancers Complements Gene Set Enrichment Testing.xlsx [Dataset]. http://doi.org/10.3389/fgene.2020.00199.s006
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2020.00199.s006
Dataset updated
Jun 9, 2023
Dataset provided by
Frontiers
Authors
Christopher Lee; Kai Wang; Tingting Qin; Maureen A. Sartor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Large sets of genomic regions are generated by the initial analysis of various genome-wide sequencing data, such as ChIP-seq and ATAC-seq experiments. Gene set enrichment (GSE) methods are commonly employed to determine the pathways associated with them. Given the pathways and other gene sets (e.g., GO terms) of significance, it is of great interest to know the extent to which each is driven by binding near transcription start sites (TSS) or near enhancers. Currently, no tool performs such an analysis. Here, we present a method that addresses this question to complement GSE methods for genomic regions. Specifically, the new method tests whether the genomic regions in a gene set are significantly closer to a TSS (or to an enhancer) than expected by chance given the total list of genomic regions, using a non-parametric test. Combining the results from a GSE test with our novel method provides additional information regarding the mode of regulation of each pathway, and additional evidence that the pathway is truly enriched. We illustrate our new method with a large set of ENCODE ChIP-seq data, using the chipenrich Bioconductor package. The results show that our method is a powerful complementary approach to help researchers interpret large sets of genomic regions.
f
STAR-NN performance comparison on four gene sets in testing dataset.
datasetcatalog.nlm.nih.gov
Updated Nov 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wu, Qing; Morrow, Eric M.; Uzun, Ece D. Gamsiz (2024). STAR-NN performance comparison on four gene sets in testing dataset. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001438268
Explore at:
Dataset updated
Nov 8, 2024
Authors
Wu, Qing; Morrow, Eric M.; Uzun, Ece D. Gamsiz
Description
Four gene sets include selected features, SFARI genes, combination of SFARI genes and selected features and full gene list. (CSV)
Table_1_SCIA: A Novel Gene Set Analysis Applicable to Data With Different...
frontiersin.figshare.com
application/cdfv2
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yiqun Li; Ying Wu; Xiaohan Zhang; Yunfan Bai; Luqman Muhammad Akthar; Xin Lu; Ming Shi; Jianxiang Zhao; Qinghua Jiang; Yu Li (2023). Table_1_SCIA: A Novel Gene Set Analysis Applicable to Data With Different Characteristics.DOC [Dataset]. http://doi.org/10.3389/fgene.2019.00598.s001
Explore at:
application/cdfv2Available download formats
Unique identifier
https://doi.org/10.3389/fgene.2019.00598.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Yiqun Li; Ying Wu; Xiaohan Zhang; Yunfan Bai; Luqman Muhammad Akthar; Xin Lu; Ming Shi; Jianxiang Zhao; Qinghua Jiang; Yu Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Gene set analysis is commonly used in functional enrichment and molecular pathway analyses. Most of the present methods are based on the competitive testing methods which assume each gene is independent of the others. However, the false discovery rates of competitive methods are amplified when they are applied to datasets with high inter-gene correlations. The self-contained testing methods could solve this problem, but there are other restrictions on data characteristics. Therefore, a statistically rigorous testing method applicable to different datasets with various complex characteristics is needed to obtain unbiased and comparable results. We propose a self-contained and competitive incorporated analysis (SCIA) to alleviate the bias caused by the limited application scope of existing gene set analysis methods. This is accomplished through a novel permutation strategy using a priori biological networks to selectively permute gene labels with different probabilities. In simulation studies, SCIA was compared with four representative analysis methods (GSEA, CAMERA, ROAST, and NES), and produced the best performance in both false discovery rate and sensitivity under most conditions with different parameter settings. Further, the KEGG pathway analysis on two real datasets of lung cancer showed that the results found by SCIA in both of the two datasets are much more than that of GSEA and most of them could be supported by literature. Overall, SCIA promisingly offers researchers more reliable and comparable results with different datasets.
Intermediate results objects for reproducing unbiased methylation gene set...
zenodo.org
application/gzip
Updated Apr 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jovana Maksimovic; Jovana Maksimovic; Alicia Oshlack; Alicia Oshlack; Belinda Phipson; Belinda Phipson (2021). Intermediate results objects for reproducing unbiased methylation gene set testing paper analyses [Dataset]. http://doi.org/10.5281/zenodo.4005288
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4005288
Dataset updated
Apr 21, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jovana Maksimovic; Jovana Maksimovic; Alicia Oshlack; Alicia Oshlack; Belinda Phipson; Belinda Phipson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Intermediate results objects that can be used to replicate the analyses presented in "Gene set enrichment analysis for genome-wide DNA methylation data" available at https://www.biorxiv.org/content/10.1101/2020.08.24.265702v1.
Instructions for how to incorporate the objects into the analysis can be found at: http://oshlacklab.com/methyl-geneset-testing/gettingStarted.html and the complete analysis code can be cloned from: https://github.com/Oshlack/methyl-geneset-testing.
f
Unfiltered GSEA results showing all GO categories tested in each cell type.
datasetcatalog.nlm.nih.gov
Updated Jun 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shu, Huan; Donnard, Elisa; Garber, Manuel (2022). Unfiltered GSEA results showing all GO categories tested in each cell type. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000280607
Explore at:
Dataset updated
Jun 8, 2022
Authors
Shu, Huan; Donnard, Elisa; Garber, Manuel
Description
NAME = GO identifier for the category tested; SIZE = Number of genes in the gene set after filtering out those genes not in the expression dataset; ES = Enrichment score for the gene set; that is, the degree to which this gene set is overrepresented at the top or bottom of the ranked list of genes in the expression dataset; NES = Normalized enrichment score; that is, the enrichment score for the gene set after it has been normalized across analyzed gene sets; NOM p-value = Nominal p value; that is, the statistical significance of the enrichment score; FDR q-value = False discovery rate; that is, the estimated probability that the normalized enrichment score represents a false positive finding. FDR q-value = False discovery rate; RANK AT MAX = The position in the ranked list at which the maximum enrichment score occurred; LEADING EDGE = Displays the three statistics used to define the leading edge subset (Tags: The percentage of gene hits before (for positive ES) or after (for negative ES) the peak in the running enrichment score; List: The percentage of genes in the ranked gene list before (for positive ES) or after (for negative ES) the peak in the running enrichment score; Signal: The enrichment signal strength that combines the two previous statistics); collapsedNAME = non-redundant category name for GO terms with high overlap in genes (see Methods); signal = text indicating the direction of the gene expression change (upreg = genes tend to be overexpressed in FMRP-KO; downreg = genes tend to be downregulated in FMRP-KO); Description = Full GO category name; core_enrichment = list of Gene IDs for the genes present in the leading edge that are annotated in the GO category tested. (TSV)
Summary statistics
figshare.com
txt
Updated Nov 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sabrina Henne (2024). Summary statistics [Dataset]. http://doi.org/10.6084/m9.figshare.27862137.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27862137.v1
Dataset updated
Nov 20, 2024
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Sabrina Henne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary statistics of:Differential expression analysis (mRNA or microRNA)Gene set analysis (mRNA or microRNA)Pathway-based MPHL PRS association test with gene expression change (mRNA and microRNA combined, split by treatment group due to size)
f
Gene set enrichment analysis of genes from black module.
datasetcatalog.nlm.nih.gov
Updated Feb 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yarmolinsky, James; Vincent, Emma E.; Pigeyre, Marie; Walker, Venexia M.; Gallinger, Steven; Moreno, Victor; Sjaarda, Jennifer; Obón-Santacana, Mireia; Amos, Christopher I.; Hampel, Heather; Tan, Vanessa Y.; Martin, Richard M.; Zheng, Wei; Paré, Guillaume; Albanes, Demetrius; Gsur, Andrea; Smith, George Davey; Díez-Obrero, Virginia; Richardson, Tom G.; Jenkins, Mark; Casey, Graham; Pai, Rish K.; Hampe, Jochen (2022). Gene set enrichment analysis of genes from black module. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000304406
Explore at:
Dataset updated
Feb 3, 2022
Authors
Yarmolinsky, James; Vincent, Emma E.; Pigeyre, Marie; Walker, Venexia M.; Gallinger, Steven; Moreno, Victor; Sjaarda, Jennifer; Obón-Santacana, Mireia; Amos, Christopher I.; Hampel, Heather; Tan, Vanessa Y.; Martin, Richard M.; Zheng, Wei; Paré, Guillaume; Albanes, Demetrius; Gsur, Andrea; Smith, George Davey; Díez-Obrero, Virginia; Richardson, Tom G.; Jenkins, Mark; Casey, Graham; Pai, Rish K.; Hampe, Jochen
Description
Caption: Category = one of 9 “major collections” included in the MSigDB, GeneSet = name of gene set as provided by MSigDB, N_genes = number of genes in gene set, N_overlap = number of genes located in the black module of the coexpression network in gene set, p = p-value (unadjusted for multiple testing), adjP = p-value (adjusted for multiple testing), genes = genes from black module of the coexpression network that overlap with gene set, link = link to further information on gene set. MSigDB, Molecular Signatures Database. (XLSX)
f
Functional enrichment analysis (genes).
datasetcatalog.nlm.nih.gov
Updated Feb 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
D’Antonio, Matteo; Matsui, Hiroko; Donovan, Margaret K. R.; Frazer, Kelly A.; Arthur, Timothy D.; D’Antonio-Chronowska, Agnieszka; Nguyen, Jennifer P. (2022). Functional enrichment analysis (genes). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000222034
Explore at:
Dataset updated
Feb 28, 2022
Authors
D’Antonio, Matteo; Matsui, Hiroko; Donovan, Margaret K. R.; Frazer, Kelly A.; Arthur, Timothy D.; D’Antonio-Chronowska, Agnieszka; Nguyen, Jennifer P.
Description
The table shows the functional enrichment analysis for genes differentially expressed between each pair of CVS tissues (S3 Table). For each gene set, we report: the tested tissues (tissue 1 and tissue 2); the gene set collection, as defined by MSigDB, the gene set name and its URL; the number of tested genes in the gene set; the average effect size for all the tested genes in the gene set and for all the other expressed genes; p-value (t-test); and FDR (Benjamini-Hochberg). Only significant gene sets are reported (FDR < 0.05). The differential expression analysis of gene sets can be found at https://doi.org/10.6084/m9.figshare.13537343. (CSV)
Data from: A hybrid gene selection approach to create the S1500+ targeted...
catalog.data.gov
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics [Dataset]. https://catalog.data.gov/dataset/a-hybrid-gene-selection-approach-to-create-the-s1500-targeted-gene-sets-for-use-in-high-th
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The U.S. Tox21 Federal collaboration, which currently quantifies the biological effects of nearly 10,000 chemicals via quantitative high-throughput screening(qHTS) in in vitro model systems, is now making an effort to incorporate gene expression profiling into the existing battery of assays. Whole transcriptome analyses performed on large numbers of samples using microarrays or RNA-Seq is currently cost-prohibitive. Accordingly, the Tox21 Program is pursuing a high-throughput transcriptomics (HTT) method that focuses on the targeted detection of gene expression for a carefully selected subset of the transcriptome that potentially can reduce the cost by a factor of 10-fold, allowing for the analysis of larger numbers of samples. To identify the optimal transcriptome subset, genes were sought that are (1) representative of the highly diverse biological space, (2) capable of serving as a proxy for expression changes in unmeasured genes, and (3) sufficient to provide coverage of well described biological pathways. A hybrid method for gene selection is presented herein that combines data-driven and knowledge-driven concepts into one cohesive method. This dataset is associated with the following publication: Mav, D., R.R. Shah, B.E. Howard, S.S. Auerbach, P.R. Bushel, J.B. Collins, D.L. Gerhold, R. Judson, A.L. Karmaus, E.A. Maull, D.L. Mendrick, B.A. Merrick, N.S. Sipes, D. Svoboda, and R.S. Paules. A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics. PLoS ONE. Public Library of Science, San Francisco, CA, USA, 13(2): 1-17, (2018).
Additional file 2 of Comprehensive enhancer-target gene assignments improve...
springernature.figshare.com
xlsx
Updated Feb 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tingting Qin; Christopher Lee; Shiting Li; Raymond G. Cavalcante; Peter Orchard; Heming Yao; Hanrui Zhang; Shuze Wang; Snehal Patil; Alan P. Boyle; Maureen A. Sartor (2024). Additional file 2 of Comprehensive enhancer-target gene assignments improve gene set level interpretation of genome-wide regulatory data [Dataset]. http://doi.org/10.6084/m9.figshare.19663872.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19663872.v1
Dataset updated
Feb 15, 2024
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Tingting Qin; Christopher Lee; Shiting Li; Raymond G. Cavalcante; Peter Orchard; Heming Yao; Hanrui Zhang; Shuze Wang; Snehal Patil; Alan P. Boyle; Maureen A. Sartor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 2: Table S1: Overview of the top 19 EnTDefs, including the ranks, enhancer/enhancer-gene link methods, and basic summary statistics. Table S2: The 31 ENCODE ChIP-seq datasets from 9 completely different cell lines and 14 completely different transcription factors. Table S3: The nine ChIA-PET datasets used for generating cell-type-specific EnTDefs (CT-EnTDefs) and number of TFs assayed by ENCODE ChIP-seq in each particular cell type, which were used to evaluate the performance of the CT-EnTDefs. Table S4: Overview of the seven independent datasets used for the comparative analysis. Table S5: ChIA-PET datasets used by “ChIA” and “Loop” methods to assign enhancer to target genes in a cell-type independent manner (general EnTDefs). Table S6: The 87 ENCODE ChIP-seq datasets used for EnTDef evaluation (evaluation ChIP-seq) (tab 1) and the TF vs. cell type matrix (tab 2). Table S7: The 13 ENCODE ChIP-seq datasets from 4 different cell lines (testing ChIP-seq).
g
New feature subset selection procedures for classification of expression...
gimi9.com
Updated Apr 1, 2002
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2002). New feature subset selection procedures for classification of expression profiles | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_new-feature-subset-selection-procedures-for-classification-of-expression-profiles/
Explore at:
Dataset updated
Apr 1, 2002
Description
Background Methods for extracting useful information from the datasets produced by microarray experiments are at present of much interest. Here we present new methods for finding gene sets that are well suited for distinguishing experiment classes, such as healthy versus diseased tissues. Our methods are based on evaluating genes in pairs and evaluating how well a pair in combination distinguishes two experiment classes. We tested the ability of our pair-based methods to select gene sets that generalize the differences between experiment classes and compared the performance relative to two standard methods. To assess the ability to generalize class differences, we studied how well the gene sets we select are suited for learning a classifier. Results We show that the gene sets selected by our methods outperform the standard methods, in some cases by a large margin, in terms of cross-validation prediction accuracy of the learned classifier. We show that on two public datasets, accurate diagnoses can be made using only 15-30 genes. Our results have implications for how to select marker genes and how many gene measurements are needed for diagnostic purposes. Conclusion When looking for differential expression between experiment classes, it may not be sufficient to look at each gene in a separate universe. Evaluating combinations of genes reveals interesting information that will not be discovered otherwise. Our results show that class prediction can be improved by taking advantage of this extra information.
Clust_100_GE_datasets
zenodo.org
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Basel Abu-Jamous; Basel Abu-Jamous; Steven Kelly; Steven Kelly (2020). Clust_100_GE_datasets [Dataset]. http://doi.org/10.5281/zenodo.1298541
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1298541
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Basel Abu-Jamous; Basel Abu-Jamous; Steven Kelly; Steven Kelly
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
100 microarray and RNA-seq gene expression datasets from five model species (human, mouse, fruit fly, arabidopsis plants, and baker's yeast). These datasets represent the benchmark set that was used to test our clust clustering method and to compare it with seven widely used clustering methods (Cross-Clustering, k-means, self-organising maps, MCL, hierarchical clustering, CLICK, and WGCNA). This data resource includes raw data files, pre-processed data files, clustering results, clustering results evaluation, and scripts.

The files are split into eight zipped parts, 100Datasets_0.zip to 100Datasets_7.zip. The contents of the three zipped files should be extracted to a single folder (e.g. 100Datasets).

Below is a thorough description of the files and folders in this data resource.

Scripts

The scripts used to apply each one of the clustering methods to each one of the 100 datasets and to evaluate their results are all included in the folder (scripts/).

Datasets and clustering results (folders starting with D)

The datasets are labelled as D001 to D100. Each dataset has two folders: D###/ and D###_Res/, where ### is the number of the dataset. The first folder only includes the raw dataset while the second folder includes the results of applying the clustering methods to that dataset. The files ending with _B.tsv include clustering results in the form of a partition matrix. The files ending with _E include metrics evaluating the clustering results. The files ending with _go and _go_E respectively include the enriched GO terms in the clustering results and evaluation metrics of these GO terms. The files ending with _REACTOME and _REACTOME_E are similar to the GO term files but for the REACTOME pathway enrichment analysis. Each of these D###_Res/ folders includes a sub-folder "ParamSweepClust" which includes the results of applying clust multiple times to the same dataset while sweeping some parameters.

Large datasets analysis results

The folder LargeDatasets/ includes data and results for what we refer to as "large" datasets. These are 19 datasets that have more than 50 samples including replicates and have not therefore been included in the set of 100 datasets. However, they fit all of the other dataset selection criteria. We have compared clust with the other clustering methods over these datasets to demonstrate that clust still outperforms other datasets over larger datasets. This folder includes folders LD001/ to LD019/ and LD001_Res/ to LD019_Res/. These have similar format and contents as the D###/ and D###_Res/ folders described above.

Simultaneous analysis of multiple datasets (folders starting with MD)

As our clust method is design to be able to extract clusters from multiple datasets simultaneously, we also tested it over multiple datasets. All folders starting with MD_ are related to "multiple datasets (MD)" results. Each MD experiment simultaneously analyses d randomly selected datasets either out of a set of 10 arabidopsis datasets or out of a set of 10 yeast datasets. For each one of the two species, all d values from 2 to 10 were tested, and at each one of these d values, 10 different runs were conducted, where at each run a different subset of d datasets is selected randomly.

The folders MD_10A and MD_10Y include the full sets of 10 arabidposis or 10 yeast datasets, respectively. Each folder with the format MD_10#_d#_Res## includes the results of applying the eight clustering methods at one of the 10 random runs of one of the selected d values. For example, the "MD_10A_d4_Res03/" folder includes the clustering results of the 3^rd random selection of 4 arabidopsis datasets (the letter A in the folder's name refers to arabidopsis).

Our clust method is applied directly over multiple datasets where each dataset is in a separate data file. Each "MD_10#_d#_Res##" folder includes these individual files in a sub-folder named "Processed_Data/". However, the other clustering methods only accept a single input data file. Therefore, the datasets are merged first before being submitted to these methods. Each "MD_10#_d#_Res##" folder includes a file "X_merged.tsv" for the merged data.

Evaluation metrics (folders starting with Metrics)

Each clustering results folder (D##_Res or MD_10#_d#_Res##) includes some clustering evaluation files ending with _E. This information is combined into tables for all datasets, and these tables appear in the folders starting with "Metrics_".

Other files and folders

The GO folder includes the reference GO term annotations for arabidopsis and yeast. Similarly, the REACTOME folder includes the reference REACTOME pathway annotations for arabidopsis and yeast. The Datasets file includes a TAB delimited table describing the 100 datasets. The SearchCriterion file includes the objective methodology of searching the NCBI database to select these 100 datasets. The Specials file includes some special considerations for couple of datasets that differ a bit from what is described in the SearchCriterion file. The Norm### files and the files in the Reps/ folder describe normalisation codes and replicate structures for the datasets and were fed to the clust method as inputs. The Plots/ folder includes plots of the gene expression profiles of the individual genes in the clusters generated by each one of the eight methods over each one of the 100 datasets. Only up to 14 clusters per method are plotted.
e
Altered Hematopoietic Cell Gene Expression Precedes Development of...
ebi.ac.uk
Updated Dec 11, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ravi Bhatia; Smita Bhatia; Liang Li; Sierra Li (2011). Altered Hematopoietic Cell Gene Expression Precedes Development of Therapy-Related Myelodysplasia and Identifies Patients at Risk [Dataset]. https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-23025
Explore at:
Dataset updated
Dec 11, 2011
Authors
Ravi Bhatia; Smita Bhatia; Liang Li; Sierra Li
Description
Therapy-related myelodysplasia or acute myeloid leukemia (t-MDS/AML) is a lethal complication of cancer treatment. Although t-MDS/AML development is associated with known genotoxic exposures, its pathogenesis is not well understood and methods to predict risk of development of t-MDS/AML in individual cancer survivors are not available. We performed microarray analysis of gene expression in samples from patients who developed t-MDS/AML after autologous hematopoietic cell transplantation (aHCT) for Hodgkin lymphoma (HL) or non-Hodgkin lymphoma (NHL) and controls that did not develop t-MDS/AML after aHCT. CD34+ progenitor cells from peripheral blood stem cell (PBSC) samples obtained pre-aHCT from t-MDS/AML cases and matched controls, and bone marrow (BM) samples obtained at time of development of t-MDS/AML, were studied. Significant differences in gene expression were seen in PBSC obtained pre-aHCT from patients who subsequently developed t-MDS/AML compared to controls. Genetic alterations in pre-aHCT samples were related to mitochondrial function, protein synthesis, metabolic regulation and hematopoietic regulation. Progression to overt t-MDS/AML was associated with additional alterations in DNA repair and DNA-damage checkpoint genes. Altered gene expression in PBSC samples were validated in an independent group of patients. An optimal 63-gene PBSC classifier derived from the training set accurately distinguished patients who did or did not develop t-MDS/AML in the independent test set. These results indicate that genetic programs associated with t-MDS/AML are perturbed long before disease onset, and can accurately identify those at risk of developing this complication. PBSC samples obtained pre-aHCT and BM samples at the time of development of t-MDS/AML post-HCT were studied. The training set consisted of 18 patients who developed t-MDS/AML (M-bM-^@M-^]casesM-bM-^@M-^]) after aHCT, matched with 37 controls who underwent aHCT, but did not develop t-MDS/AML. One to three controls were selected per case, matched for primary diagnosis (HL/NHL), age at aHCT (M-BM-110years), and ethnicity (Caucasians, African-Americans, Hispanics, other). The length of follow-up after aHCT for controls was longer than the time to t-MDS/AML in the corresponding case. The results of the training set were validated in an independent group of 36 patients (test set) consisting of 16 cases that developed t-MDS/AML post-aHCT and 20 matched controls. In the test set, 55 PBSC samples from 18 cases and 37 matched controls were studied. BM samples from time of development of t-MDS/AML were available for 12 cases, and from 21 matched controls obtained at a comparable time from aHCT. For validation, 36 PBSC samples from 16 cases and 20 matched controls were studied. All samples had been cryopreserved as mononuclear cells. After thawing, samples were labeled with anti-CD34-APC and anti-CD45-FITC and CD34+CD45dim cells were selected using flow cytometry. Total RNA was extracted using the RNeasy kit. RNA from 1000 cells was amplified and labeled using GeneChipM-BM-. Two-Cycle Target Labeling and Control Reagents from Affymetrix. 15 M-BM-5g of cRNA each was hybridized to Affymetrix HG U133 plus 2.0 Arrays. Microarray data were analyzed using R (version 2.9) with genomic analysis packages from Bioconductor (version 2.4). Data for PBSC and BM samples were normalized separately using robust multiarray averages with consideration of GC content (GCRMA). Probesets with low expression or variability were filtered. Expression of genes represented by multiple probesets was set as the median of the probesets. Using conditional logistic model (CLM) to retain matching between cases and controls, we analyzed the magnitude of association [expressed as odds ratio (OR)] between t-MDS/AML and i) gene expression levels in PBSC at the pre-aHCT time point; ii) gene expression levels in BM at time of t-MDS/AML; and iii) change of expression of individual genes from PBSC to time of t-MDS/AML. False discovery rate (FDR) was applied to adjust for multiple testing. Gene set enrichment analysis (GSEA) was performed on ranked lists of genes differentially expressed between cases and controls. Where multiple significant gene sets were related to each other, analysis was performed to identify a subset of common enriched genes. Average gene expression was calculated for each set and heatmaps plotted to show the contrasts between cases and controls. Gene Ontology (GO) and pathway analysis was performed using DAVID 2008 and Ingenuity IPA 7.5 respectively, retaining genes with z-scores M-bM-^IM-%1.8 or M-bM-^IM-$-1.8, and M-bM-^IM-%1.5-fold change in OR between cases and controls. The association between gene expression in the PBSC product and subsequent development of t-MDS/AML identified in the training set was validated in an independent test set of 36 PBSC sample procured from patients who developed t-MDS/AML after aHCT (16 cases) or did not (20 controls). Pre-processing, normalization and filtering procedures for the test set were identical to the training set. Differential expression between cases and controls was analyzed using CLM. GSEA analysis was performed on the ranked list of differentially expressed genes. Prediction analysis of microarray (PAM) was used to derive a prognostic gene signature from the training set to classify patients as case or control. PAM uses the M-bM-^@M-^ earest shrunken centroidM-bM-^@M-^] approach and 10-fold cross-validation to select a parsimonious gene expression signature that can classify samples with minimal misclassification. PAM was applied to genes common to both datasets. Based on the misclassification error in cross-validation, a 63-gene signature was selected for prediction using the test data.
d
Data from: Significance Analysis of Prognostic Signatures
search.dataone.org
Updated Apr 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew H. Beck; Nicholas W. Knoblauch; Marco M. Hefti; Jennifer Kaplan; Stuart J. Schnitt; Aedin C. Culhane; Markus S. Schroeder; John Quackenbush; Benjamin Haibe-Kains; Thomas Risch (2025). Significance Analysis of Prognostic Signatures [Dataset]. http://doi.org/10.5061/dryad.mk471
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.mk471
Dataset updated
Apr 19, 2025
Dataset provided by
Dryad Digital Repository
Authors
Andrew H. Beck; Nicholas W. Knoblauch; Marco M. Hefti; Jennifer Kaplan; Stuart J. Schnitt; Aedin C. Culhane; Markus S. Schroeder; John Quackenbush; Benjamin Haibe-Kains; Thomas Risch
Time period covered
Jan 25, 2013
Description
A major goal in translational cancer research is to identify biological signatures driving cancer progression and metastasis. A common technique applied in genomics research is to cluster patients using gene expression data from a candidate prognostic gene set, and if the resulting clusters show statistically significant outcome stratification, to associate the gene set with prognosis, suggesting its biological and clinical importance. Recent work has questioned the validity of this approach by showing in several breast cancer data sets that "random" gene sets tend to cluster patients into prognostically variable subgroups. This work suggests that new rigorous statistical methods are needed to identify biologically informative prognostic gene sets. To address this problem, we developed Significance Analysis of Prognostic Signatures (SAPS) which integrates standard prognostic tests with a new prognostic significance test based on stratifying patients into prognostic subtypes with random ...
f
Roast gene-set enrichment test for the endotoxin tolerance signature.
datasetcatalog.nlm.nih.gov
Updated Oct 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Blimkie, Travis; Lee, Amy Huei-Yi; Falsafi, Reza; Hancock, Robert E. W.; Sedivy-Haley, Katharine (2022). Roast gene-set enrichment test for the endotoxin tolerance signature. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000262236
Explore at:
Dataset updated
Oct 14, 2022
Authors
Blimkie, Travis; Lee, Amy Huei-Yi; Falsafi, Reza; Hancock, Robert E. W.; Sedivy-Haley, Katharine
Description
Roast gene-set enrichment test for the endotoxin tolerance signature.
f
Results of gene set enrichment analysis of genes involved in reproduction.
datasetcatalog.nlm.nih.gov
Updated May 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Koch, Eva L.; Guillaume, Frédéric (2020). Results of gene set enrichment analysis of genes involved in reproduction. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000479561
Explore at:
Dataset updated
May 7, 2020
Authors
Koch, Eva L.; Guillaume, Frédéric
Description
Gene set enrichment test was conducted in edgeR using the roast function [70]. Prop.Down and Prop.Up give the proportion of genes that are down- and up-regulated. The direction of change is determined from the significance of changes in each direction and is shown in the Direction column. The P-value provides evidence for whether the majority of genes in the set are DE in the specified direction. The genes (N = 56) were selected based on [76–79].
Data from: Differential abundance and gene set enrichment in plasma of...
zenodo.org
txt
Updated May 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Annelien Morlion; Annelien Morlion (2023). Differential abundance and gene set enrichment in plasma of cancer patients versus controls [Dataset]. http://doi.org/10.5281/zenodo.7953708
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7953708
Dataset updated
May 22, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Annelien Morlion; Annelien Morlion
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DESeq2 differential abundance output for genes with q < 0.05 and |log₂ fold change| > 1 in cancer vs control plasma samples:

differentialabundance_pancancer.txt: tables with differentially abundant genes (|log2(fold change)|>1 and adjusted p>0.05) per cancer-control comparison (cancertype) in a pan-cancer plasma sample cohort (25 locally advanced to metastatic cancer types - 7 or 8 patients per type - vs 8 cancer-free control donors)

differentialabundance_threecancer.txt: tables with differentially abundant genes (|log2(fold change)|>1 and adjusted p>0.05) per cancer-control comparison (cancertype) in the three-cancer plasma cohort (ovarian, prostate and uterine cancer - 11 or 12 patients per type - vs 20 cancer-free controls)

Gene_id: Ensembl gene id (GChr38 v91); baseMean: mean of normalized counts for all samples; log2FoldChange: log2 fold change for cancer vs control; lfcSE: standard error for cancer vs control; stat: Wald statistic for cancer vs control; pvalue: Wald test p-value for cancer vs control; padj: Benjamini-Hochberg corrected p-value; cancertype: respective cancer type abbreviation of cancer patient plasma samples that were compared to plasma samples of controls.

Gene set enrichment analyses based on fold change ranked gene lists (cancer versus control) - results obtained with fgea (v1.22.0):

customgenesets.txt: custom gene set lists based on RNA Atlas (&Human Protein Atlas), Tabula Sapiens, GTEX, TCGA data.

Reference: reference to create gene sets (including RNA Atlas, Human Protein Atlas, Tabula Sapiens, GTEX, and TCGA); set: set name; genes: gene list for set

GSEA_pancancer.txt & GSEA_threecancer.txt: gene set enrichment results based on fold change ranked gene list (specific cancer type versus controls) in pan-cancer cohort and three-cancer cohort, respectively

Sets: gene set category (HALLMARK and KEGG: Hallmark and Canonical Pathways gene sets obtained from MSigDB (v2022.1); CUSTOM: custom tissue and cell type specific gene sets as defined in customgenesets.txt); pathway: pathway/set name; pval: enrichment p-value; padj: Benjamini-Hochberg adjusted p-value; log2err: expected error for the standard deviation of the P-value logarithm; ES: enrichment score, same as in Broad GSEA implementation; NES: enrichment score normalized to mean enrichment of random samples of the same size; size: size of the pathway after removing genes without statistic values; leadingEdge: leading edge genes that drive the enrichment; Disease: respective cancer type abbreviation of cancer patient plasma samples that were compared to plasma samples of controls
WGCNA results on chemical perturbagens tested with the S1500+ sentinel gene...
zenodo.org
application/gzip
Updated Aug 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Meier; Matthew Meier (2025). WGCNA results on chemical perturbagens tested with the S1500+ sentinel gene set [Dataset]. http://doi.org/10.5281/zenodo.16944904
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.16944904
Dataset updated
Aug 25, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Matthew Meier; Matthew Meier
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
WGCNA results (wgcna_results.tar.gz) on chemical perturbagens tested with the S1500+ sentinel gene set.

This version also includes the raw, intermediate, and processed files (wgcna_data.tar.gz) for:

gene expression count matrices downloaded from GEO

metadata downloaded from GEO

gene information on the TempO-Seq S1500+ panel
f
Summary of gene and pathway level extrapolation performance of the S1500 and...
datasetcatalog.nlm.nih.gov
Updated Feb 20, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Svoboda, Daniel; Howard, Brian E.; Paules, Richard S.; Bushel, Pierre R.; Merrick, B. Alex; Collins, Jennifer B.; Auerbach, Scott S.; Gerhold, David L.; Mav, Deepak; Shah, Ruchir R.; Sipes, Nisha S.; Maull, Elizabeth A.; Mendrick, Donna L.; Karmaus, Agnes L.; Judson, Richard S. (2018). Summary of gene and pathway level extrapolation performance of the S1500 and S1500+ gene sets using independent test set. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000681261
Explore at:
Dataset updated
Feb 20, 2018
Authors
Svoboda, Daniel; Howard, Brian E.; Paules, Richard S.; Bushel, Pierre R.; Merrick, B. Alex; Collins, Jennifer B.; Auerbach, Scott S.; Gerhold, David L.; Mav, Deepak; Shah, Ruchir R.; Sipes, Nisha S.; Maull, Elizabeth A.; Mendrick, Donna L.; Karmaus, Agnes L.; Judson, Richard S.
Description
Summary of gene and pathway level extrapolation performance of the S1500 and S1500+ gene sets using independent test set.
f
Gene Sets Analysis Results for Selected Gene Ontology (GO) Terms.
datasetcatalog.nlm.nih.gov
Updated Apr 1, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Blangero, John; Duggirala, Ravindranath; Lehman, Donna M.; Göring, Harald H. H.; Curran, Joanne E.; Abdul-Ghani, Muhammad A.; DeFronzo, Ralph A.; Carless, Melanie; Farook, Vidya S.; Norton, Luke; Arya, Rector; Fourcaudot, Marcel; Hu, Shirley L.; Chittoor, Geetha; Puppala, Sobha; Dyer, Thomas D.; Cromack, Douglas T.; Winnier, Deidre A.; Kumar, Satish; Jenkinson, Christopher P.; Coletta, Dawn K.; Tripathy, Devjit (2015). Gene Sets Analysis Results for Selected Gene Ontology (GO) Terms. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001910760
Explore at:
Dataset updated
Apr 1, 2015
Authors
Blangero, John; Duggirala, Ravindranath; Lehman, Donna M.; Göring, Harald H. H.; Curran, Joanne E.; Abdul-Ghani, Muhammad A.; DeFronzo, Ralph A.; Carless, Melanie; Farook, Vidya S.; Norton, Luke; Arya, Rector; Fourcaudot, Marcel; Hu, Shirley L.; Chittoor, Geetha; Puppala, Sobha; Dyer, Thomas D.; Cromack, Douglas T.; Winnier, Deidre A.; Kumar, Satish; Jenkinson, Christopher P.; Coletta, Dawn K.; Tripathy, Devjit
Description
The final list of 29 most highly significant adipose genes was analyzed using Gene Profiler to detect enrichment of genes in various biological functional categories. A global set of all genes was used as the control group. P values are shown for several relevant GO categories containing ADH1A and ADH1B. Several categories contained only these two genes. The genes were analyzed as an ordered list based on FDR values and numbers of shared traits. Only manually curated data was used for the analysis to avoid potentially spurious results. Significance for these GO terms was increased approximately 10-fold when analysis was restricted to the two genes ADH1A and ADH1B. Essentially identical results were obtained using the WebGestalt analytical engine. All significance values were corrected for multiple testing.Gene Sets Analysis Results for Selected Gene Ontology (GO) Terms.

Facebook

Twitter

Click to copy link

Link copied

Cite

Christopher Lee; Kai Wang; Tingting Qin; Maureen A. Sartor (2023). Table_5_Testing Proximity of Genomic Regions to Transcription Start Sites and Enhancers Complements Gene Set Enrichment Testing.xlsx [Dataset]. http://doi.org/10.3389/fgene.2020.00199.s006

Table_5_Testing Proximity of Genomic Regions to Transcription Start Sites and Enhancers Complements Gene Set Enrichment Testing.xlsx

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.3389/fgene.2020.00199.s006

Dataset updated

Jun 9, 2023

Dataset provided by

Frontiers

Authors

Christopher Lee; Kai Wang; Tingting Qin; Maureen A. Sartor

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Large sets of genomic regions are generated by the initial analysis of various genome-wide sequencing data, such as ChIP-seq and ATAC-seq experiments. Gene set enrichment (GSE) methods are commonly employed to determine the pathways associated with them. Given the pathways and other gene sets (e.g., GO terms) of significance, it is of great interest to know the extent to which each is driven by binding near transcription start sites (TSS) or near enhancers. Currently, no tool performs such an analysis. Here, we present a method that addresses this question to complement GSE methods for genomic regions. Specifically, the new method tests whether the genomic regions in a gene set are significantly closer to a TSS (or to an enhancer) than expected by chance given the total list of genomic regions, using a non-parametric test. Combining the results from a GSE test with our novel method provides additional information regarding the mode of regulation of each pathway, and additional evidence that the pathway is truly enriched. We illustrate our new method with a large set of ENCODE ChIP-seq data, using the chipenrich Bioconductor package. The results show that our method is a powerful complementary approach to help researchers interpret large sets of genomic regions.

Clear search

Close search

Google apps

Main menu

Table_5_Testing Proximity of Genomic Regions to Transcription Start Sites...

STAR-NN performance comparison on four gene sets in testing dataset.

Table_1_SCIA: A Novel Gene Set Analysis Applicable to Data With Different...

Intermediate results objects for reproducing unbiased methylation gene set...

Unfiltered GSEA results showing all GO categories tested in each cell type.

Summary statistics

Gene set enrichment analysis of genes from black module.

Functional enrichment analysis (genes).

Data from: A hybrid gene selection approach to create the S1500+ targeted...

Additional file 2 of Comprehensive enhancer-target gene assignments improve...

New feature subset selection procedures for classification of expression...

Clust_100_GE_datasets

Altered Hematopoietic Cell Gene Expression Precedes Development of...

Data from: Significance Analysis of Prognostic Signatures

Roast gene-set enrichment test for the endotoxin tolerance signature.

Results of gene set enrichment analysis of genes involved in reproduction.

Data from: Differential abundance and gene set enrichment in plasma of...

WGCNA results on chemical perturbagens tested with the S1500+ sentinel gene...

Summary of gene and pathway level extrapolation performance of the S1500 and...

Gene Sets Analysis Results for Selected Gene Ontology (GO) Terms.

Table_5_Testing Proximity of Genomic Regions to Transcription Start Sites and Enhancers Complements Gene Set Enrichment Testing.xlsxSee More Versions

Table_5_Testing Proximity of Genomic Regions to Transcription Start Sites and Enhancers Complements Gene Set Enrichment Testing.xlsx