100+ datasets found
  1. f

    Table_5_Testing Proximity of Genomic Regions to Transcription Start Sites...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Lee; Kai Wang; Tingting Qin; Maureen A. Sartor (2023). Table_5_Testing Proximity of Genomic Regions to Transcription Start Sites and Enhancers Complements Gene Set Enrichment Testing.xlsx [Dataset]. http://doi.org/10.3389/fgene.2020.00199.s006
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    Frontiers
    Authors
    Christopher Lee; Kai Wang; Tingting Qin; Maureen A. Sartor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Large sets of genomic regions are generated by the initial analysis of various genome-wide sequencing data, such as ChIP-seq and ATAC-seq experiments. Gene set enrichment (GSE) methods are commonly employed to determine the pathways associated with them. Given the pathways and other gene sets (e.g., GO terms) of significance, it is of great interest to know the extent to which each is driven by binding near transcription start sites (TSS) or near enhancers. Currently, no tool performs such an analysis. Here, we present a method that addresses this question to complement GSE methods for genomic regions. Specifically, the new method tests whether the genomic regions in a gene set are significantly closer to a TSS (or to an enhancer) than expected by chance given the total list of genomic regions, using a non-parametric test. Combining the results from a GSE test with our novel method provides additional information regarding the mode of regulation of each pathway, and additional evidence that the pathway is truly enriched. We illustrate our new method with a large set of ENCODE ChIP-seq data, using the chipenrich Bioconductor package. The results show that our method is a powerful complementary approach to help researchers interpret large sets of genomic regions.

  2. Z

    QuantSeq 3' mRNA-Seq testing data set based on samples from NCBI Bioproject...

    • data.niaid.nih.gov
    Updated Jan 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lähnemann, David (2024). QuantSeq 3' mRNA-Seq testing data set based on samples from NCBI Bioproject PRJNA509074 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10572745
    Explore at:
    Dataset updated
    Jan 26, 2024
    Dataset authored and provided by
    Lähnemann, David
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As described in the contained README.md file, this is a QuantSeq 3’ mRNA-Seq testing dataset that has been derived from data published in this study:Corley, S.M., Troy, N.M., Bosco, A. et al. QuantSeq. 3′ Sequencing combined with Salmon provides a fast, reliable approach for high throughput RNA expression analysis. Sci Rep 9, 18895 (2019). https://doi.org/10.1038/s41598-019-55434-x

    In short, the respective dataset has been downloaded, mapped against the human transcriptome, and reduced to the reads that map to two smaller gene sets. One of the gene sets was found as enriched for differentially expressed genes in the original publication, the other gene set wasn't. This leads to a reasonably small testing dataset, that nevertheless has useful expected results. For details of the data generation, see the contained README.md file and the self-contained workflow used for generating it:

    https://doi.org/10.5281/zenodo.10572324The MSigDB gene sets are used according to their Creative Commons Attribution 4.0 International License, which is given here:https://www.gsea-msigdb.org/gsea/msigdb_license_terms.jsp

  3. H

    Sparse Linear Discriminant Analysis for Simultaneous Gene Set/Pathway...

    • dataverse.harvard.edu
    Updated Oct 12, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Wu; Lingsong Zhang; Zhaoxi Wang; David Christiani; Xihong Lin (2010). Sparse Linear Discriminant Analysis for Simultaneous Gene Set/Pathway Significance Test and Gene Selection [Dataset]. http://doi.org/10.7910/DVN/FBPZQC
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 12, 2010
    Dataset provided by
    Harvard Dataverse
    Authors
    Michael Wu; Lingsong Zhang; Zhaoxi Wang; David Christiani; Xihong Lin
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    MOTIVATION: Pathway and gene set based approaches for the analysis of gene expression profiling experiments have become increasingly popular for addressing problems associated with individual gene analysis. Since most genes are not differently expressed, existing gene set tests, which consider all the genes within a gene set, are subject to considerable noise and power loss, a concern exacerbated in studies in which the degree of differential expression is moderate for truly differentially expressed genes. Fora significantly differentially expressed pathway, it is also of substantial interest to select important genes that drive the differential expression of the pathway. METHODS: We develop a unified framework to jointly test the significance of a pathway and to select a subset of genes that drive the significant pathway effect. To achieve dimension reduction and gene selection, we decompose each gene pathway into a single score by using a regularized form of linear discriminant analysis, called sparse linear discriminant analysis (sLDA). Testing for the significance of the pathway effect proceeds via permutation of the sLDA score. The sLDA based test is compared to competing approaches with simulations and two application: a study on the effect of metal fume exposure on immune response and a study of gene expression profiles among Type II Diabetes patients. RESULTS: Our results show that sLDA based testing provides a powerful approach to test for the significance of a differentially expressed pathway and gene selection. AVAILABILITY: An implementation of the proposed sLDA based pathway test in the R statistical computing environment is available at http://www.hsph.harvard.edu/~mwu/software/ CONTACT: xlin@hsph.harvard.edu

  4. f

    Recurrent functional misinterpretation of RNA-seq data caused by...

    • figshare.com
    • plos.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shir Mandelboum; Zohar Manber; Orna Elroy-Stein; Ran Elkon (2023). Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias [Dataset]. http://doi.org/10.1371/journal.pbio.3000481
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS Biology
    Authors
    Shir Mandelboum; Zohar Manber; Orna Elroy-Stein; Ran Elkon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is a critical step in RNA sequencing (RNA-seq) analysis, aiming to remove systematic effects from the data to ensure that technical biases have minimal impact on the results. Analyzing numerous RNA-seq datasets, we detected a prevalent sample-specific length effect that leads to a strong association between gene length and fold-change estimates between samples. This stochastic sample-specific effect is not corrected by common normalization methods, including reads per kilobase of transcript length per million reads (RPKM), Trimmed Mean of M values (TMM), relative log expression (RLE), and quantile and upper-quartile normalization. Importantly, we demonstrate that this bias causes recurrent false positive calls by gene-set enrichment analysis (GSEA) methods, thereby leading to frequent functional misinterpretation of the data. Gene sets characterized by markedly short genes (e.g., ribosomal protein genes) or long genes (e.g., extracellular matrix genes) are particularly prone to such false calls. This sample-specific length bias is effectively removed by the conditional quantile normalization (cqn) and EDASeq methods, which allow the integration of gene length as a sample-specific covariate. Consequently, using these normalization methods led to substantial reduction in GSEA false results while retaining true ones. In addition, we found that application of gene-set tests that take into account gene–gene correlations attenuates false positive rates caused by the length bias, but statistical power is reduced as well. Our results advocate the inspection and correction of sample-specific length biases as default steps in RNA-seq analysis pipelines and reiterate the need to account for intergene correlations when performing gene-set enrichment tests to lessen false interpretation of transcriptomic data.

  5. f

    Table_6_Testing Proximity of Genomic Regions to Transcription Start Sites...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Lee; Kai Wang; Tingting Qin; Maureen A. Sartor (2023). Table_6_Testing Proximity of Genomic Regions to Transcription Start Sites and Enhancers Complements Gene Set Enrichment Testing.xlsx [Dataset]. http://doi.org/10.3389/fgene.2020.00199.s007
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Christopher Lee; Kai Wang; Tingting Qin; Maureen A. Sartor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Large sets of genomic regions are generated by the initial analysis of various genome-wide sequencing data, such as ChIP-seq and ATAC-seq experiments. Gene set enrichment (GSE) methods are commonly employed to determine the pathways associated with them. Given the pathways and other gene sets (e.g., GO terms) of significance, it is of great interest to know the extent to which each is driven by binding near transcription start sites (TSS) or near enhancers. Currently, no tool performs such an analysis. Here, we present a method that addresses this question to complement GSE methods for genomic regions. Specifically, the new method tests whether the genomic regions in a gene set are significantly closer to a TSS (or to an enhancer) than expected by chance given the total list of genomic regions, using a non-parametric test. Combining the results from a GSE test with our novel method provides additional information regarding the mode of regulation of each pathway, and additional evidence that the pathway is truly enriched. We illustrate our new method with a large set of ENCODE ChIP-seq data, using the chipenrich Bioconductor package. The results show that our method is a powerful complementary approach to help researchers interpret large sets of genomic regions.

  6. d

    Data from: Significance Analysis of Prognostic Signatures

    • search.dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Apr 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew H. Beck; Nicholas W. Knoblauch; Marco M. Hefti; Jennifer Kaplan; Stuart J. Schnitt; Aedin C. Culhane; Markus S. Schroeder; John Quackenbush; Benjamin Haibe-Kains; Thomas Risch (2025). Significance Analysis of Prognostic Signatures [Dataset]. http://doi.org/10.5061/dryad.mk471
    Explore at:
    Dataset updated
    Apr 19, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Andrew H. Beck; Nicholas W. Knoblauch; Marco M. Hefti; Jennifer Kaplan; Stuart J. Schnitt; Aedin C. Culhane; Markus S. Schroeder; John Quackenbush; Benjamin Haibe-Kains; Thomas Risch
    Time period covered
    Jan 25, 2013
    Description

    A major goal in translational cancer research is to identify biological signatures driving cancer progression and metastasis. A common technique applied in genomics research is to cluster patients using gene expression data from a candidate prognostic gene set, and if the resulting clusters show statistically significant outcome stratification, to associate the gene set with prognosis, suggesting its biological and clinical importance. Recent work has questioned the validity of this approach by showing in several breast cancer data sets that "random" gene sets tend to cluster patients into prognostically variable subgroups. This work suggests that new rigorous statistical methods are needed to identify biologically informative prognostic gene sets. To address this problem, we developed Significance Analysis of Prognostic Signatures (SAPS) which integrates standard prognostic tests with a new prognostic significance test based on stratifying patients into prognostic subtypes with random ...

  7. f

    DataSheet_1_An Unbiased Machine Learning Exploration Reveals Gene Sets...

    • figshare.com
    • frontiersin.figshare.com
    docx
    Updated Jun 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qiang Fu; Divyansh Agarwal; Kevin Deng; Rudy Matheson; Hongji Yang; Liang Wei; Qing Ran; Shaoping Deng; James F. Markmann (2023). DataSheet_1_An Unbiased Machine Learning Exploration Reveals Gene Sets Predictive of Allograft Tolerance After Kidney Transplantation.docx [Dataset]. http://doi.org/10.3389/fimmu.2021.695806.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    Frontiers
    Authors
    Qiang Fu; Divyansh Agarwal; Kevin Deng; Rudy Matheson; Hongji Yang; Liang Wei; Qing Ran; Shaoping Deng; James F. Markmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Efforts at finding potential biomarkers of tolerance after kidney transplantation have been hindered by limited sample size, as well as the complicated mechanisms underlying tolerance and the potential risk of rejection after immunosuppressant withdrawal. In this work, three different publicly available genome-wide expression data sets of peripheral blood lymphocyte (PBL) from 63 tolerant patients were used to compare 14 different machine learning models for their ability to predict spontaneous kidney graft tolerance. We found that the Best Subset Selection (BSS) regression approach was the most powerful with a sensitivity of 91.7% and a specificity of 93.8% in the test group, and a specificity of 86.1% and a sensitivity of 80% in the validation group. A feature set with five genes (HLA-DOA, TCL1A, EBF1, CD79B, and PNOC) was identified using the BSS model. EBF1 downregulation was also an independent factor predictive of graft rejection and graft loss. An AUC value of 84.4% was achieved using the two-gene signature (EBF1 and HLA-DOA) as an input to our classifier. Overall, our systematic machine learning exploration suggests novel biological targets that might affect tolerance to renal allografts, and provides clinical insights that can potentially guide patient selection for immunosuppressant withdrawal.

  8. u

    Data from: Diaphorina citri Official Gene Set v1.0

    • agdatacommons.nal.usda.gov
    application/x-gzip
    Updated Feb 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Surya Saha; Prashant Hosmani; Krystal Villalobos-Ayala; Sherry Miller; Teresa D. Shippy; Mirella Flores; Andrew J. Rosendale; Chris Cordola; Tracey J. Bell; Hannah Mann; Gabe DeAvila; Daniel DeAvila; Zachary Moore; Kyle Buller; Kathryn Ciolkevich; Samantha Nandyal; Robert Mahoney; Joshua Voorhis; Megan E. Dunlevy; David W. Farrow; David Hunter; Taylar Morgan; Kayla Shore; Victoria Guzman; Allison Izsak; Danielle Dixon; Andrew Cridge; Liliana Cano; Xiaolong Cao; Haobo Jiang; Nan Leng; Shannon Johnson; Brandi Cantarel; Stephen Richards; Adam English; Robert Shatters; Christopher Childers; Mei-Ju Chen; Wayne B. Hunter; Michelle Cilia; Lukas A. Mueller; Monica Munoz-Torres; David R. Nelson; Monica F. Poelchau; Joshua B. Benoit; Helen Wiersma-Koch; Tom D'Elia; Susan Brown (2024). Diaphorina citri Official Gene Set v1.0 [Dataset]. http://doi.org/10.15482/USDA.ADC/1345524
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Feb 8, 2024
    Dataset provided by
    Ag Data Commons
    Authors
    Surya Saha; Prashant Hosmani; Krystal Villalobos-Ayala; Sherry Miller; Teresa D. Shippy; Mirella Flores; Andrew J. Rosendale; Chris Cordola; Tracey J. Bell; Hannah Mann; Gabe DeAvila; Daniel DeAvila; Zachary Moore; Kyle Buller; Kathryn Ciolkevich; Samantha Nandyal; Robert Mahoney; Joshua Voorhis; Megan E. Dunlevy; David W. Farrow; David Hunter; Taylar Morgan; Kayla Shore; Victoria Guzman; Allison Izsak; Danielle Dixon; Andrew Cridge; Liliana Cano; Xiaolong Cao; Haobo Jiang; Nan Leng; Shannon Johnson; Brandi Cantarel; Stephen Richards; Adam English; Robert Shatters; Christopher Childers; Mei-Ju Chen; Wayne B. Hunter; Michelle Cilia; Lukas A. Mueller; Monica Munoz-Torres; David R. Nelson; Monica F. Poelchau; Joshua B. Benoit; Helen Wiersma-Koch; Tom D'Elia; Susan Brown
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the pathogen associated with citrus Huanglongbing (HLB, citrus greening). HLB threatens citrus production worldwide. Suppression or reduction of the insect vector using chemical insecticides has been the primary method to inhibit the spread of citrus greening disease. Accurate structural and functional annotation of the Asian citrus psyllid genome, as well as a clear understanding of the interactions between the insect and CLas, are required for development of new molecular-based HLB control methods. A draft assembly of the D. citri genome has been generated and annotated with automated pipelines. However, knowledge transfer from well-curated reference genomes such as that of Drosophila melanogaster to newly sequenced ones is challenging due to the complexity and diversity of insect genomes. To identify and improve gene models as potential targets for pest control, we manually curated several gene families with a focus on genes that have key functional roles in D. citri biology and CLas interactions. This community effort produced 530 manually curated gene models across developmental, physiological, RNAi regulatory, and immunity-related pathways. As previously shown in the pea aphid, RNAi machinery genes putatively involved in the microRNA pathway have been specifically duplicated. A comprehensive transcriptome enabled us to identify a number of gene families that are either missing or misassembled in the draft genome. In order to develop biocuration as a training experience, we included undergraduate and graduate students from multiple institutions, as well as experienced annotators from the insect genomics research community. The resulting gene set (OGS v1.0) combines both automatically predicted and manually curated gene models. This project was funded by the U.S. Department of Agriculture under the DEVELOPING AN INFRASTRUCTURE AND PRODUCT TEST PIPELINE TO DELIVER NOVEL THERAPIES FOR CITRUS GREENING DISEASE grant. This Official Gene Set was generated as a merge of NCBI's Diaphorina citri Annotation Release 100 and a gff3 file resulting from manual curation efforts of the Diaphorina citri annotation community in the Apollo software (Apollo URL: https://apollo.nal.usda.gov/diacit/jbrowse/). Initially, QC of the manually curated genes was performed using the NAL's QC prototype software (description is available here: https://github.com/NAL-i5K/I5KNAL_OGS/wiki/QC-phase; software is available on request). Then, the cleaned manual annotations were merged with the protein-coding genes from the NCBI Diaphorina citri Annotation Release 100 using the NAL's Merge prototype software (description is available here:https://github.com/NAL-i5K/I5KNAL_OGS/wiki/Merge-phase; software is available on request). Non-coding RNAs from the NCBI Diaphorina citri Annotation Release 100 were added to the OGS after this merge. New consortium IDs for the OGS were generated, but Dbxref attributes referring to the original NCBI accessions were maintained when the model was not altered manually. CDS sequences for all protein-coding models, and protein and rna sequences from manually curated models were generated from the OGS gff3 file using the NAL's gff3_to_fasta.py program (available here: https://github.com/NAL-i5K/GFF3toolkit) and the underlying genome sequence. All other sequences were derived from NCBI's Diaphorina citri Annotation Release 100, primarily because some protein and rna sequences predicted by NCBI contain additional sequence not present in the genome sequence. Note and exception attributes from NCBI were ported to the OGS gff3 file when sequence not derived from the genome sequence was used for the final model. Files included in this Official Gene Set:

    Gff3 file: Dcitr_OGSv1.0.gff3

    Protein fasta: Dcitr_OGSv1.0_pep.fa

    RNA fasta: Dcitr_OGSv1.0_rna.fa

    CDS fasta: Dcitr_OGSv1.0_cds.fa

    Mapping file describing the changes between the original NCBI annotations and the OGS: Dcitr_NCBI_to_OGSv1.0_id_mapFile.txt Resources in this dataset:Resource Title: Diaphorina citri Official Gene Set v1.0. File Name: Dcitr_OGSv1.0.tar.gzResource Description: Files included in this Official Gene Set:

    Gff3 file: Dcitr_OGSv1.0.gff3

    Protein fasta: Dcitr_OGSv1.0_pep.fa

    RNA fasta: Dcitr_OGSv1.0_rna.fa

    CDS fasta: Dcitr_OGSv1.0_cds.fa

    Mapping file describing the changes between the original NCBI annotations and the OGS: Dcitr_NCBI_to_OGSv1.0_id_mapFile.txt

    Resource Title: Curation workflow. File Name: Workflow_Fig3.png

  9. n

    GeneSigDB

    • neuinfo.org
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). GeneSigDB [Dataset]. http://identifiers.org/RRID:SCR_013275
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database of traceable, standardized, annotated gene signatures which have been manually curated from publications that are indexed in PubMed. The Advanced Gene Search will perform a One-tailed Fisher Exact Test (which is equivalent to Hypergeometric Distribution) to test if your gene list is over-represented in any gene signature in GeneSigDB. Gene expression studies typically result in a list of genes (gene signature) which reflect the many biological pathways that are concurrently active. We have created a Gene Signature Data Base (GeneSigDB) of published gene expression signatures or gene sets which we have manually extracted from published literature. GeneSigDB was creating following a thorough search of PubMed using defined set of cancer gene signature search terms. We would be delighted to accept or update your gene signature. Please fill out the form as best you can. We will contact you when we get it and will be happy to work with you to ensure we accurately report your signature. GeneSigDB is capable of providing its functionality through a Java RESTful web service.

  10. d

    TWIS meta-analyzed summary statistics

    • search.dataone.org
    • datadryad.org
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luke Evans (2023). TWIS meta-analyzed summary statistics [Dataset]. http://doi.org/10.5061/dryad.866t1g1tw
    Explore at:
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Luke Evans
    Time period covered
    Dec 9, 2022
    Description

    It remains unknown to what extent gene-gene interactions contribute to complex traits. Here, we introduce a new approach using predicted gene expression to perform exhaustive transcriptome-wide interaction studies (TWISs) for multiple traits across all pairs of genes expressed in several tissue types. Using imputed transcriptomes, we simultaneously reduce the computational challenge and improve interpretability and statistical power. We discover and replicate several interaction associations and find several hub genes with numerous interactions. We also demonstrate that TWIS can identify novel associated genes because genes with many or strong interactions have smaller single-locus model effect sizes. Finally, we develop a method to test gene set enrichment of TWIS associations (E-TWIS), finding numerous pathways and networks enriched in interaction associations. Epistasis is likely widespread, and our procedure represents a tractable framework for beginning to explore gene interactions..., We developed Transcriptome-Wide Interaction Study (TWIS), a new method that comprehensively tests associations of all pairwise gene-gene interactions with complex traits using imputed expression. We applied the method to 12 complex traits in humans across four tissues/cross-tissue expression measures. We applied the method to multiple datasets, then meta-analyzed the results using METAL., Files are compressed using gzip. Transcriptome-wide interaction study (TWIS): code available at https://github.com/evanslm/TWIS.

  11. S

    Susceptibility Gene Detection Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Susceptibility Gene Detection Report [Dataset]. https://www.archivemarketresearch.com/reports/susceptibility-gene-detection-145029
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Jul 4, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Susceptibility Gene Detection market is experiencing robust growth, driven by advancements in genomic technologies, increasing prevalence of genetic disorders, and rising demand for personalized medicine. The market size in 2025 is estimated at $2.5 billion, exhibiting a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033. This significant expansion is fueled by several key factors. Firstly, the decreasing cost of next-generation sequencing (NGS) and other gene sequencing technologies makes susceptibility gene testing more accessible and affordable. Secondly, an increasing understanding of the genetic basis of numerous diseases is pushing healthcare providers and patients toward proactive risk assessment and preventive strategies. Finally, the development of advanced bioinformatics tools for analyzing complex genomic data facilitates faster and more accurate interpretation of test results, further driving market growth. Leading companies like Premed, Yin Feng Gene, United Gene Group, Geneis, Topgen, Sanvally, and SinoMD are actively shaping the market landscape through innovation and strategic partnerships. The market segmentation reveals a strong focus on specific disease areas with high prevalence and unmet medical needs. While precise segment breakdowns are unavailable, projected growth will likely be driven by cancer susceptibility testing, followed by cardiovascular disease and neurodegenerative disorders. Geographic expansion is another major trend, with North America and Europe currently leading, yet significant growth opportunities exist in rapidly developing economies of Asia-Pacific and Latin America. However, challenges remain, including regulatory hurdles surrounding genetic testing, ethical concerns surrounding data privacy and genetic discrimination, and the need for improved patient education and counseling regarding the interpretation and implications of test results. Overcoming these restraints will be critical to unlocking the full potential of this rapidly expanding market.

  12. T

    Talent Genetic Testing Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Apr 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Talent Genetic Testing Report [Dataset]. https://www.archivemarketresearch.com/reports/talent-genetic-testing-145014
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Apr 20, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global talent genetic testing market is experiencing robust growth, driven by advancements in genomic technologies, increasing awareness of personalized medicine, and the rising demand for predictive and preventative healthcare. While the exact market size for 2025 is not provided, considering a conservative estimate based on typical market growth in the genomics sector and assuming a moderate CAGR (let's assume 15% for illustrative purposes), a market size of approximately $2.5 billion in 2025 seems plausible. With a projected CAGR of 15%, the market is poised for significant expansion, potentially reaching $6 billion by 2033. This growth is fueled by several key factors. Firstly, the increasing affordability and accessibility of genetic testing are making it more widely available to individuals and healthcare providers. Secondly, the growing understanding of the role genetics plays in talent identification and development is driving demand among employers, sports organizations, and educational institutions. Thirdly, the development of more sophisticated and accurate genetic tests, coupled with advancements in data analytics, allows for a more nuanced and effective interpretation of genetic information related to talent potential. Furthermore, the segmentations indicate a strong presence across various applications (hospitals, clinics, diagnostic centers) and types of tests (genetic screening, gene carrier tests, reproductive genetic testing), suggesting diversified avenues for market growth. The market, however, faces certain challenges. Data privacy and ethical concerns surrounding the use of genetic information need careful consideration and robust regulatory frameworks. Additionally, the complexity of interpreting genetic data and translating it into actionable insights requires further research and development. Despite these challenges, the long-term growth trajectory remains positive, especially with ongoing innovations in gene editing, AI-driven analysis, and personalized talent development programs. The geographical distribution of the market is expected to be widespread, with North America and Europe currently holding substantial market shares, while Asia Pacific is expected to witness significant growth driven by rising disposable incomes and increasing healthcare investments. This presents exciting opportunities for existing players and new entrants to capitalize on this dynamic market landscape.

  13. Clust_100_GE_datasets

    • zenodo.org
    zip
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Basel Abu-Jamous; Basel Abu-Jamous; Steven Kelly; Steven Kelly (2020). Clust_100_GE_datasets [Dataset]. http://doi.org/10.5281/zenodo.1298541
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Basel Abu-Jamous; Basel Abu-Jamous; Steven Kelly; Steven Kelly
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    100 microarray and RNA-seq gene expression datasets from five model species (human, mouse, fruit fly, arabidopsis plants, and baker's yeast). These datasets represent the benchmark set that was used to test our clust clustering method and to compare it with seven widely used clustering methods (Cross-Clustering, k-means, self-organising maps, MCL, hierarchical clustering, CLICK, and WGCNA). This data resource includes raw data files, pre-processed data files, clustering results, clustering results evaluation, and scripts.

    The files are split into eight zipped parts, 100Datasets_0.zip to 100Datasets_7.zip. The contents of the three zipped files should be extracted to a single folder (e.g. 100Datasets).

    Below is a thorough description of the files and folders in this data resource.

    Scripts

    The scripts used to apply each one of the clustering methods to each one of the 100 datasets and to evaluate their results are all included in the folder (scripts/).

    Datasets and clustering results (folders starting with D)

    The datasets are labelled as D001 to D100. Each dataset has two folders: D###/ and D###_Res/, where ### is the number of the dataset. The first folder only includes the raw dataset while the second folder includes the results of applying the clustering methods to that dataset. The files ending with _B.tsv include clustering results in the form of a partition matrix. The files ending with _E include metrics evaluating the clustering results. The files ending with _go and _go_E respectively include the enriched GO terms in the clustering results and evaluation metrics of these GO terms. The files ending with _REACTOME and _REACTOME_E are similar to the GO term files but for the REACTOME pathway enrichment analysis. Each of these D###_Res/ folders includes a sub-folder "ParamSweepClust" which includes the results of applying clust multiple times to the same dataset while sweeping some parameters.

    Large datasets analysis results

    The folder LargeDatasets/ includes data and results for what we refer to as "large" datasets. These are 19 datasets that have more than 50 samples including replicates and have not therefore been included in the set of 100 datasets. However, they fit all of the other dataset selection criteria. We have compared clust with the other clustering methods over these datasets to demonstrate that clust still outperforms other datasets over larger datasets. This folder includes folders LD001/ to LD019/ and LD001_Res/ to LD019_Res/. These have similar format and contents as the D###/ and D###_Res/ folders described above.

    Simultaneous analysis of multiple datasets (folders starting with MD)

    As our clust method is design to be able to extract clusters from multiple datasets simultaneously, we also tested it over multiple datasets. All folders starting with MD_ are related to "multiple datasets (MD)" results. Each MD experiment simultaneously analyses d randomly selected datasets either out of a set of 10 arabidopsis datasets or out of a set of 10 yeast datasets. For each one of the two species, all d values from 2 to 10 were tested, and at each one of these d values, 10 different runs were conducted, where at each run a different subset of d datasets is selected randomly.

    The folders MD_10A and MD_10Y include the full sets of 10 arabidposis or 10 yeast datasets, respectively. Each folder with the format MD_10#_d#_Res## includes the results of applying the eight clustering methods at one of the 10 random runs of one of the selected d values. For example, the "MD_10A_d4_Res03/" folder includes the clustering results of the 3rd random selection of 4 arabidopsis datasets (the letter A in the folder's name refers to arabidopsis).

    Our clust method is applied directly over multiple datasets where each dataset is in a separate data file. Each "MD_10#_d#_Res##" folder includes these individual files in a sub-folder named "Processed_Data/". However, the other clustering methods only accept a single input data file. Therefore, the datasets are merged first before being submitted to these methods. Each "MD_10#_d#_Res##" folder includes a file "X_merged.tsv" for the merged data.

    Evaluation metrics (folders starting with Metrics)

    Each clustering results folder (D##_Res or MD_10#_d#_Res##) includes some clustering evaluation files ending with _E. This information is combined into tables for all datasets, and these tables appear in the folders starting with "Metrics_".

    Other files and folders

    The GO folder includes the reference GO term annotations for arabidopsis and yeast. Similarly, the REACTOME folder includes the reference REACTOME pathway annotations for arabidopsis and yeast. The Datasets file includes a TAB delimited table describing the 100 datasets. The SearchCriterion file includes the objective methodology of searching the NCBI database to select these 100 datasets. The Specials file includes some special considerations for couple of datasets that differ a bit from what is described in the SearchCriterion file. The Norm### files and the files in the Reps/ folder describe normalisation codes and replicate structures for the datasets and were fed to the clust method as inputs. The Plots/ folder includes plots of the gene expression profiles of the individual genes in the clusters generated by each one of the eight methods over each one of the 100 datasets. Only up to 14 clusters per method are plotted.

  14. S

    Spinal Muscular Atrophy Genetic Detection Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Spinal Muscular Atrophy Genetic Detection Report [Dataset]. https://www.datainsightsmarket.com/reports/spinal-muscular-atrophy-genetic-detection-1374041
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    May 10, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global market for Spinal Muscular Atrophy (SMA) genetic detection is experiencing robust growth, driven by increasing SMA prevalence, advancements in genetic testing technologies, and rising awareness among healthcare professionals and patients. The market's expansion is fueled by the availability of more accurate and accessible diagnostic tests, enabling earlier diagnosis and intervention, which significantly improves patient outcomes. This market is segmented by application (hospitals, clinics, diagnostic centers) and test type (genetic screening, reproductive genetic testing, diagnostic tests, gene carrier tests, pre-symptomatic testing). The high cost of advanced testing, particularly in resource-limited settings, and potential ethical concerns surrounding genetic information remain significant restraints. However, the ongoing development of less expensive and more accessible technologies is expected to mitigate these challenges. The market shows strong regional variations, with North America and Europe currently holding the largest shares due to well-established healthcare infrastructure and higher adoption rates of advanced genetic testing. However, the Asia-Pacific region is projected to witness the fastest growth due to rising healthcare expenditure, increasing awareness about genetic disorders, and growing adoption of advanced diagnostic technologies in countries like China and India. The forecast period (2025-2033) anticipates continued market expansion, driven by factors such as increasing research and development efforts leading to improved diagnostic techniques, the expansion of newborn screening programs, and the growing adoption of personalized medicine approaches. The strategic initiatives of key market players, including United Gene Group, Berry Genomics, Sanvalley, Microread, and Genecore, further contribute to market growth through product innovation and geographical expansion. The development of non-invasive prenatal testing (NIPT) for SMA detection is expected to be a major growth driver, offering a safer and more convenient alternative to traditional invasive procedures. Competitive dynamics, characterized by mergers, acquisitions, and strategic partnerships, are shaping the market landscape and driving innovation. While challenges remain, the overall outlook for the SMA genetic detection market is positive, promising significant growth opportunities over the next decade.

  15. f

    Summary statistics

    • figshare.com
    txt
    Updated Nov 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sabrina Henne (2024). Summary statistics [Dataset]. http://doi.org/10.6084/m9.figshare.27862137.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 20, 2024
    Dataset provided by
    figshare
    Authors
    Sabrina Henne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary statistics of:Differential expression analysis (mRNA or microRNA)Gene set analysis (mRNA or microRNA)Pathway-based MPHL PRS association test with gene expression change (mRNA and microRNA combined, split by treatment group due to size)

  16. Scorpio Gene-Taxa Benchmark Dataset

    • zenodo.org
    bin, csv, txt
    Updated Apr 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Saleh Refahi; Mohammad Saleh Refahi (2025). Scorpio Gene-Taxa Benchmark Dataset [Dataset]. http://doi.org/10.5281/zenodo.12964684
    Explore at:
    bin, csv, txtAvailable download formats
    Dataset updated
    Apr 3, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mohammad Saleh Refahi; Mohammad Saleh Refahi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We used the Woltka pipeline to compile the complete Basic genome dataset, consisting of 4634 genomes, with each genus represented by a single genome. After downloading all coding sequences (CDS) from the NCBI database, we extracted 8 million distinct CDS, focusing on bacteria and archaea and excluding viruses and fungi due to inadequate gene information.

    To maintain accuracy, we excluded hypothetical proteins, uncharacterized proteins, and sequences without gene labels. We addressed issues with gene name inconsistencies in NCBI by keeping only genes with more than 1000 samples and ensuring each phylum had at least 350 sequences. This resulted in a curated dataset of 800,318 gene sequences from 497 genes across 2046 genera.

    We created four datasets to evaluate our model: a training set (Train_set), a test set (Test_set) with different samples but the same genus and gene as the training set, a Taxa_out_set excluding 18 phyla present in the training set but from different phyla, and a Gene_out_set excluding 60 genes from the training set but from the same phyla. We ensured each CDS had only one representation per genome, removing genes with multiple representations within the same species.

  17. i

    Prenatal and Newborn Genetic Testing Market Size, Share, Growth and Industry...

    • imarcgroup.com
    pdf,excel,csv,ppt
    Updated Jun 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IMARC Group (2024). Prenatal and Newborn Genetic Testing Market Size, Share, Growth and Industry Report [Dataset]. https://www.imarcgroup.com/prenatal-newborn-genetic-testing-market
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jun 26, 2024
    Dataset authored and provided by
    IMARC Group
    License

    https://www.imarcgroup.com/privacy-policyhttps://www.imarcgroup.com/privacy-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    The global prenatal and newborn genetic testing market size reached USD 6.7 Billion in 2024. Looking forward, IMARC Group expects the market to reach USD 17.3 Billion by 2033, exhibiting a growth rate (CAGR) of 11.04% during 2025-2033. The market share is experiencing steady growth driven by the growing demand for advanced diagnostic and screening devices, the thriving medical industry, and the rising prevalence of congenital malformations and genetic abnormalities in newborn babies.

    Report Attribute
    Key Statistics
    Base Year
    2024
    Forecast Years
    2025-2033
    Historical Years
    2019-2024
    Market Size in 2024
    USD 6.7 Billion
    Market Forecast in 2033
    USD 17.3 Billion
    Market Growth Rate 2025-203311.04%

    IMARC Group provides an analysis of the key trends in each segment of the prenatal and newborn genetic testing market statistics, along with forecasts at the global, regional, and country levels for 2025-2033. Our report has categorized the market based on product type, screening, disease, and end user.

  18. Recurrent functional misinterpretation of RNA-seq data caused by...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shir Mandelboum; Zohar Manber; Orna Elroy-Stein; Ran Elkon (2023). Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias - Table 1 [Dataset]. http://doi.org/10.1371/journal.pbio.3000481.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Shir Mandelboum; Zohar Manber; Orna Elroy-Stein; Ran Elkon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias - Table 1

  19. Additional file 2 of Comprehensive enhancer-target gene assignments improve...

    • springernature.figshare.com
    xlsx
    Updated Feb 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tingting Qin; Christopher Lee; Shiting Li; Raymond G. Cavalcante; Peter Orchard; Heming Yao; Hanrui Zhang; Shuze Wang; Snehal Patil; Alan P. Boyle; Maureen A. Sartor (2024). Additional file 2 of Comprehensive enhancer-target gene assignments improve gene set level interpretation of genome-wide regulatory data [Dataset]. http://doi.org/10.6084/m9.figshare.19663872.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 15, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Tingting Qin; Christopher Lee; Shiting Li; Raymond G. Cavalcante; Peter Orchard; Heming Yao; Hanrui Zhang; Shuze Wang; Snehal Patil; Alan P. Boyle; Maureen A. Sartor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 2: Table S1: Overview of the top 19 EnTDefs, including the ranks, enhancer/enhancer-gene link methods, and basic summary statistics. Table S2: The 31 ENCODE ChIP-seq datasets from 9 completely different cell lines and 14 completely different transcription factors. Table S3: The nine ChIA-PET datasets used for generating cell-type-specific EnTDefs (CT-EnTDefs) and number of TFs assayed by ENCODE ChIP-seq in each particular cell type, which were used to evaluate the performance of the CT-EnTDefs. Table S4: Overview of the seven independent datasets used for the comparative analysis. Table S5: ChIA-PET datasets used by “ChIA” and “Loop” methods to assign enhancer to target genes in a cell-type independent manner (general EnTDefs). Table S6: The 87 ENCODE ChIP-seq datasets used for EnTDef evaluation (evaluation ChIP-seq) (tab 1) and the TF vs. cell type matrix (tab 2). Table S7: The 13 ENCODE ChIP-seq datasets from 4 different cell lines (testing ChIP-seq).

  20. Additional file 4 of Detect tissue heterogeneity in gene expression data...

    • springernature.figshare.com
    zip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jitao Zhang; Klas Hatje; Gregor Sturm; Clemens Broger; Martin Ebeling; Martine Burtin; Fabiola Terzi; Silvia Pomposiello; Laura Badi (2023). Additional file 4 of Detect tissue heterogeneity in gene expression data with BioQC [Dataset]. http://doi.org/10.6084/m9.figshare.c.3733999_D4.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Jitao Zhang; Klas Hatje; Gregor Sturm; Clemens Broger; Martin Ebeling; Martine Burtin; Fabiola Terzi; Silvia Pomposiello; Laura Badi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplementary Document 3. This document can also be assessed on the BioQC website under [29] respectively. (ZIP 325 kb)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Christopher Lee; Kai Wang; Tingting Qin; Maureen A. Sartor (2023). Table_5_Testing Proximity of Genomic Regions to Transcription Start Sites and Enhancers Complements Gene Set Enrichment Testing.xlsx [Dataset]. http://doi.org/10.3389/fgene.2020.00199.s006

Table_5_Testing Proximity of Genomic Regions to Transcription Start Sites and Enhancers Complements Gene Set Enrichment Testing.xlsx

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
Jun 9, 2023
Dataset provided by
Frontiers
Authors
Christopher Lee; Kai Wang; Tingting Qin; Maureen A. Sartor
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Large sets of genomic regions are generated by the initial analysis of various genome-wide sequencing data, such as ChIP-seq and ATAC-seq experiments. Gene set enrichment (GSE) methods are commonly employed to determine the pathways associated with them. Given the pathways and other gene sets (e.g., GO terms) of significance, it is of great interest to know the extent to which each is driven by binding near transcription start sites (TSS) or near enhancers. Currently, no tool performs such an analysis. Here, we present a method that addresses this question to complement GSE methods for genomic regions. Specifically, the new method tests whether the genomic regions in a gene set are significantly closer to a TSS (or to an enhancer) than expected by chance given the total list of genomic regions, using a non-parametric test. Combining the results from a GSE test with our novel method provides additional information regarding the mode of regulation of each pathway, and additional evidence that the pathway is truly enriched. We illustrate our new method with a large set of ENCODE ChIP-seq data, using the chipenrich Bioconductor package. The results show that our method is a powerful complementary approach to help researchers interpret large sets of genomic regions.

Search
Clear search
Close search
Google apps
Main menu