28 datasets found
  1. DISEASES v2 (dictionary)

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    application/x-gzip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lars Juhl Jensen; Dhouha Grissa; Alexander Junge; Tudor I. Oprea (2023). DISEASES v2 (dictionary) [Dataset]. http://doi.org/10.6084/m9.figshare.19146044.v1
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Lars Juhl Jensen; Dhouha Grissa; Alexander Junge; Tudor I. Oprea
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file contains the human gene and disease names used for text mining in the DISEASES database v2.

  2. Data from: Graphine: A Dataset for Graph-aware Terminology Definition...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Sep 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zequn Liu; Shukai Wang; Yiyang Gu; Ruiyi Zhang; Ming Zhang; Sheng Wang; Zequn Liu; Shukai Wang; Yiyang Gu; Ruiyi Zhang; Ming Zhang; Sheng Wang (2021). Graphine: A Dataset for Graph-aware Terminology Definition Generation [Dataset]. http://doi.org/10.5281/zenodo.5320310
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 6, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zequn Liu; Shukai Wang; Yiyang Gu; Ruiyi Zhang; Ming Zhang; Sheng Wang; Zequn Liu; Shukai Wang; Yiyang Gu; Ruiyi Zhang; Ming Zhang; Sheng Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset of our EMNLP 2021 paper:

    Graphine: A Dataset for Graph-aware Terminology Definition Generation.

    Please read the "readme.md" in it for the format of the dataset.

  3. BioCreative V.5 TIPS small dictionary

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    application/gzip
    Updated Feb 9, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lars Juhl Jensen (2017). BioCreative V.5 TIPS small dictionary [Dataset]. http://doi.org/10.6084/m9.figshare.4635175.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 9, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Lars Juhl Jensen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Reduced dictionary used by PiTagger for participation in the BioCreative V.5 BeCalm TIPS task.

  4. d

    High Quality SNP Database

    • dknet.org
    • scicrunch.org
    • +2more
    Updated May 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). High Quality SNP Database [Dataset]. http://identifiers.org/RRID:SCR_007230
    Explore at:
    Dataset updated
    May 11, 2024
    Description

    This is the HQSNP DB (high-quality SNP database) developed by CHG bioinformatics group. The high-quality SNP is defined as a SNP having allele frequency or genotyping data. The majority of the HQSNPs come from HapMap, others come from JSNP (Japanese SNP database), TSC (The SNP Consortium), Affymetrix 120K SNP, and Perlegen SNP. There are four kinds of SNP search you can do: * Get SNPs by dbSNP rs#: Choose this search if you have already selected a list of SNPs and you just want to get the SNP information. The program will generate a Excel file containing the SNP flanking sequence, variation, quality, function, etc. In the Excel file, there are 10 highlighted fields. You can send only those highlighted information to Illumina to get SNP pre-score. (The same fields are presented in other types of searches as well.) * Get gene SNPs by gene names: Choose this search if you have a list of gene names and you want to get the SNP information in these genes. The gene name can be official gene symbol, Ensembl gene ID, RefSeq accession ID, LocusLink number, etc. * Get gene SNPs by genome regions: Choose this search if you have a list of genome regions and you want to get all gene SNP information in these regions. The software will find all the Ensembl genes in the regions and find SNPs associated to each Ensembl gene. * Get genome scan SNPs by genome regions: Choose this search if you have a list of genome regions and you want to get evenly spaced SNPs in these regions. A SNP selection tool (SNPselector) was built upon HQSNP. It took snp ID list, gene name list, or genome region list as input and searched SNPs for genome scan or gene assoctiation study. It could take an optional ABI SNP file (exported from ABI SNP search web page) as input for checking whether the candidate SNP is available from ABI. It could also take an optional Illumina SNP pre-score file as input to select SNP for Illumina SNP assay. It generated results sorted by tag SNP in LD block, SNP quality, SNP function, SNP regulatory potential, and SNP mutation risk. SNPselector is now retired from public use (as of September 30, 2010).

  5. KEGG genomes, networks, diseases and drugs

    • kaggle.com
    zip
    Updated Apr 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Abedi Madiseh (2023). KEGG genomes, networks, diseases and drugs [Dataset]. https://www.kaggle.com/datasets/aliabedimadiseh/kegg-genomes-networks-diseases-and-drugs
    Explore at:
    zip(9132230 bytes)Available download formats
    Dataset updated
    Apr 21, 2023
    Authors
    Ali Abedi Madiseh
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset collected from 'genome.jp' web-based dataset by using its ftp : *** https://www.genome.jp/ftp/kegg/ It include bioinformatics and medical datbases in pathway, medical, genome, medicus , drug and .etc categories.

    This dataset include 5 .txt files: dgroup : 'Entry_ID' , 'name', 'type' and 'member' information about drugs disease: 'Entry_ID' , 'name' , 'subgroup', 'supergroup', 'description' ,'genes' and 'category' about drugs and related disease drug: this file include molecular information of drugs network: this file include network of genes interaction with their 'class' and 'gene' information variant: this file include variants of the genes and 'gene variant id' , 'gene name' , 'gene definition' and 'variation type' categories.

    Important definitions

    1.Signaling Pathways : Describes a series of chemical reactions in which a group of molecules in a cell work together to control a cell function, such as cell division or cell death. A cell receives signals from its environment when a molecule, such as a hormone or growth factor, binds to a specific protein receptor on or in the cell. After the first molecule in the pathway receives a signal, it activates another molecule. This process is repeated through the entire signaling pathway until the last molecule is activated and the cell function is carried out. Abnormal activation of signaling pathways may lead to diseases, such as cancer. Drugs are being developed to target specific molecules involved in these pathways. These drugs may help keep cancer cells from growing. (https://www.cancer.gov/publications/dictionaries/cancer-terms/def/signaling-pathway)

    2.Variants of gene :An alteration in the most common DNA nucleotide sequence. The term variant can be used to describe an alteration that may be benign, pathogenic, or of unknown significance. The term variant is increasingly being used in place of the term mutation. (https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/variant)

  6. f

    Data from: Veneer Is a Webtool for Rapid, Standardized, and Transparent...

    • acs.figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    xlsx
    Updated Feb 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linda Berg Luecke; Roneldine Mesidor; Jack Littrell; Morgan Carpenter; Melinda Wojtkiewicz; Rebekah L. Gundry (2024). Veneer Is a Webtool for Rapid, Standardized, and Transparent Interpretation, Annotation, and Reporting of Mammalian Cell Surface N‑Glycocapture Data [Dataset]. http://doi.org/10.1021/acs.jproteome.3c00800.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 27, 2024
    Dataset provided by
    ACS Publications
    Authors
    Linda Berg Luecke; Roneldine Mesidor; Jack Littrell; Morgan Carpenter; Melinda Wojtkiewicz; Rebekah L. Gundry
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Currently, no consensus exists regarding criteria required to designate a protein within a proteomic data set as a cell surface protein. Most published proteomic studies rely on varied ontology annotations or computational predictions instead of experimental evidence when attributing protein localization. Consequently, standardized approaches for analyzing and reporting cell surface proteome data sets would increase confidence in localization claims and promote data use by other researchers. Recently, we developed Veneer, a web-based bioinformatic tool that analyzes results from cell surface N-glycocapture workflowsthe most popular cell surface proteomics method used to date that generates experimental evidence of subcellular location. Veneer assigns protein localization based on defined experimental and bioinformatic evidence. In this study, we updated the criteria and process for assigning protein localization and added new functionality to Veneer. Results of Veneer analysis of 587 cell surface N-glycocapture data sets from 32 published studies demonstrate the importance of applying defined criteria when analyzing cell surface proteomics data sets and exemplify how Veneer can be used to assess experimental quality and facilitate data extraction for informing future biological studies and annotating public repositories.

  7. m

    CWL run of Somatic Variant Calling Workflow (CWLProv 0.5.0 Research Object)

    • data.mendeley.com
    • data.niaid.nih.gov
    • +2more
    Updated Oct 27, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stian Soiland-Reyes (2018). CWL run of Somatic Variant Calling Workflow (CWLProv 0.5.0 Research Object) [Dataset]. http://doi.org/10.17632/97hj93mkfd.1
    Explore at:
    Dataset updated
    Oct 27, 2018
    Authors
    Stian Soiland-Reyes
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The somatic variant calling workflow included in this case study is designed by Blue Collar Bioinformatics (bcbio), a community-driven initiative to develop best-practice pipelines for variant calling, RNA-seq and small RNA analysis workflows. According to the documentation, the goal of this project is to facilitate the automated analysis of high throughput data by making the resources quantifiable, analyzable, scalable, accessible and reproducible.

    All the underlying tools are containerized facilitating software use in the workflow. The somatic variant calling workflow defined in CWL is available on GitHub and equipped with a well defined test dataset.

    This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwlprov/ to explore

  8. f

    Data from: Comparative Bioinformatics Analysis of Transcription Factor Genes...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Nov 11, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alzan, Heba F.; Knowles, Donald P.; Suarez, Carlos E. (2016). Comparative Bioinformatics Analysis of Transcription Factor Genes Indicates Conservation of Key Regulatory Domains among Babesia bovis, Babesia microti, and Theileria equi [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001596026
    Explore at:
    Dataset updated
    Nov 11, 2016
    Authors
    Alzan, Heba F.; Knowles, Donald P.; Suarez, Carlos E.
    Description

    Apicomplexa tick-borne hemoparasites, including Babesia bovis, Babesia microti, and Theileria equi are responsible for bovine and human babesiosis and equine theileriosis, respectively. These parasites of vast medical, epidemiological, and economic impact have complex life cycles in their vertebrate and tick hosts. Large gaps in knowledge concerning the mechanisms used by these parasites for gene regulation remain. Regulatory genes coding for DNA binding proteins such as members of the Api-AP2, HMG, and Myb families are known to play crucial roles as transcription factors. Although the repertoire of Api-AP2 has been defined and a HMG gene was previously identified in the B. bovis genome, these regulatory genes have not been described in detail in B. microti and T. equi. In this study, comparative bioinformatics was used to: (i) identify and map genes encoding for these transcription factors among three parasites’ genomes; (ii) identify a previously unreported HMG gene in B. microti; (iii) define a repertoire of eight conserved Myb genes; and (iv) identify AP2 correlates among B. bovis and the better-studied Plasmodium parasites. Searching the available transcriptome of B. bovis defined patterns of transcription of these three gene families in B. bovis erythrocyte stage parasites. Sequence comparisons show conservation of functional domains and general architecture in the AP2, Myb, and HMG proteins, which may be significant for the regulation of common critical parasite life cycle transitions in B. bovis, B. microti, and T. equi. A detailed understanding of the role of gene families encoding DNA binding proteins will provide new tools for unraveling regulatory mechanisms involved in B. bovis, B. microti, and T. equi life cycles and environmental adaptive responses and potentially contributes to the development of novel convergent strategies for improved control of babesiosis and equine piroplasmosis.

  9. Medulloblastoma omics data

    • kaggle.com
    zip
    Updated Feb 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Chervov (2023). Medulloblastoma omics data [Dataset]. https://www.kaggle.com/alexandervc/medulloblastoma-omics-data
    Explore at:
    zip(2278448493 bytes)Available download formats
    Dataset updated
    Feb 22, 2023
    Authors
    Alexander Chervov
    Description

    Collection of gene expression and similar datasets related to brain tumors. In particular Medulloblastoma. Medulloblastoma is the most common malignant brain tumor in childhood. Typically csv files genes x samples.

    GSE124814 WOW! Integration of many (all?) medulloblastoma datasets(!): 1641 samples, of which 1350 samples represent primary medulloblastomas and 291 samples represent normal brain

    https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE124814 Weishaupt H, Johansson P, Sundström A, Lubovac-Pilav Z et al. Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes. Bioinformatics 2019 Sep 15;35(18):3357-3364. PMID: 30715209 https://doi.org/10.1093/bioinformatics/btz066 We downloaded a total of 1796 CEL files from previously published GEO or ArrayExpress records: GSE85217(n=763), GSE25219(n=154), GSE60862(n=130), GSE12992(n=40), GSE67850(n=22), GSE10327(n=62), GSE30074(n=30), E-MTAB-292(n=19), GSE74195(n=30), GSE37418(n=76), GSE4036(n=14), GSE62803(n=52), GSE21140(n=103), GSE37382(n=50), GSE22569(n=24), GSE35974(n=50), GSE73038(n=46), GSE50161(n=24), GSE3526(n=9), GSE50765(n=12), GSE49243(n=58), GSE41842(n=19), GSE44971(n=9). After preprocessing of all CEL files, we averaged the expression profiles of samples that mapped to the same patient in a single dataset, producing a final expression array comprising 1641 samples, of which 1350 samples represent primary medulloblastomas and 291 samples represent normal brain (cerebellum/upper rhombic lip). Also discussed in paper: A transcriptome-based classifier to determine molecular subtypes in medulloblastoma https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008263

    GSE85217 (Cavalli ... Taylor ) 768 samples 2016 ( Affimetrix Human Gene 1.1 ST Array ) Cavalli FMG, Remke M, Rampasek L, Peacock J et al. Intertumoral Heterogeneity within Medulloblastoma Subgroups. Cancer Cell 2017 Jun 12;31(6):737-754.e6. PMID: 28609654 Ramaswamy V, Taylor MD. Bioinformatic Strategies for the Genomic and Epigenomic Characterization of Brain Tumors. Methods Mol Biol 2019;1869:37-56. PMID: 30324512 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85217

    GSE202043 (Pomeroy) 214 samples, 2011 (Expression profiling by array) Cho YJ, Tsherniak A, Tamayo P, Santagata S et al. Integrative genomic analysis of medulloblastoma identifies a molecular subgroup that drives poor clinical outcome. J Clin Oncol 2011 Apr 10;29(11):1424-30. PMID: 21098324 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE202043

    GSE12992 (Fattet ... Delattre) 72 samples, 2009 (Expression profiling by array) Fattet S, Haberler C, Legoix P, Varlet P et al. Beta-catenin status in paediatric medulloblastomas: correlation of immunohistochemical expression with mutational status, genetic profiles, and clinical characteristics. J Pathol 2009 May;218(1):86-94. PMID: 19197950 A series of 72 pediatric medulloblastoma tumors has been studied at the genomic level (array-CGH), screened for CTNNB1 mutations and beta-catenin expression (immunohistochemistry). A subset of 40 tumor samples has been analyzed at the RNA expression level (Affymetrix HG U133 Plus 2.0). https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12992

    GSE37382 (Northcott ... Taylor) 2012 (Expression profiling by array, Affymetrix Human Gene 1.1 ST Array profiling of 285 primary medulloblastoma samples.) Northcott PA, Shih DJ, Peacock J, Garzia L et al. Subgroup-specific structural variation across 1,000 medulloblastoma genomes. Nature 2012 Aug 2;488(7409):49-56. PMID: 22832581 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37382

    GSE10327 (M. Kool ) 62 samples, 2008 ( Expression profiling by array ) (beware it is sometimes referred as GSE10237 in original paper and several references - that is an error reference). Kool M, Koster J, Bunt J, Hasselt NE et al. Integrated genomics identifies five medulloblastoma subtypes with distinct genetic profiles, pathway signatures and clinicopathological features. PLoS One 2008 Aug 28;3(8):e3088. PMID: 18769486 Rack PG, Ni J, Payumo AY, Nguyen V et al. Arhgap36-dependent activation of Gli transcription factors. Proc Natl Acad Sci U S A 2014 Jul 29;111(30):11061-6. PMID: 25024229 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10327

    Other datasets (not yet loaded):

    (47.1 Gb, 2012) (Expression profiling by array, Genome variation profiling by SNP array, SNP genotyping by SNP array ) Northcott PA, Shih DJ, Peacock J, Garzia L et al. Subgroup-specific structural variation across 1,000 medulloblastoma genomes. Nature 2012 Aug 2;488(7409):49-56. PMID: 22832581 Here we report somatic copy number aberrations (SCNAs) in 1087 unique medulloblastomas. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37385

  10. Mutplot: An easy-to-use online tool for plotting complex mutation data with...

    • plos.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weiwei Zhang; Cheng Wang; Xuan Zhang (2023). Mutplot: An easy-to-use online tool for plotting complex mutation data with flexibility [Dataset]. http://doi.org/10.1371/journal.pone.0215838
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Weiwei Zhang; Cheng Wang; Xuan Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the development of technology, an enormous amount of sequencing data is being generated rapidly. However, transforming this data into patient care is a critical challenge. There are two difficulties: how to integrate functional information into mutation interpretation and how to make the integration easy to apply. One solution is to visualize amino acid changes with protein structure and function in web app platform. There are multiple existing tools for plotting mutations, but the majority of them requires programming skills that are not common background for clinicians or researchers. Furthermore, the recurrent mutations are the focus and the recurrence cutoff varies. Yet, none of the current software offers customer-defined cutoff. Thus, we developed this user-friendly web-based tool, Mutplot (https://bioinformaticstools.shinyapps.io/lollipop/). Mutplot retrieves up-to-date domain information from the protein resource UniProt (https://www.uniprot.org/), integrates the submitted mutation information and produces lollipop diagrams with annotations and highlighted candidates. It offers flexible output options. For data that follows security standards, the app can also be hosted in web servers inside a firewall or computers without internet with Uniprot database stored on them. Altogether, Mutplot is an excellent tool for visualizing protein mutations, especially for clinicians or researchers without any bioinformatics background.

  11. Data from: RNA splicing programs define tissue compartments and cell types...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julia Olivieri; Roozbeh Dehghannasiri; Peter Wang; SoRi Jang; Antoine de Morree; Serena Tan; Jingsi Ming; Angela Wu; Tabula Sapiens Consortium; Stephen Quake; Mark Krasnow; Julia Salzman (2023). RNA splicing programs define tissue compartments and cell types at single cell resolution [Dataset]. http://doi.org/10.6084/m9.figshare.14531721.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Julia Olivieri; Roozbeh Dehghannasiri; Peter Wang; SoRi Jang; Antoine de Morree; Serena Tan; Jingsi Ming; Angela Wu; Tabula Sapiens Consortium; Stephen Quake; Mark Krasnow; Julia Salzman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    spliz_*:

    Separate tables with the SpliZ and SpliZVD score for each cell and gene for each dataset. The cell, gene, cell type, SpliZ, and SpliZVD are given by the cell, geneR1A_uniq, ontology, scZ, and svd_z0 columns respectively.

  12. Additional file 1: of Silhouette Scores for Arbitrary Defined Groups in Gene...

    • springernature.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shitao Zhao; Jianqiang Sun; Kentaro Shimizu; Koji Kadota (2023). Additional file 1: of Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results [Dataset]. http://doi.org/10.6084/m9.figshare.5937616.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Shitao Zhao; Jianqiang Sun; Kentaro Shimizu; Koji Kadota
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Detailed results for Blekhman’s RNA-seq count data. (a) Silhouette indices (s i ) for each sample i and the average (AS). The sample names (A1, A2, A3, B1, B2, or B3) for i correspond to those shown in Fig. 1b. (b) PDEG values at various FDR thresholds (1%, 5%, 10%, 20%, 30%, and 40% FDR). The values at 10% FDR were the same as those shown in Fig. 1b. (c) Percentages of true DEGs (PtrueDEG), defined as PDEG × (1 − FDR threshold), at corresponding FDR thresholds shown in (b). (XLSX 19 kb)

  13. f

    A summary of the criteria that would define a general genomics workbench...

    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enis Afgan; Clare Sloggett; Nuwan Goonasekera; Igor Makunin; Derek Benson; Mark Crowe; Simon Gladman; Yousef Kowsar; Michael Pheasant; Ron Horst; Andrew Lonie (2023). A summary of the criteria that would define a general genomics workbench environment, and suggested implications on technical requirements. [Dataset]. http://doi.org/10.1371/journal.pone.0140829.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Enis Afgan; Clare Sloggett; Nuwan Goonasekera; Igor Makunin; Derek Benson; Mark Crowe; Simon Gladman; Yousef Kowsar; Michael Pheasant; Ron Horst; Andrew Lonie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A summary of the criteria that would define a general genomics workbench environment, and suggested implications on technical requirements.

  14. Additional file 10: of Silhouette Scores for Arbitrary Defined Groups in...

    • springernature.figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shitao Zhao; Jianqiang Sun; Kentaro Shimizu; Koji Kadota (2023). Additional file 10: of Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results [Dataset]. http://doi.org/10.6084/m9.figshare.5937598.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Shitao Zhao; Jianqiang Sun; Kentaro Shimizu; Koji Kadota
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R-codes for analyses. This zipped file includes a total of 23 R-code files. Results can be obtained by executing scripts in the order of the serial numbers XX in the filename “rcode_XX_...” Note that two files (“rcode_08_Add6_pre.R” and “rcode_10_Add7_pre.R”) must be executed using R ver. 3.1.3 (affy ver. 1.44.0) instead of R ver. 3.3.2 (affy ver. 1.52.0). (ZIP 33 kb)

  15. Comparison between Mutplot and other most popular tools for mutaiotn plots.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weiwei Zhang; Cheng Wang; Xuan Zhang (2023). Comparison between Mutplot and other most popular tools for mutaiotn plots. [Dataset]. http://doi.org/10.1371/journal.pone.0215838.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Weiwei Zhang; Cheng Wang; Xuan Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison between Mutplot and other most popular tools for mutaiotn plots.

  16. f

    Supporting data for:"Integrated gene expression and alternative splicing...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    csv
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Silvia Gioiosa; Silvia Gasparini; Carlo Presutti; Arianna Rinaldi; Tiziana Castrignanò; Cecilia Mannironi (2025). Supporting data for:"Integrated gene expression and alternative splicing analysis in human and mouse models of Rett Syndrome." [Dataset]. http://doi.org/10.6084/m9.figshare.26946523.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    figshare
    Authors
    Silvia Gioiosa; Silvia Gasparini; Carlo Presutti; Arianna Rinaldi; Tiziana Castrignanò; Cecilia Mannironi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The folder "Homo_sapiens" contains 8 subfolders. Each of them is an individual input Bioproject downloaded from SRA and reanalized with the same transcriptomic pipeline in order to compute:1) Differentially Alternative spliced Genes (DAS) in .tsv format;2) Differentially expressed genes in csv format;3) Gene ontology analysis over DEGs results. When the analysis has produced statistically significant results for Gene Enrichment Ontology analysis, three .csv files have been added to each folder, one for BP=Biological Process results, one for CC=Cellular Component results and one for MF=Molecular Functions results.When a Bioprojects appears more than once, it means that DEGs have been computed over diffferent varibles (e.g. Rtt vs. wt) or treated as indipendent studies when multiple source materials are present. To distinguish the studies an "_0", "_1", "_2" progressive number has been added to the folder names (e.g. in PRJNA509687_0 the samples under study were iPSC derived neural cortical neurons RTT vs. Wt while in PRJNA509687_1 the samples were derived from iPSC derived neural progenitors RTT vs. Wt). To facilitate the folder navigation, a file named "parameters" has been added to each folder.4) DESeq2 inputs divided in:gene count matrices in csv formatassociated phenodata.csvThe same logic is applied to the main folder "Mus_musculus",which contains 13 subfolders with DAS and DEG results

  17. Dataset for: Structural features of Aspergillus niger β-galactosidase define...

    • wiley.figshare.com
    bin
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agustín Rico-Díaz; Mercedes Ramírez-Escudero; Angel Vizoso-Vázquez; M. Esperanza Cerdán; Manuel Becerra; Juliana (Julia) Sanz-Aparicio (2023). Dataset for: Structural features of Aspergillus niger β-galactosidase define its activity against glycoside linkages [Dataset]. http://doi.org/10.6084/m9.figshare.5001896.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Wileyhttps://www.wiley.com/
    Authors
    Agustín Rico-Díaz; Mercedes Ramírez-Escudero; Angel Vizoso-Vázquez; M. Esperanza Cerdán; Manuel Becerra; Juliana (Julia) Sanz-Aparicio
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    β-galactosidases are biotechnologically interesting enzymes that catalyze the hydrolysis or transgalactosylation of β-galactosides. Among them, the Aspergillus niger β-galactosidase (AnβGal) belongs to the glycoside hydrolase family 35 (GH35) and is widely used in the industry due to its high hydrolytic activity degrading lactose. We present here its three-dimensional structure in complex with different oligosaccharides, to illustrate the structural determinants of the broad specificity of the enzyme against different glycoside linkages. Remarkably, the residues Phe264, Tyr304 and Trp806 make a dynamic hydrophobic platform that accommodates the sugar at subsite +1 suggesting a main role on the recognition of structurally different substrates. Moreover, complexes with the trisaccharides show two potential subsites +2 depending on the substrate type. This feature and the peculiar shape of its wide cavity suggest that AnβGal might accommodate branched substrates from the complex net of polysaccharides composing the plant material in its natural environment. Relevant residues were selected and mutagenesis analyses were performed to evaluate their role in the catalytic performance and the hydrolase/transferase ratio of AnβGal. Thus, we generated mutants with improved transgalactosylation activity. In particular, the variant Y304F/Y355H/N357G/W806F displays a higher level of galacto-oligosaccharides (GOS) production than the Aspergillus oryzae β-galactosidase, which is the preferred enzyme in the industry owing to its high transferase activity. Our results provide new knowledge on the determinants modulating specificity and the catalytic performance of fungal GH35 β-galactosidases. In turn, this fundamental background gives novel tools for the future improvement of these enzymes, which represent an interesting target for rational design.

  18. DataSheet1_Ferroptosis patterns and tumor microenvironment infiltration...

    • frontiersin.figshare.com
    txt
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lu-Lu Zhang; Wei-Jie Zhu; Xin-Xin Zhang; Da Feng; Xi-Cheng Wang; Ying Ding; Dong-Xia Wang; Yi-Yang Li (2023). DataSheet1_Ferroptosis patterns and tumor microenvironment infiltration characterization in esophageal squamous cell cancer.CSV [Dataset]. http://doi.org/10.3389/fgene.2022.1047382.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Lu-Lu Zhang; Wei-Jie Zhu; Xin-Xin Zhang; Da Feng; Xi-Cheng Wang; Ying Ding; Dong-Xia Wang; Yi-Yang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: Esophageal Squamous Cell Cancer (ESCC) is an aggressive disease associated with a poor prognosis. As a newly defined form of regulated cell death, ferroptosis plays a crucial role in cancer development and treatment and might be a promising therapeutic target. However, the expression patterns of ferroptosis-related genes (FRGs) in ESCC remain to be systematically analyzed.Methods: First, we retrieved the transcriptional profile of ESCC from TCGA and GEO datasets (GSE47404, GSE23400, and GSE53625) and performed unsupervised clustering to identify different ferroptosis patterns. Then, we used the ssGSEA algorithm to estimate the immune cell infiltration of these patterns and explored the differences in immune cell abundance. Common genes among patterns were finally identified as signature genes of ferroptosis patterns.Results: Herein, we depicted the multi-omics landscape of FRGs through integrated bioinformatics analysis and identified three ESCC subtypes with distinct immune characteristics: clusters A-C. Cluster C was abundant in CD8+ T cells and other immune cell infiltration, while cluster A was immune-barren. By comparing the differently expressed genes between clusters of diverse datasets, we defined a gene signature for each cluster and successfully validated it in the TCGA-ESCC dataset.Conclusion: We provided a comprehensive insight into the expression pattern of ferroptosis genes and their interaction with immune cell infiltration. Additionally, we established a gene signature to define the ferroptosis patterns, which might be used to predict the response to immunotherapy.

  19. Data from: Advances in understanding cis regulation of the plant gene with...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Jan 19, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diane Burgess; Michael Freeling; Jie Xu (2016). Advances in understanding cis regulation of the plant gene with an emphasis on comparative genomics [Dataset]. http://doi.org/10.6084/m9.figshare.1397562.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Diane Burgess; Michael Freeling; Jie Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a list of Arabidopsis thaliana CNSs merged from the following CNS lists: 1) Haudry et al. (2013) An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat. Genet. 45:891-898. 2) PL3.0 (TAIR 10 version): Turco et al. (2013) Automated conserved noncoding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses. Frontiers in Plant Genetics and Genomics 4:170-180. 3) Van de Velde et al (2014) Inferences of transcriptional networks in Arabidopsis through conserved noncoding sequence analysis. Plant Cell 26:2729-2745. CNSs from the individual lists were concatenated and then merged using merge from the BEDTools suite (Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842). CNSs from the merged list were assigned to an Arabidopsis thaliana gene based on their PL3.0 component. PL3.0 CNSs are defined as syntenic conserved noncoding regions between Arabidopsis thaliana and the early branching Brassicaceae Aethionema arabicum. Orthologous Arabidopsis thaliana-Aethionema arabicum genes were identified using a combination of CoGe: Synfind (Tang et al. (2011) BMC Bioinformatics 12:102) and the PL3.0 CNS pipeline (Turco et al. 2013). closestBed (Bedtools) was then used to map PL3.0 CNSs to the closest Arabidopsis thaliana gene which had an Aethionema arabicum ortholog. Distance to the nearest gene is included in the closestBed output. Proximal regions were defined as being 1000 bp upstream from the transcription start site (5' proximal) or 1000 bp downstream from the gene (3' proximal). For intragenic CNSs, a custom perlscript was used to identify the position of the CNS in introns vs UTRs. Overlap with UTRs and CDS regions was calculated using intersectBed (BEDTools) using bedfiles created from GFF "UTR" and "CDS" features. CNS sequences overlapping CDSs by 50% or more were given "CDS" designations. CNSs overlapping UTRs by 50% or more were given 5' or 3' UTR designations. CNSs without a PL3.0 component were then assigned to an Arabidopsis thaliana gene if they were present in the genespace of an arabidopsis gene, with the genespace being defined as the region between and encompassing the 5'-most PL3.0 CNS and the 3'-most PL3.0 CNS. Once assigned to an arabidopsis gene, the distance to that gene was calculated using closestBed (BEDTools) and intersectBed was used, as above, to identify the position of intragenic CNSs. An Arabidopsis thaliana genome has been made available on CoGe, dsgid 25725, decorated with 2 sets of CNSs: 1) PL3.0 and 2) the merged set from this datasheet. To see the CNSs, in Results Visualization Options, set "Show preannotated CNSs?" to "Yes". Note: CNS assignments to Arabidopsis thaliana genes are best-guess computational assignments; individual PL3.0 CNSs may in actuality function in regulating genes that are not the closest Arabidopsis thaliana gene with an Aethionema arabicum ortholog. This is particularly true for genes with complex regulation. In the GEvo links included in this spreadsheet these can often be seen as clusters of CNSs extending beyond the midpoint between two Arabidopsis thaliana genes. By adding additional orthologous genes to GEvo panels, it is often possible to assign a CNS to an Arabidopsis thaliana gene with greater confidence if only one of the two Arabidopsis thaliana genes is retained in all genomes along with the CNS.

  20. Arabidopsis thaliana CNSs verified in at least 2 CNS lists

    • figshare.com
    xlsx
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diane Burgess (2016). Arabidopsis thaliana CNSs verified in at least 2 CNS lists [Dataset]. http://doi.org/10.6084/m9.figshare.1422166.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Diane Burgess
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a list of Arabidopsis thaliana CNS sequences present in at least two of the three following CNS lists: 1) Haudry et al. (2013) An atalas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat. Genet. 45:891-898. 2) PL3.0 (TAIR 10 version): Turco et al. (2013) Automated conserved noncoding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses. Frontiers in Plant Genetics and Genomics 4:170-180. 3) Van de Velde et al (2014) Inferences of transcriptional networks in Arabidopsis through conserved noncoding sequence analysis. Plant Cell 26:2729-2745. CNS sequences found in at least 2 of the 3 CNS lists were identified using multiIntersectBed from the BEDTools suite (Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842). CNSs from the verified2 list were assigned to an Arabidopsis thaliana gene based on their PL3.0 component. PL3.0 CNSs are defined as syntenic conserved noncoding regions between Arabidopsis thaliana and the early branching Brassicaceae Aethionema arabicum. Orthologous Arabidopsis thaliana-Aethionema arabicum genes were identified using a combination of CoGe: Synfind (Tang et al. (2011) BMC Bioinformatics 12:102) and the PL3.0 CNS pipeline (Turco et al. 2013). closestBed (Bedtools) was then used to map PL3.0 CNSs to the closest Arabidopsis thaliana gene with an Aethionema arabicum ortholog. Distance to the nearest gene is included in the closestBed output. Proximal regions were defined as being 1000 bp upstream from the transcription start site (5' proximal) or 1000 bp downstream from the gene (3' proximal). CNSs without a PL3.0 component were also assigned to an Arabidopsis thaliana gene if they were intragenic or if they were in the genespace of an arabidopsis gene, with the genespace being defined as the region between and encompassing the 5'-most PL3.0 CNS and the 3'-most PL3.0 CNS. For intragenic CNSs, a custom perlscript was used to identify the position of the CNS in introns vs UTRs. Overlap with UTRs and CDS regions was calculated using intersectBed (BEDTools) using bedfiles created from GFF "UTR", "gene", and "CDS" features. CNS sequences overlapping CDSs by 50% or more were given "CDS" designations. CNSs overlapping UTRs by 50% or more were given 5' or 3' UTR designations. Note: CNS assignments to Arabidopsis thaliana genes are best-guess computational assignments; individual PL3.0 CNSs may in actuality function in regulating genes that are not the closest Arabidopsis thaliana gene with an Aethionema arabicum ortholog. This is particularly true for genes with complex regulation. In the GEvo links included in this spreadsheet these can often be seen as clusters of CNSs extending beyond the midpoint between two Arabidopsis thaliana genes. By adding additional orthologous genes to GEvo panels, it is often possible to assign a CNS to an Arabidopsis thaliana gene with greater confidence if only one of the two Arabidopsis thaliana genes is retained in all genomes along with the CNS.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lars Juhl Jensen; Dhouha Grissa; Alexander Junge; Tudor I. Oprea (2023). DISEASES v2 (dictionary) [Dataset]. http://doi.org/10.6084/m9.figshare.19146044.v1
Organization logoOrganization logo

DISEASES v2 (dictionary)

Explore at:
application/x-gzipAvailable download formats
Dataset updated
Jun 2, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Lars Juhl Jensen; Dhouha Grissa; Alexander Junge; Tudor I. Oprea
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This file contains the human gene and disease names used for text mining in the DISEASES database v2.

Search
Clear search
Close search
Google apps
Main menu