100+ datasets found
  1. Fantastic databases and where to find them: Web applications for researchers...

    • scielo.figshare.com
    jpeg
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerda Cristal Villalba; Ursula Matte (2023). Fantastic databases and where to find them: Web applications for researchers in a rush [Dataset]. http://doi.org/10.6084/m9.figshare.20018091.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Gerda Cristal Villalba; Ursula Matte
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract Public databases are essential to the development of multi-omics resources. The amount of data created by biological technologies needs a systematic and organized form of storage, that can quickly be accessed, and managed. This is the objective of a biological database. Here, we present an overview of human databases with web applications. The databases and tools allow the search of biological sequences, genes and genomes, gene expression patterns, epigenetic variation, protein-protein interactions, variant frequency, regulatory elements, and comparative analysis between human and model organisms. Our goal is to provide an opportunity for exploring large datasets and analyzing the data for users with little or no programming skills. Public user-friendly web-based databases facilitate data mining and the search for information applicable to healthcare professionals. Besides, biological databases are essential to improve biomedical search sensitivity and efficiency and merge multiple datasets needed to share data and build global initiatives for the diagnosis, prognosis, and discovery of new treatments for genetic diseases. To show the databases at work, we present a a case study using ACE2 as example of a gene to be investigated. The analysis and the complete list of databases is available in the following website .

  2. f

    Data from: hfAIM: A reliable bioinformatics approach for in silico...

    • tandf.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qingjun Xie; Oren Tzfadia; Matan Levy; Efrat Weithorn; Hadas Peled-Zehavi; Thomas Van Parys; Yves Van de Peer; Gad Galili (2023). hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms [Dataset]. http://doi.org/10.6084/m9.figshare.3172519
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Qingjun Xie; Oren Tzfadia; Matan Levy; Efrat Weithorn; Hadas Peled-Zehavi; Thomas Van Parys; Yves Van de Peer; Gad Galili
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Most of the proteins that are specifically turned over by selective autophagy are recognized by the presence of short Atg8 interacting motifs (AIMs) that facilitate their association with the autophagy apparatus. Such AIMs can be identified by bioinformatics methods based on their defined degenerate consensus F/W/Y-X-X-L/I/V sequences in which X represents any amino acid. Achieving reliability and/or fidelity of the prediction of such AIMs on a genome-wide scale represents a major challenge. Here, we present a bioinformatics approach, high fidelity AIM (hfAIM), which uses additional sequence requirements—the presence of acidic amino acids and the absence of positively charged amino acids in certain positions—to reliably identify AIMs in proteins. We demonstrate that the use of the hfAIM method allows for in silico high fidelity prediction of AIMs in AIM-containing proteins (ACPs) on a genome-wide scale in various organisms. Furthermore, by using hfAIM to identify putative AIMs in the Arabidopsis proteome, we illustrate a potential contribution of selective autophagy to various biological processes. More specifically, we identified 9 peroxisomal PEX proteins that contain hfAIM motifs, among which AtPEX1, AtPEX6 and AtPEX10 possess evolutionary-conserved AIMs. Bimolecular fluorescence complementation (BiFC) results verified that AtPEX6 and AtPEX10 indeed interact with Atg8 in planta. In addition, we show that mutations occurring within or nearby hfAIMs in PEX1, PEX6 and PEX10 caused defects in the growth and development of various organisms. Taken together, the above results suggest that the hfAIM tool can be used to effectively perform genome-wide in silico screens of proteins that are potentially regulated by selective autophagy. The hfAIM system is a web tool that can be accessed at link: http://bioinformatics.psb.ugent.be/hfAIM/.

  3. TCR-MHC Germline Interaction Scores Generated Using AIMS

    • zenodo.org
    zip
    Updated Aug 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher T. Boughter; Christopher T. Boughter (2022). TCR-MHC Germline Interaction Scores Generated Using AIMS [Dataset]. http://doi.org/10.5281/zenodo.7023681
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 28, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christopher T. Boughter; Christopher T. Boughter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These data were generated using the AIMS interaction scoring function as outlined in the manuscript "A Systematic Characterization of Germline-Encoded Contacts Identifies the Source of Bias in TCR-MHC Interactions". They accompany the AIMS version 0.7 software available on GitHub: https://github.com/ctboughter/AIMS . These files are meant to be loaded into the mhc_germline_analysis.ipynb file, but are too large to be included on the GitHub page itself.

  4. t

    BIOGRID CURATED DATA FOR PUBLICATION: hfAIM: A reliable bioinformatics...

    • thebiogrid.org
    zip
    Updated Feb 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2016). BIOGRID CURATED DATA FOR PUBLICATION: hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms. [Dataset]. https://thebiogrid.org/199728/publication/hfaim-a-reliable-bioinformatics-approach-for-in-silico-genome-wide-identification-of-autophagy-associated-atg8-interacting-motifs-in-various-organisms.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 1, 2016
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for Xie Q (2016):hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms. curated by BioGRID (https://thebiogrid.org); ABSTRACT: Most of the proteins that are specifically turned over by selective autophagy are recognized by the presence of short Atg8 interacting motifs (AIMs) that facilitate their association with the autophagy apparatus. Such AIMs can be identified by bioinformatics methods based on their defined degenerate consensus F/W/Y-X-X-L/I/V sequences in which X represents any amino acid. Achieving reliability and/or fidelity of the prediction of such AIMs on a genome-wide scale represents a major challenge. Here, we present a bioinformatics approach, high fidelity AIM (hfAIM), which uses additional sequence requirements-the presence of acidic amino acids and the absence of positively charged amino acids in certain positions-to reliably identify AIMs in proteins. We demonstrate that the use of the hfAIM method allows for in silico high fidelity prediction of AIMs in AIM-containing proteins (ACPs) on a genome-wide scale in various organisms. Furthermore, by using hfAIM to identify putative AIMs in the Arabidopsis proteome, we illustrate a potential contribution of selective autophagy to various biological processes. More specifically, we identified 9 peroxisomal PEX proteins that contain hfAIM motifs, among which AtPEX1, AtPEX6 and AtPEX10 possess evolutionary-conserved AIMs. Bimolecular fluorescence complementation (BiFC) results verified that AtPEX6 and AtPEX10 indeed interact with Atg8 in planta. In addition, we show that mutations occurring within or nearby hfAIMs in PEX1, PEX6 and PEX10 caused defects in the growth and development of various organisms. Taken together, the above results suggest that the hfAIM tool can be used to effectively perform genome-wide in silico screens of proteins that are potentially regulated by selective autophagy. The hfAIM system is a web tool that can be accessed at link: http://bioinformatics.psb.ugent.be/hfAIM/.

  5. d

    Data from: Semi-artificial datasets as a resource for validation of...

    • search.dataone.org
    • datadryad.org
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucie Tamisier; Annelies Haegeman; Yoika Foucart; Nicolas Fouillien; Maher Al Rwahnih; Nihal Buzkan; Thierry Candresse; Michela Chiumenti; Kris De Jonghe; Marie Lefebvre; Paolo Margaria; Jean Sébastien Reynard; Kristian Stevens; Denis Kutnjak; Sébastien Massart (2025). Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection [Dataset]. http://doi.org/10.5061/dryad.0zpc866z8
    Explore at:
    Dataset updated
    May 21, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Lucie Tamisier; Annelies Haegeman; Yoika Foucart; Nicolas Fouillien; Maher Al Rwahnih; Nihal Buzkan; Thierry Candresse; Michela Chiumenti; Kris De Jonghe; Marie Lefebvre; Paolo Margaria; Jean Sébastien Reynard; Kristian Stevens; Denis Kutnjak; Sébastien Massart
    Time period covered
    Jan 1, 2021
    Description

    In the last decade, High-Throughput Sequencing (HTS) has revolutionized biology and medicine. This technology allows the sequencing of huge amount of DNA and RNA fragments at a very low price. In medicine, HTS tests for disease diagnostics are already brought into routine practice. However, the adoption in plant health diagnostics is still limited. One of the main bottlenecks is the lack of expertise and consensus on the standardization of the data analysis. The Plant Health Bioinformatic Network (PHBN) is an Euphresco project aiming to build a community network of bioinformaticians/computational biologists working in plant health. One of the main goals of the project is to develop reference datasets that can be used for validation of bioinformatics pipelines and for standardization purposes.

    Semi-artificial datasets have been created for this purpose (Datasets 1 to 10). They are composed of a “real†HTS dataset spiked with artificial viral reads. It will allow researchers to adjust ...

  6. Microarray and bioinformatic analysis of conventional ameloblastoma

    • data.scielo.org
    jpeg, txt, xlsx
    Updated Dec 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis Fernando Jacinto-Alemán; Luis Fernando Jacinto-Alemán; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez (2022). Microarray and bioinformatic analysis of conventional ameloblastoma [Dataset]. http://doi.org/10.48331/SCIELODATA.Z2S8X9
    Explore at:
    xlsx(10317), jpeg(3415112), xlsx(9969), jpeg(12173968), txt(605), txt(289), txt(3840), xlsx(9964), xlsx(12458), txt(2657), txt(18077), xlsx(10402), jpeg(2313098), txt(406), txt(1023)Available download formats
    Dataset updated
    Dec 20, 2022
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Luis Fernando Jacinto-Alemán; Luis Fernando Jacinto-Alemán; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    National Autonomous University of Mexico
    Description

    Ameloblastoma is a highly aggressive odontogenic tumor, and its pathogenesis is associated with multiple participating genes. Objective: Our aim was to identify and validate new critical genes of conventional ameloblastoma using microarray and bioinformatics analysis. Methods: Gene expression microarray and bioinformatic analysis were performed to use CHIP H10KA and DAVID software for enrichment. Protein-protein interactions (PPI) were visualized using STRING-Cytoscape with MCODE plugin, followed by Kaplan-Meier and GEPIA analysis that were employed for the candidate's postulation. RT-qPCR and IHC assays were performed to validate the bioinformatic approach. Results: 376 upregulated genes were identified. PPI analysis revealed 14 genes that were validated by Kaplan-Meier and GEPIA resulting in PDGFA and IL2RA as candidate genes. The RT-qPCR analysis confirmed their intense expression. Immunohistochemistry analysis showed that PDGFA expression is parenchyma located. Conclusion: With bioinformatics methods, we can identify upregulated genes in conventional ameloblastoma, and with RT-qPCR and immunoexpression analysis validate that PDGFA could be a more specific and localized therapeutic target.

  7. S

    Screening miRNA biomarkers for postmenopausal osteoporosis based on...

    • scidb.cn
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bin.WANG; Shunjie.Wu; Bin.XU; Aiguo.Zhu; Hailei.Chen (2024). Screening miRNA biomarkers for postmenopausal osteoporosis based on bioinformatics methods [Dataset]. http://doi.org/10.57760/sciencedb.j00217.00183
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 4, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Bin.WANG; Shunjie.Wu; Bin.XU; Aiguo.Zhu; Hailei.Chen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Objective Using bioinformatics methods to screen potential miRNAs as biomarkers for postmenopausal osteoporosis (PMO). Methods Obtain the expression profile of PMO peripheral blood miRNA chip through the GEO public database. Firstly, the chip was re-annotated using R language, and then the clinical typing significance of the data was determined using similarity analysis (ANOSIM). Then, weighted gene co expression network analysis (WGCNA), multi-scale embedded gene co expression network analysis (MEGCNA), and nonnegative matrix factorization (NMF) were used to screen miRNAs related to PMO. Finally, the diagnostic efficacy of miRNA was evaluated using ROC curves, the target genes of miRNA were predicted using a database, and functional enrichment of the target genes was performed using Metascape.Results miR-223-3p has a high predictive diagnostic value for PMO. GO and KEGG enrichment analysis was conducted on 34 target genes potentially regulated by miR-223-3p, and the results showed that multiple pathways were associated with bone development.Conclusion miR-223-3p has significance in the diagnosis of PMO and may regulate bone development by regulating downstream target genes.

  8. S

    To explore the co-pathogenesis of obesity and nonalcoholic steatohepatitis...

    • scidb.cn
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siwei.Wang; Jiarui.Li; LIkun.DU (2025). To explore the co-pathogenesis of obesity and nonalcoholic steatohepatitis based on bioinformatics analysis [Dataset]. http://doi.org/10.57760/sciencedb.j00217.05747
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Siwei.Wang; Jiarui.Li; LIkun.DU
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Objective Bioinformatics methods were used to investigate the pathogenesis,disease-characteristic genes and immunoinvasive manifestations of obesity(OB)and nonalcoholic steatohepatitis(NASH), and to explore the correlation between disease-characteristic genes and immune cells.Methods OB and NASH related chips were obtained from GEO database,R language was used to analyze gene differences and WGCNA analysis,GO and KEGG enrichment were analyzed by intersection analysis,and protein-protein interaction network was constructed at the same time.Key genes were selected using 12 cytohubba methods,ROC curve and sample chip were used to detect the accuracy of key genes,and the disease characteristic genes with the best performance were selected.CIBERSORT algorithm was continued to analyze the immune infiltration of OB and NASH,and the correlation between disease characteristic genes and immune cells was analyzed.Results A total of 235 differential genes were obtained in the obesity training group GSE25401 and GSE151839,and 804 differential genes were obtained in the non-alcoholic steatohepatitis training group GSE63067 and GSE89632.GO analysis mainly involved the significant expression of interleukin 8 regulation.KEGG analysis showed that multiple comb inhibition complex and other pathways were closely related to OB and NASH.Key genes IL6,IL1B,IL1RN,VCAN and TNFAIP6 were selected by 12 cytohubba methods.ROC curve and sample chip were used to detect disease characteristic genes,and VCAN and IL1RN had the best effect.Conclusion: OB and NASH characteristic genes VCAN and IL1RN are significantly correlated with immune cells,which provides a preliminary basis for further research on OB and MASH targeted diagnosis and treatment.

  9. d

    (high-temp) No 4. Taxonomic: (16S rRNA/ITS) Output

    • search.dataone.org
    • dataone.org
    • +1more
    Updated Aug 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jarrod Scott (2024). (high-temp) No 4. Taxonomic: (16S rRNA/ITS) Output [Dataset]. https://search.dataone.org/view/urn%3Auuid%3A2f7f52c8-0273-40a9-99dd-ebeb9d5239dd
    Explore at:
    Dataset updated
    Aug 16, 2024
    Dataset provided by
    Smithsonian Research Data Repository
    Authors
    Jarrod Scott
    Description

    Output files from the No 4. Taxonomic Workflow page of the SWELTR high- temp study. In this workflow we used the microeco package for taxonomic assessment. We first converted each phyloseq object into a microtable object using the file2meco package.

    taxa_wf.rdata : contains all variables and phyloseq objects from 16s rRNA and ITS ASV taxonomic assessment. To see the Objects, in R run _load("taxa_wf.rdata", verbose=TRUE)_

    Additional files:

    For convenience, we also include individual phyloseq and microtable objects (collected in zip files).

    I** _TS (its_taxa_objects.zip)_ :**
    its18_ps_work_me.rds : microtable object for the FULL (unfiltered) ITS data.
    its18_ps_filt_me.rds : microtable object for the Arbitrary filtered ITS data.
    its18_ps_perfect_me.rds : microtable object for the PERfect ITS data.
    its18_ps_pime_me.rds : microtable object for the PIME ITS data.

    _**16S rRNA (ssu_taxa_objects.zip):**_
    ssu18_ps_work_me.rds : microtable object for the FULL (unfiltered) 16S rRNA data.
    ssu18_ps_filt_me.rds : microtable object for the Arbitrary filtered 16S rRNA data.
    ssu18_ps_perfect_me.rds : microtable object for the PERfect 16S rRNA data.
    ssu18_ps_pime_me.rds : microtable object for the PIME 16S rRNA data.

    For one of the 16S rRNA analyses we looked at family-level diversity of major bacterial phyla. For this analysis, we renamed NA ranks by the next highest named rank. For example, ASV13884 was unclassifed at family level, so the NA was replaced with the next highest named rank (in this case order). Therefore the family-level classification for this ASV was changed to _o_Polyangiales_. Doing this allowed us to include uncalssifed abundance in our analyses. We include the following phyloseq objects containing the modifed taxonomies.

    ssu18_ps_work_clean.rds : modified phyloseq object for the FULL (unfiltered) 16S rRNA data.
    ssu18_ps_filt_clean.rds : modified phyloseq object for the Arbitrary filtered 16S rRNA data.
    ssu18_ps_perfect_clean.rds : modified phyloseq object for the PERfect filtered 16S rRNA data.
    ssu18_ps_pime_clean.rds : modified phyloseq object for the PIME filtered 16S rRNA data.

    Source code for the workflow can be found here:
    https://github.com/sweltr/high-temp/blob/master/taxa.Rmd

  10. S

    Bioinformatics Analysis, Prokaryotic Expression, and Antiserum Preparation...

    • scidb.cn
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiarong.Liu; Yihao.Wang; Lingbao.Kong (2025). Bioinformatics Analysis, Prokaryotic Expression, and Antiserum Preparation of hnRNP A1 Protein [Dataset]. http://doi.org/10.57760/sciencedb.j00217.07379
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Jiarong.Liu; Yihao.Wang; Lingbao.Kong
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Objective This study aims to construct a recombinant expression plasmid using methods such as PCR and double enzyme digestion. By conducting single-factor experiments to adjust IPTG concentration, expression time, and expression temperature, the research seeks to optimize the expression conditions of the hnRNP A1 protein in BL21 competent cells. The goal is to obtain high-concentration, high-quality purified protein and to prepare high-titer polyclonal antibodies against hnRNP A1.Methods Bioinformatics tools were used to analyze the physicochemical properties and structure of hnRNP A1. The pET-28a-hnRNP A1 recombinant plasmid was constructed and transformed into BL21 cells. After optimizing expression conditions, hnRNP A1 protein was purified using nickel column chromatography and identified by Western Blot. The purified protein was used to immunize C57BL/6 mice to produce polyclonal antibodies, and the antibody titer was determined by indirect ELISA.Results The highest expression of hnRNP A1 was achieved under conditions of 0.4 mM IPTG, induction temperature of 42°C, and induction time of 8 hours. The purified protein concentration reached 2.0563 μg/μl, and Western Blot confirmed the target protein. The antibody titer detected by indirect ELISA was 1:409,600.Conclusion The physicochemical properties of hnRNP A1 were successfully analyzed, high-efficiency expression of hnRNP A1 protein was achieved, and high-titer mouse-derived polyclonal antibodies against hnRNP A1 were prepared, providing a valuable tool for further research.

  11. Novel and ultra-rare damaging variants in neuropeptide signaling are...

    • plos.figshare.com
    xlsx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Lutter; Ethan Bahl; Claire Hannah; Dabney Hofammann; Summer Acevedo; Huxing Cui; Carrie J. McAdams; Jacob J. Michaelson (2023). Novel and ultra-rare damaging variants in neuropeptide signaling are associated with disordered eating behaviors [Dataset]. http://doi.org/10.1371/journal.pone.0181556
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Michael Lutter; Ethan Bahl; Claire Hannah; Dabney Hofammann; Summer Acevedo; Huxing Cui; Carrie J. McAdams; Jacob J. Michaelson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ObjectiveEating disorders develop through a combination of genetic vulnerability and environmental stress, however the genetic basis of this risk is unknown.MethodsTo understand the genetic basis of this risk, we performed whole exome sequencing on 93 unrelated individuals with eating disorders (38 restricted-eating and 55 binge-eating) to identify novel damaging variants. Candidate genes with an excessive burden of predicted damaging variants were then prioritized based upon an unbiased, data-driven bioinformatic analysis. One top candidate pathway was empirically tested for therapeutic potential in a mouse model of binge-like eating.ResultsAn excessive burden of novel damaging variants was identified in 186 genes in the restricted-eating group and 245 genes in the binge-eating group. This list is significantly enriched (OR = 4.6, p

  12. m

    CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object)

    • data.mendeley.com
    • data.niaid.nih.gov
    • +3more
    Updated Dec 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farah Zaib Khan (2018). CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object) [Dataset]. http://doi.org/10.17632/xnwncxpw42.1
    Explore at:
    Dataset updated
    Dec 4, 2018
    Authors
    Farah Zaib Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This workflow adapts the approach and parameter settings of Trans-Omics for precision Medicine (TOPMed). The RNA-seq pipeline originated from the Broad Institute. There are in total five steps in the workflow starting from:

    1. Read alignment using STAR which produces aligned BAM files including the Genome BAM and Transcriptome BAM.
    2. The Genome BAM file is processed using Picard MarkDuplicates producing an updated BAM file containing information on duplicate reads (such reads can indicate biased interpretation).
    3. SAMtools index is then employed to generate an index for the BAM file, in preparation for the next step.
    4. The indexed BAM file is processed further with RNA-SeQC which takes the BAM file, human genome reference sequence and Gene Transfer Format (GTF) file as inputs to generate transcriptome-level expression quantifications and standard quality control metrics.
    5. In parallel with transcript quantification, isoform expression levels are quantified by RSEM. This step depends only on the output of the STAR tool, and additional RSEM reference sequences.

    For testing and analysis, the workflow author provided example data created by down-sampling the read files of a TOPMed public access data. Chromosome 12 was extracted from the Homo Sapien Assembly 38 reference sequence and provided by the workflow authors. The required GTF and RSEM reference data files are also provided. The workflow is well-documented with a detailed set of instructions of the steps performed to down-sample the data are also provided for transparency. The availability of example input data, use of containerization for underlying software and detailed documentation are important factors in choosing this specific CWL workflow for CWLProv evaluation.

    This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwl

  13. CWL run of Alignment Workflow (CWLProv 0.6.0 Research Object)

    • zenodo.org
    • data-staging.niaid.nih.gov
    • +3more
    bin, zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farah Zaib Khan; Farah Zaib Khan; Stian Soiland-Reyes; Stian Soiland-Reyes (2020). CWL run of Alignment Workflow (CWLProv 0.6.0 Research Object) [Dataset]. http://doi.org/10.17632/6wtpgr3kbj.1
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Farah Zaib Khan; Farah Zaib Khan; Stian Soiland-Reyes; Stian Soiland-Reyes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see CWLProv 0.6.0 or use the cwlprov Python tool to explore.

    The CWL alignment workflow included in this case study is designed by Data Biosphere. It adapts the alignment pipeline originally developed at Abecasis Lab, The University of Michigan. This workflow is part of NIH Data Commons initiative and comprises of four stages.

    First step, Pre-align, accepts a Compressed Alignment Map (CRAM) file (a compressed format for BAM files developed by European Bioinformatics Institute (EBI)) and human genome reference sequence as input and using underlying software utilities of SAMtools such as view, sort and fixmate returns a list of fastq files which can be used as input for the next step.

    The next step Align also accepts the human reference genome as input along with the output files from Pre-align and uses BWA-mem to generate aligned reads as BAM files. SAMBLASTER is used to mark duplicate reads and SAMtools view to convert read files from SAM to BAM format.

    The BAM files generated after lign are sorted with SAMtool sort'.

    Finally, these sorted alignment files are merged to produce single sorted BAM file using SAMtools merge in Post-align step.

    Steps to reproduce

    This analysis was run using a 16-core Linux cloud instance with 64GB RAM and pre-installed docker.

    1. Install gsutils

      export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)"
      
      echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | \
       sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
      
      curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | \
       sudo apt-key add -
      
      sudo apt-get update && sudo apt-get install google-cloud-sdk

    2. Get the data and make the analysis environment ready:

      git clone https://github.com/FarahZKhan/topmed-workflows.git
      cd topmed-workflows
      git checkout cwlprov_testing
      cd aligner/sbg-alignment-cwl
      
      # this is a custom script download google bucket files from json files and create a local json
      # it needs gsutil to be installed though
      git clone https://github.com/DailyDreaming/fetch_gs_frm_json.git
      
      # Wait... this should download ~18Gb.
      python2.7 fetch_gs_frm_json/dl_gsfiles_frm_json.py topmed-alignment.sample.json
      

    3. Run the following commands to create the CWLProv Research Object:

      time cwltool --no-match-user --provenance alignmnentwf0.6.0 --tmp-outdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp --tmpdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp topmed-alignment.cwl topmed-alignment.sample.json.new
      
      zip -r alignment_0.6.0_linux.zip alignment_0.6.0_linux
      
      sha256sum alignment_0.6.0_linux.zip > alignment_0.6.0_linux.zip.sha25

  14. P

    Protein Sequencing Market Report

    • promarketreports.com
    doc, pdf, ppt
    Updated Jul 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pro Market Reports (2025). Protein Sequencing Market Report [Dataset]. https://www.promarketreports.com/reports/protein-sequencing-market-6683
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Jul 27, 2025
    Dataset authored and provided by
    Pro Market Reports
    License

    https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Reagents and Consumables: This segment includes consumables such as enzymes, buffers, and columns.Instruments: This segment includes instruments such as sequencers, mass spectrometers, and chromatographs.Others: This segment includes services such as sample preparation and data analysis. Recent developments include: BSI's objective is to make a positive influence in proteomic research, particularly by offering professionally supported software. Bioinformatics Solutions Inc. creates powerful algorithms based on cutting-edge research to solve basic bioinformatics difficulties. This small, agile team is dedicated to meeting the demands of pharmaceutical, biotechnological, and academic scientists, as well as advancing drug discovery research. The firm, started in 2000 in Waterloo, Canada, is comprised of a bright, award-winning, and clever crew of developers, scientists, and salespeople., Charles River Laboratories International, Inc. offers drug discovery and development solutions such as research models and related services, as well as outsourced preclinical services. Segments the business is divided into two divisions: The firm produces and sells research models, mostly genetically and virally specified purpose-bred rats and mice, with roughly 150 distinct strains. It also offers a variety of complementary services to help clients support the usage of research models in medication development., Intended Audience. Notable trends are: Increased use of digital manufacturing processes to propel market growth.

  15. b

    Biocompute Object

    • bioregistry.io
    Updated Apr 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Biocompute Object [Dataset]. https://bioregistry.io/biocompute
    Explore at:
    Dataset updated
    Apr 5, 2024
    Description

    BioCompute is shorthand for the IEEE 2791-2020 standard for Bioinformatics Analyses Generated by High-Throughput Sequencing (HTS) to facilitate communication. This pipeline documentation approach has been adopted by a few FDA centers. The goal is to ease the communication burdens between research centers, organizations, and industries. This web portal allows users to build a BioCompute Objects through the interface in a human and machine readable format.

  16. d

    (high-temp) No 5. Aplha diversity (16S rRNA/ITS) Output

    • search.dataone.org
    • smithsonian.figshare.com
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jarrod Scott (2024). (high-temp) No 5. Aplha diversity (16S rRNA/ITS) Output [Dataset]. https://search.dataone.org/view/urn:uuid:044e517a-de37-4aed-9dcf-6ec98ebd8eaa
    Explore at:
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    Smithsonian Research Data Repository
    Authors
    Jarrod Scott
    Description

    Output files from the No 5. Aplha diversity Workflow page of the SWELTR high-temp study. In this workflow we used Hill numbers to assess alpha diversity across temperature treatments.

    alpha_wf.rdata : contains all variables and phyloseq objects from 16s rRNA and ITS ASV alpha diversity assessment. To see the Objects, in R run load("alpha_wf.rdata", verbose=TRUE)

    Additional files:

    For convenience, we also include individual phyloseq objects (collected in zip files) where Hill numbers have been added to the sample data tables.

    _**ITS (its_alpha_objects.zip)**_ :
    its18_ps_work.rds : phyloseq object for the FULL (unfiltered) ITS data.
    its18_ps_filt.rds : phyloseq object for the Arbitrary filtered ITS data.
    its18_ps_perfect.rds : phyloseq object for the PERfect ITS data.
    its18_ps_pime.rds : phyloseq object for the PIME ITS data.

    _16S rRNA (ssu_alpha_objects.zip)_ :
    ssu18_ps_work.rds : phyloseq object for the FULL (unfiltered) 16S rRNA data.
    ssu18_ps_filt.rds : phyloseq object for the Arbitrary filtered 16S rRNA data.
    ssu18_ps_perfect.rds : phyloseq object for the PERfect 16S rRNA data.
    ssu18_ps_pime.rds : phyloseq object for the PIME 16S rRNA data.

    Source code for the workflow can be found here:
    https://github.com/sweltr/high-temp/blob/master/alpha.Rmd

  17. n

    Data from: Knowledge-based prediction of protein backbone conformation using...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Oct 23, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iyanar Vetrivel; Swapnil Mahajan; Manoj Tyagi; Lionel Hoffmann; Yves-Henri Sanejouand; Narayanaswamy Srinivasan; Alexandre de Brevern; Frédéric Cadet; Bernard Offmann (2018). Knowledge-based prediction of protein backbone conformation using a structural alphabet [Dataset]. http://doi.org/10.5061/dryad.3f5q5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 23, 2018
    Dataset provided by
    University of Reunion Island
    Nantes Université
    Authors
    Iyanar Vetrivel; Swapnil Mahajan; Manoj Tyagi; Lionel Hoffmann; Yves-Henri Sanejouand; Narayanaswamy Srinivasan; Alexandre de Brevern; Frédéric Cadet; Bernard Offmann
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlights the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.

  18. Data from: Defining objective clusters for rabies virus sequences using...

    • zenodo.org
    • openagrar.de
    • +1more
    csv, pdf, tiff
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Susanne Fischer; Susanne Fischer; Conrad M. Freuling; Thomas Müller; Florian Pfaff; Ulrich Bodenhofer; Dirk Höper; Mareike Fischer; Denise A. Marston; Anthony R. Fooks; Thomas C. Mettenleiter; Franz J. Conraths; Timo Homeier-Bachmann; Conrad M. Freuling; Thomas Müller; Florian Pfaff; Ulrich Bodenhofer; Dirk Höper; Mareike Fischer; Denise A. Marston; Anthony R. Fooks; Thomas C. Mettenleiter; Franz J. Conraths; Timo Homeier-Bachmann (2024). Defining objective clusters for rabies virus sequences using affinity propagation clustering [Dataset]. http://doi.org/10.5281/zenodo.7115116
    Explore at:
    csv, pdf, tiffAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Susanne Fischer; Susanne Fischer; Conrad M. Freuling; Thomas Müller; Florian Pfaff; Ulrich Bodenhofer; Dirk Höper; Mareike Fischer; Denise A. Marston; Anthony R. Fooks; Thomas C. Mettenleiter; Franz J. Conraths; Timo Homeier-Bachmann; Conrad M. Freuling; Thomas Müller; Florian Pfaff; Ulrich Bodenhofer; Dirk Höper; Mareike Fischer; Denise A. Marston; Anthony R. Fooks; Thomas C. Mettenleiter; Franz J. Conraths; Timo Homeier-Bachmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses.
  19. f

    Supplementary Material for: Screening a Prognosis-Related Target Gene in...

    • datasetcatalog.nlm.nih.gov
    • karger.figshare.com
    Updated Jun 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S. , Wang; Y. , Quan; X. , Gao; J. , Deng; H. , Lv (2021). Supplementary Material for: Screening a Prognosis-Related Target Gene in Patients with HER-2-Positive Breast Cancer by Bioinformatics Analysis [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000901575
    Explore at:
    Dataset updated
    Jun 16, 2021
    Authors
    S. , Wang; Y. , Quan; X. , Gao; J. , Deng; H. , Lv
    Description

    Objective: The objective of the present study was to determine a target gene and explore the molecular mechanisms involved in the pathogenesis of HER-2-positive breast cancer. Methods: Three RNA expression profiles obtained from the Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA) were used to identify differentially expressed genes (DEGs) using the R software. A protein-protein interaction network was then constructed, and hub genes were determined. Subsequently, the relationship between clinical parameters and hub genes was examined to screen for target genes. Next, DNA methylation and genomic alterations of the target gene were evaluated. To further explore potential molecular mechanisms, a functional enrichment analysis of genes coexpressed with the target gene was performed. Results: The differential expression analysis revealed 217 DEGs in HER-2-positive breast cancer samples compared to normal breast tissues. RRM2 was the only hub gene closely associated with lymphatic metastasis and the patients’ prognosis. Additionally, RRM2 was found to be consistently amplified and negatively associated with the level of methylation. Functional enrichment analysis showed that the coexpressed genes were mainly involved in cell cycle regulation. Conclusions: RRM2 was identified as a target gene associated with the initiation, progression, and prognosis of HER-2-positive breast cancer, which may be considered as a new biomarker and therapeutic target.

  20. f

    Table2_Novel insights into the progression and prognosis of the calpain...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Jul 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tian, Zhifeng; Hu, Hanguang; Dai, Dongjun; Shui, Yongjie; Wu, Dehao; Wei, Qichun; Li, Ping; Ni, Runliang (2023). Table2_Novel insights into the progression and prognosis of the calpain family members in hepatocellular carcinoma: a comprehensive integrated analysis.XLSX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000968926
    Explore at:
    Dataset updated
    Jul 12, 2023
    Authors
    Tian, Zhifeng; Hu, Hanguang; Dai, Dongjun; Shui, Yongjie; Wu, Dehao; Wei, Qichun; Li, Ping; Ni, Runliang
    Description

    Objectives: The goal of our bioinformatics study was to comprehensively analyze the association between the whole calpain family members and the progression and prognosis of hepatocellular carcinoma (HCC).Methods: The data were collected from The Cancer Genome Atlas (TCGA). The landscape of the gene expression, copy number variation (CNV), mutation, and DNA methylation of calpain members were analyzed. Clustering analysis was performed to stratify the calpain-related groups. The least absolute shrinkage and selection operator (LASSO)-based Cox model was used to select hub survival genes.Results: We found 14 out of 16 calpain members expressed differently between tumor and normal tissues of HCC. The clustering analyses revealed high- and low-risk calpain groups which had prognostic difference. We found the high-risk calpain group had higher B cell infiltration and higher expression of immune checkpoint genes HAVCR2, PDCD1, and TIGHT. The CMap analysis found that the histone deacetylase (HDAC) inhibitor trichostatin A and the PI3K-AKT-mTOR pathway inhibitors LY-294002 and wortmannin might have a therapeutic effect on the high-risk calpain group. The DEGs between calpain groups were identified. Subsequent univariate Cox analysis of each DEG and LASSO-based Cox model obtained a calpain-related prognostic signature. The risk score model of this signature showed good ability to predict the overall survival of HCC patients in TCGA datasets and external validation datasets from the Gene Expression Omnibus database and the International Cancer Genome Consortium database.Conclusion: We found that calpain family members were associated with the progression, prognosis, and drug response of HCC. Our results require further studies to confirm.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gerda Cristal Villalba; Ursula Matte (2023). Fantastic databases and where to find them: Web applications for researchers in a rush [Dataset]. http://doi.org/10.6084/m9.figshare.20018091.v1
Organization logo

Fantastic databases and where to find them: Web applications for researchers in a rush

Related Article
Explore at:
jpegAvailable download formats
Dataset updated
Jun 3, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Gerda Cristal Villalba; Ursula Matte
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract Public databases are essential to the development of multi-omics resources. The amount of data created by biological technologies needs a systematic and organized form of storage, that can quickly be accessed, and managed. This is the objective of a biological database. Here, we present an overview of human databases with web applications. The databases and tools allow the search of biological sequences, genes and genomes, gene expression patterns, epigenetic variation, protein-protein interactions, variant frequency, regulatory elements, and comparative analysis between human and model organisms. Our goal is to provide an opportunity for exploring large datasets and analyzing the data for users with little or no programming skills. Public user-friendly web-based databases facilitate data mining and the search for information applicable to healthcare professionals. Besides, biological databases are essential to improve biomedical search sensitivity and efficiency and merge multiple datasets needed to share data and build global initiatives for the diagnosis, prognosis, and discovery of new treatments for genetic diseases. To show the databases at work, we present a a case study using ACE2 as example of a gene to be investigated. The analysis and the complete list of databases is available in the following website .

Search
Clear search
Close search
Google apps
Main menu