100+ datasets found

Fantastic databases and where to find them: Web applications for researchers...
scielo.figshare.com
jpeg
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gerda Cristal Villalba; Ursula Matte (2023). Fantastic databases and where to find them: Web applications for researchers in a rush [Dataset]. http://doi.org/10.6084/m9.figshare.20018091.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20018091.v1
Dataset updated
Jun 3, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Gerda Cristal Villalba; Ursula Matte
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract Public databases are essential to the development of multi-omics resources. The amount of data created by biological technologies needs a systematic and organized form of storage, that can quickly be accessed, and managed. This is the objective of a biological database. Here, we present an overview of human databases with web applications. The databases and tools allow the search of biological sequences, genes and genomes, gene expression patterns, epigenetic variation, protein-protein interactions, variant frequency, regulatory elements, and comparative analysis between human and model organisms. Our goal is to provide an opportunity for exploring large datasets and analyzing the data for users with little or no programming skills. Public user-friendly web-based databases facilitate data mining and the search for information applicable to healthcare professionals. Besides, biological databases are essential to improve biomedical search sensitivity and efficiency and merge multiple datasets needed to share data and build global initiatives for the diagnosis, prognosis, and discovery of new treatments for genetic diseases. To show the databases at work, we present a a case study using ACE2 as example of a gene to be investigated. The analysis and the complete list of databases is available in the following website .
f
Data from: hfAIM: A reliable bioinformatics approach for in silico...
tandf.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qingjun Xie; Oren Tzfadia; Matan Levy; Efrat Weithorn; Hadas Peled-Zehavi; Thomas Van Parys; Yves Van de Peer; Gad Galili (2023). hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms [Dataset]. http://doi.org/10.6084/m9.figshare.3172519
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3172519
Dataset updated
May 31, 2023
Dataset provided by
Taylor & Francis
Authors
Qingjun Xie; Oren Tzfadia; Matan Levy; Efrat Weithorn; Hadas Peled-Zehavi; Thomas Van Parys; Yves Van de Peer; Gad Galili
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Most of the proteins that are specifically turned over by selective autophagy are recognized by the presence of short Atg8 interacting motifs (AIMs) that facilitate their association with the autophagy apparatus. Such AIMs can be identified by bioinformatics methods based on their defined degenerate consensus F/W/Y-X-X-L/I/V sequences in which X represents any amino acid. Achieving reliability and/or fidelity of the prediction of such AIMs on a genome-wide scale represents a major challenge. Here, we present a bioinformatics approach, high fidelity AIM (hfAIM), which uses additional sequence requirements—the presence of acidic amino acids and the absence of positively charged amino acids in certain positions—to reliably identify AIMs in proteins. We demonstrate that the use of the hfAIM method allows for in silico high fidelity prediction of AIMs in AIM-containing proteins (ACPs) on a genome-wide scale in various organisms. Furthermore, by using hfAIM to identify putative AIMs in the Arabidopsis proteome, we illustrate a potential contribution of selective autophagy to various biological processes. More specifically, we identified 9 peroxisomal PEX proteins that contain hfAIM motifs, among which AtPEX1, AtPEX6 and AtPEX10 possess evolutionary-conserved AIMs. Bimolecular fluorescence complementation (BiFC) results verified that AtPEX6 and AtPEX10 indeed interact with Atg8 in planta. In addition, we show that mutations occurring within or nearby hfAIMs in PEX1, PEX6 and PEX10 caused defects in the growth and development of various organisms. Taken together, the above results suggest that the hfAIM tool can be used to effectively perform genome-wide in silico screens of proteins that are potentially regulated by selective autophagy. The hfAIM system is a web tool that can be accessed at link: http://bioinformatics.psb.ugent.be/hfAIM/.
TCR-MHC Germline Interaction Scores Generated Using AIMS
zenodo.org
zip
Updated Aug 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher T. Boughter; Christopher T. Boughter (2022). TCR-MHC Germline Interaction Scores Generated Using AIMS [Dataset]. http://doi.org/10.5281/zenodo.7023681
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7023681
Dataset updated
Aug 28, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christopher T. Boughter; Christopher T. Boughter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These data were generated using the AIMS interaction scoring function as outlined in the manuscript "A Systematic Characterization of Germline-Encoded Contacts Identifies the Source of Bias in TCR-MHC Interactions". They accompany the AIMS version 0.7 software available on GitHub: https://github.com/ctboughter/AIMS . These files are meant to be loaded into the mhc_germline_analysis.ipynb file, but are too large to be included on the GitHub page itself.
t
BIOGRID CURATED DATA FOR PUBLICATION: hfAIM: A reliable bioinformatics...
thebiogrid.org
zip
Updated Feb 1, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BioGRID Project (2016). BIOGRID CURATED DATA FOR PUBLICATION: hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms. [Dataset]. https://thebiogrid.org/199728/publication/hfaim-a-reliable-bioinformatics-approach-for-in-silico-genome-wide-identification-of-autophagy-associated-atg8-interacting-motifs-in-various-organisms.html
Explore at:
zipAvailable download formats
Dataset updated
Feb 1, 2016
Dataset authored and provided by
BioGRID Project
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Protein-Protein, Genetic, and Chemical Interactions for Xie Q (2016):hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms. curated by BioGRID (https://thebiogrid.org); ABSTRACT: Most of the proteins that are specifically turned over by selective autophagy are recognized by the presence of short Atg8 interacting motifs (AIMs) that facilitate their association with the autophagy apparatus. Such AIMs can be identified by bioinformatics methods based on their defined degenerate consensus F/W/Y-X-X-L/I/V sequences in which X represents any amino acid. Achieving reliability and/or fidelity of the prediction of such AIMs on a genome-wide scale represents a major challenge. Here, we present a bioinformatics approach, high fidelity AIM (hfAIM), which uses additional sequence requirements-the presence of acidic amino acids and the absence of positively charged amino acids in certain positions-to reliably identify AIMs in proteins. We demonstrate that the use of the hfAIM method allows for in silico high fidelity prediction of AIMs in AIM-containing proteins (ACPs) on a genome-wide scale in various organisms. Furthermore, by using hfAIM to identify putative AIMs in the Arabidopsis proteome, we illustrate a potential contribution of selective autophagy to various biological processes. More specifically, we identified 9 peroxisomal PEX proteins that contain hfAIM motifs, among which AtPEX1, AtPEX6 and AtPEX10 possess evolutionary-conserved AIMs. Bimolecular fluorescence complementation (BiFC) results verified that AtPEX6 and AtPEX10 indeed interact with Atg8 in planta. In addition, we show that mutations occurring within or nearby hfAIMs in PEX1, PEX6 and PEX10 caused defects in the growth and development of various organisms. Taken together, the above results suggest that the hfAIM tool can be used to effectively perform genome-wide in silico screens of proteins that are potentially regulated by selective autophagy. The hfAIM system is a web tool that can be accessed at link: http://bioinformatics.psb.ugent.be/hfAIM/.
d
Data from: Semi-artificial datasets as a resource for validation of...
search.dataone.org
datadryad.org
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucie Tamisier; Annelies Haegeman; Yoika Foucart; Nicolas Fouillien; Maher Al Rwahnih; Nihal Buzkan; Thierry Candresse; Michela Chiumenti; Kris De Jonghe; Marie Lefebvre; Paolo Margaria; Jean SÃ©bastien Reynard; Kristian Stevens; Denis Kutnjak; SÃ©bastien Massart (2025). Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection [Dataset]. http://doi.org/10.5061/dryad.0zpc866z8
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.0zpc866z8
Dataset updated
May 21, 2025
Dataset provided by
Dryad Digital Repository
Authors
Lucie Tamisier; Annelies Haegeman; Yoika Foucart; Nicolas Fouillien; Maher Al Rwahnih; Nihal Buzkan; Thierry Candresse; Michela Chiumenti; Kris De Jonghe; Marie Lefebvre; Paolo Margaria; Jean SÃ©bastien Reynard; Kristian Stevens; Denis Kutnjak; SÃ©bastien Massart
Time period covered
Jan 1, 2021
Description
In the last decade, High-Throughput Sequencing (HTS) has revolutionized biology and medicine. This technology allows the sequencing of huge amount of DNA and RNA fragments at a very low price. In medicine, HTS tests for disease diagnostics are already brought into routine practice. However, the adoption in plant health diagnostics is still limited. One of the main bottlenecks is the lack of expertise and consensus on the standardization of the data analysis. The Plant Health Bioinformatic Network (PHBN) is an Euphresco project aiming to build a community network of bioinformaticians/computational biologists working in plant health. One of the main goals of the project is to develop reference datasets that can be used for validation of bioinformatics pipelines and for standardization purposes.

Semi-artificial datasets have been created for this purpose (Datasets 1 to 10). They are composed of a â€œrealâ€ HTS dataset spiked with artificial viral reads. It will allow researchers to adjust ...
Microarray and bioinformatic analysis of conventional ameloblastoma
data.scielo.org
jpeg, txt, xlsx
Updated Dec 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luis Fernando Jacinto-Alemán; Luis Fernando Jacinto-Alemán; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez (2022). Microarray and bioinformatic analysis of conventional ameloblastoma [Dataset]. http://doi.org/10.48331/SCIELODATA.Z2S8X9
Explore at:
xlsx(10317), jpeg(3415112), xlsx(9969), jpeg(12173968), txt(605), txt(289), txt(3840), xlsx(9964), xlsx(12458), txt(2657), txt(18077), xlsx(10402), jpeg(2313098), txt(406), txt(1023)Available download formats
Unique identifier
https://doi.org/10.48331/SCIELODATA.Z2S8X9
Dataset updated
Dec 20, 2022
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Luis Fernando Jacinto-Alemán; Luis Fernando Jacinto-Alemán; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
National Autonomous University of Mexico
Description
Ameloblastoma is a highly aggressive odontogenic tumor, and its pathogenesis is associated with multiple participating genes. Objective: Our aim was to identify and validate new critical genes of conventional ameloblastoma using microarray and bioinformatics analysis. Methods: Gene expression microarray and bioinformatic analysis were performed to use CHIP H10KA and DAVID software for enrichment. Protein-protein interactions (PPI) were visualized using STRING-Cytoscape with MCODE plugin, followed by Kaplan-Meier and GEPIA analysis that were employed for the candidate's postulation. RT-qPCR and IHC assays were performed to validate the bioinformatic approach. Results: 376 upregulated genes were identified. PPI analysis revealed 14 genes that were validated by Kaplan-Meier and GEPIA resulting in PDGFA and IL2RA as candidate genes. The RT-qPCR analysis confirmed their intense expression. Immunohistochemistry analysis showed that PDGFA expression is parenchyma located. Conclusion: With bioinformatics methods, we can identify upregulated genes in conventional ameloblastoma, and with RT-qPCR and immunoexpression analysis validate that PDGFA could be a more specific and localized therapeutic target.
S
Screening miRNA biomarkers for postmenopausal osteoporosis based on...
scidb.cn
Updated Nov 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bin.WANG; Shunjie.Wu; Bin.XU; Aiguo.Zhu; Hailei.Chen (2024). Screening miRNA biomarkers for postmenopausal osteoporosis based on bioinformatics methods [Dataset]. http://doi.org/10.57760/sciencedb.j00217.00183
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.j00217.00183
Dataset updated
Nov 4, 2024
Dataset provided by
Science Data Bank
Authors
Bin.WANG; Shunjie.Wu; Bin.XU; Aiguo.Zhu; Hailei.Chen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Objective Using bioinformatics methods to screen potential miRNAs as biomarkers for postmenopausal osteoporosis (PMO). Methods Obtain the expression profile of PMO peripheral blood miRNA chip through the GEO public database. Firstly, the chip was re-annotated using R language, and then the clinical typing significance of the data was determined using similarity analysis (ANOSIM). Then, weighted gene co expression network analysis (WGCNA), multi-scale embedded gene co expression network analysis (MEGCNA), and nonnegative matrix factorization (NMF) were used to screen miRNAs related to PMO. Finally, the diagnostic efficacy of miRNA was evaluated using ROC curves, the target genes of miRNA were predicted using a database, and functional enrichment of the target genes was performed using Metascape.Results miR-223-3p has a high predictive diagnostic value for PMO. GO and KEGG enrichment analysis was conducted on 34 target genes potentially regulated by miR-223-3p, and the results showed that multiple pathways were associated with bone development.Conclusion miR-223-3p has significance in the diagnosis of PMO and may regulate bone development by regulating downstream target genes.
S
To explore the co-pathogenesis of obesity and nonalcoholic steatohepatitis...
scidb.cn
Updated Feb 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siwei.Wang; Jiarui.Li; LIkun.DU (2025). To explore the co-pathogenesis of obesity and nonalcoholic steatohepatitis based on bioinformatics analysis [Dataset]. http://doi.org/10.57760/sciencedb.j00217.05747
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.j00217.05747
Dataset updated
Feb 4, 2025
Dataset provided by
Science Data Bank
Authors
Siwei.Wang; Jiarui.Li; LIkun.DU
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Objective Bioinformatics methods were used to investigate the pathogenesis,disease-characteristic genes and immunoinvasive manifestations of obesity(OB)and nonalcoholic steatohepatitis(NASH), and to explore the correlation between disease-characteristic genes and immune cells.Methods OB and NASH related chips were obtained from GEO database,R language was used to analyze gene differences and WGCNA analysis,GO and KEGG enrichment were analyzed by intersection analysis,and protein-protein interaction network was constructed at the same time.Key genes were selected using 12 cytohubba methods,ROC curve and sample chip were used to detect the accuracy of key genes,and the disease characteristic genes with the best performance were selected.CIBERSORT algorithm was continued to analyze the immune infiltration of OB and NASH,and the correlation between disease characteristic genes and immune cells was analyzed.Results A total of 235 differential genes were obtained in the obesity training group GSE25401 and GSE151839,and 804 differential genes were obtained in the non-alcoholic steatohepatitis training group GSE63067 and GSE89632.GO analysis mainly involved the significant expression of interleukin 8 regulation.KEGG analysis showed that multiple comb inhibition complex and other pathways were closely related to OB and NASH.Key genes IL6,IL1B,IL1RN,VCAN and TNFAIP6 were selected by 12 cytohubba methods.ROC curve and sample chip were used to detect disease characteristic genes,and VCAN and IL1RN had the best effect.Conclusion: OB and NASH characteristic genes VCAN and IL1RN are significantly correlated with immune cells,which provides a preliminary basis for further research on OB and MASH targeted diagnosis and treatment.
d
(high-temp) No 4. Taxonomic: (16S rRNA/ITS) Output
search.dataone.org
dataone.org
+1more
Updated Aug 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jarrod Scott (2024). (high-temp) No 4. Taxonomic: (16S rRNA/ITS) Output [Dataset]. https://search.dataone.org/view/urn%3Auuid%3A2f7f52c8-0273-40a9-99dd-ebeb9d5239dd
Explore at:
Dataset updated
Aug 16, 2024
Dataset provided by
Smithsonian Research Data Repository
Authors
Jarrod Scott
Description
Output files from the No 4. Taxonomic Workflow page of the SWELTR high- temp study. In this workflow we used the microeco package for taxonomic assessment. We first converted each phyloseq object into a microtable object using the file2meco package.

taxa_wf.rdata : contains all variables and phyloseq objects from 16s rRNA and ITS ASV taxonomic assessment. To see the Objects, in R run _load("taxa_wf.rdata", verbose=TRUE)_

Additional files:

For convenience, we also include individual phyloseq and microtable objects (collected in zip files).

I** _TS (its_taxa_objects.zip)_ :**
its18_ps_work_me.rds : microtable object for the FULL (unfiltered) ITS data.
its18_ps_filt_me.rds : microtable object for the Arbitrary filtered ITS data.
its18_ps_perfect_me.rds : microtable object for the PERfect ITS data.
its18_ps_pime_me.rds : microtable object for the PIME ITS data.

_**16S rRNA (ssu_taxa_objects.zip):**_
ssu18_ps_work_me.rds : microtable object for the FULL (unfiltered) 16S rRNA data.
ssu18_ps_filt_me.rds : microtable object for the Arbitrary filtered 16S rRNA data.
ssu18_ps_perfect_me.rds : microtable object for the PERfect 16S rRNA data.
ssu18_ps_pime_me.rds : microtable object for the PIME 16S rRNA data.

For one of the 16S rRNA analyses we looked at family-level diversity of major bacterial phyla. For this analysis, we renamed NA ranks by the next highest named rank. For example, ASV13884 was unclassifed at family level, so the NA was replaced with the next highest named rank (in this case order). Therefore the family-level classification for this ASV was changed to _o_Polyangiales_. Doing this allowed us to include uncalssifed abundance in our analyses. We include the following phyloseq objects containing the modifed taxonomies.

ssu18_ps_work_clean.rds : modified phyloseq object for the FULL (unfiltered) 16S rRNA data.
ssu18_ps_filt_clean.rds : modified phyloseq object for the Arbitrary filtered 16S rRNA data.
ssu18_ps_perfect_clean.rds : modified phyloseq object for the PERfect filtered 16S rRNA data.
ssu18_ps_pime_clean.rds : modified phyloseq object for the PIME filtered 16S rRNA data.

Source code for the workflow can be found here:
https://github.com/sweltr/high-temp/blob/master/taxa.Rmd
S
Bioinformatics Analysis, Prokaryotic Expression, and Antiserum Preparation...
scidb.cn
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiarong.Liu; Yihao.Wang; Lingbao.Kong (2025). Bioinformatics Analysis, Prokaryotic Expression, and Antiserum Preparation of hnRNP A1 Protein [Dataset]. http://doi.org/10.57760/sciencedb.j00217.07379
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.j00217.07379
Dataset updated
Apr 28, 2025
Dataset provided by
Science Data Bank
Authors
Jiarong.Liu; Yihao.Wang; Lingbao.Kong
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Objective This study aims to construct a recombinant expression plasmid using methods such as PCR and double enzyme digestion. By conducting single-factor experiments to adjust IPTG concentration, expression time, and expression temperature, the research seeks to optimize the expression conditions of the hnRNP A1 protein in BL21 competent cells. The goal is to obtain high-concentration, high-quality purified protein and to prepare high-titer polyclonal antibodies against hnRNP A1.Methods Bioinformatics tools were used to analyze the physicochemical properties and structure of hnRNP A1. The pET-28a-hnRNP A1 recombinant plasmid was constructed and transformed into BL21 cells. After optimizing expression conditions, hnRNP A1 protein was purified using nickel column chromatography and identified by Western Blot. The purified protein was used to immunize C57BL/6 mice to produce polyclonal antibodies, and the antibody titer was determined by indirect ELISA.Results The highest expression of hnRNP A1 was achieved under conditions of 0.4 mM IPTG, induction temperature of 42°C, and induction time of 8 hours. The purified protein concentration reached 2.0563 μg/μl, and Western Blot confirmed the target protein. The antibody titer detected by indirect ELISA was 1:409,600.Conclusion The physicochemical properties of hnRNP A1 were successfully analyzed, high-efficiency expression of hnRNP A1 protein was achieved, and high-titer mouse-derived polyclonal antibodies against hnRNP A1 were prepared, providing a valuable tool for further research.
Novel and ultra-rare damaging variants in neuropeptide signaling are...
plos.figshare.com
xlsx
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Lutter; Ethan Bahl; Claire Hannah; Dabney Hofammann; Summer Acevedo; Huxing Cui; Carrie J. McAdams; Jacob J. Michaelson (2023). Novel and ultra-rare damaging variants in neuropeptide signaling are associated with disordered eating behaviors [Dataset]. http://doi.org/10.1371/journal.pone.0181556
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0181556
Dataset updated
Jun 4, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Michael Lutter; Ethan Bahl; Claire Hannah; Dabney Hofammann; Summer Acevedo; Huxing Cui; Carrie J. McAdams; Jacob J. Michaelson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ObjectiveEating disorders develop through a combination of genetic vulnerability and environmental stress, however the genetic basis of this risk is unknown.MethodsTo understand the genetic basis of this risk, we performed whole exome sequencing on 93 unrelated individuals with eating disorders (38 restricted-eating and 55 binge-eating) to identify novel damaging variants. Candidate genes with an excessive burden of predicted damaging variants were then prioritized based upon an unbiased, data-driven bioinformatic analysis. One top candidate pathway was empirically tested for therapeutic potential in a mouse model of binge-like eating.ResultsAn excessive burden of novel damaging variants was identified in 186 genes in the restricted-eating group and 245 genes in the binge-eating group. This list is significantly enriched (OR = 4.6, p
m
CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object)
data.mendeley.com
data.niaid.nih.gov
+3more
Updated Dec 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farah Zaib Khan (2018). CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object) [Dataset]. http://doi.org/10.17632/xnwncxpw42.1
Explore at:
Unique identifier
https://doi.org/10.17632/xnwncxpw42.1
Dataset updated
Dec 4, 2018
Authors
Farah Zaib Khan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This workflow adapts the approach and parameter settings of Trans-Omics for precision Medicine (TOPMed). The RNA-seq pipeline originated from the Broad Institute. There are in total five steps in the workflow starting from:

Read alignment using STAR which produces aligned BAM files including the Genome BAM and Transcriptome BAM.

The Genome BAM file is processed using Picard MarkDuplicates producing an updated BAM file containing information on duplicate reads (such reads can indicate biased interpretation).

SAMtools index is then employed to generate an index for the BAM file, in preparation for the next step.

The indexed BAM file is processed further with RNA-SeQC which takes the BAM file, human genome reference sequence and Gene Transfer Format (GTF) file as inputs to generate transcriptome-level expression quantifications and standard quality control metrics.

In parallel with transcript quantification, isoform expression levels are quantified by RSEM. This step depends only on the output of the STAR tool, and additional RSEM reference sequences.

For testing and analysis, the workflow author provided example data created by down-sampling the read files of a TOPMed public access data. Chromosome 12 was extracted from the Homo Sapien Assembly 38 reference sequence and provided by the workflow authors. The required GTF and RSEM reference data files are also provided. The workflow is well-documented with a detailed set of instructions of the steps performed to down-sample the data are also provided for transparency. The availability of example input data, use of containerization for underlying software and detailed documentation are important factors in choosing this specific CWL workflow for CWLProv evaluation.

This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwl
CWL run of Alignment Workflow (CWLProv 0.6.0 Research Object)
zenodo.org
data-staging.niaid.nih.gov
+3more
bin, zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farah Zaib Khan; Farah Zaib Khan; Stian Soiland-Reyes; Stian Soiland-Reyes (2020). CWL run of Alignment Workflow (CWLProv 0.6.0 Research Object) [Dataset]. http://doi.org/10.17632/6wtpgr3kbj.1
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.17632/6wtpgr3kbj.1
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Farah Zaib Khan; Farah Zaib Khan; Stian Soiland-Reyes; Stian Soiland-Reyes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see CWLProv 0.6.0 or use the cwlprov Python tool to explore.

The CWL alignment workflow included in this case study is designed by Data Biosphere. It adapts the alignment pipeline originally developed at Abecasis Lab, The University of Michigan. This workflow is part of NIH Data Commons initiative and comprises of four stages.

First step, Pre-align, accepts a Compressed Alignment Map (CRAM) file (a compressed format for BAM files developed by European Bioinformatics Institute (EBI)) and human genome reference sequence as input and using underlying software utilities of SAMtools such as view, sort and fixmate returns a list of fastq files which can be used as input for the next step.

The next step Align also accepts the human reference genome as input along with the output files from Pre-align and uses BWA-mem to generate aligned reads as BAM files. SAMBLASTER is used to mark duplicate reads and SAMtools view to convert read files from SAM to BAM format.

The BAM files generated after lign are sorted with SAMtool sort'.

Finally, these sorted alignment files are merged to produce single sorted BAM file using SAMtools merge in Post-align step.

Steps to reproduce

This analysis was run using a 16-core Linux cloud instance with 64GB RAM and pre-installed docker.

Install gsutils

export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | \ sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | \ sudo apt-key add - sudo apt-get update && sudo apt-get install google-cloud-sdk

Get the data and make the analysis environment ready:

git clone https://github.com/FarahZKhan/topmed-workflows.git cd topmed-workflows git checkout cwlprov_testing cd aligner/sbg-alignment-cwl # this is a custom script download google bucket files from json files and create a local json # it needs gsutil to be installed though git clone https://github.com/DailyDreaming/fetch_gs_frm_json.git # Wait... this should download ~18Gb. python2.7 fetch_gs_frm_json/dl_gsfiles_frm_json.py topmed-alignment.sample.json

Run the following commands to create the CWLProv Research Object:

time cwltool --no-match-user --provenance alignmnentwf0.6.0 --tmp-outdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp --tmpdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp topmed-alignment.cwl topmed-alignment.sample.json.new zip -r alignment_0.6.0_linux.zip alignment_0.6.0_linux sha256sum alignment_0.6.0_linux.zip > alignment_0.6.0_linux.zip.sha25
P
Protein Sequencing Market Report
promarketreports.com
doc, pdf, ppt
Updated Jul 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pro Market Reports (2025). Protein Sequencing Market Report [Dataset]. https://www.promarketreports.com/reports/protein-sequencing-market-6683
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Jul 27, 2025
Dataset authored and provided by
Pro Market Reports
License
https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Reagents and Consumables: This segment includes consumables such as enzymes, buffers, and columns.Instruments: This segment includes instruments such as sequencers, mass spectrometers, and chromatographs.Others: This segment includes services such as sample preparation and data analysis. Recent developments include: BSI's objective is to make a positive influence in proteomic research, particularly by offering professionally supported software. Bioinformatics Solutions Inc. creates powerful algorithms based on cutting-edge research to solve basic bioinformatics difficulties. This small, agile team is dedicated to meeting the demands of pharmaceutical, biotechnological, and academic scientists, as well as advancing drug discovery research. The firm, started in 2000 in Waterloo, Canada, is comprised of a bright, award-winning, and clever crew of developers, scientists, and salespeople., Charles River Laboratories International, Inc. offers drug discovery and development solutions such as research models and related services, as well as outsourced preclinical services. Segments the business is divided into two divisions: The firm produces and sells research models, mostly genetically and virally specified purpose-bred rats and mice, with roughly 150 distinct strains. It also offers a variety of complementary services to help clients support the usage of research models in medication development., Intended Audience. Notable trends are: Increased use of digital manufacturing processes to propel market growth.
b
Biocompute Object
bioregistry.io
Updated Apr 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Biocompute Object [Dataset]. https://bioregistry.io/biocompute
Explore at:
Dataset updated
Apr 5, 2024
Description
BioCompute is shorthand for the IEEE 2791-2020 standard for Bioinformatics Analyses Generated by High-Throughput Sequencing (HTS) to facilitate communication. This pipeline documentation approach has been adopted by a few FDA centers. The goal is to ease the communication burdens between research centers, organizations, and industries. This web portal allows users to build a BioCompute Objects through the interface in a human and machine readable format.
d
(high-temp) No 5. Aplha diversity (16S rRNA/ITS) Output
search.dataone.org
smithsonian.figshare.com
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jarrod Scott (2024). (high-temp) No 5. Aplha diversity (16S rRNA/ITS) Output [Dataset]. https://search.dataone.org/view/urn:uuid:044e517a-de37-4aed-9dcf-6ec98ebd8eaa
Explore at:
Dataset updated
Aug 15, 2024
Dataset provided by
Smithsonian Research Data Repository
Authors
Jarrod Scott
Description
Output files from the No 5. Aplha diversity Workflow page of the SWELTR high-temp study. In this workflow we used Hill numbers to assess alpha diversity across temperature treatments.

alpha_wf.rdata : contains all variables and phyloseq objects from 16s rRNA and ITS ASV alpha diversity assessment. To see the Objects, in R run load("alpha_wf.rdata", verbose=TRUE)

Additional files:

For convenience, we also include individual phyloseq objects (collected in zip files) where Hill numbers have been added to the sample data tables.

_**ITS (its_alpha_objects.zip)**_ :
its18_ps_work.rds : phyloseq object for the FULL (unfiltered) ITS data.
its18_ps_filt.rds : phyloseq object for the Arbitrary filtered ITS data.
its18_ps_perfect.rds : phyloseq object for the PERfect ITS data.
its18_ps_pime.rds : phyloseq object for the PIME ITS data.

_16S rRNA (ssu_alpha_objects.zip)_ :
ssu18_ps_work.rds : phyloseq object for the FULL (unfiltered) 16S rRNA data.
ssu18_ps_filt.rds : phyloseq object for the Arbitrary filtered 16S rRNA data.
ssu18_ps_perfect.rds : phyloseq object for the PERfect 16S rRNA data.
ssu18_ps_pime.rds : phyloseq object for the PIME 16S rRNA data.

Source code for the workflow can be found here:
https://github.com/sweltr/high-temp/blob/master/alpha.Rmd
n
Data from: Knowledge-based prediction of protein backbone conformation using...
data.niaid.nih.gov
zenodo.org
+1more
zip
Updated Oct 23, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iyanar Vetrivel; Swapnil Mahajan; Manoj Tyagi; Lionel Hoffmann; Yves-Henri Sanejouand; Narayanaswamy Srinivasan; Alexandre de Brevern; Frédéric Cadet; Bernard Offmann (2018). Knowledge-based prediction of protein backbone conformation using a structural alphabet [Dataset]. http://doi.org/10.5061/dryad.3f5q5
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.3f5q5
Dataset updated
Oct 23, 2018
Dataset provided by
University of Reunion Island
Nantes Université
Authors
Iyanar Vetrivel; Swapnil Mahajan; Manoj Tyagi; Lionel Hoffmann; Yves-Henri Sanejouand; Narayanaswamy Srinivasan; Alexandre de Brevern; Frédéric Cadet; Bernard Offmann
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlights the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.

Data from: Defining objective clusters for rabies virus sequences using...

zenodo.org
openagrar.de
+1more

csv, pdf, tiff

Updated Jul 16, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Susanne Fischer; Susanne Fischer; Conrad M. Freuling; Thomas Müller; Florian Pfaff; Ulrich Bodenhofer; Dirk Höper; Mareike Fischer; Denise A. Marston; Anthony R. Fooks; Thomas C. Mettenleiter; Franz J. Conraths; Timo Homeier-Bachmann; Conrad M. Freuling; Thomas Müller; Florian Pfaff; Ulrich Bodenhofer; Dirk Höper; Mareike Fischer; Denise A. Marston; Anthony R. Fooks; Thomas C. Mettenleiter; Franz J. Conraths; Timo Homeier-Bachmann (2024). Defining objective clusters for rabies virus sequences using affinity propagation clustering [Dataset]. http://doi.org/10.5281/zenodo.7115116

Explore at:

csv, pdf, tiffAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7115116

Dataset updated

Jul 16, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses.

f
Supplementary Material for: Screening a Prognosis-Related Target Gene in...
datasetcatalog.nlm.nih.gov
karger.figshare.com
Updated Jun 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S. , Wang; Y. , Quan; X. , Gao; J. , Deng; H. , Lv (2021). Supplementary Material for: Screening a Prognosis-Related Target Gene in Patients with HER-2-Positive Breast Cancer by Bioinformatics Analysis [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000901575
Explore at:
Dataset updated
Jun 16, 2021
Authors
S. , Wang; Y. , Quan; X. , Gao; J. , Deng; H. , Lv
Description
Objective: The objective of the present study was to determine a target gene and explore the molecular mechanisms involved in the pathogenesis of HER-2-positive breast cancer. Methods: Three RNA expression profiles obtained from the Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA) were used to identify differentially expressed genes (DEGs) using the R software. A protein-protein interaction network was then constructed, and hub genes were determined. Subsequently, the relationship between clinical parameters and hub genes was examined to screen for target genes. Next, DNA methylation and genomic alterations of the target gene were evaluated. To further explore potential molecular mechanisms, a functional enrichment analysis of genes coexpressed with the target gene was performed. Results: The differential expression analysis revealed 217 DEGs in HER-2-positive breast cancer samples compared to normal breast tissues. RRM2 was the only hub gene closely associated with lymphatic metastasis and the patients’ prognosis. Additionally, RRM2 was found to be consistently amplified and negatively associated with the level of methylation. Functional enrichment analysis showed that the coexpressed genes were mainly involved in cell cycle regulation. Conclusions: RRM2 was identified as a target gene associated with the initiation, progression, and prognosis of HER-2-positive breast cancer, which may be considered as a new biomarker and therapeutic target.
f
Table2_Novel insights into the progression and prognosis of the calpain...
datasetcatalog.nlm.nih.gov
figshare.com
Updated Jul 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tian, Zhifeng; Hu, Hanguang; Dai, Dongjun; Shui, Yongjie; Wu, Dehao; Wei, Qichun; Li, Ping; Ni, Runliang (2023). Table2_Novel insights into the progression and prognosis of the calpain family members in hepatocellular carcinoma: a comprehensive integrated analysis.XLSX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000968926
Explore at:
Dataset updated
Jul 12, 2023
Authors
Tian, Zhifeng; Hu, Hanguang; Dai, Dongjun; Shui, Yongjie; Wu, Dehao; Wei, Qichun; Li, Ping; Ni, Runliang
Description
Objectives: The goal of our bioinformatics study was to comprehensively analyze the association between the whole calpain family members and the progression and prognosis of hepatocellular carcinoma (HCC).Methods: The data were collected from The Cancer Genome Atlas (TCGA). The landscape of the gene expression, copy number variation (CNV), mutation, and DNA methylation of calpain members were analyzed. Clustering analysis was performed to stratify the calpain-related groups. The least absolute shrinkage and selection operator (LASSO)-based Cox model was used to select hub survival genes.Results: We found 14 out of 16 calpain members expressed differently between tumor and normal tissues of HCC. The clustering analyses revealed high- and low-risk calpain groups which had prognostic difference. We found the high-risk calpain group had higher B cell infiltration and higher expression of immune checkpoint genes HAVCR2, PDCD1, and TIGHT. The CMap analysis found that the histone deacetylase (HDAC) inhibitor trichostatin A and the PI3K-AKT-mTOR pathway inhibitors LY-294002 and wortmannin might have a therapeutic effect on the high-risk calpain group. The DEGs between calpain groups were identified. Subsequent univariate Cox analysis of each DEG and LASSO-based Cox model obtained a calpain-related prognostic signature. The risk score model of this signature showed good ability to predict the overall survival of HCC patients in TCGA datasets and external validation datasets from the Gene Expression Omnibus database and the International Cancer Genome Consortium database.Conclusion: We found that calpain family members were associated with the progression, prognosis, and drug response of HCC. Our results require further studies to confirm.

Facebook

Twitter

Click to copy link

Link copied

Cite

Gerda Cristal Villalba; Ursula Matte (2023). Fantastic databases and where to find them: Web applications for researchers in a rush [Dataset]. http://doi.org/10.6084/m9.figshare.20018091.v1

Fantastic databases and where to find them: Web applications for researchers in a rush

Explore at:

jpegAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.20018091.v1

Dataset updated

Jun 3, 2023

Dataset provided by

SciELOhttp://www.scielo.org/

Authors

Gerda Cristal Villalba; Ursula Matte

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract Public databases are essential to the development of multi-omics resources. The amount of data created by biological technologies needs a systematic and organized form of storage, that can quickly be accessed, and managed. This is the objective of a biological database. Here, we present an overview of human databases with web applications. The databases and tools allow the search of biological sequences, genes and genomes, gene expression patterns, epigenetic variation, protein-protein interactions, variant frequency, regulatory elements, and comparative analysis between human and model organisms. Our goal is to provide an opportunity for exploring large datasets and analyzing the data for users with little or no programming skills. Public user-friendly web-based databases facilitate data mining and the search for information applicable to healthcare professionals. Besides, biological databases are essential to improve biomedical search sensitivity and efficiency and merge multiple datasets needed to share data and build global initiatives for the diagnosis, prognosis, and discovery of new treatments for genetic diseases. To show the databases at work, we present a a case study using ACE2 as example of a gene to be investigated. The analysis and the complete list of databases is available in the following website .

Clear search

Close search

Google apps

Main menu

Fantastic databases and where to find them: Web applications for researchers...

Data from: hfAIM: A reliable bioinformatics approach for in silico...

TCR-MHC Germline Interaction Scores Generated Using AIMS

BIOGRID CURATED DATA FOR PUBLICATION: hfAIM: A reliable bioinformatics...

Data from: Semi-artificial datasets as a resource for validation of...

Microarray and bioinformatic analysis of conventional ameloblastoma

Screening miRNA biomarkers for postmenopausal osteoporosis based on...

To explore the co-pathogenesis of obesity and nonalcoholic steatohepatitis...

(high-temp) No 4. Taxonomic: (16S rRNA/ITS) Output

Bioinformatics Analysis, Prokaryotic Expression, and Antiserum Preparation...

Novel and ultra-rare damaging variants in neuropeptide signaling are...

CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object)

CWL run of Alignment Workflow (CWLProv 0.6.0 Research Object)

Protein Sequencing Market Report

Biocompute Object

(high-temp) No 5. Aplha diversity (16S rRNA/ITS) Output

Data from: Knowledge-based prediction of protein backbone conformation using...

Data from: Defining objective clusters for rabies virus sequences using...

Supplementary Material for: Screening a Prognosis-Related Target Gene in...

Table2_Novel insights into the progression and prognosis of the calpain...

Fantastic databases and where to find them: Web applications for researchers in a rush