100+ datasets found

f
Table1_Bioinformatics on the Road: Taking Training to Students and...
frontiersin.figshare.com
docx
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcus Braga; Fabrício Araujo; Edian Franco; Kenny Pinheiro; Jakelyne Silva; Denner Maués; Sebastiao Neto; Lucas Pompeu; Luis Guimaraes; Adriana Carneiro; Igor Hamoy; Rommel Ramos (2023). Table1_Bioinformatics on the Road: Taking Training to Students and Researchers Beyond State Capitals.DOCX [Dataset]. http://doi.org/10.3389/feduc.2021.726930.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2021.726930.s001
Dataset updated
Jun 8, 2023
Dataset provided by
Frontiers
Authors
Marcus Braga; Fabrício Araujo; Edian Franco; Kenny Pinheiro; Jakelyne Silva; Denner Maués; Sebastiao Neto; Lucas Pompeu; Luis Guimaraes; Adriana Carneiro; Igor Hamoy; Rommel Ramos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In Brazil, training capable bioinformaticians is done, mostly, in graduate programs, sometimes with experiences during the undergraduate period. However, this formation tends to be inefficient in attracting students to the area and mainly in attracting professionals to support research projects in research groups. To solve these issues, participation in short courses is important for training students and professionals in the usage of tools for specific areas that use bioinformatics, as well as in ways to develop solutions tailored to the local needs of academic institutions or research groups. In this aim, the project “Bioinformática na Estrada” (Bioinformatics on the Road) proposed improving bioinformaticians’ skills in undergraduate and graduate courses, primarily in the countryside of the State of Pará, in the Amazon region of Brazil. The project scope is practical courses focused on the areas of interest of the place where the courses are occurring to train and encourage students and researchers to work in this field, reducing the existing gap due to the lack of qualified bioinformatics professionals. Theoretical and practical workshops took place, such as Introduction to Bioinformatics, Computer Science Basics, Applications of Computational Intelligence applied to Bioinformatics and Biotechnology, Computational Tools for Bioinformatics, Soil Genomics and Research Perspectives and Horizons in the Amazon Region. In the end, 444 undergraduate and graduate students from higher education institutions in the state of Pará and other Brazilian states attended the events of the Bioinformatics on the Road project.
I
Molecular Biology Databases Published in Nucleic Acids Research between...
databank.illinois.edu
Updated Feb 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heidi Imker (2024). Molecular Biology Databases Published in Nucleic Acids Research between 1991-2016 [Dataset]. http://doi.org/10.13012/B2IDB-4311325_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-4311325_V1
Dataset updated
Feb 1, 2024
Authors
Heidi Imker
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset was developed to create a census of sufficiently documented molecular biology databases to answer several preliminary research questions. Articles published in the annual Nucleic Acids Research (NAR) “Database Issues” were used to identify a population of databases for study. Namely, the questions addressed herein include: 1) what is the historical rate of database proliferation versus rate of database attrition?, 2) to what extent do citations indicate persistence?, and 3) are databases under active maintenance and does evidence of maintenance likewise correlate to citation? An overarching goal of this study is to provide the ability to identify subsets of databases for further analysis, both as presented within this study and through subsequent use of this openly released dataset.
🧫 Promoter or not? - Bioinformatics 🗃️ Dataset
kaggle.com
zip
Updated Mar 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samira Shemirani (2024). 🧫 Promoter or not? - Bioinformatics 🗃️ Dataset [Dataset]. https://www.kaggle.com/datasets/samira1992/promoter-or-not-bioinformatics-dataset
Explore at:
zip(4992691 bytes)Available download formats
Dataset updated
Mar 31, 2024
Authors
Samira Shemirani
Description
The promoter region is located near the transcription start sites, which regulate the transcription initiation of the gene by controlling the binding of RNA polymerase. Thus, recognition of the promoter region is an important area of interest in the field of bioinformatics. Over the past years, many new promoter prediction programs (PPPs) have emerged. PPPs aim to identify promoter regions in a genome using computational methods. Promoter prediction is a supervised learning problem that consists of three main steps to extract features: 1) CpG islands 2) Structural features 3) Content features
Bioinformatics Protein Dataset - Simulated
kaggle.com
zip
Updated Dec 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Gallo (2024). Bioinformatics Protein Dataset - Simulated [Dataset]. https://www.kaggle.com/datasets/gallo33henrique/bioinformatics-protein-dataset-simulated
Explore at:
zip(12928905 bytes)Available download formats
Dataset updated
Dec 27, 2024
Authors
Rafael Gallo
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Subtitle

"Synthetic protein dataset with sequences, physical properties, and functional classification for machine learning tasks."

Description

Introduction

This synthetic dataset was created to explore and develop machine learning models in bioinformatics. It contains 20,000 synthetic proteins, each with an amino acid sequence, calculated physicochemical properties, and a functional classification.

Columns Included

ID_Protein: Unique identifier for each protein.

Sequence: String of amino acids.

Molecular_Weight: Molecular weight calculated from the sequence.

Isoelectric_Point: Estimated isoelectric point based on the sequence composition.

Hydrophobicity: Average hydrophobicity calculated from the sequence.

Total_Charge: Sum of the charges of the amino acids in the sequence.

Polar_Proportion: Percentage of polar amino acids in the sequence.

Nonpolar_Proportion: Percentage of nonpolar amino acids in the sequence.

Sequence_Length: Total number of amino acids in the sequence.

Class: The functional class of the protein, one of five categories: Enzyme, Transport, Structural, Receptor, Other.

Inspiration and Sources

While this is a simulated dataset, it was inspired by patterns observed in real protein datasets, such as: - UniProt: A comprehensive database of protein sequences and annotations. - Kyte-Doolittle Scale: Calculations of hydrophobicity. - Biopython: A tool for analyzing biological sequences.

Proposed Uses

This dataset is ideal for: - Training classification models for proteins. - Exploratory analysis of physicochemical properties of proteins. - Building machine learning pipelines in bioinformatics.

How This Dataset Was Created

Sequence Generation: Amino acid chains were randomly generated with lengths between 50 and 300 residues.

Property Calculation: Physicochemical properties were calculated using the Biopython library.

Class Assignment: Classes were randomly assigned for classification purposes.

Limitations

The sequences and properties do not represent real proteins but follow patterns observed in natural proteins.

The functional classes are simulated and do not correspond to actual biological characteristics.

Data Split

The dataset is divided into two subsets: - Training: 16,000 samples (proteinas_train.csv). - Testing: 4,000 samples (proteinas_test.csv).

Acknowledgment

This dataset was inspired by real bioinformatics challenges and designed to help researchers and developers explore machine learning applications in protein analysis.
f
Table_1_A Bioinformatics Approach to Explore MicroRNAs as Tools to Bridge...
frontiersin.figshare.com
docx
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massimo Bellato; Davide De Marchi; Carla Gualtieri; Elisabetta Sauta; Paolo Magni; Anca Macovei; Lorenzo Pasotti (2023). Table_1_A Bioinformatics Approach to Explore MicroRNAs as Tools to Bridge Pathways Between Plants and Animals. Is DNA Damage Response (DDR) a Potential Target Process?.docx [Dataset]. http://doi.org/10.3389/fpls.2019.01535.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fpls.2019.01535.s001
Dataset updated
May 30, 2023
Dataset provided by
Frontiers
Authors
Massimo Bellato; Davide De Marchi; Carla Gualtieri; Elisabetta Sauta; Paolo Magni; Anca Macovei; Lorenzo Pasotti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MicroRNAs, highly-conserved small RNAs, act as key regulators of many biological functions in both plants and animals by post-transcriptionally regulating gene expression through interactions with their target mRNAs. The microRNA research is a dynamic field, in which new and unconventional aspects are emerging alongside well-established roles in development and stress adaptation. A recent hypothesis states that miRNAs can be transferred from one species to another and potentially target genes across distant species. Here, we propose to look into the trans-kingdom potential of miRNAs as a tool to bridge conserved pathways between plant and human cells. To this aim, a novel multi-faceted bioinformatic analysis pipeline was developed, enabling the investigation of common biological processes and genes targeted in plant and human transcriptome by a set of publicly available Medicago truncatula miRNAs. Multiple datasets, including miRNA, gene, transcript and protein sequences, expression profiles and genetic interactions, were used. Three different strategies were employed, namely a network-based pipeline, an alignment-based pipeline, and a M. truncatula network reconstruction approach, to study functional modules and to evaluate gene/protein similarities among miRNA targets. The results were compared in order to find common features, e.g., microRNAs targeting similar processes. Biological processes like exocytosis and response to viruses were common denominators in the investigated species. Since the involvement of miRNAs in the regulation of DNA damage response (DDR)-associated pathways is barely explored, especially in the plant kingdom, a special attention is given to this aspect. Hereby, miRNAs predicted to target genes involved in DNA repair, recombination and replication, chromatin remodeling, cell cycle and cell death were identified in both plants and humans, paving the way for future interdisciplinary advancements.
f
Data_Sheet_1_Validation of a Bioinformatics Workflow for Routine Analysis of...
datasetcatalog.nlm.nih.gov
Updated Mar 6, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roosens, Nancy H. C.; Mattheus, Wesley; Fu, Qiang; Ceyssens, Pieter-Jan; Vanneste, Kevin; De Keersmaecker, Sigrid C. J.; Van Braekel, Julien; Bertrand, Sophie; Bogaerts, Bert; Winand, Raf (2019). Data_Sheet_1_Validation of a Bioinformatics Workflow for Routine Analysis of Whole-Genome Sequencing Data and Related Challenges for Pathogen Typing in a European National Reference Center: Neisseria meningitidis as a Proof-of-Concept.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000172206
Explore at:
Dataset updated
Mar 6, 2019
Authors
Roosens, Nancy H. C.; Mattheus, Wesley; Fu, Qiang; Ceyssens, Pieter-Jan; Vanneste, Kevin; De Keersmaecker, Sigrid C. J.; Van Braekel, Julien; Bertrand, Sophie; Bogaerts, Bert; Winand, Raf
Description
Despite being a well-established research method, the use of whole-genome sequencing (WGS) for routine molecular typing and pathogen characterization remains a substantial challenge due to the required bioinformatics resources and/or expertise. Moreover, many national reference laboratories and centers, as well as other laboratories working under a quality system, require extensive validation to demonstrate that employed methods are “fit-for-purpose” and provide high-quality results. A harmonized framework with guidelines for the validation of WGS workflows does currently, however, not exist yet, despite several recent case studies highlighting the urgent need thereof. We present a validation strategy focusing specifically on the exhaustive characterization of the bioinformatics analysis of a WGS workflow designed to replace conventionally employed molecular typing methods for microbial isolates in a representative small-scale laboratory, using the pathogen Neisseria meningitidis as a proof-of-concept. We adapted several classically employed performance metrics specifically toward three different bioinformatics assays: resistance gene characterization (based on the ARG-ANNOT, ResFinder, CARD, and NDARO databases), several commonly employed typing schemas (including, among others, core genome multilocus sequence typing), and serogroup determination. We analyzed a core validation dataset of 67 well-characterized samples typed by means of classical genotypic and/or phenotypic methods that were sequenced in-house, allowing to evaluate repeatability, reproducibility, accuracy, precision, sensitivity, and specificity of the different bioinformatics assays. We also analyzed an extended validation dataset composed of publicly available WGS data for 64 samples by comparing results of the different bioinformatics assays against results obtained from commonly used bioinformatics tools. We demonstrate high performance, with values for all performance metrics >87%, >97%, and >90% for the resistance gene characterization, sequence typing, and serogroup determination assays, respectively, for both validation datasets. Our WGS workflow has been made publicly available as a “push-button” pipeline for Illumina data at https://galaxy.sciensano.be to showcase its implementation for non-profit and/or academic usage. Our validation strategy can be adapted to other WGS workflows for other pathogens of interest and demonstrates the added value and feasibility of employing WGS with the aim of being integrated into routine use in an applied public health setting.
Data from: Semi-artificial datasets as a resource for validation of...
zenodo.org
datadryad.org
zip
Updated Jun 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucie Tamisier; Lucie Tamisier; Annelies Haegeman; Annelies Haegeman; Yoika Foucart; Nicolas Fouillien; Maher Al Rwahnih; Nihal Buzkan; Thierry Candresse; Thierry Candresse; Michela Chiumenti; Kris De Jonghe; Marie Lefebvre; Paolo Margaria; Jean Sébastien Reynard; Kristian Stevens; Denis Kutnjak; Denis Kutnjak; Sébastien Massart; Yoika Foucart; Nicolas Fouillien; Maher Al Rwahnih; Nihal Buzkan; Michela Chiumenti; Kris De Jonghe; Marie Lefebvre; Paolo Margaria; Jean Sébastien Reynard; Kristian Stevens; Sébastien Massart (2022). Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection [Dataset]. http://doi.org/10.5061/dryad.0zpc866z8
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.0zpc866z8
Dataset updated
Jun 5, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lucie Tamisier; Lucie Tamisier; Annelies Haegeman; Annelies Haegeman; Yoika Foucart; Nicolas Fouillien; Maher Al Rwahnih; Nihal Buzkan; Thierry Candresse; Thierry Candresse; Michela Chiumenti; Kris De Jonghe; Marie Lefebvre; Paolo Margaria; Jean Sébastien Reynard; Kristian Stevens; Denis Kutnjak; Denis Kutnjak; Sébastien Massart; Yoika Foucart; Nicolas Fouillien; Maher Al Rwahnih; Nihal Buzkan; Michela Chiumenti; Kris De Jonghe; Marie Lefebvre; Paolo Margaria; Jean Sébastien Reynard; Kristian Stevens; Sébastien Massart
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In the last decade, High-Throughput Sequencing (HTS) has revolutionized biology and medicine. This technology allows the sequencing of huge amount of DNA and RNA fragments at a very low price. In medicine, HTS tests for disease diagnostics are already brought into routine practice. However, the adoption in plant health diagnostics is still limited. One of the main bottlenecks is the lack of expertise and consensus on the standardization of the data analysis. The Plant Health Bioinformatic Network (PHBN) is an Euphresco project aiming to build a community network of bioinformaticians/computational biologists working in plant health. One of the main goals of the project is to develop reference datasets that can be used for validation of bioinformatics pipelines and for standardization purposes.

Semi-artificial datasets have been created for this purpose (Datasets 1 to 10). They are composed of a "real" HTS dataset spiked with artificial viral reads. It will allow researchers to adjust their pipeline/parameters as good as possible to approximate the actual viral composition of the semi-artificial datasets. Each semi-artificial dataset allows to test one or several limitations that could prevent virus detection or a correct virus identification from HTS data (i.e. low viral concentration, new viral species, non-complete genome).

Eight artificial datasets only composed of viral reads (no background data) have also been created (Datasets 11 to 18). Each dataset consists of a mix of several isolates from the same viral species showing different frequencies. The viral species were selected to be as divergent as possible. These datasets can be used to test haplotype reconstruction software, the goal being to reconstruct all the isolates present in a dataset.

A GitLab repository (https://gitlab.com/ilvo/VIROMOCKchallenge) is available and provides a complete description of the composition of each dataset, the methods used to create them and their goals.
B
Bioinformatics Market Report
marketresearchforecast.com
doc, pdf, ppt
Updated Oct 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Bioinformatics Market Report [Dataset]. https://www.marketresearchforecast.com/reports/bioinformatics-market-10292
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Oct 26, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2026 - 2034
Area covered
Global
Variables measured
Market Size
Description
The size of the Bioinformatics Market was valued at USD 20.72 USD Billion in 2023 and is projected to reach USD 64.45 USD Billion by 2032, with an expected CAGR of 17.6% during the forecast period. Recent developments include: October 2023 – Bionl, Inc., a pioneering company in biomedical and bioinformatics research, launched a no-code biomedical research platform that enables researchers, students, and professionals to investigate biomedicine using natural language queries., October 2023 – BioBam Bioinformatics launched OmicsBox 3.1 to empower researchers, scientists, and bioinformaticians in their pursuit of advanced omics data analysis and interpretation., April 2023 – Absci Corp. collaborated with Aster Insights (formerly named M2GEN) to expedite the development of new cancer medicines., December 2022 – Analytical Biosciences Limited partnered with Mission Bio to co-develop bioinformatics packages for translational and clinical research applications in hematological cancers., April 2022 – ATCC signed an agreement with QIAGEN to provide sequencing data from its collection of biological data. QIAGEN Digital Insights aims to establish a database from this information to develop and deliver high-value digital biology content for the biotechnology and pharmaceutical industries.. Key drivers for this market are: Increased Funding for Genomics Research to Surge Demand for Bioinformatic Solutions. Potential restraints include: Increased Funding for Genomics Research to Surge Demand for Bioinformatic Solutions. Notable trends are: Increased Funding for Genomics Research to Surge Demand for Bioinformatic Solutions.
Microarray and bioinformatic analysis of conventional ameloblastoma
data.scielo.org
jpeg, txt, xlsx
Updated Dec 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luis Fernando Jacinto-Alemán; Luis Fernando Jacinto-Alemán; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez (2022). Microarray and bioinformatic analysis of conventional ameloblastoma [Dataset]. http://doi.org/10.48331/SCIELODATA.Z2S8X9
Explore at:
xlsx(10317), jpeg(3415112), xlsx(9969), jpeg(12173968), txt(605), txt(289), txt(3840), xlsx(9964), xlsx(12458), txt(2657), txt(18077), xlsx(10402), jpeg(2313098), txt(406), txt(1023)Available download formats
Unique identifier
https://doi.org/10.48331/SCIELODATA.Z2S8X9
Dataset updated
Dec 20, 2022
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Luis Fernando Jacinto-Alemán; Luis Fernando Jacinto-Alemán; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
National Autonomous University of Mexico
Description
Ameloblastoma is a highly aggressive odontogenic tumor, and its pathogenesis is associated with multiple participating genes. Objective: Our aim was to identify and validate new critical genes of conventional ameloblastoma using microarray and bioinformatics analysis. Methods: Gene expression microarray and bioinformatic analysis were performed to use CHIP H10KA and DAVID software for enrichment. Protein-protein interactions (PPI) were visualized using STRING-Cytoscape with MCODE plugin, followed by Kaplan-Meier and GEPIA analysis that were employed for the candidate's postulation. RT-qPCR and IHC assays were performed to validate the bioinformatic approach. Results: 376 upregulated genes were identified. PPI analysis revealed 14 genes that were validated by Kaplan-Meier and GEPIA resulting in PDGFA and IL2RA as candidate genes. The RT-qPCR analysis confirmed their intense expression. Immunohistochemistry analysis showed that PDGFA expression is parenchyma located. Conclusion: With bioinformatics methods, we can identify upregulated genes in conventional ameloblastoma, and with RT-qPCR and immunoexpression analysis validate that PDGFA could be a more specific and localized therapeutic target.
f
Table_6_Identification of molecular subtypes and immune infiltration in...
datasetcatalog.nlm.nih.gov
Updated Aug 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gan, Lei; Sun, Jing; Lv, Si-ji; Sun, Jia-ni (2023). Table_6_Identification of molecular subtypes and immune infiltration in endometriosis: a novel bioinformatics analysis and In vitro validation.xlsx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001005583
Explore at:
Dataset updated
Aug 18, 2023
Authors
Gan, Lei; Sun, Jing; Lv, Si-ji; Sun, Jia-ni
Description
IntroductionEndometriosis is a worldwide gynacological diseases, affecting in 6–10% of women of reproductive age. The aim of this study was to investigate the gene network and potential signatures of immune infiltration in endometriosis.MethodsThe expression profiles of GSE51981, GSE6364, and GSE7305 were obtained from the Gene Expression Omnibus (GEO) database. Core modules and central genes related to immune characteristics were identified using a weighted gene coexpression network analysis. Bioinformatics analysis was performed to identify central genes in immune infiltration. Protein-protein interaction (PPI) network was used to identify the hub genes. We then constructed subtypes of endometriosis samples and calculated their correlation with hub genes. qRTPCR and Western blotting were used to verify our findings.ResultsWe identified 10 candidate hub genes (GZMB, PRF1, KIR2DL1, KIR2DL3, KIR3DL1, KIR2DL4, FGB, IGFBP1, RBP4, and PROK1) that were significantly correlated with immune infiltration. Our study established a detailed immune network and systematically elucidated the molecular mechanism underlying endometriosis from the aspect of immune infiltration.DiscussionOur study provides comprehensive insights into the immunology involved in endometriosis and might contribute to the development of immunotherapy for endometriosis. Furthermore, our study sheds light on the underlying molecular mechanism of endometriosis and might help improve the diagnosis and treatment of this condition.
r
Data from: Multiple sequence alignment for functional correlation among low...
researchdata.edu.au
bridges.monash.edu
Updated May 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wei-Yao Chou; Wei-I Chou; Tun-Wen Pai; Shu-Chuan Lin; Fan-Yu Chang; Yuh-Ju Sun; Chuan-Yi Tang; Margaret Dah-Tsyr Chang (2022). Multiple sequence alignment for functional correlation among low similarity sequences [Dataset]. http://doi.org/10.4225/03/5a13722947571
Explore at:
Unique identifier
https://doi.org/10.4225/03/5a13722947571
Dataset updated
May 5, 2022
Dataset provided by
Monash University
Authors
Wei-Yao Chou; Wei-I Chou; Tun-Wen Pai; Shu-Chuan Lin; Fan-Yu Chang; Yuh-Ju Sun; Chuan-Yi Tang; Margaret Dah-Tsyr Chang
Description
Multiple sequence alignment is a broadly used methodology in biological applications. It is expected to locate consensus sequence stretches with evolutionary and functional conservation. However, when sequence similarity among the queries becomes low, it works poorly. The aim of this study is to incorporate important biological knowledge and assumption to improve the quality of a general alignment on low similarity sequences such as carbohydrate binding module (CBM) families. Since the recognition of characteristic patterns in CBMs does not apply to a general model, a more accurate scoring function employing secondary-structure-based and key-residue-weighted algorithms for alignment was designed to approach this goal. Our results indicated that the new method was practically applicable to identify the key residues in terms of three-dimensional structures, while conventional tools could fail. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
n
EchoBASE
neuinfo.org
dknet.org
+2more
Updated May 14, 2006
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2006). EchoBASE [Dataset]. http://identifiers.org/RRID:SCR_002430
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002430
Dataset updated
May 14, 2006
Description
A database that curates new experimental and bioinformatic information about the genes and gene products of the model bacterium Escherichia coli K-12 strain MG1655. It has been created to integrate information from post-genomic experiments into a single resource with the aim of providing functional predictions for the 1500 or so gene products for which we have no knowledge of their physiological function. While EchoBASE provides a basic annotation of the genome, taken from other databases, its novelty is in the curation of post-genomic experiments and their linkage to genes of unknown function. Experiments published on E. coli are curated to one of two levels. Papers dealing with the determination of function of a single gene are briefly described, while larger dataset are actually included in the database and can be searched and manipulated. This includes data for proteomics studies, protein-protein interaction studies, microarray data, functional genomic approaches (looking at multiple deletion strains for novel phenotypes) and a wide range of predictions that come out of in silico bioinformatic approaches. The aim of the database is to provide hypothesis for the functions of uncharacterized gene products that may be used by the E. coli research community to further our knowledge of this model bacterium.
r
Application of deep learning in biological networks
resodate.org
Updated Jan 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amine Abdelmaksoud (2024). Application of deep learning in biological networks [Dataset]. http://doi.org/10.48366/R691834
Explore at:
Unique identifier
https://doi.org/10.48366/R691834
Dataset updated
Jan 1, 2024
Dataset provided by
Open Research Knowledge Graph
Authors
Amine Abdelmaksoud
Description
This comparison shows the use of deep learning techniques in biological networks, focusing on diverse applications and research problems. The analysis includes various properties, such as results, application domains, classification tasks, dataset features, and machine learning methods/algorithms employed. The research problems explored include multiclass disease classification, compound-protein interaction prediction, microRNA-disease association prediction, identifying essential proteins, predicting protein functions, robust phenotype prediction, disease outcome classification, drug-target association prediction, risk stratification modeling for lung cancer, and modeling polypharmacy side effects. Through this comparison, we aim to explain the effectiveness and adaptability of deep learning approaches in addressing complex biological challenges across different domains and applications within biological networks.
m
CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object)
data.mendeley.com
explore.openaire.eu
+3more
Updated Dec 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farah Zaib Khan (2018). CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object) [Dataset]. http://doi.org/10.17632/xnwncxpw42.1
Explore at:
Unique identifier
https://doi.org/10.17632/xnwncxpw42.1
Dataset updated
Dec 4, 2018
Authors
Farah Zaib Khan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This workflow adapts the approach and parameter settings of Trans-Omics for precision Medicine (TOPMed). The RNA-seq pipeline originated from the Broad Institute. There are in total five steps in the workflow starting from:

Read alignment using STAR which produces aligned BAM files including the Genome BAM and Transcriptome BAM.

The Genome BAM file is processed using Picard MarkDuplicates producing an updated BAM file containing information on duplicate reads (such reads can indicate biased interpretation).

SAMtools index is then employed to generate an index for the BAM file, in preparation for the next step.

The indexed BAM file is processed further with RNA-SeQC which takes the BAM file, human genome reference sequence and Gene Transfer Format (GTF) file as inputs to generate transcriptome-level expression quantifications and standard quality control metrics.

In parallel with transcript quantification, isoform expression levels are quantified by RSEM. This step depends only on the output of the STAR tool, and additional RSEM reference sequences.

For testing and analysis, the workflow author provided example data created by down-sampling the read files of a TOPMed public access data. Chromosome 12 was extracted from the Homo Sapien Assembly 38 reference sequence and provided by the workflow authors. The required GTF and RSEM reference data files are also provided. The workflow is well-documented with a detailed set of instructions of the steps performed to down-sample the data are also provided for transparency. The availability of example input data, use of containerization for underlying software and detailed documentation are important factors in choosing this specific CWL workflow for CWLProv evaluation.

This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwl
m
CWL run of Alignment Workflow (CWLProv 0.6.0 Research Object)
data.mendeley.com
data.niaid.nih.gov
+3more
Updated Dec 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farah Zaib Khan (2018). CWL run of Alignment Workflow (CWLProv 0.6.0 Research Object) [Dataset]. http://doi.org/10.17632/6wtpgr3kbj.1
Explore at:
Unique identifier
https://doi.org/10.17632/6wtpgr3kbj.1
Dataset updated
Dec 4, 2018
Authors
Farah Zaib Khan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The CWL alignment workflow included in this case study is designed by Data Biosphere. It adapts the alignment pipeline originally developed at Abecasis Lab, The University of Michigan. This workflow is part of NIH Data Commons initiative and comprises of four stages. First step, "Pre-align'' accepts a Compressed Alignment Map (CRAM) file (a compressed format for BAM files developed by European Bioinformatics Institute (EBI)) and human genome reference sequence as input and using underlying software utilities of SAMtools such as view, sort and fixmate returns a list of fastq files which can be used as input for the next step. The next step "Align'' also accepts the human reference genome as input along with the output files from "Pre-align'' and uses BWA-mem to generate aligned reads as BAM files. SAMBLASTER is used to mark duplicate reads and SAMtools view to convert read files from SAM to BAM format. The BAM files generated after "Align'' are sorted with "SAMtool sort''. Finally, these sorted alignment files are merged to produce single sorted BAM file using SAMtools merge in "Post-align'' step.

This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.6.0 or use https://pypi.org/project/cwlprov/ to explore
f
Supplementary Material for: Integrative Bioinformatics Analysis Provides...
datasetcatalog.nlm.nih.gov
karger.figshare.com
Updated Apr 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S. , Qiu; L. -L. , Lv; L. -T. , Zhou; K. -L. , Ma; B. -C. , Liu; H. , Liu; Z. -L. , Li; R. -N. , Tang (2018). Supplementary Material for: Integrative Bioinformatics Analysis Provides Insight into the Molecular Mechanisms of Chronic Kidney Disease [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000631776
Explore at:
Dataset updated
Apr 11, 2018
Authors
S. , Qiu; L. -L. , Lv; L. -T. , Zhou; K. -L. , Ma; B. -C. , Liu; H. , Liu; Z. -L. , Li; R. -N. , Tang
Description
Background/Aims: Chronic kidney disease (CKD) is a worldwide public health problem. Regardless of the underlying primary disease, CKD tends to progress to end-stage kidney disease, resulting in unsatisfactory and costly treatment. Its common pathogenesis, however, remains unclear. The aim of this study was to provide an unbiased catalog of common gene-expression changes of CKD and reveal the underlying molecular mechanism using an integrative bioinformatics approach. Methods: We systematically collected over 250 Affymetrix microarray datasets from the glomerular and tubulointerstitial compartments of healthy renal tissues and those with various types of established CKD (diabetic kidney disease, hypertensive nephropathy, and glomerular nephropathy). Then, using stringent bioinformatics analysis, shared differentially expressed genes (DEGs) of CKD were obtained. These shared DEGs were further analyzed by the gene ontology (GO) and pathway enrichment analysis. Finally, the protein-protein interaction networks(PINs) were constructed to further refine our results. Results: Our analysis identified 176 and 50 shared DEGs in diseased glomeruli and tubules, respectively, including many transcripts that have not been previously reported to be involved in kidney disease. Enrichment analysis also showed that the glomerular and tubulointerstitial compartments underwent a wide range of unique pathological changes during chronic injury. As revealed by the GO enrichment analysis, shared DEGs in glomeruli were significantly enriched in exosomes. By constructing PINs, we identified several hub genes (e.g. OAS1, JUN, and FOS) and clusters that might play key roles in regulating the development of CKD. Conclusion: Our study not only further reveals the unifying molecular mechanism of CKD pathogenesis but also provides a valuable resource of potential biomarkers and therapeutic targets.
iSkylims demo dataset
zenodo.org
application/gzip
Updated Jun 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sara Monzón; Luis Chapado; Sarai Varona; Isabel Cuesta de la Plaza; Sara Monzón; Luis Chapado; Sarai Varona; Isabel Cuesta de la Plaza (2023). iSkylims demo dataset [Dataset]. http://doi.org/10.5281/zenodo.8059345
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8059345
Dataset updated
Jun 28, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sara Monzón; Luis Chapado; Sarai Varona; Isabel Cuesta de la Plaza; Sara Monzón; Luis Chapado; Sarai Varona; Isabel Cuesta de la Plaza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
iSkyLIMS is born with the aim of helping with the wet laboratory tasks, and implementing a workflow that guides genomics labs on their activities from library preparation to data production, reducing potential errors associated to high throughput technology, and facilitating the quality control of the sequencing. Also, iSkyLIMS connects the wet lab with dry lab facilitating data analysis by bioinformaticians.

This dataset provides three run folders for NextSeq, NovaSeq y MiSeq Illumina machines with stats and log files that iskylims processes and uses for visualization.
d
(high-temp) No 4. Taxonomic: (16S rRNA/ITS) Output
search.dataone.org
dataone.org
Updated Aug 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jarrod Scott (2024). (high-temp) No 4. Taxonomic: (16S rRNA/ITS) Output [Dataset]. https://search.dataone.org/view/urn%3Auuid%3A2f7f52c8-0273-40a9-99dd-ebeb9d5239dd
Explore at:
Dataset updated
Aug 16, 2024
Dataset provided by
Smithsonian Research Data Repository
Authors
Jarrod Scott
Description
Output files from the No 4. Taxonomic Workflow page of the SWELTR high- temp study. In this workflow we used the microeco package for taxonomic assessment. We first converted each phyloseq object into a microtable object using the file2meco package.

taxa_wf.rdata : contains all variables and phyloseq objects from 16s rRNA and ITS ASV taxonomic assessment. To see the Objects, in R run _load("taxa_wf.rdata", verbose=TRUE)_

Additional files:

For convenience, we also include individual phyloseq and microtable objects (collected in zip files).

I** _TS (its_taxa_objects.zip)_ :**
its18_ps_work_me.rds : microtable object for the FULL (unfiltered) ITS data.
its18_ps_filt_me.rds : microtable object for the Arbitrary filtered ITS data.
its18_ps_perfect_me.rds : microtable object for the PERfect ITS data.
its18_ps_pime_me.rds : microtable object for the PIME ITS data.

_**16S rRNA (ssu_taxa_objects.zip):**_
ssu18_ps_work_me.rds : microtable object for the FULL (unfiltered) 16S rRNA data.
ssu18_ps_filt_me.rds : microtable object for the Arbitrary filtered 16S rRNA data.
ssu18_ps_perfect_me.rds : microtable object for the PERfect 16S rRNA data.
ssu18_ps_pime_me.rds : microtable object for the PIME 16S rRNA data.

For one of the 16S rRNA analyses we looked at family-level diversity of major bacterial phyla. For this analysis, we renamed NA ranks by the next highest named rank. For example, ASV13884 was unclassifed at family level, so the NA was replaced with the next highest named rank (in this case order). Therefore the family-level classification for this ASV was changed to _o_Polyangiales_. Doing this allowed us to include uncalssifed abundance in our analyses. We include the following phyloseq objects containing the modifed taxonomies.

ssu18_ps_work_clean.rds : modified phyloseq object for the FULL (unfiltered) 16S rRNA data.
ssu18_ps_filt_clean.rds : modified phyloseq object for the Arbitrary filtered 16S rRNA data.
ssu18_ps_perfect_clean.rds : modified phyloseq object for the PERfect filtered 16S rRNA data.
ssu18_ps_pime_clean.rds : modified phyloseq object for the PIME filtered 16S rRNA data.

Source code for the workflow can be found here:
https://github.com/sweltr/high-temp/blob/master/taxa.Rmd
S
To explore the co-pathogenesis of obesity and nonalcoholic steatohepatitis...
scidb.cn
Updated Feb 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siwei.Wang; Jiarui.Li; LIkun.DU (2025). To explore the co-pathogenesis of obesity and nonalcoholic steatohepatitis based on bioinformatics analysis [Dataset]. http://doi.org/10.57760/sciencedb.j00217.05747
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.j00217.05747
Dataset updated
Feb 4, 2025
Dataset provided by
Science Data Bank
Authors
Siwei.Wang; Jiarui.Li; LIkun.DU
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Objective Bioinformatics methods were used to investigate the pathogenesis,disease-characteristic genes and immunoinvasive manifestations of obesity(OB)and nonalcoholic steatohepatitis(NASH), and to explore the correlation between disease-characteristic genes and immune cells.Methods OB and NASH related chips were obtained from GEO database,R language was used to analyze gene differences and WGCNA analysis,GO and KEGG enrichment were analyzed by intersection analysis,and protein-protein interaction network was constructed at the same time.Key genes were selected using 12 cytohubba methods,ROC curve and sample chip were used to detect the accuracy of key genes,and the disease characteristic genes with the best performance were selected.CIBERSORT algorithm was continued to analyze the immune infiltration of OB and NASH,and the correlation between disease characteristic genes and immune cells was analyzed.Results A total of 235 differential genes were obtained in the obesity training group GSE25401 and GSE151839,and 804 differential genes were obtained in the non-alcoholic steatohepatitis training group GSE63067 and GSE89632.GO analysis mainly involved the significant expression of interleukin 8 regulation.KEGG analysis showed that multiple comb inhibition complex and other pathways were closely related to OB and NASH.Key genes IL6,IL1B,IL1RN,VCAN and TNFAIP6 were selected by 12 cytohubba methods.ROC curve and sample chip were used to detect disease characteristic genes,and VCAN and IL1RN had the best effect.Conclusion: OB and NASH characteristic genes VCAN and IL1RN are significantly correlated with immune cells,which provides a preliminary basis for further research on OB and MASH targeted diagnosis and treatment.
d
Data from: Knowledge-based prediction of protein backbone conformation using...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Oct 23, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iyanar Vetrivel; Swapnil Mahajan; Manoj Tyagi; Lionel Hoffmann; Yves-Henri Sanejouand; Narayanaswamy Srinivasan; Alexandre de Brevern; Frédéric Cadet; Bernard Offmann (2018). Knowledge-based prediction of protein backbone conformation using a structural alphabet [Dataset]. http://doi.org/10.5061/dryad.3f5q5
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.3f5q5
Dataset updated
Oct 23, 2018
Dataset provided by
Dryad
Authors
Iyanar Vetrivel; Swapnil Mahajan; Manoj Tyagi; Lionel Hoffmann; Yves-Henri Sanejouand; Narayanaswamy Srinivasan; Alexandre de Brevern; Frédéric Cadet; Bernard Offmann
Time period covered
Oct 21, 2017
Description
DATA SET USED IN THE PAPERDATA SET FOR

Knowledge based prediction of protein backbone conformation using a structural alphabet

Authors : Iyanar Vetrivel, Swapnil Mahajan, Manoj Tyagi, Lionel Hoffmann, Yves-Henri Sanejouand, Narayanaswamy Srinivasan, Alexandre G. de Brevern, Frédéric Cadet, Bernard Offmann

Corresponding author : bernard.offmann@univ-nantes.fr

FILES DESCRIPTIONS :

1) PDB30_aaseq.txt. This file contains the amino acid sequence of the PDB chains from PDB30 dataset described in the manuscript.

2) PDB30_assigned_pb.txt. This file contains the PB sequence assignment of the PDB chains from the same PDB30 dataset. PB assignment is performed according to reference [29] in the manuscript.

3) PDB30_predited_pb.txt. This file contains the predicted PB sequences of the PDB chains from the PDB30 dataset as per the hybrid method described in the manuscript.DATA_KPRED.tar.gz

Facebook

Twitter

Click to copy link

Link copied

Cite

Marcus Braga; Fabrício Araujo; Edian Franco; Kenny Pinheiro; Jakelyne Silva; Denner Maués; Sebastiao Neto; Lucas Pompeu; Luis Guimaraes; Adriana Carneiro; Igor Hamoy; Rommel Ramos (2023). Table1_Bioinformatics on the Road: Taking Training to Students and Researchers Beyond State Capitals.DOCX [Dataset]. http://doi.org/10.3389/feduc.2021.726930.s001

Table1_Bioinformatics on the Road: Taking Training to Students and Researchers Beyond State Capitals.DOCX

Explore at:

docxAvailable download formats

Unique identifier

https://doi.org/10.3389/feduc.2021.726930.s001

Dataset updated

Jun 8, 2023

Dataset provided by

Frontiers

Authors

Marcus Braga; Fabrício Araujo; Edian Franco; Kenny Pinheiro; Jakelyne Silva; Denner Maués; Sebastiao Neto; Lucas Pompeu; Luis Guimaraes; Adriana Carneiro; Igor Hamoy; Rommel Ramos

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In Brazil, training capable bioinformaticians is done, mostly, in graduate programs, sometimes with experiences during the undergraduate period. However, this formation tends to be inefficient in attracting students to the area and mainly in attracting professionals to support research projects in research groups. To solve these issues, participation in short courses is important for training students and professionals in the usage of tools for specific areas that use bioinformatics, as well as in ways to develop solutions tailored to the local needs of academic institutions or research groups. In this aim, the project “Bioinformática na Estrada” (Bioinformatics on the Road) proposed improving bioinformaticians’ skills in undergraduate and graduate courses, primarily in the countryside of the State of Pará, in the Amazon region of Brazil. The project scope is practical courses focused on the areas of interest of the place where the courses are occurring to train and encourage students and researchers to work in this field, reducing the existing gap due to the lack of qualified bioinformatics professionals. Theoretical and practical workshops took place, such as Introduction to Bioinformatics, Computer Science Basics, Applications of Computational Intelligence applied to Bioinformatics and Biotechnology, Computational Tools for Bioinformatics, Soil Genomics and Research Perspectives and Horizons in the Amazon Region. In the end, 444 undergraduate and graduate students from higher education institutions in the state of Pará and other Brazilian states attended the events of the Bioinformatics on the Road project.

Clear search

Close search

Google apps

Main menu

Table1_Bioinformatics on the Road: Taking Training to Students and...

Molecular Biology Databases Published in Nucleic Acids Research between...

🧫 Promoter or not? - Bioinformatics 🗃️ Dataset

Bioinformatics Protein Dataset - Simulated

Subtitle

Description

Introduction

Columns Included

Inspiration and Sources

Proposed Uses

How This Dataset Was Created

Limitations

Data Split

Acknowledgment

Table_1_A Bioinformatics Approach to Explore MicroRNAs as Tools to Bridge...

Data_Sheet_1_Validation of a Bioinformatics Workflow for Routine Analysis of...

Data from: Semi-artificial datasets as a resource for validation of...

Bioinformatics Market Report

Microarray and bioinformatic analysis of conventional ameloblastoma

Table_6_Identification of molecular subtypes and immune infiltration in...

Data from: Multiple sequence alignment for functional correlation among low...

EchoBASE

Application of deep learning in biological networks

CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object)

CWL run of Alignment Workflow (CWLProv 0.6.0 Research Object)

Supplementary Material for: Integrative Bioinformatics Analysis Provides...

iSkylims demo dataset

(high-temp) No 4. Taxonomic: (16S rRNA/ITS) Output

To explore the co-pathogenesis of obesity and nonalcoholic steatohepatitis...

Data from: Knowledge-based prediction of protein backbone conformation using...

Table1_Bioinformatics on the Road: Taking Training to Students and Researchers Beyond State Capitals.DOCX