Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Familiarity with genome-scale data and the bioinformatic skills to analyze it have become essential for understanding and advancing modern biology and human health, yet many undergraduate biology majors are never exposed to hands-on bioinformatics. This paper presents a module that introduces students to applied bioinformatic analysis within the context of a research-based microbiology lab course. One of the most commonly used genomic analyses in biology is resequencing: determining the sequence of DNA bases in a derived strain of some organism, and comparing it to the known ancestral genome of that organism to better understand the phenotypic differences between them. Many existing CUREs — Course Based Undergraduate Research Experiences — evolve or select new strains of bacteria and compare them phenotypically to ancestral strains. This paper covers standardized strategies and procedures, accessible to undergraduates, for preparing and analyzing microbial whole-genome resequencing data to examine the genotypic differences between such strains. Wet-lab protocols and computational tutorials are provided, along with additional guidelines for educators, providing instructors without a next-generation sequencing or bioinformatics background the necessary information to incorporate whole-genome sequencing and command-line analysis into their class. This module introduces novice students to running software at the command-line, giving them exposure and familiarity with the types of tools that make up the vast majority of open-source scientific software used in contemporary biology. Completion of the module improves student attitudes toward computing, which may make them more likely to pursue further bioinformatics study.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundsCuprotosis is a newly discovered programmed cell death by modulating tricarboxylic acid cycle. Emerging evidence showed that cuprotosis-related genes (CRGs) are implicated in the occurrence and progression of multiple diseases. However, the mechanism of cuprotosis in heart failure (HF) has not been investigated yet.MethodsThe HF microarray datasets GSE16499, GSE26887, GSE42955, GSE57338, GSE76701, and GSE79962 were downloaded from the Gene Expression Omnibus (GEO) database to identify differentially expressed CRGs between HF patients and nonfailing donors (NFDs). Four machine learning models were used to identify key CRGs features for HF diagnosis. The expression profiles of key CRGs were further validated in a merged GEO external validation dataset and human samples through quantitative reverse-transcription polymerase chain reaction (qRT-PCR). In addition, Gene Ontology (GO) function enrichment, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment, and immune infiltration analysis were used to investigate potential biological functions of key CRGs.ResultsWe discovered nine differentially expressed CRGs in heart tissues from HF patients and NFDs. With the aid of four machine learning algorithms, we identified three indicators of cuprotosis (DLAT, SLC31A1, and DLST) in HF, which showed good diagnostic properties. In addition, their differential expression between HF patients and NFDs was confirmed through qRT-PCR. Moreover, the results of enrichment analyses and immune infiltration exhibited that these diagnostic markers of CRGs were strongly correlated to energy metabolism and immune activity.ConclusionsOur study discovered that cuprotosis was strongly related to the pathogenesis of HF, probably by regulating energy metabolism-associated and immune-associated signaling pathways.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Bioinformatics Market Size 2025-2029
The bioinformatics market size is valued to increase by USD 15.98 billion, at a CAGR of 17.4% from 2024 to 2029. Reduction in cost of genetic sequencing will drive the bioinformatics market.
Market Insights
North America dominated the market and accounted for a 43% growth during the 2025-2029.
By Application - Molecular phylogenetics segment was valued at USD 4.48 billion in 2023
By Product - Platforms segment accounted for the largest market revenue share in 2023
Market Size & Forecast
Market Opportunities: USD 309.88 million
Market Future Opportunities 2024: USD 15978.00 million
CAGR from 2024 to 2029 : 17.4%
Market Summary
The market is a dynamic and evolving field that plays a pivotal role in advancing scientific research and innovation in various industries, including healthcare, agriculture, and academia. One of the primary drivers of this market's growth is the rapid reduction in the cost of genetic sequencing, making it increasingly accessible to researchers and organizations worldwide. This affordability has led to an influx of large-scale genomic data, necessitating the development of sophisticated bioinformatics tools for Next-Generation Sequencing (NGS) data analysis. Another significant trend in the market is the shortage of trained laboratory professionals capable of handling and interpreting complex genomic data. This skills gap creates a demand for user-friendly bioinformatics software and services that can streamline data analysis and interpretation, enabling researchers to focus on scientific discovery rather than data processing. For instance, a leading pharmaceutical company could leverage bioinformatics tools to optimize its drug discovery pipeline by analyzing large genomic datasets to identify potential drug targets and predict their efficacy. By integrating these tools into its workflow, the company can reduce the time and cost associated with traditional drug discovery methods, ultimately bringing new therapies to market more efficiently. Despite its numerous benefits, the market faces challenges such as data security and privacy concerns, data standardization, and the need for interoperability between different software platforms. Addressing these challenges will require collaboration between industry stakeholders, regulatory bodies, and academic institutions to establish best practices and develop standardized protocols for data sharing and analysis.
What will be the size of the Bioinformatics Market during the forecast period?
Get Key Insights on Market Forecast (PDF) Request Free SampleBioinformatics, a dynamic and evolving market, is witnessing significant growth as businesses increasingly rely on high-performance computing, gene annotation, and bioinformatics software to decipher regulatory elements, gene expression regulation, and genomic variation. Machine learning algorithms, phylogenetic trees, and ontology development are integral tools for disease modeling and protein interactions. cloud computing platforms facilitate the storage and analysis of vast biological databases and sequence datas, enabling data mining techniques and statistical modeling for sequence assembly and drug discovery pipelines. Proteomic analysis, protein folding, and computational biology are crucial components of this domain, with biomedical ontologies and data integration platforms enhancing research efficiency. The integration of gene annotation and machine learning algorithms, for instance, has led to a 25% increase in accurate disease diagnosis within leading healthcare organizations. This trend underscores the importance of investing in advanced bioinformatics solutions for improved regulatory compliance, budgeting, and product strategy.
Unpacking the Bioinformatics Market Landscape
Bioinformatics, an essential discipline at the intersection of biology and computer science, continues to revolutionize the scientific landscape. Evolutionary bioinformatics, with its molecular dynamics simulation and systems biology approaches, enables a deeper understanding of biological processes, leading to improved ROI in research and development. For instance, next-generation sequencing technologies have reduced sequencing costs by a factor of ten, enabling genome-wide association studies and transcriptome sequencing on a previously unimaginable scale. In clinical bioinformatics, homology modeling techniques and protein-protein interaction analysis facilitate drug target identification, enhancing compliance with regulatory requirements. Phylogenetic analysis tools and comparative genomics studies contribute to the discovery of novel biomarkers and the development of personalized treatments. Bioimage informatics and proteomic data integration employ advanced sequence alignment algorithms and functional genomics tools to unlock new insights from complex
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Clustal W genetic phylogenetic analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A small bit of my thesis. Why are BMC PDFs so significantly larger on average than PLOS or Zootaxa PDFs?
data sources:
A) 'Zootaxa' the entire set of articles published in the journal Zootaxa from 2001 to 2012 inclusive, consisting of 11563 pdf files downloaded direct from the publisher website : http://mapress.com/zootaxa/ B) 'PLOS' the entire set of articles published across 7 different PLOS journals: PLOS ONE, PLOS Biology, PLOS Computational Biology, PLOS Genetics, PLOS Medicine, PLOS Neglected Tropical Diseases, and PLOS Pathogens from 2003 to 2010-06-04, consisting of 20694 articles obtained via BioTorrents (Langille & Eisen, 2010). C) 'BMC' a subsample of 7948 open access articles containing the stemword 'phylogen*' at least once in the fulltext from the wide range of journals that BioMedCentral publish (the OA subset of this selection of papers: http://www.citeulike.org/user/testtest87)
Facebook
TwitterBackground: Systemic sclerosis (scleroderma; SSc), a rare and heterogeneous connective tissue disease, remains unclear in terms of its underlying causative genes and effective therapeutic approaches. The purpose of the present study was to identify hub genes, diagnostic markers and explore potential small-molecule drugs of SSc.Methods: The cohorts of data used in this study were downloaded from the Gene Expression Complex (GEO) database. Integrated bioinformatic tools were utilized for exploration, including Weighted Gene Co-Expression Network Analysis (WGCNA), least absolute shrinkage and selection operator (LASSO) regression, gene set enrichment analysis (GSEA), Connectivity Map (CMap) analysis, molecular docking, and pharmacokinetic/toxicity properties exploration.Results: Seven hub genes (THY1, SULF1, PRSS23, COL5A2, NNMT, SLCO2B1, and TIMP1) were obtained in the merged gene expression profiles of GSE45485 and GSE76885. GSEA results have shown that they are associated with autoimmune diseases, microorganism infections, inflammatory related pathways, immune responses, and fibrosis process. Among them, THY1 and SULF1 were identified as diagnostic markers and validated in skin samples from GSE32413, GSE95065, GSE58095 and GSE125362. Finally, ten small-molecule drugs with potential therapeutic effects were identified, mainly including phosphodiesterase (PDE) inhibitors (BRL-50481, dipyridamole), TGF-β receptor inhibitor (SB-525334), and so on.Conclusion: This study provides new sights into a deeper understanding the molecular mechanisms in the pathogenesis of SSc. More importantly, the results may offer promising clues for further experimental studies and novel treatment strategies.
Facebook
Twitterhttps://wemarketresearch.com/privacy-policyhttps://wemarketresearch.com/privacy-policy
The Bioinformatics Services Market will grow from $4.3B in 2025 to $15.7B by 2035, at a CAGR of 12.6%, driven by rising demand for biologics and biosimilars.
| Report Attribute | Description |
|---|---|
| Market Size in 2025 | USD 4.3 Billion |
| Market Forecast in 2035 | USD 15.7 Billion |
| CAGR % 2025-2035 | 12.6% |
| Base Year | 2024 |
| Historic Data | 2020-2024 |
| Forecast Period | 2025-2035 |
| Report USP | Production, Consumption, company share, company heatmap, company production capacity, growth factors and more |
| Segments Covered | By Service Type, By Application, By End-user |
| Regional Scope | North America, Europe, APAC, Latin America, Middle East and Africa |
| Country Scope | U.S., Canada, U.K., Germany, France, Italy, Spain, Benelux, Nordic Countries, Russia, China, India, Japan, South Korea, Australia, Indonesia, Thailand, Mexico, Brazil, Argentina, Saudi Arabia, UAE, Egypt, South Africa, Nigeria |
Facebook
TwitterIntroductionLupus nephritis (LN) is a major risk factor of morbidity and mortality. Glomerular injury is associated with different pathogeneses and clinical presentations in LN patients. However, the molecular mechanisms involved are not well understood. This study aimed to explore the molecular characteristics and mechanisms of this disease using bioinformatics analysis.MethodsTo characterize glomeruli in LN, microarray datasets GSE113342 and GSE32591 were downloaded from the Gene Expression Omnibus database and analyzed to determine the differentially expressed genes (DEGs) between LN glomeruli and normal glomeruli. Functional enrichment analyses and protein–protein interaction network analyses were then performed. Module analysis was performed using the Search Tool for the Retrieval of Interacting Genes/Proteins and Cytoscape software. Immunofluorescence staining was performed to identify the glomerular expression of S100A8 in various International Society of Nephrology/Renal Pathology Society (ISN/RPS) class LN patients. The image of each glomerulus was acquired using a digital imaging system, and the green fluorescence intensity was quantified using Image-Pro Plus software.ResultsA total of 13 DEGs, consisting of 12 downregulated genes and one upregulated gene (S100A8), were identified in the microarray datasets. The functions and pathways associated with the DEGs mainly include inflammatory response, innate immune response, neutrophil chemotaxis, leukocyte migration, cell adhesion, cell–cell signaling, and infection. We also found that monocytes and activated natural killer cells were upregulated in both GSE113342 and GSE32591. Glomerular S100A8 staining was significantly enhanced compared to that in the controls, especially in class IV.ConclusionsThe DEGs identified in the present study help us understand the underlying molecular mechanisms of LN. Our results show that glomerular S100A8 expression varies in different pathological types; however, further research is required to confirm the role of S100A8 in LN.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Modern research is increasingly data-driven and reliant on bioinformatics software. Publication is a common way of introducing new software, but not all bioinformatics tools get published. Giving there are competing tools, it is important not merely to find the appropriate software, but have a metric for judging its usefulness. Journal's impact factor has been shown to be a poor predictor of software popularity; consequently, focusing on publications in high-impact journals limits user's choices in finding useful bioinformatics tools. Free and open source software repositories on popular code sharing platforms such as GitHub provide another venue to follow the latest bioinformatics trends. The open source component of GitHub allows users to bookmark and copy repositories that are most useful to them. This Perspective aims to demonstrate the utility of GitHub “stars,” “watchers,” and “forks” (GitHub statistics) as a measure of software impact. We compiled lists of impactful bioinformatics software and analyzed commonly used impact metrics and GitHub statistics of 50 genomics-oriented bioinformatics tools. We present examples of community-selected best bioinformatics resources and show that GitHub statistics are distinct from the journal's impact factor (JIF), citation counts, and alternative metrics (Altmetrics, CiteScore) in capturing the level of community attention. We suggest the use of GitHub statistics as an unbiased measure of the usability of bioinformatics software complementing the traditional impact metrics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Databases used for MyCodentifier a Nextflow pipeline to identify Mycobacterium tuberculosis complex (MTBC) and Nontuberculous mycobacteria (NTM) species from Next-generation sequencing (NGS) data.
Short description:
The pipeline is constructed using nextflow as workflow manager running in a docker container. It is able to identify species of MTBC/NTM from positive Mycobacterial Growth Indicator Tube (MGIT) cultures. To do so it uses an hsp65 database for fast identification coupled with a Metagenomic method using centrifuge to identify on genome level. For TB it also is able to identify subspecies. Results are presented in automated pdf and html reports.
| Name | Short Description |
| 20220726_ref.tar.gz | 7 major mycobacterial genomes as centrifuge classification database, used for reference-based mapping and genotype resistance prediction |
| 20220726_wgs_centrifuge_db_Radboudumc_MB.tar.gz | centrifuge classification database using Tortoli et al 2017 Mycobacterium strains + additional strains |
| genomes.tar.gz | 7 major mycobacterial genomes, annotation and Genbank files. Files are paired with 20220726_ref.tar.gz |
| snpEff.tar.gz | 7 major mycobacterial genomes annotation models for snpEff. |
| Tortoli_etal_hsp65.tar.gz | KMA database of hsp65 gene extractions of the Tortoli et al 2017 Mycobacterium strains. |
|
Used in the study: |
Databases available via ftp://ftp.ccb.jhu.edu/pub/infphilo/centrifuge/data or https://ccb.jhu.edu/software/centrifuge/manual.shtml#custom-database |
MyCodentifier Github:
https://jordycoolen.github.io/MyCodentifier/
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The text file contains the original DNA sequence data used in the phylogenetic analyses of Krishnankutty et al. (2016: Systematic Entomology 41: 580–595). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The file contains five separate data blocks, one for each character partition (28S, histone H3, 12S, indels, and morphology) for 53 taxa (species). Gaps inserted into the DNA sequence alignment are indicated by a dash, and missing data are indicated by a question mark. The separate "indels1" block includes 40 indels (insertions/deletions) from the 28S sequence alignment re-coded using the modified complex indel coding scheme, as described in the "Materials and methods" of the original publication. The DIMENSIONS statements near the beginning of each block indicate the numbers of taxa (NTax) and characters (NChar). The file contains aligned nucleotide sequence data for 3 gene regions and 40 morphological characters. The file is configured for use with the maximum likelihood-based phylogenetic program GARLI but can also be parsed by any other bioinformatics software that supports the NEXUS format. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the supplementary document included as a separate pdf. The original raw DNA sequence data are available from NCBI GenBank under the accession numbers indicated in the supporting pdf file. More details on individual analyses are provided in the original publication.
Facebook
TwitterBackgroundUC patients suffer more from colorectal cancer (CRC) than the general population, which increases with disease duration. Early colonoscopy is difficult because ulcerative colitis-associated colorectal cancer (UCAC) lesions are flat and multifocal. Our study aimed to identify promising UCAC biomarkers that are complementary endoscopy strategies in the early stages.MethodsThe datasets may be accessed from the Gene Expression Omnibus and The Cancer Genome Atlas databases. The co-expressed modules of UC and CRC were determined via weighted co-expression network analysis (WGCNA). The biological mechanisms of the shared genes were exported for analysis using the Gene Ontology and Kyoto Encyclopedia of Genes and Genomes. To identify protein interactions and hub genes, a protein-protein interaction network and CytoHubba analysis were conducted. To evaluate gene expression, external datasets and experimental validation of human colon tissues were utilized. The diagnostic value of core genes was examined through receiver operating characteristic (ROC) curves. Immune infiltration analysis was employed to investigate the associations between immune cell populations and hub genes.ResultsThree crucial modules were identified from the WGCNA of UC and CRC tissues, and 33 coexpressed genes that were predominantly enriched in the NF-κB pathway were identified. Two biomarkers (CXCL1 and BCL6) were identified via Cytoscape and validated in external datasets and human colon tissues. CRC patients expressed CXCL1 at the highest level, whereas UC and CRC patients showed higher levels than the controls. The UC cohort expressed BCL6 at the highest level, whereas the UC and CRC cohorts expressed it more highly than the controls. The hub genes exhibited significant diagnostic potential (ROC curve > 0.7). The immune infiltration results revealed a correlation among the hub genes and macrophages, neutrophils and B cells.ConclusionsThe findings of our research suggest that BCL6 and CXCL1 could serve as effective biomarkers for UCAC surveillance. Additionally, they demonstrated a robust correlation with immune cell populations within the CRC tumour microenvironment (TME). Our findings provide a valuable insight about diagnosis and therapy of UCAC.
Facebook
TwitterBackgroundType 2 diabetes (T2DM) combined with fatty liver is a subtype of metabolic fatty liver disease (MAFLD), and the relationship between T2DM and MAFLD is close and mutually influential. However, the connection and mechanisms between the two are still unclear. Therefore, we aimed to identify potential biomarkers for diagnosing both conditions.MethodsWe performed differential expression analysis and weighted gene correlation network analysis (WGCNA) on publicly available data on the two diseases in the Gene Expression Omnibus database to find genes related to both conditions. We utilised protein–protein interactions (PPIs), Gene Ontology, and the Kyoto Encyclopedia of Genes and Genomes to identify T2DM-associated MAFLD genes and potential mechanisms. Candidate biomarkers were screened using machine learning algorithms combined with 12 cytoHubba algorithms, and a diagnostic model for T2DM-related MAFLD was constructed and evaluated.The CIBERSORT method was used to investigate immune cell infiltration in MAFLD and the immunological significance of central genes. Finally, we collected whole blood from patients with T2DM-related MAFLD, MAFLD patients and healthy individuals, and used high-fat, high-glucose combined with high-fat cell models to verify the expression of hub genes.ResultsDifferential expression analysis and WGCNA identified 354 genes in the MAFLD dataset. The differential expression analysis of the T2DM-peripheral blood mononuclear cells/liver dataset screened 91 T2DM-associated secreted proteins. PPI analysis revealed two important modules of T2DM-related pathogenic genes in MAFLD, which contained 49 nodes, suggesting their involvement in cell interaction, inflammation, and other processes. TNFSF10, SERPINB2, and TNFRSF1A were the only coexisting genes shared between MAFLD key genes and T2DM-related secreted proteins, enabling the construction of highly accurate diagnostic models for both disorders. Additionally, high-fat, high-glucose combined with high-fat cell models were successfully produced. The expression patterns of TNFRSF1A and SERPINB2 were verified in patient blood and our cellular model. Immune dysregulation was observed in MAFLD, with TNFRSF1A and SERPINB2 strongly linked to immune regulation.ConclusionThe sensitivity and accuracy in diagnosing and predicting T2DM-associated MAFLD can be greatly improved using SERPINB2 and TNFRSF1A. These genes may significantly influence the development of T2DM-associated MAFLD, offering new diagnostic options for patients with T2DM combined with MAFLD.
Facebook
TwitterAssay for transposase-accessible chromatin using sequencing data (ATAC-seq) is an efficient and precise method for revealing chromatin accessibility across the genome. Most of the current ATAC-seq tools follow chromatin immunoprecipitation sequencing (ChIP-seq) strategies that do not consider ATAC-seq-specific properties. To incorporate specific ATAC-seq quality control and the underlying biology of chromatin accessibility, we developed a bioinformatics software named ATACgraph for analyzing and visualizing ATAC-seq data. ATACgraph profiles accessible chromatin regions and provides ATAC-seq-specific information including definitions of nucleosome-free regions (NFRs) and nucleosome-occupied regions. ATACgraph also allows identification of differentially accessible regions between two ATAC-seq datasets. ATACgraph incorporates the docker image with the Galaxy platform to provide an intuitive user experience via the graphical interface. Without tedious installation processes on a local machine or cloud, users can analyze data through activated websites using pre-designed workflows or customized pipelines composed of ATACgraph modules. Overall, ATACgraph is an effective tool designed for ATAC-seq for biologists with minimal bioinformatics knowledge to analyze chromatin accessibility. ATACgraph can be run on any ATAC-seq data with no limit to specific genomes. As validation, we demonstrated ATACgraph on human genome to showcase its functions for ATAC-seq interpretation. This software is publicly accessible and can be downloaded at https://github.com/RitataLU/ATACgraph.
Facebook
TwitterHepatocellular carcinoma (HCC) accounts for approximately 85–90% of all liver cancer cases and has poor relapse-free survival. There are many gene expression studies that have been performed to elucidate the genetic landscape and driver pathways leading to HCC. However, existing studies have been limited by the sample size and thus the pathogenesis of HCC is still unclear. In this study, we performed an integrated characterization using four independent datasets including 320 HCC samples and 270 normal liver tissues to identify the candidate genes and pathways in the progression of HCC. A total of 89 consistent differentially expression genes (DEGs) were identified. Gene-set enrichment analysis revealed that these genes were significantly enriched for cellular response to zinc ion in biological process group, collagen trimer in the cellular component group, extracellular matrix (ECM) structural constituent conferring tensile strength in the molecular function group, protein digestion and absorption, mineral absorption and ECM-receptor interaction. Network system biology based on the protein–protein interaction (PPI) network was also performed to identify the most connected and important genes based on our DEGs. The top five hub genes including osteopontin (SPP1), Collagen alpha-2(I) chain (COL1A2), Insulin-like growth factor I (IGF1), lipoprotein A (LPA), and Galectin-3 (LGALS3) were identified. Western blot and immunohistochemistry analysis were employed to verify the differential protein expression of hub genes in HCC patients. More importantly, we identified that these five hub genes were significantly associated with poor disease-free survival and overall survival. In summary, we have identified a potential clinical significance of these genes as prognostic biomarkers for HCC patients who would benefit from experimental approaches to obtain optimal outcome.
Facebook
TwitterDespite being a well-established research method, the use of whole-genome sequencing (WGS) for routine molecular typing and pathogen characterization remains a substantial challenge due to the required bioinformatics resources and/or expertise. Moreover, many national reference laboratories and centers, as well as other laboratories working under a quality system, require extensive validation to demonstrate that employed methods are “fit-for-purpose” and provide high-quality results. A harmonized framework with guidelines for the validation of WGS workflows does currently, however, not exist yet, despite several recent case studies highlighting the urgent need thereof. We present a validation strategy focusing specifically on the exhaustive characterization of the bioinformatics analysis of a WGS workflow designed to replace conventionally employed molecular typing methods for microbial isolates in a representative small-scale laboratory, using the pathogen Neisseria meningitidis as a proof-of-concept. We adapted several classically employed performance metrics specifically toward three different bioinformatics assays: resistance gene characterization (based on the ARG-ANNOT, ResFinder, CARD, and NDARO databases), several commonly employed typing schemas (including, among others, core genome multilocus sequence typing), and serogroup determination. We analyzed a core validation dataset of 67 well-characterized samples typed by means of classical genotypic and/or phenotypic methods that were sequenced in-house, allowing to evaluate repeatability, reproducibility, accuracy, precision, sensitivity, and specificity of the different bioinformatics assays. We also analyzed an extended validation dataset composed of publicly available WGS data for 64 samples by comparing results of the different bioinformatics assays against results obtained from commonly used bioinformatics tools. We demonstrate high performance, with values for all performance metrics >87%, >97%, and >90% for the resistance gene characterization, sequence typing, and serogroup determination assays, respectively, for both validation datasets. Our WGS workflow has been made publicly available as a “push-button” pipeline for Illumina data at https://galaxy.sciensano.be to showcase its implementation for non-profit and/or academic usage. Our validation strategy can be adapted to other WGS workflows for other pathogens of interest and demonstrates the added value and feasibility of employing WGS with the aim of being integrated into routine use in an applied public health setting.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bioinformatics skills are increasingly relevant to research in most areas of the life sciences. The availability of genome sequences and large data sets provide unique opportunities to incorporate bioinformatics exercises into undergraduate microbiology courses. The goal of this project was to develop a teaching module to investigate the abundance and phylogenetic relationships amongst bacteriophages using a set of freely available bioinformatics tools. Computational identification and examination of bacteriophage genomes, followed by phylogenetic analyses, provides opportunities to incorporate core bioinformatics competencies in microbiology courses and enhance students’ bioinformatics skills. The first activity consisted of using PHASTER (PHAge Search Tool Enhanced Release), a bioinformatics tool that identifies bacteriophage sequences within bacterial chromosomes. Further computational analyses were conducted to align bacteriophage proteins, genomes, and determine phylogenetic relationships amongst these viruses. This part of the project was carried out using the Clustal omega, MAFFT (Multiple Alignment using Fast Fourier Transform), and Interactive Tree of Life (iTOL) programs for sequence alignments and phylogenetic analyses. The laboratory activities were field tested in undergraduate directed research, and microbiology classes. The learning objectives were assessed by comparing the scores of pre and post-tests and grading final presentations. Post-tests were higher than pre-test scores at or below p = 0.002. The data suggest in silico phage hunting improves students’ ability to search databases, interpret phylogenetic trees, and use bioinformatics tools to examine genome structure. This activity allows instructors to integrate key bioinformatic concepts in their curriculums and gives students the opportunity to participate in a research-directed learning environment in the classroom.
Facebook
Twitterhttp://www.gnu.org/licenses/gpl-3.0.en.htmlhttp://www.gnu.org/licenses/gpl-3.0.en.html
In support of the manuscript by Bagley et al. (2018), this accession provides scripts and information that were used to conduct MTML-msBayes analyses included in the paper. To meet PeerJ requirements, we also provide files containing the raw data and input files (DNA sequence alignments) that we analyzed in the paper. See the README file provided in Markdown and PDF formats for additional information on the files contained within this accession, as well as how they were strung together in a UNIX/LINUX pipeline workflow to conduct hierarchical approximate Bayesian computation (hABC) analyses reported in the corresponding manuscript (Bagley et al. 2018). Licensing information is discussed in the README and provided in full in the "LICENSE.md" file.
REFERENCES
Bagley, J.C., Hickerson, M.J. & Johnson, J.B. (2018) Testing hypotheses of diversification in Panamanian frogs and freshwater fishes using hierarchical approximate Bayesian computation with model averaging. Diversity.
Facebook
TwitterThe molecular mechanisms underlying obesity-related cardiomyopathy (ORCM) progression involve multiple signaling pathways, and the pharmacological treatment for ORCM is still limited. Thus, it is necessary to explore new targets and develop novel therapies. Microarray analysis for gene expression profiles using different bioinformatics tools has been an effective strategy for identifying novel targets for various diseases. In this study, we aimed to explore the potential genes related to ORCM using the integrated bioinformatics analysis. The GSE18897 (whole blood expression profiling of obese diet-sensitive, obese diet-resistant, and lean human subjects) and GSE47022 (regular weight C57BL/6 and diet-induced obese C57BL/6 mice) were used for bioinformatics analysis. Weighted gene co-expression network analysis (WGCNA) of GSE18897 was employed to investigate gene modules that were strongly correlated with clinical phenotypes. Gene Ontology (GO) functional enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed on the co-expression genes. The expression levels of the hub genes were validated in the clinical samples. Yellow co-expression module of WGCNA in GSE18897 was found to be significantly related to the caloric restriction treatment. In addition, GO functional enrichment analysis and KEGG pathway analysis were performed on the co-expression genes in yellow co-expression module, which showed an association with oxygen transport and the porphyrins pathway. Overlap analysis of yellow co-expression module genes from GSE18897 andGSE47022 revealed six upregulated genes, and further experimental validation results showed that elongation of very-long-chain fatty acids protein 4 (ELOVL4), matrix metalloproteinase-8 (MMP-8), and interleukin-33 (IL-33) were upregulated in the peripheral blood from patients with ORCM compared to that in the controls. The bioinformatics analysis revealed that ELOVL4 expression levels are positively correlated with that of IL-33. Collectively, using WGCNA in combination with integrated bioinformatics analysis, the hub genes of ELVOL4 and IL-33 might serve as potential biomarkers for diagnosis and/or therapeutic targets for ORCM. The detailed roles of ELVOL4 and IL-33 in the pathophysiology of ORCM still require further investigation.
Facebook
TwitterThoracic aortic aneurysm and dissection (TAAD) is a high-risk aortic disease. Mouse models are usually used to explore the pathological progression of TAAD. In our studies, we performed bioinformatics analysis on a microarray dataset (GSE36778) and verified experiments to define the integrated hub genes of TAAD in three different mouse models. Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and protein–protein interaction (PPI) network analyses, and histological and quantitative reverse transcription-PCR (qRT–PCR) experiments were used in our study. First, differentially expressed genes (DEGs) were identified, and twelve common differentially expressed genes were found. Second, genes related to the cell cycle and inflammation were enriched by using GO and PPI. We focused on filtering and validating eighteen hub genes that were upregulated. Then, expression data from human ascending aortic tissues in the GSE153434 dataset were also used to verify our findings. These results indicated that cell cycle-related genes participate in the pathological mechanism of TAAD and provide new insight into the molecular mechanisms of TAAD.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Familiarity with genome-scale data and the bioinformatic skills to analyze it have become essential for understanding and advancing modern biology and human health, yet many undergraduate biology majors are never exposed to hands-on bioinformatics. This paper presents a module that introduces students to applied bioinformatic analysis within the context of a research-based microbiology lab course. One of the most commonly used genomic analyses in biology is resequencing: determining the sequence of DNA bases in a derived strain of some organism, and comparing it to the known ancestral genome of that organism to better understand the phenotypic differences between them. Many existing CUREs — Course Based Undergraduate Research Experiences — evolve or select new strains of bacteria and compare them phenotypically to ancestral strains. This paper covers standardized strategies and procedures, accessible to undergraduates, for preparing and analyzing microbial whole-genome resequencing data to examine the genotypic differences between such strains. Wet-lab protocols and computational tutorials are provided, along with additional guidelines for educators, providing instructors without a next-generation sequencing or bioinformatics background the necessary information to incorporate whole-genome sequencing and command-line analysis into their class. This module introduces novice students to running software at the command-line, giving them exposure and familiarity with the types of tools that make up the vast majority of open-source scientific software used in contemporary biology. Completion of the module improves student attitudes toward computing, which may make them more likely to pursue further bioinformatics study.