100+ datasets found
  1. f

    Data_Sheet_2_Resequencing of Microbial Isolates: A Lab Module to Introduce...

    • frontiersin.figshare.com
    pdf
    Updated Jun 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katherine Lynn Petrie; Rujia Xie (2023). Data_Sheet_2_Resequencing of Microbial Isolates: A Lab Module to Introduce Novices to Command-Line Bioinformatics.PDF [Dataset]. http://doi.org/10.3389/fmicb.2021.578859.s002
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Frontiers
    Authors
    Katherine Lynn Petrie; Rujia Xie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Familiarity with genome-scale data and the bioinformatic skills to analyze it have become essential for understanding and advancing modern biology and human health, yet many undergraduate biology majors are never exposed to hands-on bioinformatics. This paper presents a module that introduces students to applied bioinformatic analysis within the context of a research-based microbiology lab course. One of the most commonly used genomic analyses in biology is resequencing: determining the sequence of DNA bases in a derived strain of some organism, and comparing it to the known ancestral genome of that organism to better understand the phenotypic differences between them. Many existing CUREs — Course Based Undergraduate Research Experiences — evolve or select new strains of bacteria and compare them phenotypically to ancestral strains. This paper covers standardized strategies and procedures, accessible to undergraduates, for preparing and analyzing microbial whole-genome resequencing data to examine the genotypic differences between such strains. Wet-lab protocols and computational tutorials are provided, along with additional guidelines for educators, providing instructors without a next-generation sequencing or bioinformatics background the necessary information to incorporate whole-genome sequencing and command-line analysis into their class. This module introduces novice students to running software at the command-line, giving them exposure and familiarity with the types of tools that make up the vast majority of open-source scientific software used in contemporary biology. Completion of the module improves student attitudes toward computing, which may make them more likely to pursue further bioinformatics study.

  2. Article PDF Filesizes

    • figshare.com
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ross Mounce (2023). Article PDF Filesizes [Dataset]. http://doi.org/10.6084/m9.figshare.748784.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ross Mounce
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A small bit of my thesis. Why are BMC PDFs so significantly larger on average than PLOS or Zootaxa PDFs?

    data sources:

    A) 'Zootaxa' the entire set of articles published in the journal Zootaxa from 2001 to 2012 inclusive, consisting of 11563 pdf files downloaded direct from the publisher website : http://mapress.com/zootaxa/ B) 'PLOS' the entire set of articles published across 7 different PLOS journals: PLOS ONE, PLOS Biology, PLOS Computational Biology, PLOS Genetics, PLOS Medicine, PLOS Neglected Tropical Diseases, and PLOS Pathogens from 2003 to 2010-06-04, consisting of 20694 articles obtained via BioTorrents (Langille & Eisen, 2010). C) 'BMC' a subsample of 7948 open access articles containing the stemword 'phylogen*' at least once in the fulltext from the wide range of journals that BioMedCentral publish (the OA subset of this selection of papers: http://www.citeulike.org/user/testtest87)

  3. Bioinformatics Market Analysis, Size, and Forecast 2025-2029: North America...

    • technavio.com
    pdf
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Bioinformatics Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Italy, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/bioinformatics-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 18, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    Europe, France, Germany, North America, Canada, United Kingdom, United States
    Description

    Snapshot img

    Bioinformatics Market Size 2025-2029

    The bioinformatics market size is valued to increase by USD 15.98 billion, at a CAGR of 17.4% from 2024 to 2029. Reduction in cost of genetic sequencing will drive the bioinformatics market.

    Market Insights

    North America dominated the market and accounted for a 43% growth during the 2025-2029.
    By Application - Molecular phylogenetics segment was valued at USD 4.48 billion in 2023
    By Product - Platforms segment accounted for the largest market revenue share in 2023
    

    Market Size & Forecast

    Market Opportunities: USD 309.88 million 
    Market Future Opportunities 2024: USD 15978.00 million
    CAGR from 2024 to 2029 : 17.4%
    

    Market Summary

    The market is a dynamic and evolving field that plays a pivotal role in advancing scientific research and innovation in various industries, including healthcare, agriculture, and academia. One of the primary drivers of this market's growth is the rapid reduction in the cost of genetic sequencing, making it increasingly accessible to researchers and organizations worldwide. This affordability has led to an influx of large-scale genomic data, necessitating the development of sophisticated bioinformatics tools for Next-Generation Sequencing (NGS) data analysis. Another significant trend in the market is the shortage of trained laboratory professionals capable of handling and interpreting complex genomic data. This skills gap creates a demand for user-friendly bioinformatics software and services that can streamline data analysis and interpretation, enabling researchers to focus on scientific discovery rather than data processing. For instance, a leading pharmaceutical company could leverage bioinformatics tools to optimize its drug discovery pipeline by analyzing large genomic datasets to identify potential drug targets and predict their efficacy. By integrating these tools into its workflow, the company can reduce the time and cost associated with traditional drug discovery methods, ultimately bringing new therapies to market more efficiently. Despite its numerous benefits, the market faces challenges such as data security and privacy concerns, data standardization, and the need for interoperability between different software platforms. Addressing these challenges will require collaboration between industry stakeholders, regulatory bodies, and academic institutions to establish best practices and develop standardized protocols for data sharing and analysis.

    What will be the size of the Bioinformatics Market during the forecast period?

    Get Key Insights on Market Forecast (PDF) Request Free SampleBioinformatics, a dynamic and evolving market, is witnessing significant growth as businesses increasingly rely on high-performance computing, gene annotation, and bioinformatics software to decipher regulatory elements, gene expression regulation, and genomic variation. Machine learning algorithms, phylogenetic trees, and ontology development are integral tools for disease modeling and protein interactions. cloud computing platforms facilitate the storage and analysis of vast biological databases and sequence datas, enabling data mining techniques and statistical modeling for sequence assembly and drug discovery pipelines. Proteomic analysis, protein folding, and computational biology are crucial components of this domain, with biomedical ontologies and data integration platforms enhancing research efficiency. The integration of gene annotation and machine learning algorithms, for instance, has led to a 25% increase in accurate disease diagnosis within leading healthcare organizations. This trend underscores the importance of investing in advanced bioinformatics solutions for improved regulatory compliance, budgeting, and product strategy.

    Unpacking the Bioinformatics Market Landscape

    Bioinformatics, an essential discipline at the intersection of biology and computer science, continues to revolutionize the scientific landscape. Evolutionary bioinformatics, with its molecular dynamics simulation and systems biology approaches, enables a deeper understanding of biological processes, leading to improved ROI in research and development. For instance, next-generation sequencing technologies have reduced sequencing costs by a factor of ten, enabling genome-wide association studies and transcriptome sequencing on a previously unimaginable scale. In clinical bioinformatics, homology modeling techniques and protein-protein interaction analysis facilitate drug target identification, enhancing compliance with regulatory requirements. Phylogenetic analysis tools and comparative genomics studies contribute to the discovery of novel biomarkers and the development of personalized treatments. Bioimage informatics and proteomic data integration employ advanced sequence alignment algorithms and functional genomics tools to unlock new insights from complex

  4. f

    Datasheet2_Integrative analysis of bioinformatics and machine learning to...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Mar 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ma, Chaoqun; Sun, Jie; Zuo, Xiaoli; Tu, Dingyuan; Xu, Qiang; Luan, Yanmin (2024). Datasheet2_Integrative analysis of bioinformatics and machine learning to identify cuprotosis-related biomarkers and immunological characteristics in heart failure.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001355558
    Explore at:
    Dataset updated
    Mar 18, 2024
    Authors
    Ma, Chaoqun; Sun, Jie; Zuo, Xiaoli; Tu, Dingyuan; Xu, Qiang; Luan, Yanmin
    Description

    BackgroundsCuprotosis is a newly discovered programmed cell death by modulating tricarboxylic acid cycle. Emerging evidence showed that cuprotosis-related genes (CRGs) are implicated in the occurrence and progression of multiple diseases. However, the mechanism of cuprotosis in heart failure (HF) has not been investigated yet.MethodsThe HF microarray datasets GSE16499, GSE26887, GSE42955, GSE57338, GSE76701, and GSE79962 were downloaded from the Gene Expression Omnibus (GEO) database to identify differentially expressed CRGs between HF patients and nonfailing donors (NFDs). Four machine learning models were used to identify key CRGs features for HF diagnosis. The expression profiles of key CRGs were further validated in a merged GEO external validation dataset and human samples through quantitative reverse-transcription polymerase chain reaction (qRT-PCR). In addition, Gene Ontology (GO) function enrichment, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment, and immune infiltration analysis were used to investigate potential biological functions of key CRGs.ResultsWe discovered nine differentially expressed CRGs in heart tissues from HF patients and NFDs. With the aid of four machine learning algorithms, we identified three indicators of cuprotosis (DLAT, SLC31A1, and DLST) in HF, which showed good diagnostic properties. In addition, their differential expression between HF patients and NFDs was confirmed through qRT-PCR. Moreover, the results of enrichment analyses and immune infiltration exhibited that these diagnostic markers of CRGs were strongly correlated to energy metabolism and immune activity.ConclusionsOur study discovered that cuprotosis was strongly related to the pathogenesis of HF, probably by regulating energy metabolism-associated and immune-associated signaling pathways.

  5. f

    DataSheet1_Hub genes, diagnostic model, and predicted drugs in systemic...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Jul 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shen, Chen; Wu, Yun; Yan, Yue-Mei; Hu, Fei-Fei; Jin, Meng-Zhu; Wang, Qiang; Li, Sheng-Hua; Yin, Wen-Hao (2023). DataSheet1_Hub genes, diagnostic model, and predicted drugs in systemic sclerosis by integrated bioinformatics analysis.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000953790
    Explore at:
    Dataset updated
    Jul 12, 2023
    Authors
    Shen, Chen; Wu, Yun; Yan, Yue-Mei; Hu, Fei-Fei; Jin, Meng-Zhu; Wang, Qiang; Li, Sheng-Hua; Yin, Wen-Hao
    Description

    Background: Systemic sclerosis (scleroderma; SSc), a rare and heterogeneous connective tissue disease, remains unclear in terms of its underlying causative genes and effective therapeutic approaches. The purpose of the present study was to identify hub genes, diagnostic markers and explore potential small-molecule drugs of SSc.Methods: The cohorts of data used in this study were downloaded from the Gene Expression Complex (GEO) database. Integrated bioinformatic tools were utilized for exploration, including Weighted Gene Co-Expression Network Analysis (WGCNA), least absolute shrinkage and selection operator (LASSO) regression, gene set enrichment analysis (GSEA), Connectivity Map (CMap) analysis, molecular docking, and pharmacokinetic/toxicity properties exploration.Results: Seven hub genes (THY1, SULF1, PRSS23, COL5A2, NNMT, SLCO2B1, and TIMP1) were obtained in the merged gene expression profiles of GSE45485 and GSE76885. GSEA results have shown that they are associated with autoimmune diseases, microorganism infections, inflammatory related pathways, immune responses, and fibrosis process. Among them, THY1 and SULF1 were identified as diagnostic markers and validated in skin samples from GSE32413, GSE95065, GSE58095 and GSE125362. Finally, ten small-molecule drugs with potential therapeutic effects were identified, mainly including phosphodiesterase (PDE) inhibitors (BRL-50481, dipyridamole), TGF-β receptor inhibitor (SB-525334), and so on.Conclusion: This study provides new sights into a deeper understanding the molecular mechanisms in the pathogenesis of SSc. More importantly, the results may offer promising clues for further experimental studies and novel treatment strategies.

  6. f

    Data_Sheet_1_GitHub Statistics as a Measure of the Impact of Open-Source...

    • frontiersin.figshare.com
    • figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikhail G. Dozmorov (2023). Data_Sheet_1_GitHub Statistics as a Measure of the Impact of Open-Source Bioinformatics Software.PDF [Dataset]. http://doi.org/10.3389/fbioe.2018.00198.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Mikhail G. Dozmorov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Modern research is increasingly data-driven and reliant on bioinformatics software. Publication is a common way of introducing new software, but not all bioinformatics tools get published. Giving there are competing tools, it is important not merely to find the appropriate software, but have a metric for judging its usefulness. Journal's impact factor has been shown to be a poor predictor of software popularity; consequently, focusing on publications in high-impact journals limits user's choices in finding useful bioinformatics tools. Free and open source software repositories on popular code sharing platforms such as GitHub provide another venue to follow the latest bioinformatics trends. The open source component of GitHub allows users to bookmark and copy repositories that are most useful to them. This Perspective aims to demonstrate the utility of GitHub “stars,” “watchers,” and “forks” (GitHub statistics) as a measure of software impact. We compiled lists of impactful bioinformatics software and analyzed commonly used impact metrics and GitHub statistics of 50 genomics-oriented bioinformatics tools. We present examples of community-selected best bioinformatics resources and show that GitHub statistics are distinct from the journal's impact factor (JIF), citation counts, and alternative metrics (Altmetrics, CiteScore) in capturing the level of community attention. We suggest the use of GitHub statistics as an unbiased measure of the usability of bioinformatics software complementing the traditional impact metrics.

  7. f

    Table_1_The Development of a Sustainable Bioinformatics Training Environment...

    • datasetcatalog.nlm.nih.gov
    Updated Sep 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ras, Verena; Mulder, Nicola; Panji, Sumir; Chauke, Paballo Abel; Johnston, Katherine; Aron, Shaun (2021). Table_1_The Development of a Sustainable Bioinformatics Training Environment Within the H3Africa Bioinformatics Network (H3ABioNet).pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000881136
    Explore at:
    Dataset updated
    Sep 23, 2021
    Authors
    Ras, Verena; Mulder, Nicola; Panji, Sumir; Chauke, Paballo Abel; Johnston, Katherine; Aron, Shaun
    Description

    Bioinformatics training programs have been developed independently around the world based on the perceived needs of the local and global academic communities. The field of bioinformatics is complicated by the need to train audiences from diverse backgrounds in a variety of topics to various levels of competencies. While there have been several attempts to develop standardised approaches to provide bioinformatics training globally, the challenges encountered in resource limited settings hinder the adaptation of these global approaches. H3ABioNet, a Pan-African Bioinformatics Network with 27 nodes in 16 African countries, has realised that there is no single simple solution to this challenge and has rather, over the years, evolved and adapted training approaches to create a sustainable training environment, with several components that allow for the successful dissemination of bioinformatics knowledge to diverse audiences. This has been achieved through the implementation of a combination of training modalities and sharing of high quality training material and experiences. The results highlight the success of implementing this multi-pronged approach to training, to reach audiences from different backgrounds and provide training in a variety of different areas of expertise. While face-to-face training was initially required and successful, the mixed-model teaching approach allowed for an increased reach, providing training in advanced analysis topics to reach large audiences across the continent with minimal teaching resources. The transition to hackathons provided an environment to allow the progression of skills, once basic skills had been developed, together with the development of real-world solutions to bioinformatics problems. Ensuring our training materials are FAIR, and through synergistic collaborations with global training partners, the reach of our training materials extends beyond H3ABioNet. Coupled with the opportunity to develop additional career building soft skills, such as scientific communication, H3ABioNet has created a flexible, sustainable and high quality bioinformatics training environment that has successfully been implemented to train several highly skilled African bioinformaticians on the continent.

  8. H

    Bioinformatics Services Market Size and Forecast (2025 - 2035), Global and...

    • wemarketresearch.com
    csv, pdf
    Updated May 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    We Market Research (2025). Bioinformatics Services Market Size and Forecast (2025 - 2035), Global and Regional Growth, Trend, Share and Industry Analysis Report Coverage: By Service Type (Data Analysis & Interpretation, Sequencing Services, Data Management Services, Software & Tool Development, Consulting Services, Outsourcing Services, Others); Application (Genomics, Proteomics, Transcriptomics, Pharmacogenomics, Clinical Diagnostics, Personalized Medicine and Others) End-user (Pharmaceutical & Biotechnology Companies, Academic & Research Institutes, Hospitals & Healthcare Institutions, Contract Research Organizations (CROs) and Others) and Geography. [Dataset]. https://wemarketresearch.com/reports/bioinformatics-services-market/1735
    Explore at:
    pdf, csvAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset authored and provided by
    We Market Research
    License

    https://wemarketresearch.com/privacy-policyhttps://wemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2035
    Area covered
    Worldwide
    Description

    The Bioinformatics Services Market will grow from $4.3B in 2025 to $15.7B by 2035, at a CAGR of 12.6%, driven by rising demand for biologics and biosimilars.

    Report AttributeDescription
    Market Size in 2025USD 4.3 Billion
    Market Forecast in 2035USD 15.7 Billion
    CAGR % 2025-203512.6%
    Base Year2024
    Historic Data2020-2024
    Forecast Period2025-2035
    Report USPProduction, Consumption, company share, company heatmap, company production capacity, growth factors and more
    Segments CoveredBy Service Type, By Application, By End-user
    Regional ScopeNorth America, Europe, APAC, Latin America, Middle East and Africa
    Country ScopeU.S., Canada, U.K., Germany, France, Italy, Spain, Benelux, Nordic Countries, Russia, China, India, Japan, South Korea, Australia, Indonesia, Thailand, Mexico, Brazil, Argentina, Saudi Arabia, UAE, Egypt, South Africa, Nigeria
  9. f

    Text S1 - Teaching Bioinformatics in Concert

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Nov 20, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dekhtyar, Alex; Goodman, Anya L. (2014). Text S1 - Teaching Bioinformatics in Concert [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001172135
    Explore at:
    Dataset updated
    Nov 20, 2014
    Authors
    Dekhtyar, Alex; Goodman, Anya L.
    Description

    Syllabi from the courses taught in 2013. (PDF)

  10. f

    Data_Sheet_1_Comparison of Bioinformatics Pipelines and Operating Systems...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Jun 17, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gurry, Thomas; Marizzoni, Moira; Provasi, Stefania; Salvatore, Marco; Mazzelli, Monica; Soricelli, Andrea; Mirabelli, Peppino; Lopizzo, Nicola; Cattaneo, Annamaria; Festari, Cristina; Ribaldi, Federica; Frisoni, Giovanni B.; Franzese, Monica; Greub, Gilbert; Mombelli, Elisa (2020). Data_Sheet_1_Comparison of Bioinformatics Pipelines and Operating Systems for the Analyses of 16S rRNA Gene Amplicon Sequences in Human Fecal Samples.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000509436
    Explore at:
    Dataset updated
    Jun 17, 2020
    Authors
    Gurry, Thomas; Marizzoni, Moira; Provasi, Stefania; Salvatore, Marco; Mazzelli, Monica; Soricelli, Andrea; Mirabelli, Peppino; Lopizzo, Nicola; Cattaneo, Annamaria; Festari, Cristina; Ribaldi, Federica; Frisoni, Giovanni B.; Franzese, Monica; Greub, Gilbert; Mombelli, Elisa
    Description

    Amplicon high-throughput sequencing of 16S ribosomal RNA (rRNA) gene is currently the most widely used technique to investigate complex gut microbial communities. Microbial identification might be influenced by several factors, including the choice of bioinformatic pipelines, making comparisons across studies difficult. Here, we compared four commonly used pipelines (QIIME2, Bioconductor, UPARSE and mothur) run on two operating systems (OS) (Linux and Mac), to evaluate the impact of bioinformatic pipeline and OS on the taxonomic classification of 40 human stool samples. We applied the SILVA 132 reference database for all the pipelines. We compared phyla and genera identification and relative abundances across the four pipelines using the Friedman rank sum test. QIIME2 and Bioconductor provided identical outputs on Linux and Mac OS, while UPARSE and mothur reported only minimal differences between OS. Taxa assignments were consistent at both phylum and genus level across all the pipelines. However, a difference in terms of relative abundance was identified for all phyla (p < 0.013) and for the majority of the most abundant genera (p < 0.028), such as Bacteroides (QIIME2: 24.5%, Bioconductor: 24.6%, UPARSE-linux: 23.6%, UPARSE-mac: 20.6%, mothur-linux: 22.2%, mothur-mac: 21.6%, p < 0.001). The use of different bioinformatic pipelines affects the estimation of the relative abundance of gut microbial community, indicating that studies using different pipelines cannot be directly compared. A harmonization procedure is needed to move the field forward.

  11. f

    Data_Sheet_1_Validation of a Bioinformatics Workflow for Routine Analysis of...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Mar 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roosens, Nancy H. C.; Mattheus, Wesley; Fu, Qiang; Ceyssens, Pieter-Jan; Vanneste, Kevin; De Keersmaecker, Sigrid C. J.; Van Braekel, Julien; Bertrand, Sophie; Bogaerts, Bert; Winand, Raf (2019). Data_Sheet_1_Validation of a Bioinformatics Workflow for Routine Analysis of Whole-Genome Sequencing Data and Related Challenges for Pathogen Typing in a European National Reference Center: Neisseria meningitidis as a Proof-of-Concept.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000172206
    Explore at:
    Dataset updated
    Mar 6, 2019
    Authors
    Roosens, Nancy H. C.; Mattheus, Wesley; Fu, Qiang; Ceyssens, Pieter-Jan; Vanneste, Kevin; De Keersmaecker, Sigrid C. J.; Van Braekel, Julien; Bertrand, Sophie; Bogaerts, Bert; Winand, Raf
    Description

    Despite being a well-established research method, the use of whole-genome sequencing (WGS) for routine molecular typing and pathogen characterization remains a substantial challenge due to the required bioinformatics resources and/or expertise. Moreover, many national reference laboratories and centers, as well as other laboratories working under a quality system, require extensive validation to demonstrate that employed methods are “fit-for-purpose” and provide high-quality results. A harmonized framework with guidelines for the validation of WGS workflows does currently, however, not exist yet, despite several recent case studies highlighting the urgent need thereof. We present a validation strategy focusing specifically on the exhaustive characterization of the bioinformatics analysis of a WGS workflow designed to replace conventionally employed molecular typing methods for microbial isolates in a representative small-scale laboratory, using the pathogen Neisseria meningitidis as a proof-of-concept. We adapted several classically employed performance metrics specifically toward three different bioinformatics assays: resistance gene characterization (based on the ARG-ANNOT, ResFinder, CARD, and NDARO databases), several commonly employed typing schemas (including, among others, core genome multilocus sequence typing), and serogroup determination. We analyzed a core validation dataset of 67 well-characterized samples typed by means of classical genotypic and/or phenotypic methods that were sequenced in-house, allowing to evaluate repeatability, reproducibility, accuracy, precision, sensitivity, and specificity of the different bioinformatics assays. We also analyzed an extended validation dataset composed of publicly available WGS data for 64 samples by comparing results of the different bioinformatics assays against results obtained from commonly used bioinformatics tools. We demonstrate high performance, with values for all performance metrics >87%, >97%, and >90% for the resistance gene characterization, sequence typing, and serogroup determination assays, respectively, for both validation datasets. Our WGS workflow has been made publicly available as a “push-button” pipeline for Illumina data at https://galaxy.sciensano.be to showcase its implementation for non-profit and/or academic usage. Our validation strategy can be adapted to other WGS workflows for other pathogens of interest and demonstrates the added value and feasibility of employing WGS with the aim of being integrated into routine use in an applied public health setting.

  12. Databases for MyCodentifier: A tool for routine identification of...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Dec 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jodie A. Schildkraut; Jodie A. Schildkraut; Jordy P.M. Coolen; Jordy P.M. Coolen; Heleen Severin; Ellen Koenraad; Nicole Aalders; Willem J.G. Melchers; Wouter Hoefsloot; Wouter Hoefsloot; Heiman F.L. Wertheim; Heiman F.L. Wertheim; Jakko van Ingen; Jakko van Ingen; Heleen Severin; Ellen Koenraad; Nicole Aalders; Willem J.G. Melchers (2022). Databases for MyCodentifier: A tool for routine identification of nontuberculous mycobacteria using MGIT enriched shotgun metagenomics. [Dataset]. http://doi.org/10.5281/zenodo.7396289
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Dec 9, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jodie A. Schildkraut; Jodie A. Schildkraut; Jordy P.M. Coolen; Jordy P.M. Coolen; Heleen Severin; Ellen Koenraad; Nicole Aalders; Willem J.G. Melchers; Wouter Hoefsloot; Wouter Hoefsloot; Heiman F.L. Wertheim; Heiman F.L. Wertheim; Jakko van Ingen; Jakko van Ingen; Heleen Severin; Ellen Koenraad; Nicole Aalders; Willem J.G. Melchers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Databases used for MyCodentifier a Nextflow pipeline to identify Mycobacterium tuberculosis complex (MTBC) and Nontuberculous mycobacteria (NTM) species from Next-generation sequencing (NGS) data.

    Short description:
    The pipeline is constructed using nextflow as workflow manager running in a docker container. It is able to identify species of MTBC/NTM from positive Mycobacterial Growth Indicator Tube (MGIT) cultures. To do so it uses an hsp65 database for fast identification coupled with a Metagenomic method using centrifuge to identify on genome level. For TB it also is able to identify subspecies. Results are presented in automated pdf and html reports.

    Databases
    NameShort Description
    20220726_ref.tar.gz7 major mycobacterial genomes as centrifuge classification database, used for reference-based mapping and genotype resistance prediction
    20220726_wgs_centrifuge_db_Radboudumc_MB.tar.gzcentrifuge classification database using Tortoli et al 2017 Mycobacterium strains + additional strains
    genomes.tar.gz7 major mycobacterial genomes, annotation and Genbank files. Files are paired with 20220726_ref.tar.gz
    snpEff.tar.gz7 major mycobacterial genomes annotation models for snpEff.
    Tortoli_etal_hsp65.tar.gzKMA database of hsp65 gene extractions of the Tortoli et al 2017 Mycobacterium strains.

    Used in the study:
    p_compressed+h+v.tar.gz (12/06/2016)

    Databases available via ftp://ftp.ccb.jhu.edu/pub/infphilo/centrifuge/data or https://ccb.jhu.edu/software/centrifuge/manual.shtml#custom-database

    MyCodentifier Github:

    https://jordycoolen.github.io/MyCodentifier/

  13. S2b File.pdf

    • figshare.com
    pdf
    Updated May 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ömür Baysal (2020). S2b File.pdf [Dataset]. http://doi.org/10.6084/m9.figshare.12326405.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 19, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ömür Baysal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Clustal W genetic phylogenetic analysis.

  14. f

    Data Sheet 1_Raman spectroscopy and bioinformatics-based identification of...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Feb 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhou, Yuan; Li, Yihan; Ren, Yansong; Wang, Haoyu; Cao, Zhijie; Xue, Mei; Zhu, Guoqing; Sun, Fanfan; Liang, Haoyue (2025). Data Sheet 1_Raman spectroscopy and bioinformatics-based identification of key genes and pathways capable of distinguishing between diffuse large B cell lymphoma and chronic lymphocytic leukemia.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001324276
    Explore at:
    Dataset updated
    Feb 25, 2025
    Authors
    Zhou, Yuan; Li, Yihan; Ren, Yansong; Wang, Haoyu; Cao, Zhijie; Xue, Mei; Zhu, Guoqing; Sun, Fanfan; Liang, Haoyue
    Description

    Diffuse large B-cell lymphoma (DLBCL) and chronic lymphocytic leukemia (CLL) are subtypes of non-Hogkin lymphoma (NHL) that are generally distinct form one cases, but the transformation of one of these diseases into the other is possible. Some patients with CLL, for instance, have the potential to develop Richter transformation such that they are diagnosed with a rare, invasive DLBCL subtype. In this study, bioinformatics analyses of these two NHL subtypes were conducted, identifying key patterns of gene expression and then experimentally validating the results. Disease-related gene expression datasets from the GEO database were used to identify differentially expressed genes (DEGs) and DEG functions were examined using GO analysis and protein-protein interaction network construction. This strategy revealed many up- and down-regulated DEGs, with functional enrichment analyses identifying these genes as being closely associated with inflammatory and immune response activity. PPI network analyses and the evaluation of clustered network modules indicated the top 10 up- and down-regulated genes involved in disease onset and development. Serological analyses revealed significantly higher ALB, TT, and WBC levels in CLL patients relative to DLBCL patients, whereas the opposite was true with respect to TG, HDL, GGT, ALP, ALT, and NEUT% levels. In comparison to the CLL and DLBCL groups, the healthy control samples demonstrated higher signals of protein peak positions (621, 643, 848, 853, 869, 935, 1003, 1031, 1221, 1230, 1260, 1344, 1443, 1446, 1548, 1579, 1603, 1647 cm-1), nucleic acid peak positions (726, 781, 786, 1078, 1190, 1415, 1573, 1579 cm-1), beta carotene peak positions (957, 1155, 1162 cm-1), carbohydrate peak positions (842 cm-1), collagen peak positions (1345 cm-1), and lipid peak positions (957, 1078, 1119, 1285, 1299, 1437, 1443, 1446 cm-1) compared to the CLL and DLBCL groups. Verification of these key genes in patient samples yielded results consistent with findings derived from bioinformatics analyses, highlighting their relevance to diagnosing and treating these forms of NHL. Together, these analyses identified genes and pathways involved in both DLBCL and CLL. The set of molecular markers established herein can aid in patient diagnosis and prognostic evaluation, providing a valuable foundation for their therapeutic application.

  15. I

    NEXUS data file for phylogenetic analysis of Iassinae (Hemiptera:...

    • databank.illinois.edu
    Updated Dec 6, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sindhu Krishnankutty; Christopher Dietrich; Wu Dai; Madhura Siddappaji (2018). NEXUS data file for phylogenetic analysis of Iassinae (Hemiptera: Cicadellidae) [Dataset]. http://doi.org/10.13012/B2IDB-9500981_V1
    Explore at:
    Dataset updated
    Dec 6, 2018
    Authors
    Sindhu Krishnankutty; Christopher Dietrich; Wu Dai; Madhura Siddappaji
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    U.S. National Science Foundation (NSF)
    National Natural Science Foundation of China
    Description

    The text file contains the original DNA sequence data used in the phylogenetic analyses of Krishnankutty et al. (2016: Systematic Entomology 41: 580–595). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The file contains five separate data blocks, one for each character partition (28S, histone H3, 12S, indels, and morphology) for 53 taxa (species). Gaps inserted into the DNA sequence alignment are indicated by a dash, and missing data are indicated by a question mark. The separate "indels1" block includes 40 indels (insertions/deletions) from the 28S sequence alignment re-coded using the modified complex indel coding scheme, as described in the "Materials and methods" of the original publication. The DIMENSIONS statements near the beginning of each block indicate the numbers of taxa (NTax) and characters (NChar). The file contains aligned nucleotide sequence data for 3 gene regions and 40 morphological characters. The file is configured for use with the maximum likelihood-based phylogenetic program GARLI but can also be parsed by any other bioinformatics software that supports the NEXUS format. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the supplementary document included as a separate pdf. The original raw DNA sequence data are available from NCBI GenBank under the accession numbers indicated in the supporting pdf file. More details on individual analyses are provided in the original publication.

  16. f

    Data Sheet 1_Identification of biomarkers for the diagnosis of type 2...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Jan 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yao, You; Wu, Sihui; Liang, Xinghuan; Chen, Cuihong; Qin, Yingfen; Wu, Guiling; Yang, Xi; Qiu, Yu; Xiong, Tian; Meng, Liheng (2025). Data Sheet 1_Identification of biomarkers for the diagnosis of type 2 diabetes mellitus with metabolic associated fatty liver disease by bioinformatics analysis and experimental validation.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001498598
    Explore at:
    Dataset updated
    Jan 28, 2025
    Authors
    Yao, You; Wu, Sihui; Liang, Xinghuan; Chen, Cuihong; Qin, Yingfen; Wu, Guiling; Yang, Xi; Qiu, Yu; Xiong, Tian; Meng, Liheng
    Description

    BackgroundType 2 diabetes (T2DM) combined with fatty liver is a subtype of metabolic fatty liver disease (MAFLD), and the relationship between T2DM and MAFLD is close and mutually influential. However, the connection and mechanisms between the two are still unclear. Therefore, we aimed to identify potential biomarkers for diagnosing both conditions.MethodsWe performed differential expression analysis and weighted gene correlation network analysis (WGCNA) on publicly available data on the two diseases in the Gene Expression Omnibus database to find genes related to both conditions. We utilised protein–protein interactions (PPIs), Gene Ontology, and the Kyoto Encyclopedia of Genes and Genomes to identify T2DM-associated MAFLD genes and potential mechanisms. Candidate biomarkers were screened using machine learning algorithms combined with 12 cytoHubba algorithms, and a diagnostic model for T2DM-related MAFLD was constructed and evaluated.The CIBERSORT method was used to investigate immune cell infiltration in MAFLD and the immunological significance of central genes. Finally, we collected whole blood from patients with T2DM-related MAFLD, MAFLD patients and healthy individuals, and used high-fat, high-glucose combined with high-fat cell models to verify the expression of hub genes.ResultsDifferential expression analysis and WGCNA identified 354 genes in the MAFLD dataset. The differential expression analysis of the T2DM-peripheral blood mononuclear cells/liver dataset screened 91 T2DM-associated secreted proteins. PPI analysis revealed two important modules of T2DM-related pathogenic genes in MAFLD, which contained 49 nodes, suggesting their involvement in cell interaction, inflammation, and other processes. TNFSF10, SERPINB2, and TNFRSF1A were the only coexisting genes shared between MAFLD key genes and T2DM-related secreted proteins, enabling the construction of highly accurate diagnostic models for both disorders. Additionally, high-fat, high-glucose combined with high-fat cell models were successfully produced. The expression patterns of TNFRSF1A and SERPINB2 were verified in patient blood and our cellular model. Immune dysregulation was observed in MAFLD, with TNFRSF1A and SERPINB2 strongly linked to immune regulation.ConclusionThe sensitivity and accuracy in diagnosing and predicting T2DM-associated MAFLD can be greatly improved using SERPINB2 and TNFRSF1A. These genes may significantly influence the development of T2DM-associated MAFLD, offering new diagnostic options for patients with T2DM combined with MAFLD.

  17. m

    Data for: Testing hypotheses of diversification in Panamanian frogs and...

    • data.mendeley.com
    Updated Nov 1, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Justin Bagley (2018). Data for: Testing hypotheses of diversification in Panamanian frogs and freshwater fishes using hierarchical approximate Bayesian computation with model averaging [Dataset]. http://doi.org/10.17632/f94kxmwf2n.3
    Explore at:
    Dataset updated
    Nov 1, 2018
    Authors
    Justin Bagley
    License

    http://www.gnu.org/licenses/gpl-3.0.en.htmlhttp://www.gnu.org/licenses/gpl-3.0.en.html

    Description

    In support of the manuscript by Bagley et al. (2018), this accession provides scripts and information that were used to conduct MTML-msBayes analyses included in the paper. To meet PeerJ requirements, we also provide files containing the raw data and input files (DNA sequence alignments) that we analyzed in the paper. See the README file provided in Markdown and PDF formats for additional information on the files contained within this accession, as well as how they were strung together in a UNIX/LINUX pipeline workflow to conduct hierarchical approximate Bayesian computation (hABC) analyses reported in the corresponding manuscript (Bagley et al. 2018). Licensing information is discussed in the README and provided in full in the "LICENSE.md" file.

    REFERENCES

    Bagley, J.C., Hickerson, M.J. & Johnson, J.B. (2018) Testing hypotheses of diversification in Panamanian frogs and freshwater fishes using hierarchical approximate Bayesian computation with model averaging. Diversity.

  18. f

    Data_Sheet_3_Critical Roles of ELVOL4 and IL-33 in the Progression of...

    • datasetcatalog.nlm.nih.gov
    Updated Jun 5, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wang, Yajing; Liang, Shi; Tao, Jun; Zheng, Junmeng; Li, Ling (2020). Data_Sheet_3_Critical Roles of ELVOL4 and IL-33 in the Progression of Obesity-Related Cardiomyopathy via Integrated Bioinformatics Analysis.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000529786
    Explore at:
    Dataset updated
    Jun 5, 2020
    Authors
    Wang, Yajing; Liang, Shi; Tao, Jun; Zheng, Junmeng; Li, Ling
    Description

    The molecular mechanisms underlying obesity-related cardiomyopathy (ORCM) progression involve multiple signaling pathways, and the pharmacological treatment for ORCM is still limited. Thus, it is necessary to explore new targets and develop novel therapies. Microarray analysis for gene expression profiles using different bioinformatics tools has been an effective strategy for identifying novel targets for various diseases. In this study, we aimed to explore the potential genes related to ORCM using the integrated bioinformatics analysis. The GSE18897 (whole blood expression profiling of obese diet-sensitive, obese diet-resistant, and lean human subjects) and GSE47022 (regular weight C57BL/6 and diet-induced obese C57BL/6 mice) were used for bioinformatics analysis. Weighted gene co-expression network analysis (WGCNA) of GSE18897 was employed to investigate gene modules that were strongly correlated with clinical phenotypes. Gene Ontology (GO) functional enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed on the co-expression genes. The expression levels of the hub genes were validated in the clinical samples. Yellow co-expression module of WGCNA in GSE18897 was found to be significantly related to the caloric restriction treatment. In addition, GO functional enrichment analysis and KEGG pathway analysis were performed on the co-expression genes in yellow co-expression module, which showed an association with oxygen transport and the porphyrins pathway. Overlap analysis of yellow co-expression module genes from GSE18897 andGSE47022 revealed six upregulated genes, and further experimental validation results showed that elongation of very-long-chain fatty acids protein 4 (ELOVL4), matrix metalloproteinase-8 (MMP-8), and interleukin-33 (IL-33) were upregulated in the peripheral blood from patients with ORCM compared to that in the controls. The bioinformatics analysis revealed that ELOVL4 expression levels are positively correlated with that of IL-33. Collectively, using WGCNA in combination with integrated bioinformatics analysis, the hub genes of ELVOL4 and IL-33 might serve as potential biomarkers for diagnosis and/or therapeutic targets for ORCM. The detailed roles of ELVOL4 and IL-33 in the pathophysiology of ORCM still require further investigation.

  19. f

    Data Sheet 1_Identification of immune-associated genes for the diagnosis of...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Nov 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tian, Yingying; Jin, Shizhu; Gao, Feiyang; Huang, Shiling; Wang, Chunjing; Liu, Lina; Qiu, Jiawei; Qi, Jihan; Chen, Hongliang; Chaulagain, Ram Prasad; Cang, Xueyu; Ullah, Ubaid; Li, Ning; Deng, Pengchao; Xing, Hui (2024). Data Sheet 1_Identification of immune-associated genes for the diagnosis of ulcerative colitis-associated carcinogenesis via integrated bioinformatics analysis.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001434099
    Explore at:
    Dataset updated
    Nov 8, 2024
    Authors
    Tian, Yingying; Jin, Shizhu; Gao, Feiyang; Huang, Shiling; Wang, Chunjing; Liu, Lina; Qiu, Jiawei; Qi, Jihan; Chen, Hongliang; Chaulagain, Ram Prasad; Cang, Xueyu; Ullah, Ubaid; Li, Ning; Deng, Pengchao; Xing, Hui
    Description

    BackgroundUC patients suffer more from colorectal cancer (CRC) than the general population, which increases with disease duration. Early colonoscopy is difficult because ulcerative colitis-associated colorectal cancer (UCAC) lesions are flat and multifocal. Our study aimed to identify promising UCAC biomarkers that are complementary endoscopy strategies in the early stages.MethodsThe datasets may be accessed from the Gene Expression Omnibus and The Cancer Genome Atlas databases. The co-expressed modules of UC and CRC were determined via weighted co-expression network analysis (WGCNA). The biological mechanisms of the shared genes were exported for analysis using the Gene Ontology and Kyoto Encyclopedia of Genes and Genomes. To identify protein interactions and hub genes, a protein-protein interaction network and CytoHubba analysis were conducted. To evaluate gene expression, external datasets and experimental validation of human colon tissues were utilized. The diagnostic value of core genes was examined through receiver operating characteristic (ROC) curves. Immune infiltration analysis was employed to investigate the associations between immune cell populations and hub genes.ResultsThree crucial modules were identified from the WGCNA of UC and CRC tissues, and 33 coexpressed genes that were predominantly enriched in the NF-κB pathway were identified. Two biomarkers (CXCL1 and BCL6) were identified via Cytoscape and validated in external datasets and human colon tissues. CRC patients expressed CXCL1 at the highest level, whereas UC and CRC patients showed higher levels than the controls. The UC cohort expressed BCL6 at the highest level, whereas the UC and CRC cohorts expressed it more highly than the controls. The hub genes exhibited significant diagnostic potential (ROC curve > 0.7). The immune infiltration results revealed a correlation among the hub genes and macrophages, neutrophils and B cells.ConclusionsThe findings of our research suggest that BCL6 and CXCL1 could serve as effective biomarkers for UCAC surveillance. Additionally, they demonstrated a robust correlation with immune cell populations within the CRC tumour microenvironment (TME). Our findings provide a valuable insight about diagnosis and therapy of UCAC.

  20. f

    Data_Sheet_2_Bioinformatics-Based Activities in High School: Fostering...

    • frontiersin.figshare.com
    pdf
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ana Martins; Maria João Fonseca; Marina Lemos; Leonor Lencastre; Fernando Tavares (2023). Data_Sheet_2_Bioinformatics-Based Activities in High School: Fostering Students’ Literacy, Interest, and Attitudes on Gene Regulation, Genomics, and Evolution.pdf [Dataset]. http://doi.org/10.3389/fmicb.2020.578099.s002
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Ana Martins; Maria João Fonseca; Marina Lemos; Leonor Lencastre; Fernando Tavares
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The key role of bioinformatics in explaining biological phenomena calls for the need to rethink didactic approaches at high school aligned with a new scientific reality. Despite several initiatives to introduce bioinformatics in the classroom, there is still a lack of knowledge on their impact on students’ learning gains, engagement, and motivation. In this study, we detail the effects of four bioinformatics laboratories tailored for high school biology classes named “Mining the Genome: Using Bioinformatics Tools in the Classroom to Support Student Discovery of Genes” on literacy, interest, and attitudes on 387 high school students. By exploring these laboratories, students get acquainted with bioinformatics and acknowledge that many bioinformatics tools can be intuitive for beginners. Furthermore, introducing comparative genomics in their learning practices contributed for a better understanding of curricular contents regarding the identification of genes, their regulation, and how to make evolutionary assumptions. Following the intervention, students were able to pinpoint bioinformatics tools required to identify genes in a genomics sequence, and most importantly, they were able to solve genomics-related misconceptions. Overall, students revealed a positive attitude regarding the integration of bioinformatics-based approaches in their learning practices, reinforcing their added value in educational approaches.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Katherine Lynn Petrie; Rujia Xie (2023). Data_Sheet_2_Resequencing of Microbial Isolates: A Lab Module to Introduce Novices to Command-Line Bioinformatics.PDF [Dataset]. http://doi.org/10.3389/fmicb.2021.578859.s002

Data_Sheet_2_Resequencing of Microbial Isolates: A Lab Module to Introduce Novices to Command-Line Bioinformatics.PDF

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
Jun 5, 2023
Dataset provided by
Frontiers
Authors
Katherine Lynn Petrie; Rujia Xie
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Familiarity with genome-scale data and the bioinformatic skills to analyze it have become essential for understanding and advancing modern biology and human health, yet many undergraduate biology majors are never exposed to hands-on bioinformatics. This paper presents a module that introduces students to applied bioinformatic analysis within the context of a research-based microbiology lab course. One of the most commonly used genomic analyses in biology is resequencing: determining the sequence of DNA bases in a derived strain of some organism, and comparing it to the known ancestral genome of that organism to better understand the phenotypic differences between them. Many existing CUREs — Course Based Undergraduate Research Experiences — evolve or select new strains of bacteria and compare them phenotypically to ancestral strains. This paper covers standardized strategies and procedures, accessible to undergraduates, for preparing and analyzing microbial whole-genome resequencing data to examine the genotypic differences between such strains. Wet-lab protocols and computational tutorials are provided, along with additional guidelines for educators, providing instructors without a next-generation sequencing or bioinformatics background the necessary information to incorporate whole-genome sequencing and command-line analysis into their class. This module introduces novice students to running software at the command-line, giving them exposure and familiarity with the types of tools that make up the vast majority of open-source scientific software used in contemporary biology. Completion of the module improves student attitudes toward computing, which may make them more likely to pursue further bioinformatics study.

Search
Clear search
Close search
Google apps
Main menu