100+ datasets found
  1. f

    Data from: hfAIM: A reliable bioinformatics approach for in silico...

    • tandf.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qingjun Xie; Oren Tzfadia; Matan Levy; Efrat Weithorn; Hadas Peled-Zehavi; Thomas Van Parys; Yves Van de Peer; Gad Galili (2023). hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms [Dataset]. http://doi.org/10.6084/m9.figshare.3172519
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Qingjun Xie; Oren Tzfadia; Matan Levy; Efrat Weithorn; Hadas Peled-Zehavi; Thomas Van Parys; Yves Van de Peer; Gad Galili
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Most of the proteins that are specifically turned over by selective autophagy are recognized by the presence of short Atg8 interacting motifs (AIMs) that facilitate their association with the autophagy apparatus. Such AIMs can be identified by bioinformatics methods based on their defined degenerate consensus F/W/Y-X-X-L/I/V sequences in which X represents any amino acid. Achieving reliability and/or fidelity of the prediction of such AIMs on a genome-wide scale represents a major challenge. Here, we present a bioinformatics approach, high fidelity AIM (hfAIM), which uses additional sequence requirements—the presence of acidic amino acids and the absence of positively charged amino acids in certain positions—to reliably identify AIMs in proteins. We demonstrate that the use of the hfAIM method allows for in silico high fidelity prediction of AIMs in AIM-containing proteins (ACPs) on a genome-wide scale in various organisms. Furthermore, by using hfAIM to identify putative AIMs in the Arabidopsis proteome, we illustrate a potential contribution of selective autophagy to various biological processes. More specifically, we identified 9 peroxisomal PEX proteins that contain hfAIM motifs, among which AtPEX1, AtPEX6 and AtPEX10 possess evolutionary-conserved AIMs. Bimolecular fluorescence complementation (BiFC) results verified that AtPEX6 and AtPEX10 indeed interact with Atg8 in planta. In addition, we show that mutations occurring within or nearby hfAIMs in PEX1, PEX6 and PEX10 caused defects in the growth and development of various organisms. Taken together, the above results suggest that the hfAIM tool can be used to effectively perform genome-wide in silico screens of proteins that are potentially regulated by selective autophagy. The hfAIM system is a web tool that can be accessed at link: http://bioinformatics.psb.ugent.be/hfAIM/.

  2. D

    Bioinformatics Software Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Bioinformatics Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/bioinformatics-software-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Bioinformatics Software Market Outlook



    The global bioinformatics software market size was valued at approximately USD 10 billion in 2023, and it is projected to reach around USD 25 billion by 2032, growing at a robust CAGR of 11% during the forecast period. This remarkable growth is fueled by the increased application of bioinformatics in drug discovery and development, the rising demand for personalized medicine, and the ongoing advancements in sequencing technologies. The convergence of biology and information technology has led to the optimization of biological data management, propelling the market's expansion as it transforms the landscape of biotechnology and pharmaceutical research. The rapid integration of artificial intelligence and machine learning techniques to process complex biological data further accentuates the growth trajectory of this market.



    An essential growth factor for the bioinformatics software market is the burgeoning demand for sequencing technologies. The decreasing cost of sequencing has led to a massive increase in the volume of genomic data generated, necessitating advanced software solutions to manage and interpret this data efficiently. This demand is particularly evident in genomics and proteomics, where bioinformatics software plays a critical role in analyzing and visualizing large datasets. Additionally, the adoption of cloud computing in bioinformatics offers scalable resources and cost-effective solutions for data storage and processing, further fueling market growth. The increasing collaboration between research institutions and software companies to develop innovative bioinformatics tools is also contributing positively to market expansion.



    Another significant driver is the growth of personalized medicine, which relies heavily on bioinformatics for the analysis of individual genetic information to tailor therapeutic strategies. As healthcare systems worldwide move towards precision medicine, the demand for bioinformatics software that can integrate genetic, phenotypic, and environmental data becomes more pronounced. This trend is not only transforming patient care but also significantly impacting drug development processes, as pharmaceutical companies aim to create more effective and targeted therapies. The strategic partnerships and collaborations between biotech firms and bioinformatics software providers are critical in advancing personalized medicine and enhancing patient outcomes.



    The increasing prevalence of complex diseases such as cancer and neurological disorders necessitates comprehensive research efforts, driving the need for robust bioinformatics software. These diseases require multi-omics approaches for better understanding, diagnosis, and treatment, where bioinformatics tools are indispensable. The ongoing research and development activities in this area, supported by government funding and private investments, are fostering innovation in bioinformatics solutions. Furthermore, the development of user-friendly and intuitive software interfaces is expanding the market beyond specialized research labs to include clinical settings and hospitals, broadening the potential user base and enhancing market penetration.



    From a regional perspective, North America currently leads the bioinformatics software market, thanks to its advanced technological infrastructure, significant investment in healthcare R&D, and the presence of numerous key market players. The region accounted for the largest market share in 2023 and is expected to maintain its dominance throughout the forecast period. Meanwhile, the Asia Pacific region is anticipated to exhibit the highest CAGR, driven by increasing investments in biotechnology and pharmaceutical research, expanding healthcare infrastructure, and the rising adoption of bioinformatics in emerging economies like China and India. Europe's market growth is also significant, supported by substantial funding for genomic research and a strong focus on precision medicine initiatives.



    Lifesciences Data Mining and Visualization are becoming increasingly vital in the bioinformatics software market. As the volume of biological data continues to grow exponentially, the need for sophisticated tools to mine and visualize this data is paramount. These tools enable researchers to uncover hidden patterns and insights from complex datasets, facilitating breakthroughs in genomics, proteomics, and other life sciences fields. The integration of advanced data mining techniques with visualization capabilities allows for a more intuitive

  3. I

    Molecular Biology Databases Published in Nucleic Acids Research between...

    • databank.illinois.edu
    Updated Feb 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heidi Imker (2024). Molecular Biology Databases Published in Nucleic Acids Research between 1991-2016 [Dataset]. http://doi.org/10.13012/B2IDB-4311325_V1
    Explore at:
    Dataset updated
    Feb 1, 2024
    Authors
    Heidi Imker
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset was developed to create a census of sufficiently documented molecular biology databases to answer several preliminary research questions. Articles published in the annual Nucleic Acids Research (NAR) “Database Issues” were used to identify a population of databases for study. Namely, the questions addressed herein include: 1) what is the historical rate of database proliferation versus rate of database attrition?, 2) to what extent do citations indicate persistence?, and 3) are databases under active maintenance and does evidence of maintenance likewise correlate to citation? An overarching goal of this study is to provide the ability to identify subsets of databases for further analysis, both as presented within this study and through subsequent use of this openly released dataset.

  4. f

    Table_1_A Bioinformatics Approach to Explore MicroRNAs as Tools to Bridge...

    • frontiersin.figshare.com
    docx
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massimo Bellato; Davide De Marchi; Carla Gualtieri; Elisabetta Sauta; Paolo Magni; Anca Macovei; Lorenzo Pasotti (2023). Table_1_A Bioinformatics Approach to Explore MicroRNAs as Tools to Bridge Pathways Between Plants and Animals. Is DNA Damage Response (DDR) a Potential Target Process?.docx [Dataset]. http://doi.org/10.3389/fpls.2019.01535.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers
    Authors
    Massimo Bellato; Davide De Marchi; Carla Gualtieri; Elisabetta Sauta; Paolo Magni; Anca Macovei; Lorenzo Pasotti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MicroRNAs, highly-conserved small RNAs, act as key regulators of many biological functions in both plants and animals by post-transcriptionally regulating gene expression through interactions with their target mRNAs. The microRNA research is a dynamic field, in which new and unconventional aspects are emerging alongside well-established roles in development and stress adaptation. A recent hypothesis states that miRNAs can be transferred from one species to another and potentially target genes across distant species. Here, we propose to look into the trans-kingdom potential of miRNAs as a tool to bridge conserved pathways between plant and human cells. To this aim, a novel multi-faceted bioinformatic analysis pipeline was developed, enabling the investigation of common biological processes and genes targeted in plant and human transcriptome by a set of publicly available Medicago truncatula miRNAs. Multiple datasets, including miRNA, gene, transcript and protein sequences, expression profiles and genetic interactions, were used. Three different strategies were employed, namely a network-based pipeline, an alignment-based pipeline, and a M. truncatula network reconstruction approach, to study functional modules and to evaluate gene/protein similarities among miRNA targets. The results were compared in order to find common features, e.g., microRNAs targeting similar processes. Biological processes like exocytosis and response to viruses were common denominators in the investigated species. Since the involvement of miRNAs in the regulation of DNA damage response (DDR)-associated pathways is barely explored, especially in the plant kingdom, a special attention is given to this aspect. Hereby, miRNAs predicted to target genes involved in DNA repair, recombination and replication, chromatin remodeling, cell cycle and cell death were identified in both plants and humans, paving the way for future interdisciplinary advancements.

  5. Bioinformatics Protein Dataset - Simulated

    • kaggle.com
    zip
    Updated Dec 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Gallo (2024). Bioinformatics Protein Dataset - Simulated [Dataset]. https://www.kaggle.com/datasets/gallo33henrique/bioinformatics-protein-dataset-simulated
    Explore at:
    zip(12928905 bytes)Available download formats
    Dataset updated
    Dec 27, 2024
    Authors
    Rafael Gallo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Subtitle

    "Synthetic protein dataset with sequences, physical properties, and functional classification for machine learning tasks."

    Description

    Introduction

    This synthetic dataset was created to explore and develop machine learning models in bioinformatics. It contains 20,000 synthetic proteins, each with an amino acid sequence, calculated physicochemical properties, and a functional classification.

    Columns Included

    • ID_Protein: Unique identifier for each protein.
    • Sequence: String of amino acids.
    • Molecular_Weight: Molecular weight calculated from the sequence.
    • Isoelectric_Point: Estimated isoelectric point based on the sequence composition.
    • Hydrophobicity: Average hydrophobicity calculated from the sequence.
    • Total_Charge: Sum of the charges of the amino acids in the sequence.
    • Polar_Proportion: Percentage of polar amino acids in the sequence.
    • Nonpolar_Proportion: Percentage of nonpolar amino acids in the sequence.
    • Sequence_Length: Total number of amino acids in the sequence.
    • Class: The functional class of the protein, one of five categories: Enzyme, Transport, Structural, Receptor, Other.

    Inspiration and Sources

    While this is a simulated dataset, it was inspired by patterns observed in real protein datasets, such as: - UniProt: A comprehensive database of protein sequences and annotations. - Kyte-Doolittle Scale: Calculations of hydrophobicity. - Biopython: A tool for analyzing biological sequences.

    Proposed Uses

    This dataset is ideal for: - Training classification models for proteins. - Exploratory analysis of physicochemical properties of proteins. - Building machine learning pipelines in bioinformatics.

    How This Dataset Was Created

    1. Sequence Generation: Amino acid chains were randomly generated with lengths between 50 and 300 residues.
    2. Property Calculation: Physicochemical properties were calculated using the Biopython library.
    3. Class Assignment: Classes were randomly assigned for classification purposes.

    Limitations

    • The sequences and properties do not represent real proteins but follow patterns observed in natural proteins.
    • The functional classes are simulated and do not correspond to actual biological characteristics.

    Data Split

    The dataset is divided into two subsets: - Training: 16,000 samples (proteinas_train.csv). - Testing: 4,000 samples (proteinas_test.csv).

    Acknowledgment

    This dataset was inspired by real bioinformatics challenges and designed to help researchers and developers explore machine learning applications in protein analysis.

  6. t

    BIOGRID CURATED DATA FOR PUBLICATION: hfAIM: A reliable bioinformatics...

    • thebiogrid.org
    zip
    Updated Feb 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2016). BIOGRID CURATED DATA FOR PUBLICATION: hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms. [Dataset]. https://thebiogrid.org/199728/publication/hfaim-a-reliable-bioinformatics-approach-for-in-silico-genome-wide-identification-of-autophagy-associated-atg8-interacting-motifs-in-various-organisms.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 1, 2016
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for Xie Q (2016):hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms. curated by BioGRID (https://thebiogrid.org); ABSTRACT: Most of the proteins that are specifically turned over by selective autophagy are recognized by the presence of short Atg8 interacting motifs (AIMs) that facilitate their association with the autophagy apparatus. Such AIMs can be identified by bioinformatics methods based on their defined degenerate consensus F/W/Y-X-X-L/I/V sequences in which X represents any amino acid. Achieving reliability and/or fidelity of the prediction of such AIMs on a genome-wide scale represents a major challenge. Here, we present a bioinformatics approach, high fidelity AIM (hfAIM), which uses additional sequence requirements-the presence of acidic amino acids and the absence of positively charged amino acids in certain positions-to reliably identify AIMs in proteins. We demonstrate that the use of the hfAIM method allows for in silico high fidelity prediction of AIMs in AIM-containing proteins (ACPs) on a genome-wide scale in various organisms. Furthermore, by using hfAIM to identify putative AIMs in the Arabidopsis proteome, we illustrate a potential contribution of selective autophagy to various biological processes. More specifically, we identified 9 peroxisomal PEX proteins that contain hfAIM motifs, among which AtPEX1, AtPEX6 and AtPEX10 possess evolutionary-conserved AIMs. Bimolecular fluorescence complementation (BiFC) results verified that AtPEX6 and AtPEX10 indeed interact with Atg8 in planta. In addition, we show that mutations occurring within or nearby hfAIMs in PEX1, PEX6 and PEX10 caused defects in the growth and development of various organisms. Taken together, the above results suggest that the hfAIM tool can be used to effectively perform genome-wide in silico screens of proteins that are potentially regulated by selective autophagy. The hfAIM system is a web tool that can be accessed at link: http://bioinformatics.psb.ugent.be/hfAIM/.

  7. Microarray and bioinformatic analysis of conventional ameloblastoma

    • data.scielo.org
    jpeg, txt, xlsx
    Updated Dec 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis Fernando Jacinto-Alemán; Luis Fernando Jacinto-Alemán; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez (2022). Microarray and bioinformatic analysis of conventional ameloblastoma [Dataset]. http://doi.org/10.48331/SCIELODATA.Z2S8X9
    Explore at:
    xlsx(10317), jpeg(3415112), xlsx(9969), jpeg(12173968), txt(605), txt(289), txt(3840), xlsx(9964), xlsx(12458), txt(2657), txt(18077), xlsx(10402), jpeg(2313098), txt(406), txt(1023)Available download formats
    Dataset updated
    Dec 20, 2022
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Luis Fernando Jacinto-Alemán; Luis Fernando Jacinto-Alemán; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    National Autonomous University of Mexico
    Description

    Ameloblastoma is a highly aggressive odontogenic tumor, and its pathogenesis is associated with multiple participating genes. Objective: Our aim was to identify and validate new critical genes of conventional ameloblastoma using microarray and bioinformatics analysis. Methods: Gene expression microarray and bioinformatic analysis were performed to use CHIP H10KA and DAVID software for enrichment. Protein-protein interactions (PPI) were visualized using STRING-Cytoscape with MCODE plugin, followed by Kaplan-Meier and GEPIA analysis that were employed for the candidate's postulation. RT-qPCR and IHC assays were performed to validate the bioinformatic approach. Results: 376 upregulated genes were identified. PPI analysis revealed 14 genes that were validated by Kaplan-Meier and GEPIA resulting in PDGFA and IL2RA as candidate genes. The RT-qPCR analysis confirmed their intense expression. Immunohistochemistry analysis showed that PDGFA expression is parenchyma located. Conclusion: With bioinformatics methods, we can identify upregulated genes in conventional ameloblastoma, and with RT-qPCR and immunoexpression analysis validate that PDGFA could be a more specific and localized therapeutic target.

  8. q

    Using Bioinformatics and Molecular Visualization to Develop Student...

    • qubeshub.org
    Updated Jan 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Callahan; Tamara Mans; Jing Zhang; Ellis Bell; Jessica Bell* (2022). Using Bioinformatics and Molecular Visualization to Develop Student Hypotheses in a Malate Dehydrogenase Oriented CURE [Dataset]. http://doi.org/10.24918/cs.2021.43
    Explore at:
    Dataset updated
    Jan 16, 2022
    Dataset provided by
    QUBES
    Authors
    Kevin Callahan; Tamara Mans; Jing Zhang; Ellis Bell; Jessica Bell*
    Description

    Developing student creativity and ability to develop a testable hypothesis represents a significant challenge in most laboratory courses. This lesson demonstrates how students use facets of molecular evolution and bioinformatics approaches involving protein sequence alignments (Clustal Omega, Uniprot) and 3D structure visualization (Pymol, JMol, Chimera), along with an analysis of pertinent background literature, to construct a novel hypothesis and develop a research proposal to explore their hypothesis. We have used this approach in a variety of institutional contexts (community college, research intensive university and primarily undergraduate institutions, PUIs ) as the first component in a protein-centric course-embedded undergraduate research experience (CURE) sequence. Built around the enzyme malate dehydrogenase, the sequence illustrates a variety of foundational concepts from the learning framework for Biochemistry and Molecular Biology. The lesson has three specific learning goals: i) find, use and present relevant primary literature, protein sequences, structures, and analyses resulting from the use of bioinformatics tools, ii) understand the various roles that non-covalent interactions may play in the structure and function of an enzyme. and iii) create/develop a testable and falsifiable hypothesis and propose appropriate experiments to interrogate the hypothesis. For each learning goal, we have developed specific assessment rubrics. Depending on the needs of the course, this approach builds to an in-class student presentation and/or a written research proposal. The module can be extended over several lecture and lab periods. Furthermore, the module lends itself to additional assessments including oral presentation, research proposal writing and the validated pre-post Experimental Design Ability Test (EDAT). Although presented in the context of course-based research on malate dehydrogenase, the approach and materials presented are readily adaptable to any protein of interest.

    Primary image: Mind map of the hypothesis development.

  9. G

    Translational Bioinformatics Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Translational Bioinformatics Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/translational-bioinformatics-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Aug 29, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Translational Bioinformatics Market Outlook



    According to our latest research, the global translational bioinformatics market size reached USD 4.2 billion in 2024, driven by the increasing integration of computational technologies in biomedical research and healthcare. The market is exhibiting robust growth with a compound annual growth rate (CAGR) of 11.6% from 2025 to 2033. By 2033, the market is forecasted to reach USD 11.4 billion, reflecting the rising demand for data-driven solutions in drug discovery, clinical diagnostics, and personalized medicine. This surge is primarily fueled by the growing adoption of genomics and proteomics in clinical settings, the expansion of precision medicine initiatives, and the escalating need for advanced bioinformatics platforms to handle complex biological datasets.




    One of the primary growth factors for the translational bioinformatics market is the exponential increase in biomedical data generated from next-generation sequencing (NGS), genomics, and proteomics research. The need to analyze, interpret, and translate this vast amount of data into clinically actionable insights has made translational bioinformatics solutions indispensable. Healthcare providers and research institutions are increasingly leveraging sophisticated bioinformatics software and platforms to accelerate drug discovery, identify novel biomarkers, and develop targeted therapies. The integration of artificial intelligence (AI) and machine learning (ML) algorithms into bioinformatics tools further enhances the ability to extract meaningful patterns from multidimensional datasets, thereby supporting the precision medicine paradigm and improving patient outcomes.




    Another critical driver for the translational bioinformatics market is the growing emphasis on personalized medicine and tailored therapeutics. With the advent of genomics and proteomics, there is a heightened focus on individualized treatment strategies that consider a patientÂ’s genetic makeup, lifestyle, and environmental factors. Translational bioinformatics bridges the gap between basic research and clinical application by providing the computational infrastructure necessary to translate omics data into personalized diagnostics and therapies. The market is also benefiting from increased investments in biomedical research, government initiatives promoting precision healthcare, and strategic collaborations between pharmaceutical companies, academic institutions, and technology providers. These collaborations are fostering innovation and accelerating the adoption of translational bioinformatics solutions across the healthcare ecosystem.




    The translational bioinformatics market is also witnessing significant growth due to the rising prevalence of chronic diseases and the urgent need for innovative diagnostic and therapeutic approaches. Chronic conditions such as cancer, cardiovascular diseases, and neurological disorders require comprehensive molecular profiling to inform treatment decisions. Translational bioinformatics enables the integration of diverse data sources, including genomics, proteomics, clinical records, and imaging data, to facilitate a holistic understanding of disease mechanisms. This integrative approach supports the development of novel biomarkers, enhances the efficiency of clinical trials, and expedites the translation of research findings into clinical practice. As a result, healthcare organizations are increasingly adopting translational bioinformatics solutions to improve disease management and patient care.



    As the translational bioinformatics market continues to evolve, the concept of Bioinformatics Pipelines as a Service is gaining traction. These pipelines provide a comprehensive framework for processing and analyzing biological data, offering a seamless integration of various bioinformatics tools and resources. By leveraging cloud-based infrastructures, these services enable researchers to automate complex workflows, enhance data reproducibility, and scale their analyses according to project needs. The flexibility and efficiency of Bioinformatics Pipelines as a Service are particularly beneficial for organizations with limited in-house bioinformatics expertise, allowing them to focus on their core research objectives while accessing cutting-edge computational resources. This approach not only accelerates the pace of discovery but also democratizes access to advanced bioinformatics capabilities

  10. d

    Data from: Semi-artificial datasets as a resource for validation of...

    • search.dataone.org
    • datadryad.org
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucie Tamisier; Annelies Haegeman; Yoika Foucart; Nicolas Fouillien; Maher Al Rwahnih; Nihal Buzkan; Thierry Candresse; Michela Chiumenti; Kris De Jonghe; Marie Lefebvre; Paolo Margaria; Jean Sébastien Reynard; Kristian Stevens; Denis Kutnjak; Sébastien Massart (2025). Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection [Dataset]. http://doi.org/10.5061/dryad.0zpc866z8
    Explore at:
    Dataset updated
    May 21, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Lucie Tamisier; Annelies Haegeman; Yoika Foucart; Nicolas Fouillien; Maher Al Rwahnih; Nihal Buzkan; Thierry Candresse; Michela Chiumenti; Kris De Jonghe; Marie Lefebvre; Paolo Margaria; Jean Sébastien Reynard; Kristian Stevens; Denis Kutnjak; Sébastien Massart
    Time period covered
    Jan 1, 2021
    Description

    In the last decade, High-Throughput Sequencing (HTS) has revolutionized biology and medicine. This technology allows the sequencing of huge amount of DNA and RNA fragments at a very low price. In medicine, HTS tests for disease diagnostics are already brought into routine practice. However, the adoption in plant health diagnostics is still limited. One of the main bottlenecks is the lack of expertise and consensus on the standardization of the data analysis. The Plant Health Bioinformatic Network (PHBN) is an Euphresco project aiming to build a community network of bioinformaticians/computational biologists working in plant health. One of the main goals of the project is to develop reference datasets that can be used for validation of bioinformatics pipelines and for standardization purposes.

    Semi-artificial datasets have been created for this purpose (Datasets 1 to 10). They are composed of a “real†HTS dataset spiked with artificial viral reads. It will allow researchers to adjust ...

  11. f

    Data_Sheet_1_Validation of a Bioinformatics Workflow for Routine Analysis of...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Mar 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roosens, Nancy H. C.; Mattheus, Wesley; Fu, Qiang; Ceyssens, Pieter-Jan; Vanneste, Kevin; De Keersmaecker, Sigrid C. J.; Van Braekel, Julien; Bertrand, Sophie; Bogaerts, Bert; Winand, Raf (2019). Data_Sheet_1_Validation of a Bioinformatics Workflow for Routine Analysis of Whole-Genome Sequencing Data and Related Challenges for Pathogen Typing in a European National Reference Center: Neisseria meningitidis as a Proof-of-Concept.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000172206
    Explore at:
    Dataset updated
    Mar 6, 2019
    Authors
    Roosens, Nancy H. C.; Mattheus, Wesley; Fu, Qiang; Ceyssens, Pieter-Jan; Vanneste, Kevin; De Keersmaecker, Sigrid C. J.; Van Braekel, Julien; Bertrand, Sophie; Bogaerts, Bert; Winand, Raf
    Description

    Despite being a well-established research method, the use of whole-genome sequencing (WGS) for routine molecular typing and pathogen characterization remains a substantial challenge due to the required bioinformatics resources and/or expertise. Moreover, many national reference laboratories and centers, as well as other laboratories working under a quality system, require extensive validation to demonstrate that employed methods are “fit-for-purpose” and provide high-quality results. A harmonized framework with guidelines for the validation of WGS workflows does currently, however, not exist yet, despite several recent case studies highlighting the urgent need thereof. We present a validation strategy focusing specifically on the exhaustive characterization of the bioinformatics analysis of a WGS workflow designed to replace conventionally employed molecular typing methods for microbial isolates in a representative small-scale laboratory, using the pathogen Neisseria meningitidis as a proof-of-concept. We adapted several classically employed performance metrics specifically toward three different bioinformatics assays: resistance gene characterization (based on the ARG-ANNOT, ResFinder, CARD, and NDARO databases), several commonly employed typing schemas (including, among others, core genome multilocus sequence typing), and serogroup determination. We analyzed a core validation dataset of 67 well-characterized samples typed by means of classical genotypic and/or phenotypic methods that were sequenced in-house, allowing to evaluate repeatability, reproducibility, accuracy, precision, sensitivity, and specificity of the different bioinformatics assays. We also analyzed an extended validation dataset composed of publicly available WGS data for 64 samples by comparing results of the different bioinformatics assays against results obtained from commonly used bioinformatics tools. We demonstrate high performance, with values for all performance metrics >87%, >97%, and >90% for the resistance gene characterization, sequence typing, and serogroup determination assays, respectively, for both validation datasets. Our WGS workflow has been made publicly available as a “push-button” pipeline for Illumina data at https://galaxy.sciensano.be to showcase its implementation for non-profit and/or academic usage. Our validation strategy can be adapted to other WGS workflows for other pathogens of interest and demonstrates the added value and feasibility of employing WGS with the aim of being integrated into routine use in an applied public health setting.

  12. TCR-MHC Germline Interaction Scores Generated Using AIMS

    • zenodo.org
    zip
    Updated Aug 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher T. Boughter; Christopher T. Boughter (2022). TCR-MHC Germline Interaction Scores Generated Using AIMS [Dataset]. http://doi.org/10.5281/zenodo.7023681
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 28, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christopher T. Boughter; Christopher T. Boughter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These data were generated using the AIMS interaction scoring function as outlined in the manuscript "A Systematic Characterization of Germline-Encoded Contacts Identifies the Source of Bias in TCR-MHC Interactions". They accompany the AIMS version 0.7 software available on GitHub: https://github.com/ctboughter/AIMS . These files are meant to be loaded into the mhc_germline_analysis.ipynb file, but are too large to be included on the GitHub page itself.

  13. w

    Bioinformatics and Systems Biology

    • data.wu.ac.at
    Updated Mar 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Laboratory Consortium (2017). Bioinformatics and Systems Biology [Dataset]. https://data.wu.ac.at/schema/data_gov/NWQzYzc3OWQtMTM2Zi00MDI0LTg2ZDMtOTZiOWQzMzIwNjcy
    Explore at:
    Dataset updated
    Mar 8, 2017
    Dataset provided by
    Federal Laboratory Consortium
    Description

    The Bioinformatics and Systems Biology (BISB) Core aims to assist investigators in overcoming the technical challenges in utilizing bioinformatics and systems biology techniques. The core will collaborate with principal investigators to incorporate systems biology approaches synergistically into their laboratory studies in order to speed the tempo of their research and develop transformative and translational results.

  14. b

    Viral Bioinformatics Resource Center

    • bioregistry.io
    Updated Apr 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Viral Bioinformatics Resource Center [Dataset]. https://bioregistry.io/vbrc
    Explore at:
    Dataset updated
    Apr 15, 2023
    Description

    The VBRC provides bioinformatics resources to support scientific research directed at viruses belonging to the Arenaviridae, Bunyaviridae, Filoviridae, Flaviviridae, Paramyxoviridae, Poxviridae, and Togaviridae families. The Center consists of a relational database and web application that support the data storage, annotation, analysis, and information exchange goals of this work. Each data release contains the complete genomic sequences for all viral pathogens and related strains that are available for species in the above-named families. In addition to sequence data, the VBRC provides a curation for each virus species, resulting in a searchable, comprehensive mini-review of gene function relating genotype to biological phenotype, with special emphasis on pathogenesis.

  15. d

    Raw motif mapping bedfile data and model training set class probabilities

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Phillip Davis (2025). Raw motif mapping bedfile data and model training set class probabilities [Dataset]. http://doi.org/10.5061/dryad.tdz08kq3w
    Explore at:
    Dataset updated
    May 6, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Phillip Davis
    Time period covered
    Jan 1, 2023
    Description

    Leveraging prior viral genome sequencing data to make predictions on whether an unknown, emergent virus harbors a ‘phenotype-of-concern’ has been a long-sought goal of genomic epidemiology. A predictive phenotype model built from nucleotide-level information alone is challenging with respect to RNA viruses due to the ultra-high intra-sequence variance of their genomes, even within closely related clades. We developed a degenerate k-mer method to accommodate this high intra-sequence variation of RNA virus genomes for modeling frameworks. By leveraging a taxonomy-guided ‘group-shuffle-split’ cross validation paradigm on complete coronavirus assemblies from prior to October 2018, we trained multiple regularized logistic regression classifiers at the nucleotide k-mer level. We demonstrate the feasibility of this method by finding models accurately predicting withheld SARS-CoV-2 genome sequences as human pathogens and accurately predicting withheld Swine Acute Diarrhea Syndrome coronavirus (...

  16. m

    Data from: Multiple sequence alignment for functional correlation among low...

    • bridges.monash.edu
    • researchdata.edu.au
    pdf
    Updated Nov 21, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chou, Wei-Yao; Chou, Wei-I; Pai, Tun-Wen; Lin, Shu-Chuan; Chang, Fan-Yu; Sun, Yuh-Ju; Tang, Chuan-Yi; Chang, Margaret Dah-Tsyr (2017). Multiple sequence alignment for functional correlation among low similarity sequences [Dataset]. http://doi.org/10.4225/03/5a13722947571
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 21, 2017
    Dataset provided by
    Monash University
    Authors
    Chou, Wei-Yao; Chou, Wei-I; Pai, Tun-Wen; Lin, Shu-Chuan; Chang, Fan-Yu; Sun, Yuh-Ju; Tang, Chuan-Yi; Chang, Margaret Dah-Tsyr
    License

    http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/

    Description

    Multiple sequence alignment is a broadly used methodology in biological applications. It is expected to locate consensus sequence stretches with evolutionary and functional conservation. However, when sequence similarity among the queries becomes low, it works poorly. The aim of this study is to incorporate important biological knowledge and assumption to improve the quality of a general alignment on low similarity sequences such as carbohydrate binding module (CBM) families. Since the recognition of characteristic patterns in CBMs does not apply to a general model, a more accurate scoring function employing secondary-structure-based and key-residue-weighted algorithms for alignment was designed to approach this goal. Our results indicated that the new method was practically applicable to identify the key residues in terms of three-dimensional structures, while conventional tools could fail. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

    Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.

  17. m

    CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object)

    • data.mendeley.com
    • data.niaid.nih.gov
    • +3more
    Updated Dec 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farah Zaib Khan (2018). CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object) [Dataset]. http://doi.org/10.17632/xnwncxpw42.1
    Explore at:
    Dataset updated
    Dec 4, 2018
    Authors
    Farah Zaib Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This workflow adapts the approach and parameter settings of Trans-Omics for precision Medicine (TOPMed). The RNA-seq pipeline originated from the Broad Institute. There are in total five steps in the workflow starting from:

    1. Read alignment using STAR which produces aligned BAM files including the Genome BAM and Transcriptome BAM.
    2. The Genome BAM file is processed using Picard MarkDuplicates producing an updated BAM file containing information on duplicate reads (such reads can indicate biased interpretation).
    3. SAMtools index is then employed to generate an index for the BAM file, in preparation for the next step.
    4. The indexed BAM file is processed further with RNA-SeQC which takes the BAM file, human genome reference sequence and Gene Transfer Format (GTF) file as inputs to generate transcriptome-level expression quantifications and standard quality control metrics.
    5. In parallel with transcript quantification, isoform expression levels are quantified by RSEM. This step depends only on the output of the STAR tool, and additional RSEM reference sequences.

    For testing and analysis, the workflow author provided example data created by down-sampling the read files of a TOPMed public access data. Chromosome 12 was extracted from the Homo Sapien Assembly 38 reference sequence and provided by the workflow authors. The required GTF and RSEM reference data files are also provided. The workflow is well-documented with a detailed set of instructions of the steps performed to down-sample the data are also provided for transparency. The availability of example input data, use of containerization for underlying software and detailed documentation are important factors in choosing this specific CWL workflow for CWLProv evaluation.

    This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwl

  18. R

    Renewable Bioinformatics Chemicals Market Research Report 2033

    • researchintelo.com
    csv, pdf, pptx
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Intelo (2025). Renewable Bioinformatics Chemicals Market Research Report 2033 [Dataset]. https://researchintelo.com/report/renewable-bioinformatics-chemicals-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Jul 24, 2025
    Dataset authored and provided by
    Research Intelo
    License

    https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy

    Time period covered
    2024 - 2033
    Area covered
    Global
    Description

    Renewable Bioinformatics Chemicals Market Outlook



    As per our latest research, the global renewable bioinformatics chemicals market size in 2024 stands at USD 4.82 billion, reflecting robust momentum driven by the convergence of sustainable chemistry and advanced bioinformatics. The market is expanding at a compelling CAGR of 8.1% and is forecasted to reach USD 9.12 billion by 2033. The primary growth factor fueling this surge is the increasing demand for environmentally friendly and sustainable chemicals in life sciences research and healthcare, where bioinformatics tools are pivotal in data-driven discovery and innovation.



    A significant growth driver for the renewable bioinformatics chemicals market is the escalating adoption of green chemistry practices across pharmaceutical and biotechnology sectors. As regulatory bodies and global organizations push for reduced environmental footprints, companies are actively transitioning from traditional petrochemical-based reagents and solvents to renewable alternatives. This shift not only aligns with corporate sustainability goals but also reduces hazardous waste generation and improves laboratory safety. Moreover, the integration of bioinformatics in chemical screening, synthesis, and data analysis has greatly enhanced the efficiency and precision of research processes, further accelerating the uptake of renewable chemicals.



    The rapid advancements in genomics, proteomics, and metabolomics are also fueling the demand for renewable bioinformatics chemicals. High-throughput sequencing and omics technologies generate vast datasets, necessitating specialized chemicals that are both high-quality and sustainable. Bioinformatics platforms rely on these chemicals for accurate sample preparation, data acquisition, and analysis. The growing number of collaborative research projects, increased funding for life sciences, and a surge in personalized medicine initiatives are collectively propelling the market forward. This trend is particularly evident in academic and research institutions, where adherence to green laboratory standards is becoming a norm.



    Another critical factor driving market expansion is the ongoing innovation in renewable chemical production methods. Advances in synthetic biology, enzyme engineering, and fermentation technologies have enabled the scalable and cost-effective production of bio-based reagents, enzymes, and solvents. These innovations are not only reducing the dependency on fossil resources but are also resulting in chemicals with improved purity and performance. The synergy between bioinformatics algorithms and renewable chemical development allows for rapid optimization and customization, meeting the specific needs of drug discovery, diagnostics, and molecular biology applications.



    From a regional perspective, North America currently dominates the renewable bioinformatics chemicals market, accounting for over 38% of the global share in 2024. Europe closely follows, driven by stringent environmental regulations and strong government support for green technologies. The Asia Pacific region is emerging as a high-growth market, with a projected CAGR of 10.4% through 2033, fueled by expanding biotechnology sectors in China, India, and Southeast Asia. Latin America and the Middle East & Africa are also witnessing steady adoption, albeit at a slower pace, as local industries gradually embrace sustainable laboratory practices.



    Product Type Analysis



    The product type segment of the renewable bioinformatics chemicals market encompasses enzymes, reagents, buffers, solvents, and other specialized chemicals. Enzymes hold a significant share owing to their indispensable role in genomics, proteomics, and molecular diagnostics. The demand for renewable enzymes is particularly high due to their application in DNA amplification, sequencing, and protein analysis. These enzymes, often produced through recombinant technology or microbial fermentation, offer enhanced specificity and reduced contamination risks. The market for renewable enzymes is expected to continue its upward trajectory as researchers seek alternatives to animal-derived or synthetic enzymes, aligning with both ethical and sustainability considerations.



    Reagents form the backbone of most laboratory workflows, and the shift toward renewable reagents is reshaping procurement strategies across the life sciences industry. Renewable reagents, synthesized from plant-based or m

  19. n

    EchoBASE

    • neuinfo.org
    • dknet.org
    • +2more
    Updated May 14, 2006
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2006). EchoBASE [Dataset]. http://identifiers.org/RRID:SCR_002430
    Explore at:
    Dataset updated
    May 14, 2006
    Description

    A database that curates new experimental and bioinformatic information about the genes and gene products of the model bacterium Escherichia coli K-12 strain MG1655. It has been created to integrate information from post-genomic experiments into a single resource with the aim of providing functional predictions for the 1500 or so gene products for which we have no knowledge of their physiological function. While EchoBASE provides a basic annotation of the genome, taken from other databases, its novelty is in the curation of post-genomic experiments and their linkage to genes of unknown function. Experiments published on E. coli are curated to one of two levels. Papers dealing with the determination of function of a single gene are briefly described, while larger dataset are actually included in the database and can be searched and manipulated. This includes data for proteomics studies, protein-protein interaction studies, microarray data, functional genomic approaches (looking at multiple deletion strains for novel phenotypes) and a wide range of predictions that come out of in silico bioinformatic approaches. The aim of the database is to provide hypothesis for the functions of uncharacterized gene products that may be used by the E. coli research community to further our knowledge of this model bacterium.

  20. f

    Supplementary Material for: Integrative Bioinformatics Analysis Provides...

    • datasetcatalog.nlm.nih.gov
    • karger.figshare.com
    Updated Apr 11, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    L. -T. , Zhou; H. , Liu; Z. -L. , Li; K. -L. , Ma; R. -N. , Tang; L. -L. , Lv; S. , Qiu; B. -C. , Liu (2018). Supplementary Material for: Integrative Bioinformatics Analysis Provides Insight into the Molecular Mechanisms of Chronic Kidney Disease [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000631776
    Explore at:
    Dataset updated
    Apr 11, 2018
    Authors
    L. -T. , Zhou; H. , Liu; Z. -L. , Li; K. -L. , Ma; R. -N. , Tang; L. -L. , Lv; S. , Qiu; B. -C. , Liu
    Description

    Background/Aims: Chronic kidney disease (CKD) is a worldwide public health problem. Regardless of the underlying primary disease, CKD tends to progress to end-stage kidney disease, resulting in unsatisfactory and costly treatment. Its common pathogenesis, however, remains unclear. The aim of this study was to provide an unbiased catalog of common gene-expression changes of CKD and reveal the underlying molecular mechanism using an integrative bioinformatics approach. Methods: We systematically collected over 250 Affymetrix microarray datasets from the glomerular and tubulointerstitial compartments of healthy renal tissues and those with various types of established CKD (diabetic kidney disease, hypertensive nephropathy, and glomerular nephropathy). Then, using stringent bioinformatics analysis, shared differentially expressed genes (DEGs) of CKD were obtained. These shared DEGs were further analyzed by the gene ontology (GO) and pathway enrichment analysis. Finally, the protein-protein interaction networks(PINs) were constructed to further refine our results. Results: Our analysis identified 176 and 50 shared DEGs in diseased glomeruli and tubules, respectively, including many transcripts that have not been previously reported to be involved in kidney disease. Enrichment analysis also showed that the glomerular and tubulointerstitial compartments underwent a wide range of unique pathological changes during chronic injury. As revealed by the GO enrichment analysis, shared DEGs in glomeruli were significantly enriched in exosomes. By constructing PINs, we identified several hub genes (e.g. OAS1, JUN, and FOS) and clusters that might play key roles in regulating the development of CKD. Conclusion: Our study not only further reveals the unifying molecular mechanism of CKD pathogenesis but also provides a valuable resource of potential biomarkers and therapeutic targets.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Qingjun Xie; Oren Tzfadia; Matan Levy; Efrat Weithorn; Hadas Peled-Zehavi; Thomas Van Parys; Yves Van de Peer; Gad Galili (2023). hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms [Dataset]. http://doi.org/10.6084/m9.figshare.3172519

Data from: hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms

Related Article
Explore at:
docxAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
Taylor & Francis
Authors
Qingjun Xie; Oren Tzfadia; Matan Levy; Efrat Weithorn; Hadas Peled-Zehavi; Thomas Van Parys; Yves Van de Peer; Gad Galili
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Most of the proteins that are specifically turned over by selective autophagy are recognized by the presence of short Atg8 interacting motifs (AIMs) that facilitate their association with the autophagy apparatus. Such AIMs can be identified by bioinformatics methods based on their defined degenerate consensus F/W/Y-X-X-L/I/V sequences in which X represents any amino acid. Achieving reliability and/or fidelity of the prediction of such AIMs on a genome-wide scale represents a major challenge. Here, we present a bioinformatics approach, high fidelity AIM (hfAIM), which uses additional sequence requirements—the presence of acidic amino acids and the absence of positively charged amino acids in certain positions—to reliably identify AIMs in proteins. We demonstrate that the use of the hfAIM method allows for in silico high fidelity prediction of AIMs in AIM-containing proteins (ACPs) on a genome-wide scale in various organisms. Furthermore, by using hfAIM to identify putative AIMs in the Arabidopsis proteome, we illustrate a potential contribution of selective autophagy to various biological processes. More specifically, we identified 9 peroxisomal PEX proteins that contain hfAIM motifs, among which AtPEX1, AtPEX6 and AtPEX10 possess evolutionary-conserved AIMs. Bimolecular fluorescence complementation (BiFC) results verified that AtPEX6 and AtPEX10 indeed interact with Atg8 in planta. In addition, we show that mutations occurring within or nearby hfAIMs in PEX1, PEX6 and PEX10 caused defects in the growth and development of various organisms. Taken together, the above results suggest that the hfAIM tool can be used to effectively perform genome-wide in silico screens of proteins that are potentially regulated by selective autophagy. The hfAIM system is a web tool that can be accessed at link: http://bioinformatics.psb.ugent.be/hfAIM/.

Search
Clear search
Close search
Google apps
Main menu