35 datasets found
  1. m

    Pneumonia Drug Exp Data

    • data.mendeley.com
    Updated Sep 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OCHIN SHARMA (2023). Pneumonia Drug Exp Data [Dataset]. http://doi.org/10.17632/8bmpx4zvs8.1
    Explore at:
    Dataset updated
    Sep 29, 2023
    Authors
    OCHIN SHARMA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is the result of experiments conducted using Python and rdkit library.

  2. m

    Data from: PeTMbase: A database of plant endogenous target mimics (eTMs)

    • data.mendeley.com
    Updated Nov 23, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gökhan Karakülah (2016). PeTMbase: A database of plant endogenous target mimics (eTMs) [Dataset]. http://doi.org/10.17632/htgxryrcv2.1
    Explore at:
    Dataset updated
    Nov 23, 2016
    Authors
    Gökhan Karakülah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MicroRNAs (miRNA) are small endogenous RNA molecules, which regulate target gene expression at post-transcriptional level. Besides, miRNA activity can be controlled by a newly discovered regulatory mechanism called endogenous target mimicry (eTM). In target mimicry, eTMs bind to the corresponding miRNAs to block the binding of specific transcript leading to increase mRNA expression. Thus, miRNA-eTM-target-mRNA regulation modules involving a wide range of biological processes; an increasing need for a comprehensive eTM database arose. Except miRSponge with limited number of Arabidopsis eTM data no available database and/or repository was developed and released for plant eTMs yet. Here, we present an online plant eTM database, called PeTMbase (http://petmbase.org), with a highly efficient search tool. To establish the repository a number of identified eTMs was obtained utilizing from high-throughput RNA-sequencing data of 11 plant species. Each transcriptome libraries is first mapped to corresponding plant genome, then long non-coding RNA (lncRNA) transcripts are characterized. Furthermore, additional lncRNAs retrieved from GREENC and PNRD were incorporated into the lncRNA catalog. Then, utilizing the lncRNA and miRNA sources a total of 2,728 eTMs were successfully predicted. Our regularly updated database, PeTMbase, provides high quality information regarding miRNA:eTM modules and will aid functional genomics studies particularly, on miRNA regulatory networks.

  3. Bioinformatics Market Growth Analysis - Size and Forecast 2025-2029 |...

    • technavio.com
    pdf
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Bioinformatics Market Growth Analysis - Size and Forecast 2025-2029 | Technavio | Technavio [Dataset]. https://www.technavio.com/report/bioinformatics-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 18, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Description

    snapshot-tab-pane Bioinformatics Market Size 2025-2029The bioinformatics market size is valued to increase by USD 15.98 billion, at a CAGR of 17.4% from 2024 to 2029. Reduction in cost of genetic sequencing will drive the bioinformatics market.Market InsightsNorth America dominated the market and accounted for a 43% growth during the 2025-2029.By Application - Molecular phylogenetics segment was valued at USD 4.48 billion in 2023By Product - Platforms segment accounted for the largest market revenue share in 2023Market Size & ForecastMarket Opportunities: USD 309.88 million Market Future Opportunities 2024: USD 15978.00 millionCAGR from 2024 to 2029 : 17.4%Market SummaryThe market is a dynamic and evolving field that plays a pivotal role in advancing scientific research and innovation in various industries, including healthcare, agriculture, and academia. One of the primary drivers of this market's growth is the rapid reduction in the cost of genetic sequencing, making it increasingly accessible to researchers and organizations worldwide. This affordability has led to an influx of large-scale genomic data, necessitating the development of sophisticated bioinformatics tools for Next-Generation Sequencing (NGS) data analysis. Another significant trend in the market is the shortage of trained laboratory professionals capable of handling and interpreting complex genomic data.This skills gap creates a demand for user-friendly bioinformatics software and services that can streamline data analysis and interpretation, enabling researchers to focus on scientific discovery rather than data processing. For instance, a leading pharmaceutical company could leverage bioinformatics tools to optimize its drug discovery pipeline by analyzing large genomic datasets to identify potential drug targets and predict their efficacy. By integrating these tools into its workflow, the company can reduce the time and cost associated with traditional drug discovery methods, ultimately bringing new therapies to market more efficiently. Despite its numerous benefits, the market faces challenges such as data security and privacy concerns, data standardization, and the need for interoperability between different software platforms.Addressing these challenges will require collaboration between industry stakeholders, regulatory bodies, and academic institutions to establish best practices and develop standardized protocols for data sharing and analysis.What will be the size of the Bioinformatics Market during the forecast period?Get Key Insights on Market Forecast (PDF) Request Free SampleBioinformatics, a dynamic and evolving market, is witnessing significant growth as businesses increasingly rely on high-performance computing, gene annotation, and bioinformatics software to decipher regulatory elements, gene expression regulation, and genomic variation. Machine learning algorithms, phylogenetic trees, and ontology development are integral tools for disease modeling and protein interactions. cloud computing platforms facilitate the storage and analysis of vast biological databases and sequence datas, enabling data mining techniques and statistical modeling for sequence assembly and drug discovery pipelines. Proteomic analysis, protein folding, and computational biology are crucial components of this domain, with biomedical ontologies and data integration platforms enhancing research efficiency.The integration of gene annotation and machine learning algorithms, for instance, has led to a 25% increase in accurate disease diagnosis within leading healthcare organizations. This trend underscores the importance of investing in advanced bioinformatics solutions for improved regulatory compliance, budgeting, and product strategy.Unpacking the Bioinformatics Market LandscapeBioinformatics, an essential discipline at the intersection of biology and computer science, continues to revolutionize the scientific landscape. Evolutionary bioinformatics, with its molecular dynamics simulation and systems biology approaches, enables a deeper understanding of biological processes, leading to improved ROI in research and development. For instance, next-generation sequencing technologies have reduced sequencing costs by a factor of ten, enabling genome-wide association studies and transcriptome sequencing on a previously unimaginable scale. In clinical bioinformatics, homology modeling techniques and protein-protein interaction analysis facilitate drug target identification, enhancing compliance with regulatory requirements. Phylogenetic analysis tools and comparative genomics studies contribute to the discovery of novel biomarkers and the development of personalized treatments. Bioimage informatics and proteomic data integration employ advanced sequence alignment algorithms and fun

  4. UniProtKB accession numbers for 29 homologous proteins using data from...

    • figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasmin Alam-Faruque; David P. Hill; Emily C. Dimmer; Midori A. Harris; Rebecca E. Foulger; Susan Tweedie; Helen Attrill; Douglas G. Howe; Stephen Randall Thomas; Duncan Davidson; Adrian S. Woolf; Judith A. Blake; Christopher J. Mungall; Claire O’Donovan; Rolf Apweiler; Rachael P. Huntley (2023). UniProtKB accession numbers for 29 homologous proteins using data from in-situ hybridisation expression in murine loop of Henle. [Dataset]. http://doi.org/10.1371/journal.pone.0099864.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yasmin Alam-Faruque; David P. Hill; Emily C. Dimmer; Midori A. Harris; Rebecca E. Foulger; Susan Tweedie; Helen Attrill; Douglas G. Howe; Stephen Randall Thomas; Duncan Davidson; Adrian S. Woolf; Judith A. Blake; Christopher J. Mungall; Claire O’Donovan; Rolf Apweiler; Rachael P. Huntley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Uniprot accession numbers are listed for homologues of the 29 proteins expressed in the murine loop of Henle structure (data provided by the GUDMAP Consortium via www.gudmap.org) as determined by BLAST (run via the uniprot.org website). The Drosophila proteins in parentheses are homologous to multiple mammalian proteins. (n/a = not applicable).

  5. q

    Sequence Similarity: Introducing Biological Databases to Community College...

    • qubeshub.org
    Updated Jun 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer Katcher (2021). Sequence Similarity: Introducing Biological Databases to Community College Biology Students [Dataset]. http://doi.org/10.25334/1EAH-3E24
    Explore at:
    Dataset updated
    Jun 1, 2021
    Dataset provided by
    QUBES
    Authors
    Jennifer Katcher
    Description

    This laboratory module, published on CourseSource, leads introductory biology students in the exploration of a basic set of bioinformatics concepts and tools.

  6. d

    BioCreative

    • dknet.org
    Updated Sep 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). BioCreative [Dataset]. http://identifiers.org/RRID:SCR_006311
    Explore at:
    Dataset updated
    Sep 3, 2024
    Description

    Community-wide effort (Challenge) for evaluating text mining and information extraction systems applied to the biological domain. It is focused on the comparison of methods and the community assessment of scientific progress, rather than on the purely competitive aspects. There is a considerable difficulty in constructing suitable gold standard data for training and testing new information extraction systems which handle life science literature. Thus the data sets derived from the BioCreAtIvE challenge - because they have been examined by biological database curators and domain experts - serve as useful resources for the development of new applications as well as helping to improve existing ones. Two main issues are addressed at BioCreAtIvE, both concerned with the extraction of biologically relevant and useful information from the literature. The first one is concerned with the detection of biologically significant entities (names) such as gene and protein names and their association to existing database entries. The second one is concerned with the detection of entity-fact associations (e.g. protein - functional term associations ).

  7. f

    Summary of significantly enriched GO terms from the Ontologizer and GO-Elite...

    • figshare.com
    xls
    Updated Dec 2, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasmin Alam-Faruque; David P. Hill; Emily C. Dimmer; Midori A. Harris; Rebecca E. Foulger; Susan Tweedie; Helen Attrill; Douglas G. Howe; Stephen Randall Thomas; Duncan Davidson; Adrian S. Woolf; Judith A. Blake; Christopher J. Mungall; Claire O’Donovan; Rolf Apweiler; Rachael P. Huntley (2015). Summary of significantly enriched GO terms from the Ontologizer and GO-Elite analyses that are relevant to kidney development. [Dataset]. http://doi.org/10.1371/journal.pone.0099864.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 2, 2015
    Dataset provided by
    PLOS ONE
    Authors
    Yasmin Alam-Faruque; David P. Hill; Emily C. Dimmer; Midori A. Harris; Rebecca E. Foulger; Susan Tweedie; Helen Attrill; Douglas G. Howe; Stephen Randall Thomas; Duncan Davidson; Adrian S. Woolf; Judith A. Blake; Christopher J. Mungall; Claire O’Donovan; Rolf Apweiler; Rachael P. Huntley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A summary of the significantly enriched GO terms from the Ontologizer [28] and GO-Elite [27] analyses, which are relevant to kidney development, using the pre-annotation (2009; Tables S2–S5 in File S1) and post-annotation datasets (2012; Tables S6–S9, in File S1). Terms in italics indicate parent terms where the descendants are indicated directly underneath as follows: > descendant of term above in italics. Rank refers to the position of the term in the results of the enrichment analyses (see Tables S2–S9 in File S1) where significance of the enriched term has a p-value of

  8. Mycobacterial Homology Database with 75% Identity Cutoff

    • search.datacite.org
    Updated Feb 12, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William M. Matern; Joel S. Bader; Petros C. Karakousis (2019). Mycobacterial Homology Database with 75% Identity Cutoff [Dataset]. http://doi.org/10.6084/m9.figshare.6969899.v1
    Explore at:
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    DataCite
    Authors
    William M. Matern; Joel S. Bader; Petros C. Karakousis
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a table of results from comparison of the annotated proteins from nine different mycobacterial species. The goal of its creation was to suggest which proteins are likely to have identical functions between species. This table reports only those protein comparisons with greater than 75% amino acid identity. Each row is a different gene used to search for close matches, each column is the genome used for searching. In parentheses next to the name of each match is the percent identity between the sequences (query vs each match).

  9. e

    PROSITE profiles

    • ebi.ac.uk
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PROSITE profiles [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.

  10. r

    Overview of machine learning approaches for drug-target interaction...

    • resodate.org
    Updated Jan 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amine Abdelmaksoud (2024). Overview of machine learning approaches for drug-target interaction prediction [Dataset]. http://doi.org/10.48366/R700875
    Explore at:
    Dataset updated
    Jan 1, 2024
    Dataset provided by
    Open Research Knowledge Graph
    Authors
    Amine Abdelmaksoud
    Description

    This comparision highlgihts the key aspects of several research studies focused on predicting drug interactions and drug-target associations using various machine learning techniques. The studies use various datasets such as DrugBank, LINCS signatures, and biological databases, employing algorithms like convolutional neural networks, graph convolutional networks, and deep learning methods. Evaluation metrics include accuracy, F-score, area under the curve (AUC), and precision-recall metrics, showing advancements in computational methods for pharmacological research.

  11. o

    Data from: EukProt: a database of genome-scale predicted proteins across the...

    • explore.openaire.eu
    Updated Jan 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Richter; Cédric Berney; Jürgen Strassert; Yu-Ping Poh; Emily K. Herman; Sergio A. Muñoz-Gómez; Jeremy G. Wideman; Fabien Burki; Colomban de Vargas (2022). EukProt: a database of genome-scale predicted proteins across the diversity of eukaryotes [Dataset]. http://doi.org/10.6084/m9.figshare.12417881
    Explore at:
    Dataset updated
    Jan 1, 2022
    Authors
    Daniel Richter; Cédric Berney; Jürgen Strassert; Yu-Ping Poh; Emily K. Herman; Sergio A. Muñoz-Gómez; Jeremy G. Wideman; Fabien Burki; Colomban de Vargas
    Description

    Version 3 (22 November, 2021) See https://doi.org/10.24072/pcjournal.173 for a detailed description of the database. See http://evocellbio.com/eukprot/ for a BLAST database, interactive plots of BUSCO scores and ‘The Comparative Set’ (TCS): A selected subset of EukProt for comparative genomics investigations. Protein sequence FASTA files of the TCS are available at https://doi.org/10.6084/m9.figshare.21586065. See https://github.com/beaplab/EukProt for utility scripts, annotations, and all the files necessary to build the tree in Figures 1 and 3 (from the DOI above). Scroll to the end of this page for changes since version 2. Are we missing anything? Please let us know! EukProt is a database of published and publicly available predicted protein sets selected to represent the breadth of eukaryotic diversity, currently including 993 species from all major supergroups as well as orphan taxa. The goal of the database is to provide a single, convenient resource for gene-based research across the spectrum of eukaryotic life, such as phylogenomics and gene family evolution. Each species is placed within the UniEuk taxonomic framework in order to facilitate downstream analyses, and each data set is associated with a unique, persistent identifier to facilitate comparison and replication among analyses. The database is regularly updated, and all versions will be permanently stored and made available via FigShare. The current version has a number of updates, notably ‘The Comparative Set’ (TCS), a reduced taxonomic set with high estimated completeness while maintaining a substantial phylogenetic breadth, which comprises 196 predicted proteomes. A BLAST web server and graphical displays of data set completeness are available at http://evocellbio.com/eukprot/. We invite the community to provide suggestions for new data sets and new annotation features to be included in subsequent versions, with the goal of building a collaborative resource that will promote research to understand eukaryotic diversity and diversification. This release contains 5 files: EukProt_proteins.v03.2021_11_22.tgz: 993 protein data sets, for species with either a genome (375) or single-cell genome (56), a transcriptome (498), a single-cell transcriptome (47), or an EST assembly (17). EukProt_genome_annotations.v03.2021_11_22.tgz: gene annotations, in GFF format, as produced by EukMetaSanity (https://github.com/cjneely10/EukMetaSanity) for 40 genomes lacking publicly available protein annotations. The proteins predicted from these annotations are included in the proteins file. EukProt_included_data_sets.v03.2021_11_22.txt and EukProt_not_included_data_sets.v03.2021_11_22.txt: tables of information on data sets either included (993 data sets) or not included (163) in the database. Tab-delimited; multiple entries in the same cell are comma-delimited; missing data is represented with the “N/A” value. With the following columns: EukProt_ID: the unique identifier associated with the data set. This will not change among versions. If a new data set becomes available for the species, it will be assigned a new unique identifier. Name_to_Use: the name of the species for protein/genome annotation/assembled transcriptome files. Strain: the strain(s) of the species sequenced. Previous_Names: any previous names that this species was known by. Replaces_EukProt_ID/Replaced_by_EukProt_ID: if the data set changes with respect to an earlier version, the EukProt ID of the data set that it replaces (in the included table) or that it is replaced by (in the not_included table). Genus_UniEuk, Epithet_UniEuk, Supergroup_UniEuk, Taxogroup1_UniEuk, Taxogroup2_UniEuk: taxonomic identifiers at different levels of the UniEuk taxonomy (Berney et al. 2017, DOI: 10.1111/jeu.12414, based on Adl et al. 2019, DOI: 10.1111/jeu.12691). Taxonomy_UniEuk: the full lineage of the species in the UniEuk taxonomy (semicolon-delimited). Merged_Strains: whether multiple strains of the same species were merged to create the data set. Data_Source_URL: the URL(s) from which the data were downloaded. Data_Source_Name: the name of the data set (as assigned by the data source). Paper_DOI: the DOI(s) of the paper(s) that published the data set. Actions_Prior_to_Use: the action(s) that were taken to process the publicly available files in order to produce the data set in this database. Actions taken (see our manuscript for more details): ‘assemble mRNA’: Trinity v. 2.8.4, http://trinityrnaseq.github.io/ ‘CD-HIT’: v. 4.6, http://weizhongli-lab.org/cd-hit/ ‘extractfeat’, ‘seqret’, ‘transeq’, ‘trimseq’: from EMBOSS package v. 6.6.0.0, http://emboss.sourceforge.net/ ‘translate mRNA’: Transdecoder v. 5.3.0, http://transdecoder.github.io/ ‘gffread’: v.0.12.3 https://github.com/gpertea/gffread ‘predict genes’: EukMetaSanity https://github.com/cjneely10/EukMetaSanity (cloned on 21 September, 2021) All parameter values were default, unless otherwise specified. Data_Source_Type: the type o...

  12. d

    Data from: Prophage-DB: A comprehensive database to explore diversity,...

    • search.dataone.org
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Etan Dieppa-Colón; Cody Martin; Karthik Anantharaman (2024). Prophage-DB: A comprehensive database to explore diversity, distribution, and ecology of prophages [Dataset]. http://doi.org/10.5061/dryad.3n5tb2rs5
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Etan Dieppa-Colón; Cody Martin; Karthik Anantharaman
    Time period covered
    Jun 27, 2024
    Description

    Background: Viruses that infect prokaryotes (phages) constitute the most abundant group of biological agents, playing pivotal roles in microbial systems. They are known to impact microbial community dynamics, microbial ecology, and evolution. Efforts to document the diversity, host range, infection dynamics, and effects of bacteriophage infection on host cell metabolism are still at the surface level. Among phages, some adopt the lysogenic mode of infection, where the genome integrates into the host cell genome, forming a prophage. Prophages enable viral genome replication without host cell lysis and often contribute novel and beneficial traits to the host genome. Despite their importance, research on prophages is limited. Current phage research predominantly focuses on lytic phages, leaving a significant gap in knowledge regarding prophages, including their biology, diversity, and ecological roles. Results: To bridge this gap, the creation of Prophage-DB, a prophage database, aims to a..., , , # Prophage-DB: A comprehensive database to explore diversity, distribution, and ecology of prophages

    https://doi.org/10.5061/dryad.3n5tb2rs5

    This dataset contains prophage sequences (available as .fna files) identified from prokaryotic genomes from three public databases (Genome Taxonomy Database (GTDB) (release 207), National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (accessed March 2023), and Searchable Planetary-scale mIcrobiome REsource (SPIRE). The downloaded prokaryotic genomes from these databases contained both archaeal and bacterial representative genomes (SPIRE also included data from unknown hosts).Â

    Methods

    Prophage identification from downloaded representative genomes was carried out using VIBRANT (v1.2.1). We used the default arguments when using VIBRANT (minimum scaffold length requirement = 1000 base pairs, minimum number of open readings frames (ORFs, or proteins) per scaffold requi...

  13. r

    Australian Nucleotide (DNA/RNA) and Protein sequences from Australian...

    • researchdata.edu.au
    Updated Jul 20, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    QFAB Bioinformatics (2012). Australian Nucleotide (DNA/RNA) and Protein sequences from Australian organisms in the species Amphibolis antarctica [Dataset]. https://researchdata.edu.au/australian-nucleotide-dnarna-amphibolis-antarctica/80398
    Explore at:
    Dataset updated
    Jul 20, 2012
    Dataset provided by
    QFAB
    Authors
    QFAB Bioinformatics
    Area covered
    Australia
    Description

    This data collection contains all currently published nucleotide (DNA/RNA) and protein sequences from Australian Amphibolis antarctica, commonly known as Sea Nymph. Other information about this group:

    The nucleotide (DNA/RNA) and protein sequences have been sourced through the European Nucleotide Archive (ENA) and Universal Protein Resource (UniProt), databases that contains comprehensive sets of nucleotide (DNA/RNA) and protein sequences from all organisms that have been published by the International Research Community.

    The identification of species in Amphibolis antarctica as Australian dwelling organisms has been achieved by accessing the Australian Plant Census (APC) or Australian Faunal Directory (AFD) through the Atlas of Living Australia.

  14. e

    NCBIFAM

    • ebi.ac.uk
    Updated Aug 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). NCBIFAM [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Aug 6, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NCBIfam is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology. NCBIfam is maintained at the National Center for Biotechnology Information (Bethesda, MD). NCBIfam includes models from TIGRFAMs, another database of protein families developed at The Institute for Genomic Research, then at the J. Craig Venter Institute (Rockville, MD, US).

  15. Data from: Saccharomyces genome database informs human biology

    • ckan.grassroots.tools
    pdf
    Updated Aug 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Bioinformatics Institute (2019). Saccharomyces genome database informs human biology [Dataset]. https://ckan.grassroots.tools/ar/dataset/a474c44c-efd7-48cc-98b2-fe0f0c209bd5
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 7, 2019
    Dataset provided by
    European Bioinformatics Institutehttp://www.ebi.ac.uk/
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and to provide this wealth of information to advance, in many ways, research on other organisms, even those as evolutionarily distant as humans. To build such a bridge between biological kingdoms, SGD is curating data regarding yeast-human complementation, in which a human gene can successfully replace the function of a yeast gene, and/or vice versa. These data are manually curated from published literature, made available for download, and incorporated into a variety of analysis tools provided by SGD.

  16. Manipulated Animal Community Database

    • search.datacite.org
    Updated Mar 21, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Supp (2014). Manipulated Animal Community Database [Dataset]. http://doi.org/10.6084/m9.figshare.969831
    Explore at:
    Dataset updated
    Mar 21, 2014
    Dataset provided by
    DataCite
    Figsharehttp://figshare.com/
    figshare
    Authors
    Sarah Supp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    6,698 records indicated the presence and abundance of animal species, including representatives across trophic groups and size classes documented at 254 sites throughout the world, encompassing a variety of habitats. We accessed peer-reviewed articles, government publications, and theses that were freely available with the Utah State University library subscription and were published in English. We extracted data from articles that reported species-level abundance for a control community and at least one manipulated community. The data here represent a single data point each for the control treatment and the manipulated treatment(s) in each study. Data came from a wide variety of sites including artificial experiments (i.e., caged exclosures, habitat modules, nutrient addition) and human-mediated “natural” experiments (e.g., wildfire or controlled burn, logging, grazed plots, pollution). Sites represent all continents except Antarctica, and widely varying terrestrial animal groups (arachnid, insect, herpetofauna [reptiles and amphibians], mammal, and bird).

  17. f

    Table1_A Comprehensive Database for DNA Adductomics.xlsx

    • datasetcatalog.nlm.nih.gov
    Updated May 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dragsted, Lars Ove; La Barbera, Giorgia; Stanstrup, Jan; Cuparencu, Catalina; Nommesen, Katrine Dalmo (2022). Table1_A Comprehensive Database for DNA Adductomics.xlsx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000400911
    Explore at:
    Dataset updated
    May 27, 2022
    Authors
    Dragsted, Lars Ove; La Barbera, Giorgia; Stanstrup, Jan; Cuparencu, Catalina; Nommesen, Katrine Dalmo
    Description

    The exposure of human DNA to genotoxic compounds induces the formation of covalent DNA adducts, which may contribute to the initiation of carcinogenesis. Liquid chromatography (LC) coupled with high-resolution mass spectrometry (HRMS) is a powerful tool for DNA adductomics, a new research field aiming at screening known and unknown DNA adducts in biological samples. The lack of databases and bioinformatics tool in this field limits the applicability of DNA adductomics. Establishing a comprehensive database will make the identification process faster and more efficient and will provide new insight into the occurrence of DNA modification from a wide range of genotoxicants. In this paper, we present a four-step approach used to compile and curate a database for the annotation of DNA adducts in biological samples. The first step included a literature search, selecting only DNA adducts that were unequivocally identified by either comparison with reference standards or with nuclear magnetic resonance (NMR), and tentatively identified by tandem HRMS/MS. The second step consisted in harmonizing structures, molecular formulas, and names, for building a systematic database of 279 DNA adducts. The source, the study design and the technique used for DNA adduct identification were reported. The third step consisted in implementing the database with 303 new potential DNA adducts coming from different combinations of genotoxicants with nucleobases, and reporting monoisotopic masses, chemical formulas, .cdxml files, .mol files, SMILES, InChI, InChIKey and IUPAC nomenclature. In the fourth step, a preliminary spectral library was built by acquiring experimental MS/MS spectra of 15 reference standards, generating in silico MS/MS fragments for all the adducts, and reporting both experimental and predicted fragments into interactive web datatables. The database, including 582 entries, is publicly available (https://gitlab.com/nexs-metabolomics/projects/dna_adductomics_database). This database is a powerful tool for the annotation of DNA adducts measured in (HR)MS. The inclusion of metadata indicating the source of DNA adducts, the study design and technique used, allows for prioritization of the DNA adducts of interests and/or to enhance the annotation confidence. DNA adducts identification can be further improved by integrating the present database with the generation of authentic MS/MS spectra, and with user-friendly bioinformatics tools.

  18. Data from: KEGGscape: a Cytoscape app for pathway data integration

    • figshare.com
    png
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kozo Nishida; Keiichiro Ono; Shigehiko Kanaya; Koichi Takahashi (2023). KEGGscape: a Cytoscape app for pathway data integration [Dataset]. http://doi.org/10.6084/m9.figshare.1111757.v5
    Explore at:
    pngAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kozo Nishida; Keiichiro Ono; Shigehiko Kanaya; Koichi Takahashi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this paper, we present KEGGscape a pathway data integration and visualization app for Cytoscape (http://apps.cytoscape.org/apps/keggscape). KEGG is a comprehensive public biological database that contains large collection of human curated pathways. KEGGscape utilizes the database to reproduce the corresponding hand-drawn pathway diagrams with as much detail as possible in Cytoscape. Further, it allows users to import pathway data sets to visualize biologist-friendly diagrams using the Cytoscape core visualization function (Visual Style) and the ability to perform pathway analysis with a variety of Cytoscape apps. From the analyzed data, users can create complex and interactive visualizations which cannot be done in the KEGG PATHWAY web application. Experimental data with Affymetrix E. coli chips are used as an example to demonstrate how users can integrate pathways, annotations, and experimental data sets to create complex visualizations that clarify biological systems using KEGGscape and other Cytoscape apps.

  19. e

    HAMAP

    • ebi.ac.uk
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). HAMAP [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    HAMAP stands for High-quality Automated and Manual Annotation of Proteins. HAMAP profiles are manually created by expert curators. They identify proteins that are part of well-conserved protein families or subfamilies. HAMAP is based at the SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.

  20. Bioinformatics Protein Dataset - Simulated

    • kaggle.com
    zip
    Updated Dec 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Gallo (2024). Bioinformatics Protein Dataset - Simulated [Dataset]. https://www.kaggle.com/datasets/gallo33henrique/bioinformatics-protein-dataset-simulated
    Explore at:
    zip(12928905 bytes)Available download formats
    Dataset updated
    Dec 27, 2024
    Authors
    Rafael Gallo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Subtitle

    "Synthetic protein dataset with sequences, physical properties, and functional classification for machine learning tasks."

    Description

    Introduction

    This synthetic dataset was created to explore and develop machine learning models in bioinformatics. It contains 20,000 synthetic proteins, each with an amino acid sequence, calculated physicochemical properties, and a functional classification.

    Columns Included

    • ID_Protein: Unique identifier for each protein.
    • Sequence: String of amino acids.
    • Molecular_Weight: Molecular weight calculated from the sequence.
    • Isoelectric_Point: Estimated isoelectric point based on the sequence composition.
    • Hydrophobicity: Average hydrophobicity calculated from the sequence.
    • Total_Charge: Sum of the charges of the amino acids in the sequence.
    • Polar_Proportion: Percentage of polar amino acids in the sequence.
    • Nonpolar_Proportion: Percentage of nonpolar amino acids in the sequence.
    • Sequence_Length: Total number of amino acids in the sequence.
    • Class: The functional class of the protein, one of five categories: Enzyme, Transport, Structural, Receptor, Other.

    Inspiration and Sources

    While this is a simulated dataset, it was inspired by patterns observed in real protein datasets, such as: - UniProt: A comprehensive database of protein sequences and annotations. - Kyte-Doolittle Scale: Calculations of hydrophobicity. - Biopython: A tool for analyzing biological sequences.

    Proposed Uses

    This dataset is ideal for: - Training classification models for proteins. - Exploratory analysis of physicochemical properties of proteins. - Building machine learning pipelines in bioinformatics.

    How This Dataset Was Created

    1. Sequence Generation: Amino acid chains were randomly generated with lengths between 50 and 300 residues.
    2. Property Calculation: Physicochemical properties were calculated using the Biopython library.
    3. Class Assignment: Classes were randomly assigned for classification purposes.

    Limitations

    • The sequences and properties do not represent real proteins but follow patterns observed in natural proteins.
    • The functional classes are simulated and do not correspond to actual biological characteristics.

    Data Split

    The dataset is divided into two subsets: - Training: 16,000 samples (proteinas_train.csv). - Testing: 4,000 samples (proteinas_test.csv).

    Acknowledgment

    This dataset was inspired by real bioinformatics challenges and designed to help researchers and developers explore machine learning applications in protein analysis.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
OCHIN SHARMA (2023). Pneumonia Drug Exp Data [Dataset]. http://doi.org/10.17632/8bmpx4zvs8.1

Pneumonia Drug Exp Data

Explore at:
Dataset updated
Sep 29, 2023
Authors
OCHIN SHARMA
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset is the result of experiments conducted using Python and rdkit library.

Search
Clear search
Close search
Google apps
Main menu