100+ datasets found
  1. Bioinformatic databases survey

    • zenodo.org
    csv
    Updated Aug 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alise Ponsero; Alise Ponsero; Bonnie Hurwitz; Bonnie Hurwitz; Kiran Smelser; Kiran Smelser; Karen Valencia; Lucas Jimenez Miranda; Lucas Jimenez Miranda; Abby McDermott; Karen Valencia; Abby McDermott (2024). Bioinformatic databases survey [Dataset]. http://doi.org/10.5281/zenodo.12790448
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alise Ponsero; Alise Ponsero; Bonnie Hurwitz; Bonnie Hurwitz; Kiran Smelser; Kiran Smelser; Karen Valencia; Lucas Jimenez Miranda; Lucas Jimenez Miranda; Abby McDermott; Karen Valencia; Abby McDermott
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bioinformatic databases survey

    The dataset surveys bioinformatic databases published in the NAR database issue from 1995 to 2022. It evaluates the current number of citations and availability of each ressources.

    Data content

    The dataset is composed of two tables :

    A. Databases table : Contains the information of each database published in the NAR database issue.

    • db_id : Database ID in the dataset
    • resource_name : Name(s) of the database
    • current_access : Latest known web address of the database
    • is_a_pun : The database name is a play on word
    • available_2022 : The database was accessible online during the 2022 survey
    • last_accessible_year : If not accessible, latest point in time where the database was found online (using the Internet web archive snapshots)
    • unavailable_message : If not accessible, the message/error when trying to access the ressource
    • year_first_publication : Year of first publication of the database
    • year_last_publication : Year of latest publication of the database (including database update publications)
    • total_citations_2022 : Cumulative number of citation for all articles of the database
    • nb_authors_max : Maximum number of authors associated to any articles published for that database
    • nb_articles_2022 : Number of articles published for that database in 2022

    B. Articles table : Contains the information collected for the NAR articles

    • collector : Person who contributed to add this database in the dataset
    • article_global_id : DOI of the article surveyed
    • db_id : Database ID of the ressource described in the article
    • article_id : Article unique ID
    • article_year : Article publication year
    • Authors : list of authors of the article. Separated by ";"
    • Author.ID : list of ORCID of the authors of the article. Separated by ";"
    • Title : Title of the atricle
    • Source.title : Journal name
    • Volume : Volume number
    • Issue : Issue number
    • Funding.Details : Funding information of the article
    • Funding.Text : Funding text provided by the authors
    • PubMed.ID : Pubmed ID of the article
    • citations_2016 : Number of citations of the article in 2016 (if published)
    • citations_2022 : Number of citations of the article in 2022
    • nb_authors : Number of authors in the article
    • Index.Keywords : Keywords associated to the publication

    Data sources

    Note that the presented dataset leverage and expand on the dataset gathered and published in Imker, H.J., 2020. Who Bears the Burden of Long-Lived Molecular Biology Databases?. Data Science Journal, 19(1), p.8. The original dataset collected by Dr. Imker is available at : https://doi.org/10.13012/B2IDB-4311325_V1

    The dataset was collected and is maintained by undergraduate students of a CURE class (Course-based Undergraduate Research Experience) held at the University of Arizona. All students of the class have participated to the collection, update and curation the dataset that is available as a database and a web-portal at https://hurwitzlab.shinyapps.io/DS_Heroes/. Students could elect to be added or not as author to this Zenodo repository.

    The CURE class BAT102 "Data Science Heroes: An undergraduate research experience in Open Data Science Practices" gives the students an opportunity to learn about open science and investigate open data practices in bioinformatics through a survey of the databases published in the NAR database issue.

  2. I

    Molecular Biology Databases Published in Nucleic Acids Research between...

    • databank.illinois.edu
    Updated Feb 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heidi Imker (2024). Molecular Biology Databases Published in Nucleic Acids Research between 1991-2016 [Dataset]. http://doi.org/10.13012/B2IDB-4311325_V1
    Explore at:
    Dataset updated
    Feb 1, 2024
    Authors
    Heidi Imker
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset was developed to create a census of sufficiently documented molecular biology databases to answer several preliminary research questions. Articles published in the annual Nucleic Acids Research (NAR) “Database Issues” were used to identify a population of databases for study. Namely, the questions addressed herein include: 1) what is the historical rate of database proliferation versus rate of database attrition?, 2) to what extent do citations indicate persistence?, and 3) are databases under active maintenance and does evidence of maintenance likewise correlate to citation? An overarching goal of this study is to provide the ability to identify subsets of databases for further analysis, both as presented within this study and through subsequent use of this openly released dataset.

  3. ASURAT knowledge-based databases

    • figshare.com
    application/gzip
    Updated May 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keita Iida (2022). ASURAT knowledge-based databases [Dataset]. http://doi.org/10.6084/m9.figshare.19102598.v5
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 9, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Keita Iida
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Knowledge-based databases and the codes for collecting these databases are stored.

  4. I

    Funding and Operating Organizations for Long-Lived Molecular Biology...

    • databank.illinois.edu
    • aws-databank-alb.library.illinois.edu
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heidi Imker, Funding and Operating Organizations for Long-Lived Molecular Biology Databases [Dataset]. http://doi.org/10.13012/B2IDB-3993338_V1
    Explore at:
    Authors
    Heidi Imker
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The organizations that contribute to the longevity of 67 long-lived molecular biology databases published in Nucleic Acids Research (NAR) between 1991-2016 were identified to address two research questions 1) which organizations fund these databases? and 2) which organizations maintain these databases? Funders were determined by examining funding acknowledgements in each database's most recent NAR Database Issue update article published (prior to 2017) and organizations operating the databases were determine through review of database websites.

  5. e

    PROSITE profiles

    • ebi.ac.uk
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PROSITE profiles [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.

  6. n

    Bioinformatics Links Directory

    • neuinfo.org
    • scicrunch.org
    • +3more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Bioinformatics Links Directory [Dataset]. http://identifiers.org/RRID:SCR_008018
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database of curated links to molecular resources, tools and databases selected on the basis of recommendations from bioinformatics experts in the field. This resource relies on input from its community of bioinformatics users for suggestions. Starting in 2003, it has also started listing all links contained in the NAR Webserver issue. The different types of information available in this portal: * Computer Related: This category contains links to resources relating to programming languages often used in bioinformatics. Other tools of the trade, such as web development and database resources, are also included here. * Sequence Comparison: Tools and resources for the comparison of sequences including sequence similarity searching, alignment tools, and general comparative genomics resources. * DNA: This category contains links to useful resources for DNA sequence analyses such as tools for comparative sequence analysis and sequence assembly. Links to programs for sequence manipulation, primer design, and sequence retrieval and submission are also listed here. * Education: Links to information about the techniques, materials, people, places, and events of the greater bioinformatics community. Included are current news headlines, literature sources, educational material and links to bioinformatics courses and workshops. * Expression: Links to tools for predicting the expression, alternative splicing, and regulation of a gene sequence are found here. This section also contains links to databases, methods, and analysis tools for protein expression, SAGE, EST, and microarray data. * Human Genome: This section contains links to draft annotations of the human genome in addition to resources for sequence polymorphisms and genomics. Also included are links related to ethical discussions surrounding the study of the human genome. * Literature: Links to resources related to published literature, including tools to search for articles and through literature abstracts. Additional text mining resources, open access resources, and literature goldmines are also listed. * Model Organisms: Included in this category are links to resources for various model organisms ranging from mammals to microbes. These include databases and tools for genome scale analyses. * Other Molecules: Bioinformatics tools related to molecules other than DNA, RNA, and protein. This category will include resources for the bioinformatics of small molecules as well as for other biopolymers including carbohydrates and metabolites. * Protein: This category contains links to useful resources for protein sequence and structure analyses. Resources for phylogenetic analyses, prediction of protein features, and analyses of interactions are also found here. * RNA: Resources include links to sequence retrieval programs, structure prediction and visualization tools, motif search programs, and information on various functional RNAs.

  7. List of bioinformatics tools and databases students used.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Carlos Sousa; Manuel João Costa; Joana Almeida Palha (2023). List of bioinformatics tools and databases students used. [Dataset]. http://doi.org/10.1371/journal.pone.0000481.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    João Carlos Sousa; Manuel João Costa; Joana Almeida Palha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    List of bioinformatics tools and databases students used.

  8. Bioinformatics Protein Dataset - Simulated

    • kaggle.com
    zip
    Updated Dec 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Gallo (2024). Bioinformatics Protein Dataset - Simulated [Dataset]. https://www.kaggle.com/datasets/gallo33henrique/bioinformatics-protein-dataset-simulated
    Explore at:
    zip(12928905 bytes)Available download formats
    Dataset updated
    Dec 27, 2024
    Authors
    Rafael Gallo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Subtitle

    "Synthetic protein dataset with sequences, physical properties, and functional classification for machine learning tasks."

    Description

    Introduction

    This synthetic dataset was created to explore and develop machine learning models in bioinformatics. It contains 20,000 synthetic proteins, each with an amino acid sequence, calculated physicochemical properties, and a functional classification.

    Columns Included

    • ID_Protein: Unique identifier for each protein.
    • Sequence: String of amino acids.
    • Molecular_Weight: Molecular weight calculated from the sequence.
    • Isoelectric_Point: Estimated isoelectric point based on the sequence composition.
    • Hydrophobicity: Average hydrophobicity calculated from the sequence.
    • Total_Charge: Sum of the charges of the amino acids in the sequence.
    • Polar_Proportion: Percentage of polar amino acids in the sequence.
    • Nonpolar_Proportion: Percentage of nonpolar amino acids in the sequence.
    • Sequence_Length: Total number of amino acids in the sequence.
    • Class: The functional class of the protein, one of five categories: Enzyme, Transport, Structural, Receptor, Other.

    Inspiration and Sources

    While this is a simulated dataset, it was inspired by patterns observed in real protein datasets, such as: - UniProt: A comprehensive database of protein sequences and annotations. - Kyte-Doolittle Scale: Calculations of hydrophobicity. - Biopython: A tool for analyzing biological sequences.

    Proposed Uses

    This dataset is ideal for: - Training classification models for proteins. - Exploratory analysis of physicochemical properties of proteins. - Building machine learning pipelines in bioinformatics.

    How This Dataset Was Created

    1. Sequence Generation: Amino acid chains were randomly generated with lengths between 50 and 300 residues.
    2. Property Calculation: Physicochemical properties were calculated using the Biopython library.
    3. Class Assignment: Classes were randomly assigned for classification purposes.

    Limitations

    • The sequences and properties do not represent real proteins but follow patterns observed in natural proteins.
    • The functional classes are simulated and do not correspond to actual biological characteristics.

    Data Split

    The dataset is divided into two subsets: - Training: 16,000 samples (proteinas_train.csv). - Testing: 4,000 samples (proteinas_test.csv).

    Acknowledgment

    This dataset was inspired by real bioinformatics challenges and designed to help researchers and developers explore machine learning applications in protein analysis.

  9. e

    NCBIFAM

    • ebi.ac.uk
    Updated Aug 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). NCBIFAM [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Aug 6, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NCBIfam is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology. NCBIfam is maintained at the National Center for Biotechnology Information (Bethesda, MD). NCBIfam includes models from TIGRFAMs, another database of protein families developed at The Institute for Genomic Research, then at the J. Craig Venter Institute (Rockville, MD, US).

  10. u

    Indexed NCBI nt database - original

    • figshare.unimelb.edu.au
    bin
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VANESSA ROSSETTO MARCELINO (2024). Indexed NCBI nt database - original [Dataset]. http://doi.org/10.26188/25222610.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 28, 2024
    Dataset provided by
    The University of Melbourne
    Authors
    VANESSA ROSSETTO MARCELINO
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Indexed NCBI nucleotide database, used to benchmark CCMetagen in its original publication.To download from the command line, use:curl "https://mediaflux.researchsoftware.unimelb.edu.au:443/mflux/share.mfjp?_token=i8yedNiYfdjrBfGJ8Y5z1128247857&browser=true&filename=ncbi_nt_kma.zip" -d browser=false -o ncbi_nt_kma.zip

  11. m

    Data from: PeTMbase: A database of plant endogenous target mimics (eTMs)

    • data.mendeley.com
    Updated Nov 23, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gökhan Karakülah (2016). PeTMbase: A database of plant endogenous target mimics (eTMs) [Dataset]. http://doi.org/10.17632/htgxryrcv2.1
    Explore at:
    Dataset updated
    Nov 23, 2016
    Authors
    Gökhan Karakülah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MicroRNAs (miRNA) are small endogenous RNA molecules, which regulate target gene expression at post-transcriptional level. Besides, miRNA activity can be controlled by a newly discovered regulatory mechanism called endogenous target mimicry (eTM). In target mimicry, eTMs bind to the corresponding miRNAs to block the binding of specific transcript leading to increase mRNA expression. Thus, miRNA-eTM-target-mRNA regulation modules involving a wide range of biological processes; an increasing need for a comprehensive eTM database arose. Except miRSponge with limited number of Arabidopsis eTM data no available database and/or repository was developed and released for plant eTMs yet. Here, we present an online plant eTM database, called PeTMbase (http://petmbase.org), with a highly efficient search tool. To establish the repository a number of identified eTMs was obtained utilizing from high-throughput RNA-sequencing data of 11 plant species. Each transcriptome libraries is first mapped to corresponding plant genome, then long non-coding RNA (lncRNA) transcripts are characterized. Furthermore, additional lncRNAs retrieved from GREENC and PNRD were incorporated into the lncRNA catalog. Then, utilizing the lncRNA and miRNA sources a total of 2,728 eTMs were successfully predicted. Our regularly updated database, PeTMbase, provides high quality information regarding miRNA:eTM modules and will aid functional genomics studies particularly, on miRNA regulatory networks.

  12. e

    HAMAP

    • ebi.ac.uk
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). HAMAP [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    HAMAP stands for High-quality Automated and Manual Annotation of Proteins. HAMAP profiles are manually created by expert curators. They identify proteins that are part of well-conserved protein families or subfamilies. HAMAP is based at the SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.

  13. q

    CourseSource Bioinformatics Teaching Materials

    • qubeshub.org
    Updated Aug 3, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hayley Orndorf (2016). CourseSource Bioinformatics Teaching Materials [Dataset]. https://qubeshub.org/publications/20
    Explore at:
    Dataset updated
    Aug 3, 2016
    Dataset provided by
    QUBES
    Authors
    Hayley Orndorf
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    CourseSource Teaching Materials tagged "Bioinformatics"

  14. zol: prepTG Databases for ESKAPE Pathogens

    • zenodo.org
    • nde-dev.biothings.io
    • +1more
    application/gzip
    Updated Oct 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rauf Salamzade; Rauf Salamzade; Lindsay Kalan; Lindsay Kalan (2023). zol: prepTG Databases for ESKAPE Pathogens [Dataset]. http://doi.org/10.5281/zenodo.10042148
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 26, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rauf Salamzade; Rauf Salamzade; Lindsay Kalan; Lindsay Kalan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Each of the tar.gz compressed directories corresponds to prepTG databases (for the zol suite) featuring distinct, representative genomes for one of the six genera containing ESKAPE pathogens. Representative genomes for each genus/taxon were selected using skDER v1.0.7 in greedy mode with 99% ANI and 90% AF cutoffs.

    The compressed folders also contain an extra file, corresponding to a species tree of the representative genomes constructed using GToTree with Universal markers (ribosomal proteins) from Hug et al. 2016 and in best-hits mode. Note, GToTree was modified to always use -super5 mode for SCG alignments for computational efficiency. Also, note, because genomes can be dropped by GToTree prior to phylogeny inference (e.g. if they lack enough SCGs), not all genomes in the database might be represented in the phylogenies.

  15. d

    3D-Genomics Database

    • dknet.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). 3D-Genomics Database [Dataset]. http://identifiers.org/RRID:SCR_007430
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented August 29, 2016. Database containing structural annotations for the proteomes of just under 100 organisms. Using data derived from public databases of translated genomic sequences, representatives from the major branches of Life are included: Prokaryota, Eukaryota and Archaea. The annotations stored in the database may be accessed in a number of ways. The help page provides information on how to access the database. 3D-GENOMICS is now part of a larger project, called e-Protein. The project brings together similar databases at three sites: Imperial College London , University College London and the European Bioinformatics Institute . e-Protein''s mission statement is To provide a fully automated distributed pipeline for large-scale structural and functional annotation of all major proteomes via the use of cutting-edge computer GRID technologies. The following databases are incorporated: NRprot, SCOP, ASTRAL, PFAM, Prosite, taxonomy, COG The following eukaryotic genomes are incorporated: Anopheles gambiae, protein sequences from the mosquito genome; Arabidopsis thaliana, protein sequences from the Arabidopsis genome; Caenorhabditis briggsae, protein sequences from the C.briggsae genome; Caenorhabditis elegans protein sequences from the worm genome; Ciona intestinalis protein sequences from the sea squirt genome; Danio rerio protein sequences from the zebrafish genome; Drosophila melanogaster protein sequences from the fruitfly genome; Encephalitozoon cuniculi protein sequences from the E.cuniculi genome; Fugu rubripes protein sequences from the pufferfish genome; Guillardia theta protein sequences from the G.theta genome; Homo sapiens protein sequences from the human genome; Mus musculus protein sequences from the mouse genome; Neurospora crassa protein sequences from the N.crassa genome; Oryza sativa protein sequences from the rice genome; Plasmodium falciparum protein sequences from the P.falciparum genome; Rattus norvegicus protein sequences from the rat genome; Saccharomyces cerevisiae protein sequences from the yeast genome; Schizosaccharomyces pombe protein sequences from the yeast genome

  16. e

    SFLD

    • ebi.ac.uk
    Updated Sep 7, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). SFLD [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Sep 7, 2018
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SFLD (Structure-Function Linkage Database) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities.

  17. m

    Pneumonia Drug Exp Data

    • data.mendeley.com
    Updated Sep 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OCHIN SHARMA (2023). Pneumonia Drug Exp Data [Dataset]. http://doi.org/10.17632/8bmpx4zvs8.1
    Explore at:
    Dataset updated
    Sep 29, 2023
    Authors
    OCHIN SHARMA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is the result of experiments conducted using Python and rdkit library.

  18. n

    DRCAT Resource Catalogue

    • neuinfo.org
    • rrid.site
    • +2more
    Updated Jul 29, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). DRCAT Resource Catalogue [Dataset]. http://identifiers.org/RRID:SCR_005931
    Explore at:
    Dataset updated
    Jul 29, 2011
    Description

    Data resource catalog that collates metadata on bioinformatics Web-based data resources including databases, ontologies, taxonomies and catalogues. An entry includes information such as resource identifier(s), name, description and URL. ''''Query'''' lines are defined for each resource that describe what type(s) of data are available, in what format, how (by what identifier) the data can be retrieved and from where (URL). DRCAT was developed to provide more extensive data integration for EMBOSS, but it has many applications beyond EMBOSS. DRCAT entries (including ''''Query'''' lines) are annotated with terms from the EDAM ontology of common bioinformatics concepts.

  19. r

    expam RefSeq Database

    • researchdata.edu.au
    • bridges.monash.edu
    Updated Jun 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sean Solari; Remy Young; Vanessa Marcelino; Sam Forster (2022). expam RefSeq Database [Dataset]. http://doi.org/10.26180/19653840.v2
    Explore at:
    Dataset updated
    Jun 28, 2022
    Dataset provided by
    Monash University
    Authors
    Sean Solari; Remy Young; Vanessa Marcelino; Sam Forster
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    expam reference database used for benchmarking and comparison against metagenome profilers.

  20. m

    Database of Peptides with Potential for Pharmacological Intervention in...

    • data.mendeley.com
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Micael da Silva Pirazoli Gonzalez (2023). Database of Peptides with Potential for Pharmacological Intervention in Human Pathogen Molecular Targets [Dataset]. http://doi.org/10.17632/2zhgy9ggdv.1
    Explore at:
    Dataset updated
    Jun 6, 2023
    Authors
    Micael da Silva Pirazoli Gonzalez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Peptides are polymeric chains used as research objects in the search for new drugs with greater efficacy and fewer side effects. Therefore, we created three databases of antimicrobial peptides using PubChem and ChEMBL. First we acquired the Simplified Molecular-Input Line-Entry System (SMILES) of several peptides belonging to different types of pathogens, namely bacteria, viruses, parasites, and fungi. Using the OpenBabel software, these SMILES had their file formats and structures converted to create: one database in one dimension SMI format, and two with three-dimensional MOL2 and PDB file formats. In total the three databases consists of 718 peptides that have been shown to possess inhibitory activity on molecular targets of clinically important pathogens.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alise Ponsero; Alise Ponsero; Bonnie Hurwitz; Bonnie Hurwitz; Kiran Smelser; Kiran Smelser; Karen Valencia; Lucas Jimenez Miranda; Lucas Jimenez Miranda; Abby McDermott; Karen Valencia; Abby McDermott (2024). Bioinformatic databases survey [Dataset]. http://doi.org/10.5281/zenodo.12790448
Organization logo

Bioinformatic databases survey

Explore at:
csvAvailable download formats
Dataset updated
Aug 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alise Ponsero; Alise Ponsero; Bonnie Hurwitz; Bonnie Hurwitz; Kiran Smelser; Kiran Smelser; Karen Valencia; Lucas Jimenez Miranda; Lucas Jimenez Miranda; Abby McDermott; Karen Valencia; Abby McDermott
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Bioinformatic databases survey

The dataset surveys bioinformatic databases published in the NAR database issue from 1995 to 2022. It evaluates the current number of citations and availability of each ressources.

Data content

The dataset is composed of two tables :

A. Databases table : Contains the information of each database published in the NAR database issue.

  • db_id : Database ID in the dataset
  • resource_name : Name(s) of the database
  • current_access : Latest known web address of the database
  • is_a_pun : The database name is a play on word
  • available_2022 : The database was accessible online during the 2022 survey
  • last_accessible_year : If not accessible, latest point in time where the database was found online (using the Internet web archive snapshots)
  • unavailable_message : If not accessible, the message/error when trying to access the ressource
  • year_first_publication : Year of first publication of the database
  • year_last_publication : Year of latest publication of the database (including database update publications)
  • total_citations_2022 : Cumulative number of citation for all articles of the database
  • nb_authors_max : Maximum number of authors associated to any articles published for that database
  • nb_articles_2022 : Number of articles published for that database in 2022

B. Articles table : Contains the information collected for the NAR articles

  • collector : Person who contributed to add this database in the dataset
  • article_global_id : DOI of the article surveyed
  • db_id : Database ID of the ressource described in the article
  • article_id : Article unique ID
  • article_year : Article publication year
  • Authors : list of authors of the article. Separated by ";"
  • Author.ID : list of ORCID of the authors of the article. Separated by ";"
  • Title : Title of the atricle
  • Source.title : Journal name
  • Volume : Volume number
  • Issue : Issue number
  • Funding.Details : Funding information of the article
  • Funding.Text : Funding text provided by the authors
  • PubMed.ID : Pubmed ID of the article
  • citations_2016 : Number of citations of the article in 2016 (if published)
  • citations_2022 : Number of citations of the article in 2022
  • nb_authors : Number of authors in the article
  • Index.Keywords : Keywords associated to the publication

Data sources

Note that the presented dataset leverage and expand on the dataset gathered and published in Imker, H.J., 2020. Who Bears the Burden of Long-Lived Molecular Biology Databases?. Data Science Journal, 19(1), p.8. The original dataset collected by Dr. Imker is available at : https://doi.org/10.13012/B2IDB-4311325_V1

The dataset was collected and is maintained by undergraduate students of a CURE class (Course-based Undergraduate Research Experience) held at the University of Arizona. All students of the class have participated to the collection, update and curation the dataset that is available as a database and a web-portal at https://hurwitzlab.shinyapps.io/DS_Heroes/. Students could elect to be added or not as author to this Zenodo repository.

The CURE class BAT102 "Data Science Heroes: An undergraduate research experience in Open Data Science Practices" gives the students an opportunity to learn about open science and investigate open data practices in bioinformatics through a survey of the databases published in the NAR database issue.

Search
Clear search
Close search
Google apps
Main menu