100+ datasets found

Bioinformatic databases survey
zenodo.org
csv
Updated Aug 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alise Ponsero; Alise Ponsero; Bonnie Hurwitz; Bonnie Hurwitz; Kiran Smelser; Kiran Smelser; Karen Valencia; Lucas Jimenez Miranda; Lucas Jimenez Miranda; Abby McDermott; Karen Valencia; Abby McDermott (2024). Bioinformatic databases survey [Dataset]. http://doi.org/10.5281/zenodo.12790448
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12790448
Dataset updated
Aug 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alise Ponsero; Alise Ponsero; Bonnie Hurwitz; Bonnie Hurwitz; Kiran Smelser; Kiran Smelser; Karen Valencia; Lucas Jimenez Miranda; Lucas Jimenez Miranda; Abby McDermott; Karen Valencia; Abby McDermott
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Bioinformatic databases survey

The dataset surveys bioinformatic databases published in the NAR database issue from 1995 to 2022. It evaluates the current number of citations and availability of each ressources.

Data content

The dataset is composed of two tables :

A. Databases table : Contains the information of each database published in the NAR database issue.

db_id : Database ID in the dataset

resource_name : Name(s) of the database

current_access : Latest known web address of the database

is_a_pun : The database name is a play on word

available_2022 : The database was accessible online during the 2022 survey

last_accessible_year : If not accessible, latest point in time where the database was found online (using the Internet web archive snapshots)

unavailable_message : If not accessible, the message/error when trying to access the ressource

year_first_publication : Year of first publication of the database

year_last_publication : Year of latest publication of the database (including database update publications)

total_citations_2022 : Cumulative number of citation for all articles of the database

nb_authors_max : Maximum number of authors associated to any articles published for that database

nb_articles_2022 : Number of articles published for that database in 2022

B. Articles table : Contains the information collected for the NAR articles

collector : Person who contributed to add this database in the dataset

article_global_id : DOI of the article surveyed

db_id : Database ID of the ressource described in the article

article_id : Article unique ID

article_year : Article publication year

Authors : list of authors of the article. Separated by ";"

Author.ID : list of ORCID of the authors of the article. Separated by ";"

Title : Title of the atricle

Source.title : Journal name

Volume : Volume number

Issue : Issue number

Funding.Details : Funding information of the article

Funding.Text : Funding text provided by the authors

PubMed.ID : Pubmed ID of the article

citations_2016 : Number of citations of the article in 2016 (if published)

citations_2022 : Number of citations of the article in 2022

nb_authors : Number of authors in the article

Index.Keywords : Keywords associated to the publication

Data sources

Note that the presented dataset leverage and expand on the dataset gathered and published in Imker, H.J., 2020. Who Bears the Burden of Long-Lived Molecular Biology Databases?. Data Science Journal, 19(1), p.8. The original dataset collected by Dr. Imker is available at : https://doi.org/10.13012/B2IDB-4311325_V1

The dataset was collected and is maintained by undergraduate students of a CURE class (Course-based Undergraduate Research Experience) held at the University of Arizona. All students of the class have participated to the collection, update and curation the dataset that is available as a database and a web-portal at https://hurwitzlab.shinyapps.io/DS_Heroes/. Students could elect to be added or not as author to this Zenodo repository.

The CURE class BAT102 "Data Science Heroes: An undergraduate research experience in Open Data Science Practices" gives the students an opportunity to learn about open science and investigate open data practices in bioinformatics through a survey of the databases published in the NAR database issue.
I
Molecular Biology Databases Published in Nucleic Acids Research between...
databank.illinois.edu
Updated Feb 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heidi Imker (2024). Molecular Biology Databases Published in Nucleic Acids Research between 1991-2016 [Dataset]. http://doi.org/10.13012/B2IDB-4311325_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-4311325_V1
Dataset updated
Feb 1, 2024
Authors
Heidi Imker
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset was developed to create a census of sufficiently documented molecular biology databases to answer several preliminary research questions. Articles published in the annual Nucleic Acids Research (NAR) “Database Issues” were used to identify a population of databases for study. Namely, the questions addressed herein include: 1) what is the historical rate of database proliferation versus rate of database attrition?, 2) to what extent do citations indicate persistence?, and 3) are databases under active maintenance and does evidence of maintenance likewise correlate to citation? An overarching goal of this study is to provide the ability to identify subsets of databases for further analysis, both as presented within this study and through subsequent use of this openly released dataset.
ASURAT knowledge-based databases
figshare.com
application/gzip
Updated May 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Keita Iida (2022). ASURAT knowledge-based databases [Dataset]. http://doi.org/10.6084/m9.figshare.19102598.v5
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19102598.v5
Dataset updated
May 9, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Keita Iida
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Knowledge-based databases and the codes for collecting these databases are stored.
I
Funding and Operating Organizations for Long-Lived Molecular Biology...
databank.illinois.edu
aws-databank-alb.library.illinois.edu
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heidi Imker, Funding and Operating Organizations for Long-Lived Molecular Biology Databases [Dataset]. http://doi.org/10.13012/B2IDB-3993338_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-3993338_V1
Authors
Heidi Imker
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The organizations that contribute to the longevity of 67 long-lived molecular biology databases published in Nucleic Acids Research (NAR) between 1991-2016 were identified to address two research questions 1) which organizations fund these databases? and 2) which organizations maintain these databases? Funders were determined by examining funding acknowledgements in each database's most recent NAR Database Issue update article published (prior to 2017) and organizations operating the databases were determine through review of database websites.
e
PROSITE profiles
ebi.ac.uk
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PROSITE profiles [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Feb 5, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.
n
Bioinformatics Links Directory
neuinfo.org
scicrunch.org
+3more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Bioinformatics Links Directory [Dataset]. http://identifiers.org/RRID:SCR_008018
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008018
Dataset updated
Jan 29, 2022
Description
Database of curated links to molecular resources, tools and databases selected on the basis of recommendations from bioinformatics experts in the field. This resource relies on input from its community of bioinformatics users for suggestions. Starting in 2003, it has also started listing all links contained in the NAR Webserver issue. The different types of information available in this portal: * Computer Related: This category contains links to resources relating to programming languages often used in bioinformatics. Other tools of the trade, such as web development and database resources, are also included here. * Sequence Comparison: Tools and resources for the comparison of sequences including sequence similarity searching, alignment tools, and general comparative genomics resources. * DNA: This category contains links to useful resources for DNA sequence analyses such as tools for comparative sequence analysis and sequence assembly. Links to programs for sequence manipulation, primer design, and sequence retrieval and submission are also listed here. * Education: Links to information about the techniques, materials, people, places, and events of the greater bioinformatics community. Included are current news headlines, literature sources, educational material and links to bioinformatics courses and workshops. * Expression: Links to tools for predicting the expression, alternative splicing, and regulation of a gene sequence are found here. This section also contains links to databases, methods, and analysis tools for protein expression, SAGE, EST, and microarray data. * Human Genome: This section contains links to draft annotations of the human genome in addition to resources for sequence polymorphisms and genomics. Also included are links related to ethical discussions surrounding the study of the human genome. * Literature: Links to resources related to published literature, including tools to search for articles and through literature abstracts. Additional text mining resources, open access resources, and literature goldmines are also listed. * Model Organisms: Included in this category are links to resources for various model organisms ranging from mammals to microbes. These include databases and tools for genome scale analyses. * Other Molecules: Bioinformatics tools related to molecules other than DNA, RNA, and protein. This category will include resources for the bioinformatics of small molecules as well as for other biopolymers including carbohydrates and metabolites. * Protein: This category contains links to useful resources for protein sequence and structure analyses. Resources for phylogenetic analyses, prediction of protein features, and analyses of interactions are also found here. * RNA: Resources include links to sequence retrieval programs, structure prediction and visualization tools, motif search programs, and information on various functional RNAs.
List of bioinformatics tools and databases students used.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Carlos Sousa; Manuel João Costa; Joana Almeida Palha (2023). List of bioinformatics tools and databases students used. [Dataset]. http://doi.org/10.1371/journal.pone.0000481.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0000481.t002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
João Carlos Sousa; Manuel João Costa; Joana Almeida Palha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
List of bioinformatics tools and databases students used.
Bioinformatics Protein Dataset - Simulated
kaggle.com
zip
Updated Dec 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Gallo (2024). Bioinformatics Protein Dataset - Simulated [Dataset]. https://www.kaggle.com/datasets/gallo33henrique/bioinformatics-protein-dataset-simulated
Explore at:
zip(12928905 bytes)Available download formats
Dataset updated
Dec 27, 2024
Authors
Rafael Gallo
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Subtitle

"Synthetic protein dataset with sequences, physical properties, and functional classification for machine learning tasks."

Description

Introduction

This synthetic dataset was created to explore and develop machine learning models in bioinformatics. It contains 20,000 synthetic proteins, each with an amino acid sequence, calculated physicochemical properties, and a functional classification.

Columns Included

ID_Protein: Unique identifier for each protein.

Sequence: String of amino acids.

Molecular_Weight: Molecular weight calculated from the sequence.

Isoelectric_Point: Estimated isoelectric point based on the sequence composition.

Hydrophobicity: Average hydrophobicity calculated from the sequence.

Total_Charge: Sum of the charges of the amino acids in the sequence.

Polar_Proportion: Percentage of polar amino acids in the sequence.

Nonpolar_Proportion: Percentage of nonpolar amino acids in the sequence.

Sequence_Length: Total number of amino acids in the sequence.

Class: The functional class of the protein, one of five categories: Enzyme, Transport, Structural, Receptor, Other.

Inspiration and Sources

While this is a simulated dataset, it was inspired by patterns observed in real protein datasets, such as: - UniProt: A comprehensive database of protein sequences and annotations. - Kyte-Doolittle Scale: Calculations of hydrophobicity. - Biopython: A tool for analyzing biological sequences.

Proposed Uses

This dataset is ideal for: - Training classification models for proteins. - Exploratory analysis of physicochemical properties of proteins. - Building machine learning pipelines in bioinformatics.

How This Dataset Was Created

Sequence Generation: Amino acid chains were randomly generated with lengths between 50 and 300 residues.

Property Calculation: Physicochemical properties were calculated using the Biopython library.

Class Assignment: Classes were randomly assigned for classification purposes.

Limitations

The sequences and properties do not represent real proteins but follow patterns observed in natural proteins.

The functional classes are simulated and do not correspond to actual biological characteristics.

Data Split

The dataset is divided into two subsets: - Training: 16,000 samples (proteinas_train.csv). - Testing: 4,000 samples (proteinas_test.csv).

Acknowledgment

This dataset was inspired by real bioinformatics challenges and designed to help researchers and developers explore machine learning applications in protein analysis.
e
NCBIFAM
ebi.ac.uk
Updated Aug 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). NCBIFAM [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Aug 6, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
NCBIfam is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology. NCBIfam is maintained at the National Center for Biotechnology Information (Bethesda, MD). NCBIfam includes models from TIGRFAMs, another database of protein families developed at The Institute for Genomic Research, then at the J. Craig Venter Institute (Rockville, MD, US).
u
Indexed NCBI nt database - original
figshare.unimelb.edu.au
bin
Updated Feb 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VANESSA ROSSETTO MARCELINO (2024). Indexed NCBI nt database - original [Dataset]. http://doi.org/10.26188/25222610.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.26188/25222610.v1
Dataset updated
Feb 28, 2024
Dataset provided by
The University of Melbourne
Authors
VANESSA ROSSETTO MARCELINO
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Indexed NCBI nucleotide database, used to benchmark CCMetagen in its original publication.To download from the command line, use:curl "https://mediaflux.researchsoftware.unimelb.edu.au:443/mflux/share.mfjp?_token=i8yedNiYfdjrBfGJ8Y5z1128247857&browser=true&filename=ncbi_nt_kma.zip" -d browser=false -o ncbi_nt_kma.zip
m
Data from: PeTMbase: A database of plant endogenous target mimics (eTMs)
data.mendeley.com
Updated Nov 23, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gökhan Karakülah (2016). PeTMbase: A database of plant endogenous target mimics (eTMs) [Dataset]. http://doi.org/10.17632/htgxryrcv2.1
Explore at:
Unique identifier
https://doi.org/10.17632/htgxryrcv2.1
Dataset updated
Nov 23, 2016
Authors
Gökhan Karakülah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MicroRNAs (miRNA) are small endogenous RNA molecules, which regulate target gene expression at post-transcriptional level. Besides, miRNA activity can be controlled by a newly discovered regulatory mechanism called endogenous target mimicry (eTM). In target mimicry, eTMs bind to the corresponding miRNAs to block the binding of specific transcript leading to increase mRNA expression. Thus, miRNA-eTM-target-mRNA regulation modules involving a wide range of biological processes; an increasing need for a comprehensive eTM database arose. Except miRSponge with limited number of Arabidopsis eTM data no available database and/or repository was developed and released for plant eTMs yet. Here, we present an online plant eTM database, called PeTMbase (http://petmbase.org), with a highly efficient search tool. To establish the repository a number of identified eTMs was obtained utilizing from high-throughput RNA-sequencing data of 11 plant species. Each transcriptome libraries is first mapped to corresponding plant genome, then long non-coding RNA (lncRNA) transcripts are characterized. Furthermore, additional lncRNAs retrieved from GREENC and PNRD were incorporated into the lncRNA catalog. Then, utilizing the lncRNA and miRNA sources a total of 2,728 eTMs were successfully predicted. Our regularly updated database, PeTMbase, provides high quality information regarding miRNA:eTM modules and will aid functional genomics studies particularly, on miRNA regulatory networks.
e
HAMAP
ebi.ac.uk
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). HAMAP [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Feb 5, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
HAMAP stands for High-quality Automated and Manual Annotation of Proteins. HAMAP profiles are manually created by expert curators. They identify proteins that are part of well-conserved protein families or subfamilies. HAMAP is based at the SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.
q
CourseSource Bioinformatics Teaching Materials
qubeshub.org
Updated Aug 3, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hayley Orndorf (2016). CourseSource Bioinformatics Teaching Materials [Dataset]. https://qubeshub.org/publications/20
Explore at:
Dataset updated
Aug 3, 2016
Dataset provided by
QUBES
Authors
Hayley Orndorf
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
CourseSource Teaching Materials tagged "Bioinformatics"
zol: prepTG Databases for ESKAPE Pathogens
zenodo.org
nde-dev.biothings.io
+1more
application/gzip
Updated Oct 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rauf Salamzade; Rauf Salamzade; Lindsay Kalan; Lindsay Kalan (2023). zol: prepTG Databases for ESKAPE Pathogens [Dataset]. http://doi.org/10.5281/zenodo.10042148
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10042148
Dataset updated
Oct 26, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rauf Salamzade; Rauf Salamzade; Lindsay Kalan; Lindsay Kalan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Each of the tar.gz compressed directories corresponds to prepTG databases (for the zol suite) featuring distinct, representative genomes for one of the six genera containing ESKAPE pathogens. Representative genomes for each genus/taxon were selected using skDER v1.0.7 in greedy mode with 99% ANI and 90% AF cutoffs.
The compressed folders also contain an extra file, corresponding to a species tree of the representative genomes constructed using GToTree with Universal markers (ribosomal proteins) from Hug et al. 2016 and in best-hits mode. Note, GToTree was modified to always use -super5 mode for SCG alignments for computational efficiency. Also, note, because genomes can be dropped by GToTree prior to phylogeny inference (e.g. if they lack enough SCGs), not all genomes in the database might be represented in the phylogenies.
d
3D-Genomics Database
dknet.org
scicrunch.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). 3D-Genomics Database [Dataset]. http://identifiers.org/RRID:SCR_007430
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007430
Dataset updated
Jan 29, 2022
Description
THIS RESOURCE IS NO LONGER IN SERVICE, documented August 29, 2016. Database containing structural annotations for the proteomes of just under 100 organisms. Using data derived from public databases of translated genomic sequences, representatives from the major branches of Life are included: Prokaryota, Eukaryota and Archaea. The annotations stored in the database may be accessed in a number of ways. The help page provides information on how to access the database. 3D-GENOMICS is now part of a larger project, called e-Protein. The project brings together similar databases at three sites: Imperial College London , University College London and the European Bioinformatics Institute . e-Protein''s mission statement is To provide a fully automated distributed pipeline for large-scale structural and functional annotation of all major proteomes via the use of cutting-edge computer GRID technologies. The following databases are incorporated: NRprot, SCOP, ASTRAL, PFAM, Prosite, taxonomy, COG The following eukaryotic genomes are incorporated: Anopheles gambiae, protein sequences from the mosquito genome; Arabidopsis thaliana, protein sequences from the Arabidopsis genome; Caenorhabditis briggsae, protein sequences from the C.briggsae genome; Caenorhabditis elegans protein sequences from the worm genome; Ciona intestinalis protein sequences from the sea squirt genome; Danio rerio protein sequences from the zebrafish genome; Drosophila melanogaster protein sequences from the fruitfly genome; Encephalitozoon cuniculi protein sequences from the E.cuniculi genome; Fugu rubripes protein sequences from the pufferfish genome; Guillardia theta protein sequences from the G.theta genome; Homo sapiens protein sequences from the human genome; Mus musculus protein sequences from the mouse genome; Neurospora crassa protein sequences from the N.crassa genome; Oryza sativa protein sequences from the rice genome; Plasmodium falciparum protein sequences from the P.falciparum genome; Rattus norvegicus protein sequences from the rat genome; Saccharomyces cerevisiae protein sequences from the yeast genome; Schizosaccharomyces pombe protein sequences from the yeast genome
e
SFLD
ebi.ac.uk
Updated Sep 7, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). SFLD [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Sep 7, 2018
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SFLD (Structure-Function Linkage Database) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities.
m
Pneumonia Drug Exp Data
data.mendeley.com
Updated Sep 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OCHIN SHARMA (2023). Pneumonia Drug Exp Data [Dataset]. http://doi.org/10.17632/8bmpx4zvs8.1
Explore at:
Unique identifier
https://doi.org/10.17632/8bmpx4zvs8.1
Dataset updated
Sep 29, 2023
Authors
OCHIN SHARMA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is the result of experiments conducted using Python and rdkit library.
n
DRCAT Resource Catalogue
neuinfo.org
rrid.site
+2more
Updated Jul 29, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2011). DRCAT Resource Catalogue [Dataset]. http://identifiers.org/RRID:SCR_005931
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_005931
Dataset updated
Jul 29, 2011
Description
Data resource catalog that collates metadata on bioinformatics Web-based data resources including databases, ontologies, taxonomies and catalogues. An entry includes information such as resource identifier(s), name, description and URL. ''''Query'''' lines are defined for each resource that describe what type(s) of data are available, in what format, how (by what identifier) the data can be retrieved and from where (URL). DRCAT was developed to provide more extensive data integration for EMBOSS, but it has many applications beyond EMBOSS. DRCAT entries (including ''''Query'''' lines) are annotated with terms from the EDAM ontology of common bioinformatics concepts.
r
expam RefSeq Database
researchdata.edu.au
bridges.monash.edu
Updated Jun 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sean Solari; Remy Young; Vanessa Marcelino; Sam Forster (2022). expam RefSeq Database [Dataset]. http://doi.org/10.26180/19653840.v2
Explore at:
Unique identifier
https://doi.org/10.26180/19653840.v2
Dataset updated
Jun 28, 2022
Dataset provided by
Monash University
Authors
Sean Solari; Remy Young; Vanessa Marcelino; Sam Forster
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
expam reference database used for benchmarking and comparison against metagenome profilers.
m
Database of Peptides with Potential for Pharmacological Intervention in...
data.mendeley.com
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Micael da Silva Pirazoli Gonzalez (2023). Database of Peptides with Potential for Pharmacological Intervention in Human Pathogen Molecular Targets [Dataset]. http://doi.org/10.17632/2zhgy9ggdv.1
Explore at:
Unique identifier
https://doi.org/10.17632/2zhgy9ggdv.1
Dataset updated
Jun 6, 2023
Authors
Micael da Silva Pirazoli Gonzalez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Peptides are polymeric chains used as research objects in the search for new drugs with greater efficacy and fewer side effects. Therefore, we created three databases of antimicrobial peptides using PubChem and ChEMBL. First we acquired the Simplified Molecular-Input Line-Entry System (SMILES) of several peptides belonging to different types of pathogens, namely bacteria, viruses, parasites, and fungi. Using the OpenBabel software, these SMILES had their file formats and structures converted to create: one database in one dimension SMI format, and two with three-dimensional MOL2 and PDB file formats. In total the three databases consists of 718 peptides that have been shown to possess inhibitory activity on molecular targets of clinically important pathogens.

Facebook

Twitter

Click to copy link

Link copied

Cite

Alise Ponsero; Alise Ponsero; Bonnie Hurwitz; Bonnie Hurwitz; Kiran Smelser; Kiran Smelser; Karen Valencia; Lucas Jimenez Miranda; Lucas Jimenez Miranda; Abby McDermott; Karen Valencia; Abby McDermott (2024). Bioinformatic databases survey [Dataset]. http://doi.org/10.5281/zenodo.12790448

Bioinformatic databases survey

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.12790448

Dataset updated

Aug 17, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Bioinformatic databases survey

The dataset surveys bioinformatic databases published in the NAR database issue from 1995 to 2022. It evaluates the current number of citations and availability of each ressources.

Data content

The dataset is composed of two tables :

A. Databases table : Contains the information of each database published in the NAR database issue.

db_id : Database ID in the dataset
resource_name : Name(s) of the database
current_access : Latest known web address of the database
is_a_pun : The database name is a play on word
available_2022 : The database was accessible online during the 2022 survey
last_accessible_year : If not accessible, latest point in time where the database was found online (using the Internet web archive snapshots)
unavailable_message : If not accessible, the message/error when trying to access the ressource
year_first_publication : Year of first publication of the database
year_last_publication : Year of latest publication of the database (including database update publications)
total_citations_2022 : Cumulative number of citation for all articles of the database
nb_authors_max : Maximum number of authors associated to any articles published for that database
nb_articles_2022 : Number of articles published for that database in 2022

B. Articles table : Contains the information collected for the NAR articles

collector : Person who contributed to add this database in the dataset
article_global_id : DOI of the article surveyed
db_id : Database ID of the ressource described in the article
article_id : Article unique ID
article_year : Article publication year
Authors : list of authors of the article. Separated by ";"
Author.ID : list of ORCID of the authors of the article. Separated by ";"
Title : Title of the atricle
Source.title : Journal name
Volume : Volume number
Issue : Issue number
Funding.Details : Funding information of the article
Funding.Text : Funding text provided by the authors
PubMed.ID : Pubmed ID of the article
citations_2016 : Number of citations of the article in 2016 (if published)
citations_2022 : Number of citations of the article in 2022
nb_authors : Number of authors in the article
Index.Keywords : Keywords associated to the publication

Data sources

Note that the presented dataset leverage and expand on the dataset gathered and published in Imker, H.J., 2020. Who Bears the Burden of Long-Lived Molecular Biology Databases?. Data Science Journal, 19(1), p.8. The original dataset collected by Dr. Imker is available at : https://doi.org/10.13012/B2IDB-4311325_V1

The dataset was collected and is maintained by undergraduate students of a CURE class (Course-based Undergraduate Research Experience) held at the University of Arizona. All students of the class have participated to the collection, update and curation the dataset that is available as a database and a web-portal at https://hurwitzlab.shinyapps.io/DS_Heroes/. Students could elect to be added or not as author to this Zenodo repository.

The CURE class BAT102 "Data Science Heroes: An undergraduate research experience in Open Data Science Practices" gives the students an opportunity to learn about open science and investigate open data practices in bioinformatics through a survey of the databases published in the NAR database issue.

Clear search

Close search

Google apps

Main menu

Bioinformatic databases survey

Bioinformatic databases survey

Data content

Data sources

Molecular Biology Databases Published in Nucleic Acids Research between...

ASURAT knowledge-based databases

Funding and Operating Organizations for Long-Lived Molecular Biology...

PROSITE profiles

Bioinformatics Links Directory

List of bioinformatics tools and databases students used.

Bioinformatics Protein Dataset - Simulated

Subtitle

Description

Introduction

Columns Included

Inspiration and Sources

Proposed Uses

How This Dataset Was Created

Limitations

Data Split

Acknowledgment

NCBIFAM

Indexed NCBI nt database - original

Data from: PeTMbase: A database of plant endogenous target mimics (eTMs)

HAMAP

CourseSource Bioinformatics Teaching Materials

zol: prepTG Databases for ESKAPE Pathogens

3D-Genomics Database

SFLD

Pneumonia Drug Exp Data

DRCAT Resource Catalogue

expam RefSeq Database

Database of Peptides with Potential for Pharmacological Intervention in...

Bioinformatic databases survey

Bioinformatic databases survey

Data content

Data sources