100+ datasets found

I
Molecular Biology Databases Published in Nucleic Acids Research between...
databank.illinois.edu
Updated Feb 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heidi Imker (2024). Molecular Biology Databases Published in Nucleic Acids Research between 1991-2016 [Dataset]. http://doi.org/10.13012/B2IDB-4311325_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-4311325_V1
Dataset updated
Feb 1, 2024
Authors
Heidi Imker
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset was developed to create a census of sufficiently documented molecular biology databases to answer several preliminary research questions. Articles published in the annual Nucleic Acids Research (NAR) “Database Issues” were used to identify a population of databases for study. Namely, the questions addressed herein include: 1) what is the historical rate of database proliferation versus rate of database attrition?, 2) to what extent do citations indicate persistence?, and 3) are databases under active maintenance and does evidence of maintenance likewise correlate to citation? An overarching goal of this study is to provide the ability to identify subsets of databases for further analysis, both as presented within this study and through subsequent use of this openly released dataset.
Fantastic databases and where to find them: Web applications for researchers...
scielo.figshare.com
jpeg
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gerda Cristal Villalba; Ursula Matte (2023). Fantastic databases and where to find them: Web applications for researchers in a rush [Dataset]. http://doi.org/10.6084/m9.figshare.20018091.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20018091.v1
Dataset updated
Jun 3, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Gerda Cristal Villalba; Ursula Matte
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract Public databases are essential to the development of multi-omics resources. The amount of data created by biological technologies needs a systematic and organized form of storage, that can quickly be accessed, and managed. This is the objective of a biological database. Here, we present an overview of human databases with web applications. The databases and tools allow the search of biological sequences, genes and genomes, gene expression patterns, epigenetic variation, protein-protein interactions, variant frequency, regulatory elements, and comparative analysis between human and model organisms. Our goal is to provide an opportunity for exploring large datasets and analyzing the data for users with little or no programming skills. Public user-friendly web-based databases facilitate data mining and the search for information applicable to healthcare professionals. Besides, biological databases are essential to improve biomedical search sensitivity and efficiency and merge multiple datasets needed to share data and build global initiatives for the diagnosis, prognosis, and discovery of new treatments for genetic diseases. To show the databases at work, we present a a case study using ACE2 as example of a gene to be investigated. The analysis and the complete list of databases is available in the following website .
n
Bioinformatics Links Directory
neuinfo.org
scicrunch.org
+3more
Updated Jan 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Bioinformatics Links Directory [Dataset]. http://identifiers.org/RRID:SCR_008018
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008018
Dataset updated
Jan 29, 2022
Description
Database of curated links to molecular resources, tools and databases selected on the basis of recommendations from bioinformatics experts in the field. This resource relies on input from its community of bioinformatics users for suggestions. Starting in 2003, it has also started listing all links contained in the NAR Webserver issue. The different types of information available in this portal: * Computer Related: This category contains links to resources relating to programming languages often used in bioinformatics. Other tools of the trade, such as web development and database resources, are also included here. * Sequence Comparison: Tools and resources for the comparison of sequences including sequence similarity searching, alignment tools, and general comparative genomics resources. * DNA: This category contains links to useful resources for DNA sequence analyses such as tools for comparative sequence analysis and sequence assembly. Links to programs for sequence manipulation, primer design, and sequence retrieval and submission are also listed here. * Education: Links to information about the techniques, materials, people, places, and events of the greater bioinformatics community. Included are current news headlines, literature sources, educational material and links to bioinformatics courses and workshops. * Expression: Links to tools for predicting the expression, alternative splicing, and regulation of a gene sequence are found here. This section also contains links to databases, methods, and analysis tools for protein expression, SAGE, EST, and microarray data. * Human Genome: This section contains links to draft annotations of the human genome in addition to resources for sequence polymorphisms and genomics. Also included are links related to ethical discussions surrounding the study of the human genome. * Literature: Links to resources related to published literature, including tools to search for articles and through literature abstracts. Additional text mining resources, open access resources, and literature goldmines are also listed. * Model Organisms: Included in this category are links to resources for various model organisms ranging from mammals to microbes. These include databases and tools for genome scale analyses. * Other Molecules: Bioinformatics tools related to molecules other than DNA, RNA, and protein. This category will include resources for the bioinformatics of small molecules as well as for other biopolymers including carbohydrates and metabolites. * Protein: This category contains links to useful resources for protein sequence and structure analyses. Resources for phylogenetic analyses, prediction of protein features, and analyses of interactions are also found here. * RNA: Resources include links to sequence retrieval programs, structure prediction and visualization tools, motif search programs, and information on various functional RNAs.
d
Alternative Splicing Annotation Project II Database
dknet.org
scicrunch.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Alternative Splicing Annotation Project II Database [Dataset]. http://identifiers.org/RRID:SCR_000322
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_000322
Dataset updated
Jan 29, 2022
Description
THIS RESOURCE IS NO LONGER IN SERVICE, documented on 8/12/13. An expanded version of the Alternative Splicing Annotation Project (ASAP) database with a new interface and integration of comparative features using UCSC BLASTZ multiple alignments. It supports 9 vertebrate species, 4 insects, and nematodes, and provides with extensive alternative splicing analysis and their splicing variants. As for human alternative splicing data, newly added EST libraries were classified and included into previous tissue and cancer classification, and lists of tissue and cancer (normal) specific alternatively spliced genes are re-calculated and updated. They have created a novel orthologous exon and intron databases and their splice variants based on multiple alignment among several species. These orthologous exon and intron database can give more comprehensive homologous gene information than protein similarity based method. Furthermore, splice junction and exon identity among species can be valuable resources to elucidate species-specific genes. ASAP II database can be easily integrated with pygr (unpublished, the Python Graph Database Framework for Bioinformatics) and its powerful features such as graph query, multi-genome alignment query and etc. ASAP II can be searched by several different criteria such as gene symbol, gene name and ID (UniGene, GenBank etc.). The web interface provides 7 different kinds of views: (I) user query, UniGene annotation, orthologous genes and genome browsers; (II) genome alignment; (III) exons and orthologous exons; (IV) introns and orthologous introns; (V) alternative splicing; (IV) isoform and protein sequences; (VII) tissue and cancer vs. normal specificity. ASAP II shows genome alignments of isoforms, exons, and introns in UCSC-like genome browser. All alternative splicing relationships with supporting evidence information, types of alternative splicing patterns, and inclusion rate for skipped exons are listed in separate tables. Users can also search human data for tissue- and cancer-specific splice forms at the bottom of the gene summary page. The p-values for tissue-specificity as log-odds (LOD) scores, and highlight the results for LOD >= 3 and at least 3 EST sequences are all also reported.
Data_Sheet_1_riceExplorer: Uncovering the Hidden Potential of a National...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
zip
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Clive T. Darwell; Samart Wanchana; Vinitchan Ruanjaichon; Meechai Siangliw; Burin Thunnom; Wanchana Aesomnuk; Theerayut Toojinda (2023). Data_Sheet_1_riceExplorer: Uncovering the Hidden Potential of a National Genomic Resource Against a Global Database.zip [Dataset]. http://doi.org/10.3389/fpls.2022.781153.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fpls.2022.781153.s001
Dataset updated
Jun 6, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Clive T. Darwell; Samart Wanchana; Vinitchan Ruanjaichon; Meechai Siangliw; Burin Thunnom; Wanchana Aesomnuk; Theerayut Toojinda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Agricultural crop breeding programs, particularly at the national level, typically consist of a core panel of elite breeding cultivars alongside a number of local landrace varieties (or other endemic cultivars) that provide additional sources of phenotypic and genomic variation or contribute as experimental materials (e.g., in GWAS studies). Three issues commonly arise. First, focusing primarily on core development accessions may mean that the potential contributions of landraces or other secondary accessions may be overlooked. Second, elite cultivars may accumulate deleterious alleles away from nontarget loci due to the strong effects of artificial selection. Finally, a tendency to focus solely on SNP-based methods may cause incomplete or erroneous identification of functional variants. In practice, integration of local breeding programs with findings from global database projects may be challenging. First, local GWAS experiments may only indicate useful functional variants according to the diversity of the experimental panel, while other potentially useful loci—identifiable at a global level—may remain undiscovered. Second, large-scale experiments such as GWAS may prove prohibitively costly or logistically challenging for some agencies. Here, we present a fully automated bioinformatics pipeline (riceExplorer) that can easily integrate local breeding program sequence data with international database resources, without relying on any phenotypic experimental procedure. It identifies associated functional haplotypes that may prove more robust in determining the genotypic determinants of desirable crop phenotypes. In brief, riceExplorer evaluates a global crop database (IRRI 3000 Rice Genomes) to identify haplotypes that are associated with extreme phenotypic variation at the global level and recorded in the database. It then examines which potentially useful variants are present in the local crop panel, before distinguishing between those that are already incorporated into the elite breeding accessions and those only found among secondary varieties (e.g., landraces). Results highlight the effectiveness of our pipeline, identifying potentially useful functional haplotypes across the genome that are absent from elite cultivars and found among landraces and other secondary varieties in our breeding program. riceExplorer can automatically conduct a full genome analysis and produces annotated graphical output of chromosomal maps, potential global diversity sources, and summary tables.
I
Funding and Operating Organizations for Long-Lived Molecular Biology...
databank.illinois.edu
aws-databank-alb.library.illinois.edu
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heidi Imker, Funding and Operating Organizations for Long-Lived Molecular Biology Databases [Dataset]. http://doi.org/10.13012/B2IDB-3993338_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-3993338_V1
Authors
Heidi Imker
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The organizations that contribute to the longevity of 67 long-lived molecular biology databases published in Nucleic Acids Research (NAR) between 1991-2016 were identified to address two research questions 1) which organizations fund these databases? and 2) which organizations maintain these databases? Funders were determined by examining funding acknowledgements in each database's most recent NAR Database Issue update article published (prior to 2017) and organizations operating the databases were determine through review of database websites.
NCBI Nt (Nucleotide) database FASTA file from 2017-10-26
zenodo.org
application/gzip
Updated Dec 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Fellows Yates; James Fellows Yates (2020). NCBI Nt (Nucleotide) database FASTA file from 2017-10-26 [Dataset]. http://doi.org/10.5281/zenodo.4382154
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4382154
Dataset updated
Dec 23, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
James Fellows Yates; James Fellows Yates
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
This FASTA file is the NCBI Nt (Nucleotide) database (public domain) used for holistic metagenomic screening of ancient DNA data at the Department of Archaeogenetics at the Max Planck Institute for the Science of Human History. We offer here the FASTA file used to construct MALT databases (https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/algorithms-in-bioinformatics/software/malt/), which are generally too large for uploading. Please see each relevent publications that use the database for MALT database construction commands.

NCBI does not retain older versions of this database which is why this has been uploaded here. It was downloaded on 2017-10-26 12:39 from: ftp://ftp-trace.ncbi.nih.gov/blast/db/FASTA/nt.gz. The NCBI Nt database is released into the public domain as per https://www.ncbi.nlm.nih.gov/home/about/policies/.
n
DAVID
neuinfo.org
dknet.org
+1more
Updated Aug 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). DAVID [Dataset]. http://identifiers.org/RRID:SCR_001881
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_001881
Dataset updated
Aug 17, 2024
Description
Bioinformatics resource system including web server and web service for functional annotation and enrichment analyses of gene lists. Consists of comprehensive knowledgebase and set of functional analysis tools. Includes gene centered database integrating heterogeneous gene annotation resources to facilitate high throughput gene functional analysis., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.
n
Bioinformatic Harvester IV (beta) at Karlsruhe Institute of Technology
neuinfo.org
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Bioinformatic Harvester IV (beta) at Karlsruhe Institute of Technology [Dataset]. http://identifiers.org/RRID:SCR_008017
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008017
Dataset updated
Jan 29, 2022
Description
Harvester is a Web-based tool that bulk-collects bioinformatic data on human proteins from various databases and prediction servers. It is a meta search engine for gene and protein information. It searches 16 major databases and prediction servers and combines the results on pregenerated HTML pages. In this way Harvester can provide comprehensive gene-protein information from different servers in a convenient and fast manner. As full text meta search engine, similar to Google trade mark, Harvester allows screening of the whole genome proteome for current protein functions and predictions in a few seconds. With Harvester it is now possible to compare and check the quality of different database entries and prediction algorithms on a single page. Sponsors: This work has been supported by the BMBF with grants 01GR0101 and 01KW0013.
d
3D-Genomics Database
dknet.org
scicrunch.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). 3D-Genomics Database [Dataset]. http://identifiers.org/RRID:SCR_007430
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007430
Dataset updated
Jan 29, 2022
Description
THIS RESOURCE IS NO LONGER IN SERVICE, documented August 29, 2016. Database containing structural annotations for the proteomes of just under 100 organisms. Using data derived from public databases of translated genomic sequences, representatives from the major branches of Life are included: Prokaryota, Eukaryota and Archaea. The annotations stored in the database may be accessed in a number of ways. The help page provides information on how to access the database. 3D-GENOMICS is now part of a larger project, called e-Protein. The project brings together similar databases at three sites: Imperial College London , University College London and the European Bioinformatics Institute . e-Protein''s mission statement is To provide a fully automated distributed pipeline for large-scale structural and functional annotation of all major proteomes via the use of cutting-edge computer GRID technologies. The following databases are incorporated: NRprot, SCOP, ASTRAL, PFAM, Prosite, taxonomy, COG The following eukaryotic genomes are incorporated: Anopheles gambiae, protein sequences from the mosquito genome; Arabidopsis thaliana, protein sequences from the Arabidopsis genome; Caenorhabditis briggsae, protein sequences from the C.briggsae genome; Caenorhabditis elegans protein sequences from the worm genome; Ciona intestinalis protein sequences from the sea squirt genome; Danio rerio protein sequences from the zebrafish genome; Drosophila melanogaster protein sequences from the fruitfly genome; Encephalitozoon cuniculi protein sequences from the E.cuniculi genome; Fugu rubripes protein sequences from the pufferfish genome; Guillardia theta protein sequences from the G.theta genome; Homo sapiens protein sequences from the human genome; Mus musculus protein sequences from the mouse genome; Neurospora crassa protein sequences from the N.crassa genome; Oryza sativa protein sequences from the rice genome; Plasmodium falciparum protein sequences from the P.falciparum genome; Rattus norvegicus protein sequences from the rat genome; Saccharomyces cerevisiae protein sequences from the yeast genome; Schizosaccharomyces pombe protein sequences from the yeast genome
List of bioinformatics tools and databases students used.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Carlos Sousa; Manuel João Costa; Joana Almeida Palha (2023). List of bioinformatics tools and databases students used. [Dataset]. http://doi.org/10.1371/journal.pone.0000481.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0000481.t002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
João Carlos Sousa; Manuel João Costa; Joana Almeida Palha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
List of bioinformatics tools and databases students used.
Bioinformatics Protein Dataset - Simulated
kaggle.com
zip
Updated Dec 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Gallo (2024). Bioinformatics Protein Dataset - Simulated [Dataset]. https://www.kaggle.com/datasets/gallo33henrique/bioinformatics-protein-dataset-simulated
Explore at:
zip(12928905 bytes)Available download formats
Dataset updated
Dec 27, 2024
Authors
Rafael Gallo
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Subtitle

"Synthetic protein dataset with sequences, physical properties, and functional classification for machine learning tasks."

Description

Introduction

This synthetic dataset was created to explore and develop machine learning models in bioinformatics. It contains 20,000 synthetic proteins, each with an amino acid sequence, calculated physicochemical properties, and a functional classification.

Columns Included

ID_Protein: Unique identifier for each protein.

Sequence: String of amino acids.

Molecular_Weight: Molecular weight calculated from the sequence.

Isoelectric_Point: Estimated isoelectric point based on the sequence composition.

Hydrophobicity: Average hydrophobicity calculated from the sequence.

Total_Charge: Sum of the charges of the amino acids in the sequence.

Polar_Proportion: Percentage of polar amino acids in the sequence.

Nonpolar_Proportion: Percentage of nonpolar amino acids in the sequence.

Sequence_Length: Total number of amino acids in the sequence.

Class: The functional class of the protein, one of five categories: Enzyme, Transport, Structural, Receptor, Other.

Inspiration and Sources

While this is a simulated dataset, it was inspired by patterns observed in real protein datasets, such as: - UniProt: A comprehensive database of protein sequences and annotations. - Kyte-Doolittle Scale: Calculations of hydrophobicity. - Biopython: A tool for analyzing biological sequences.

Proposed Uses

This dataset is ideal for: - Training classification models for proteins. - Exploratory analysis of physicochemical properties of proteins. - Building machine learning pipelines in bioinformatics.

How This Dataset Was Created

Sequence Generation: Amino acid chains were randomly generated with lengths between 50 and 300 residues.

Property Calculation: Physicochemical properties were calculated using the Biopython library.

Class Assignment: Classes were randomly assigned for classification purposes.

Limitations

The sequences and properties do not represent real proteins but follow patterns observed in natural proteins.

The functional classes are simulated and do not correspond to actual biological characteristics.

Data Split

The dataset is divided into two subsets: - Training: 16,000 samples (proteinas_train.csv). - Testing: 4,000 samples (proteinas_test.csv).

Acknowledgment

This dataset was inspired by real bioinformatics challenges and designed to help researchers and developers explore machine learning applications in protein analysis.
ASURAT knowledge-based databases
figshare.com
application/gzip
Updated May 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Keita Iida (2022). ASURAT knowledge-based databases [Dataset]. http://doi.org/10.6084/m9.figshare.19102598.v5
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19102598.v5
Dataset updated
May 9, 2022
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Keita Iida
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Knowledge-based databases and the codes for collecting these databases are stored.
m
Data from: PseudoResistance DB: A new Database of antibiotics related to...
data.mendeley.com
Updated Nov 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caio Cheohen (2024). PseudoResistance DB: A new Database of antibiotics related to Pseudomonas aeruginosa antibiotic resistance [Dataset]. http://doi.org/10.17632/bxdn3p33z2.1
Explore at:
Unique identifier
https://doi.org/10.17632/bxdn3p33z2.1
Dataset updated
Nov 8, 2024
Authors
Caio Cheohen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This research addresses the pressing issue of antibiotic resistance, a global health challenge that undermines the efficacy of treatments against infectious diseases. Focusing on Pseudomonas aeruginosa—a Gram-negative bacterium known for causing opportunistic infections—this study emphasizes its prioritization by the World Health Organization (WHO) as a critical-level pathogen requiring new therapeutic approaches.

To identify antibiotics associated with P. aeruginosa, the study employed text mining techniques on the Scielo database. The resulting dataset comprises 98 antibiotics, each documented with detailed textual information and referencing data. Additionally, the dataset includes structural files of the antibiotics in several formats suitable for computational modeling and simulations. These formats encompass Protein Data Bank, Partial Charge & Atom Type (PDBQT), Simplified Molecular Input Line Entry System (SMI), IUPAC International Chemical Identifier (INCHI), Molecular Design Limited Molfile (MOL2), Structure-Data File (SDF), Chemical Markup Language (CML), Cartesian Coordinates File (XYZ), Scalable Vector Graphics (SVG), Molecular File (MOL) and Protein Data Bank (PDB) files, with molecular models generated via OpenBabel to facilitate advanced studies in drug development and resistance mechanisms.

Databases for MyCodentifier: A tool for routine identification of...

zenodo.org
data.niaid.nih.gov

application/gzip

Updated Dec 9, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Jodie A. Schildkraut; Jodie A. Schildkraut; Jordy P.M. Coolen; Jordy P.M. Coolen; Heleen Severin; Ellen Koenraad; Nicole Aalders; Willem J.G. Melchers; Wouter Hoefsloot; Wouter Hoefsloot; Heiman F.L. Wertheim; Heiman F.L. Wertheim; Jakko van Ingen; Jakko van Ingen; Heleen Severin; Ellen Koenraad; Nicole Aalders; Willem J.G. Melchers (2022). Databases for MyCodentifier: A tool for routine identification of nontuberculous mycobacteria using MGIT enriched shotgun metagenomics. [Dataset]. http://doi.org/10.5281/zenodo.7396289

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7396289

Dataset updated

Dec 9, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Databases used for MyCodentifier a Nextflow pipeline to identify Mycobacterium tuberculosis complex (MTBC) and Nontuberculous mycobacteria (NTM) species from Next-generation sequencing (NGS) data.

Short description:
The pipeline is constructed using nextflow as workflow manager running in a docker container. It is able to identify species of MTBC/NTM from positive Mycobacterial Growth Indicator Tube (MGIT) cultures. To do so it uses an hsp65 database for fast identification coupled with a Metagenomic method using centrifuge to identify on genome level. For TB it also is able to identify subspecies. Results are presented in automated pdf and html reports.

**Databases**
Name	Short Description
20220726_ref.tar.gz	7 major mycobacterial genomes as centrifuge classification database, used for reference-based mapping and genotype resistance prediction
20220726_wgs_centrifuge_db_Radboudumc_MB.tar.gz	centrifuge classification database using Tortoli et al 2017 Mycobacterium strains + additional strains
genomes.tar.gz	7 major mycobacterial genomes, annotation and Genbank files. Files are paired with 20220726_ref.tar.gz
snpEff.tar.gz	7 major mycobacterial genomes annotation models for snpEff.
Tortoli_etal_hsp65.tar.gz	KMA database of hsp65 gene extractions of the Tortoli et al 2017 Mycobacterium strains.
Used in the study: p_compressed+h+v.tar.gz (12/06/2016)	Databases available via ftp://ftp.ccb.jhu.edu/pub/infphilo/centrifuge/data or https://ccb.jhu.edu/software/centrifuge/manual.shtml#custom-database

MyCodentifier Github:

https://jordycoolen.github.io/MyCodentifier/

m
Pneumonia Drug Exp Data
data.mendeley.com
Updated Sep 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OCHIN SHARMA (2023). Pneumonia Drug Exp Data [Dataset]. http://doi.org/10.17632/8bmpx4zvs8.1
Explore at:
Unique identifier
https://doi.org/10.17632/8bmpx4zvs8.1
Dataset updated
Sep 29, 2023
Authors
OCHIN SHARMA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is the result of experiments conducted using Python and rdkit library.
Bacannot database
zenodo.org
data-staging.niaid.nih.gov
+1more
application/gzip
Updated Oct 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Felipe Marques de Almeida; Felipe Marques de Almeida (2023). Bacannot database [Dataset]. http://doi.org/10.5281/zenodo.7615812
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7615812
Dataset updated
Oct 16, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Felipe Marques de Almeida; Felipe Marques de Almeida
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This zipped tarball (.tar.gz) contains a pre-built database for Bacannot (https://github.com/fmalmeida/bacannot).
Files are in the naming convention YEAR_MONTH_DAY.
m
expam RefSeq Database
bridges.monash.edu
researchdata.edu.au
bin
Updated May 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sean Solari; Remy Young; Vanessa Marcelino; Sam Forster (2022). expam RefSeq Database [Dataset]. http://doi.org/10.26180/19653840.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.26180/19653840.v2
Dataset updated
May 23, 2022
Dataset provided by
Monash University
Authors
Sean Solari; Remy Young; Vanessa Marcelino; Sam Forster
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
expam reference database used for benchmarking and comparison against metagenome profilers.
d
High Quality SNP Database
dknet.org
scicrunch.org
+2more
Updated May 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). High Quality SNP Database [Dataset]. http://identifiers.org/RRID:SCR_007230
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007230
Dataset updated
May 11, 2024
Description
This is the HQSNP DB (high-quality SNP database) developed by CHG bioinformatics group. The high-quality SNP is defined as a SNP having allele frequency or genotyping data. The majority of the HQSNPs come from HapMap, others come from JSNP (Japanese SNP database), TSC (The SNP Consortium), Affymetrix 120K SNP, and Perlegen SNP. There are four kinds of SNP search you can do: * Get SNPs by dbSNP rs#: Choose this search if you have already selected a list of SNPs and you just want to get the SNP information. The program will generate a Excel file containing the SNP flanking sequence, variation, quality, function, etc. In the Excel file, there are 10 highlighted fields. You can send only those highlighted information to Illumina to get SNP pre-score. (The same fields are presented in other types of searches as well.) * Get gene SNPs by gene names: Choose this search if you have a list of gene names and you want to get the SNP information in these genes. The gene name can be official gene symbol, Ensembl gene ID, RefSeq accession ID, LocusLink number, etc. * Get gene SNPs by genome regions: Choose this search if you have a list of genome regions and you want to get all gene SNP information in these regions. The software will find all the Ensembl genes in the regions and find SNPs associated to each Ensembl gene. * Get genome scan SNPs by genome regions: Choose this search if you have a list of genome regions and you want to get evenly spaced SNPs in these regions. A SNP selection tool (SNPselector) was built upon HQSNP. It took snp ID list, gene name list, or genome region list as input and searched SNPs for genome scan or gene assoctiation study. It could take an optional ABI SNP file (exported from ABI SNP search web page) as input for checking whether the candidate SNP is available from ABI. It could also take an optional Illumina SNP pre-score file as input to select SNP for Illumina SNP assay. It generated results sorted by tag SNP in LD block, SNP quality, SNP function, SNP regulatory potential, and SNP mutation risk. SNPselector is now retired from public use (as of September 30, 2010).
Bakta database
zenodo.org
application/gzip +1
Updated Feb 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oliver Schwengers; Oliver Schwengers (2023). Bakta database [Dataset]. http://doi.org/10.5281/zenodo.4662588
Explore at:
application/gzip, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4662588
Dataset updated
Feb 23, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Oliver Schwengers; Oliver Schwengers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data repository contains the mandatory DB for Bakta (db.tar.gz).

Bakta is a tool for the rapid & standardized local annotation of bacterial genomes & plasmids. It provides dbxref-rich and sORF-including annotations in machine-readble JSON & bioinformatics standard file formats for automatic downstream analysis: https://github.com/oschwengers/bakta

This db provides protein sequence hash digests and lengths of UniProt's UniRef100/UniRef90 clusters for ultra-fast identification & lookups. It has been pre-annotated with several specialized db and enriched with Dbxrefs. All conducted pre-annotations are logged and provided in the db.log.gz file.

External DB versions:

NCBI AMRFinderPlus: 2021-03-01

COG: 2020

DoriC: 10

ISFinder: 2019-09-25

Mob-suite: 2.0

Pfam: 34

RefSeq: r205

Rfam: 14.5

UniProtKB/Swiss-Prot: 2021_01

VFDB: 2021-04-05

Facebook

Twitter

Click to copy link

Link copied

Cite

Heidi Imker (2024). Molecular Biology Databases Published in Nucleic Acids Research between 1991-2016 [Dataset]. http://doi.org/10.13012/B2IDB-4311325_V1

Molecular Biology Databases Published in Nucleic Acids Research between 1991-2016

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.13012/B2IDB-4311325_V1

Dataset updated

Feb 1, 2024

Authors

Heidi Imker

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

This dataset was developed to create a census of sufficiently documented molecular biology databases to answer several preliminary research questions. Articles published in the annual Nucleic Acids Research (NAR) “Database Issues” were used to identify a population of databases for study. Namely, the questions addressed herein include: 1) what is the historical rate of database proliferation versus rate of database attrition?, 2) to what extent do citations indicate persistence?, and 3) are databases under active maintenance and does evidence of maintenance likewise correlate to citation? An overarching goal of this study is to provide the ability to identify subsets of databases for further analysis, both as presented within this study and through subsequent use of this openly released dataset.

Clear search

Close search

Google apps

Main menu

Molecular Biology Databases Published in Nucleic Acids Research between...

Fantastic databases and where to find them: Web applications for researchers...

Bioinformatics Links Directory

Alternative Splicing Annotation Project II Database

Data_Sheet_1_riceExplorer: Uncovering the Hidden Potential of a National...

Funding and Operating Organizations for Long-Lived Molecular Biology...

NCBI Nt (Nucleotide) database FASTA file from 2017-10-26

DAVID

Bioinformatic Harvester IV (beta) at Karlsruhe Institute of Technology

3D-Genomics Database

List of bioinformatics tools and databases students used.

Bioinformatics Protein Dataset - Simulated

Subtitle

Description

Introduction

Columns Included

Inspiration and Sources

Proposed Uses

How This Dataset Was Created

Limitations

Data Split

Acknowledgment

ASURAT knowledge-based databases

Data from: PseudoResistance DB: A new Database of antibiotics related to...

Databases for MyCodentifier: A tool for routine identification of...

Pneumonia Drug Exp Data

Bacannot database

expam RefSeq Database

High Quality SNP Database

Bakta database

Molecular Biology Databases Published in Nucleic Acids Research between 1991-2016