100+ datasets found
  1. d

    DBETH - Database for Bacterial ExoToxins for Humans

    • dknet.org
    • scicrunch.org
    • +1more
    Updated Oct 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). DBETH - Database for Bacterial ExoToxins for Humans [Dataset]. http://identifiers.org/RRID:SCR_005908
    Explore at:
    Dataset updated
    Oct 16, 2019
    Description

    Database of Bacterial ExoToxins for Human is a database of sequences, structures, interaction networks and analytical results for 229 exotoxins, from 26 different human pathogenic bacterial genus. All toxins are classified into 24 different Toxin classes. The aim of DBETH is to provide a comprehensive database for human pathogenic bacterial exotoxins. DBETH also provides a platform to its users to identify potential exotoxin like sequences through Homology based as well as Non-homology based methods. In homology based approach the users can identify potential exotoxin like sequences either running BLASTp against the toxin sequences or by running HMMER against toxin domains identified by DBETH from human pathogenic bacterial exotoxins. In Non-homology based part DBETH uses a machine learning approach to identify potential exotoxins (Toxin Prediction by Support Vector Machine based approach).

  2. 4

    SAFPredDB: Bacterial synteny database

    • data.4tu.nl
    zip
    Updated Apr 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aysun Urhan; Bianca-Maria Cosma; Ashlee M. Earl; Abigail L. Manson; Thomas Abeel (2024). SAFPredDB: Bacterial synteny database [Dataset]. http://doi.org/10.4121/ac84802e-853f-46f1-9786-b9d29c0f7557.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 5, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Aysun Urhan; Bianca-Maria Cosma; Ashlee M. Earl; Abigail L. Manson; Thomas Abeel
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    SAFPredDB is a bacterial synteny database built for the gene function prediction tool SAFPred, Synteny Aware Function Predictor. The database is a collection of conserved synteny and operons found across the bacterial kingdom. First, we formulated a synteny model based on experimentally known operons and the genomic features common in bacteria. We designed a bottoms-up, purely computational approach to build our database based on the proposed synteny model using complete bacterial genome assemblies from the Genome Taxonomy Database (GTDB).


    Although we initially built SAFPred for our prediction tool only, it can be used for other purposes where such a catalog is needed. As a standalone database, it can be queried to mine information about conserved genomic patterns in bacteria. In addition, it can be updated as newer assemblies are added to GTDB.

  3. MARMICRODB database for taxonomic classification of (marine) metagenomes

    • zenodo.org
    • explore.openaire.eu
    application/gzip, bin +3
    Updated Mar 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shane L Hogle; Shane L Hogle (2020). MARMICRODB database for taxonomic classification of (marine) metagenomes [Dataset]. http://doi.org/10.5281/zenodo.3520509
    Explore at:
    bin, application/gzip, tsv, html, bz2Available download formats
    Dataset updated
    Mar 20, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shane L Hogle; Shane L Hogle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction:
    This sequence database (MARMICRODB) was introduced in the publication JW Becker, SL Hogle, K Rosendo, and SW Chisholm. 2019. Co-culture and biogeography of Prochlorococcus and SAR11. ISME J. doi:10.1038/s41396-019-0365-4. Please see the original publication and its associated supplementary material for the original description of this resource.

    Motivation:
    We needed a reference database to annotate shotgun metagenomes from the Tara Oceans project [1] the GEOTRACES cruises GA02, GA03, GA10, and GP13 and the HOT and BATS time series [2]. Our interests are primarily in quantifying and annotating the free-living, oligotrophic bacterial groups Prochlorococcus, Pelagibacterales/SAR11, SAR116, and SAR86 from these samples using the protein classifier tool Kaiju [3]. Kaiju’s sensitivity and classification accuracy depend on the composition of the reference database, and highest sensitivity is achieved when the reference database contains a comprehensive representation of expected taxa from an environment/sample of interest. However, the speed of the algorithm decreases as database size increases. Therefore, we aimed to create a reference database that maximized the representation of sequences from marine bacteria, archaea, and microbial eukaryotes, while minimizing (but not excluding) the sequences from clinical, industrial, and terrestrial host-associated samples.

    Results/Description:
    MARMICRODB consists of 56 million sequence non-redundant protein sequences from 18769 bacterial/archaeal/eukaryote genome and transcriptome bins and 7492 viral genomes optimized for use with the protein homology classifier Kaiju [3]. To ensure maximum representation of marine bacteria, archaea, and microbial eukaryotes, we included translated genes/transcripts from 5397 representative “specI” species clusters from the proGenomes database [4]; 113 transcriptomes from the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) [5]; 10509 metagenome assembled genomes from the Tara Oceans expedition [6,7], the Red Sea [8], the Baltic Sea [9], and other aquatic and terrestrial sources [10]; 994 isolate genomes from the Genomic Encyclopedia of Bacteria and Archaea [11]; 7492 viral genomes from NCBI RefSeq [12]; 786 bacterial and archaeal genomes from MarRef [13]; and 677 marine single cell genomes [14]. In order to annotate metagenomic reads at the clade/ecotype level (subspecies) for the focal taxa Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116, we generated custom MARMICRODB taxonomies based on curated genome phylogenies for each group. The curated phylogenies, Kaiju formatted Burrows-Wheeler index, translated genes, the custom taxonomy hierarchy, an interactive kronaplot of the taxonomic composition, and scripts and instructions for how to use or rebuild the resource is available from 10.5281/zenodo.3520509.

    Methods:
    The curation and quality control of MARMICRODB single cell, metagenome assembled, and isolate genomes was performed as described in [15]. Briefly, we downloaded all MARMICRODB genomes as raw nucleotide assemblies from NCBI. We determined an initial genome taxonomy for these assemblies using checkM with the default lineage workflow [16]. All genome bins met the completion/contamination thresholds outlined in prior studies [7,17]. For single cell and metagenome assembled genomes, especially those from Tara Oceans Mediterranean sea samples [18], we use the GTDB-Tk classification workflow [19] to verify the taxonomic fidelity of each genome bin. We then selected genomes with a checkM taxonomic assignment of Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116 for further analysis and confirmed taxonomic assignment using blast matches to known Prochlorococcus/Synechococcus ITS sequences and by matching 16S sequences to the SILVA database [20]. To refine our estimates of completeness/contamination of Prochlorococcus genome bins we created a custom set of 730 single copy protein families (available from 10.5281/zenodo.3719132) from closed, isolate Prochlorococcus genomes [21] for quality assessments with checkM. For Synechococcus we used the CheckM taxonomic-specific workflow with the genus Synechococcus. After the custom CheckM quality control, we excluded any genome bins from downstream analysis that had an estimated quality < 30, defined as %completeness – 5x %contamination resulting in 18769 genome/transcriptome bins. We predicted genes in the resulting genome bins using prodigal [22] and excluded protein sequences with lengths less than 20 and greater than 20000 amino acids, removed non-standard amino acid residues, and condensed redundant protein sequences to a single representative sequence to which we assigned a lowest common ancestor (LCA) taxonomy identifier from the NCBI taxonomy database [23]. The resulting protein sequences were compiled and used to build a Kaiju [3] search database.

    The above filtering criteria resulted in 605 Prochlorococcus, 96 Synechococcus, 186 SAR11/Pelagibacterales, 60 SAR86, and 59 SAR116 high-quality genome bins. We constructed a high quality fixed reference phylogenetic tree for each taxonomic group based on genomes manually selected for completeness and the phylogenetic diversity. For example the Prochlorococcus and Synechococcus genomes for the fixed reference phylogeny are estimated > 90% complete, and SAR11 genomes are estimated > 70% complete. We created multiple sequence alignments of phylogenetically conserved genes from these genomes using the GTDB-Tk pipeline [19] with default settings. The pipeline identifies conserved proteins (120 bacterial proteins) and generates concatenated multi-protein alignments [17] from the genome assemblies using hmmalign from the hmmer software suite. We further filtered the resulting alignment columns using the bacterial and archaeal alignment masks from [17] (http://gtdb.ecogenomic.org/downloads). We removed columns represented by fewer than 50% of all taxa and/or columns with no single amino acid residue occuring at a frequency greater than 25%. We trimmed the alignments using trimal [24] with the automated -gappyout option to trim columns based on their gap distribution. We inferred reference phylogenies using multithreaded RAxML [25] with the GAMMA model of rate heterogeneity, empirically determined base frequencies, and the LG substitution model [26](PROTGAMMALGF). Branch support is based on 250 resampled bootstrap trees. This tree was then pruned to only allow a maximum average distance to the closest leaf (ADCL) of 0.003 to reduce the phylogenetic redundancy in the tree [27]. We then “placed” genomes that either did not pass completeness threshold or were considered phylogenetically redundant by ADCL within the fixed reference phylogeny for each group using pplacer [28] representing each placed genome as a pendant edge in the final tree. We then examined the resulting tree and manually selected clade/ecotype cutoffs to be as consistent as possible with clade definitions previously outlined for these groups [29–32]. We then gave clades from each taxonomic group custom taxonomic identifiers and we added these identifiers to the MARMICRODB Kaiju taxonomic hierarchy.

    Software/databases used:
    checkM v1.0.11[16]
    HMMERv3.1b2 (http://hmmer.org/)
    prodigal v2.6.3 [22]
    trimAl v1.4.rev22 [24]
    AliView v1.18.1 [33] [34]
    Phyx v0.1 [35]
    RAxML v8.2.12 [36]
    Pplacer v1.1alpha [28]
    GTDB-Tk v0.1.3 [19]
    Kaiju v1.6.0 [34]
    GTDB RS83 (https://data.ace.uq.edu.au/public/gtdb/data/releases/release83/83.0/)
    NCBI Taxonomy (accessed 2018-07-02) [23]
    TIGRFAM v14.0 [37]
    PFAM v31.0 [38]

    Discussion/Caveats:
    MARMICRODB is optimized for metagenomic samples from the marine environment, in particular planktonic microbes from the pelagic euphotic zone. We expect this database may also be useful for classifying other types of marine metagenomic samples (for example mesopelagic, bathypelagic, or even benthic or marine host-associated), but it has not been tested as such. The original purpose of this database was to quantify clades/ecotypes of Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116 in metagenomes from Tara Oceans Expedition and the GEOTRACES project. We carefully annotated and quality controlled genomes from these five groups, but the processing of the other marine taxa was largely automated and unsupervised. Taxonomy for other groups was copied over from the Genome Taxonomy Database (GTDB) [19,39] and NCBI Taxonomy [23] so any inconsistencies in those databases will be propagated to MARMICRODB. For most use cases MARMICRODB can probably be used unmodified, but if the user’s goal is to focus on a particular organism/clade that we did not curate in the database then the user may wish to spend some time curating those genomes (ie checking for contamination, dereplicating, building a genome phylogeny for custom taxonomy node assignment). Currently the custom taxonomy is hardcoded in the MARMICRODB.fmi index, but if users wish to modify MARMICRODB by adding or removing genomes, or reconfiguring taxonomic ranks the names.dmp and nodes.dmp files can easily be modified as well as the fasta file of protein sequences. However, the Kaiju index will need to be rebuilt, and user will require a high

  4. d

    Archaeal and Bacterial ABC Transporter Database

    • dknet.org
    • neuinfo.org
    • +1more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Archaeal and Bacterial ABC Transporter Database [Dataset]. http://identifiers.org/RRID:SCR_001692
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    ABCdb is a public resource devoted to the ATP-binding Cassette (ABC) transporters encoded by completely sequenced prokaryotic genomes. In order to establish, in a complete genome, the repertory of ABC systems, we have to: i) identify the different partners, ii) assemble the partners in putative systems, and iii) classify the system into the correct functional subfamily (Quentin et al., 2002). The main pitfalls were the identification of loosely conserved domains and the assembly of partners encoded by genes dispersed over the chromosome. In order to face the avalanche of newly sequenced genomes, we decided to also feed into the database the raw prediction issued by this automatic procedure, before time consuming review by an expert occurs. Therefore, the database comprises two sections: CleanDb, for data checked by an expert and AutoDb for raw data. The ABC proteins are involved in a wide variety of physiological processes in Archaea, Bacteria and Eucaryota where they are encoded by large families of paralogous genes. The majority of ABC domains energize the transport of compounds across membranes. In bacteria, ABC transporters are involved in the uptake of a wide variety of molecules, as well as in mechanisms of virulence and antibiotic resistance. In eukaryotes, most of them are involved in drug resistance and in human cell, many are associated with diseases. Sequence analysis reveals that members of the ABC superfamily can be organized into sub-families, and suggests that they have diverged from common ancestral forms. A typical ABC transporter system is composed of an assembly of protein domains that serve different functions: i) two Nucleotide Binding Domains (NBD) that energize transport via ATP hydrolysis, ii) two Membrane Spanning Domains (MSD) that act as a membrane channel for the substrate, and iii) for the importer, a Solute Binding Protein (SBP) that confers substrates specificity on the transporter. The different partners of an ABC system are generally encoded by neighboring genes. The database includes information on: * ABC transporters * Protein partners * Protein domains (NBD, MSD and SBP) * Classification of ABC transporters and their protein partners * Taxonomy of the species Each model Protein includes a link to the Peptide sequence, general information extracted from EMBL files, and specific tags to store results of predictions. The results of the annotation procedure are reachable through the class Prediction. The origin of the proteins is modeled as a path through the classes Chromosome, Strain, Species, and Taxon. Assembly and protein compilation tables are also provided for each of the chromosomes ( Assembly and Protein ).

  5. ARS Microbial Genomic Sequence Database Server

    • s.cnmilf.com
    • datadiscoverystudio.org
    • +2more
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). ARS Microbial Genomic Sequence Database Server [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/ars-microbial-genomic-sequence-database-server-1b81c
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    This database server is supported in fulfilment of the research mission of the Mycotoxin Prevention and Applied Microbiology Research Unit at the National Center for Agricultural Utilization Research in Peoria, Illinois. The linked website provides access to gene sequence databases for various groups of microorganisms, such as Streptomyces species or Aspergillus species and their relatives, that are the product of ARS research programs. The sequence databases are organized in the BIGSdb (Bacterial Isolate Genomic Sequence Database) software package developed by Keith Jolley and Martin Maiden at Oxford University. Resources in this dataset:Resource Title: ARS Microbial Genomic Sequence Database Server. File Name: Web Page, url: http://199.133.98.43

  6. f

    Data from: Mass Spectrometry-Based Proteomics Combined with Bioinformatic...

    • figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacek P. Dworzanski; Samir V. Deshpande; Rui Chen; Rabih E. Jabbour; A. Peter Snyder; Charles H. Wick; Liang Li (2023). Mass Spectrometry-Based Proteomics Combined with Bioinformatic Tools for Bacterial Classification [Dataset]. http://doi.org/10.1021/pr050294t.s002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    ACS Publications
    Authors
    Jacek P. Dworzanski; Samir V. Deshpande; Rui Chen; Rabih E. Jabbour; A. Peter Snyder; Charles H. Wick; Liang Li
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Timely classification and identification of bacteria is of vital importance in many areas of public health. We present a mass spectrometry (MS)-based proteomics approach for bacterial classification. In this method, a bacterial proteome database is derived from all potential protein coding open reading frames (ORFs) found in 170 fully sequenced bacterial genomes. Amino acid sequences of tryptic peptides obtained by LC−ESI MS/MS analysis of the digest of bacterial cell extracts are assigned to individual bacterial proteomes in the database. Phylogenetic profiles of these peptides are used to create a matrix of sequence-to-bacterium assignments. These matrixes, viewed as specific assignment bitmaps, are analyzed using statistical tools to reveal the relatedness between a test bacterial sample and the microorganism database. It is shown that, if a sufficient amount of sequence information is obtained from the MS/MS experiments, a bacterial sample can be classified to a strain level by using this proteomics method, leading to its positive identification. Keywords: classification of bacteria • proteomics • tandem mass spectrometry • LC−MS/MS • bioinformatics

  7. n

    MiST - Microbial Signal Transduction database

    • neuinfo.org
    • scicrunch.org
    • +1more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). MiST - Microbial Signal Transduction database [Dataset]. http://identifiers.org/RRID:SCR_003166
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database which contains the signal transduction proteins for complete and draft bacterial and archaeal genomes. The MiST2 database identifies and catalogs the repertoire of signal transduction proteins in microbial genomes.

  8. n

    Ensembl Bacteria

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Oct 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Ensembl Bacteria [Dataset]. http://identifiers.org/RRID:SCR_008679/resolver/mentions
    Explore at:
    Dataset updated
    Oct 16, 2019
    Description

    The Ensembl Genomes project produces genome databases for important species from across the taxonomic range, using the Ensembl software system. Five sites are now available, one of which is Ensembl Bacteria, which houses bacterial species. All bacterial collections in Ensembl Bacteria have been updated with the latest data from ENA and UniProtKB. New genomes have been added to Escherichia/Shigella (3 additional genomes) and Staphylococcus (3 additional genomes). The mapping of array probes has been expanded to all genomes in the Escherichia/Shigella and Staphylococcus collections. Ensembl Bacteria also now features improved interfaces for selecting regions of circular molecules a new visualisation allowing the large scale comparison of multiple genomes. In multi-synteny view, users can select multiple genomes and observe the syntenic relationships between them. Sponsors: EnsembBacteria is a project run by EMBL - EBI to maintain annotation on selected genomes, based on the software developed in the Ensembl project developed jointly by the EBI and the Wellcome Trust Sanger Institute.

  9. Z

    zol: prepTG Databases for Other Bacterial Taxa

    • data.niaid.nih.gov
    Updated Oct 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salamzade, Rauf (2023). zol: prepTG Databases for Other Bacterial Taxa [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8273155
    Explore at:
    Dataset updated
    Oct 26, 2023
    Dataset provided by
    Salamzade, Rauf
    Kalan, Lindsay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Each of the tar.gz compressed directories corresponds to prepTG databases (for the zol suite) featuring distinct, representative genomes for some well studied taxa/genera. Representative genomes for each genus/taxon were selected using skDER v1.0.7 in greedy mode with 99% ANI and 90% AF cutoffs. The compressed folders also contain an extra file, corresponding to a species tree of the representative genomes constructed using GToTree with Universal markers (ribosomal proteins) from Hug et al. 2016 and in best-hits mode. Note, GToTree was modified to always use -super5 mode for SCG alignments for computational efficiency. Also, note, because genomes can be dropped by GToTree prior to phylogeny inference (e.g. if they lack enough SCGs), not all genomes in the database might be represented in the phylogenies.

  10. Z

    mOTUs database for MetaMeta pipeline - Archaea and Bacteria - version 1

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piro, Vitor C. (2020). mOTUs database for MetaMeta pipeline - Archaea and Bacteria - version 1 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_819364
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Piro, Vitor C.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    mOTUs database for MetaMeta pipeline version 1. The database was downloaded from http://www.bork.embl.de/software/mOTU/share/mOTUs.Linux64bits.tar.gz and it is based on marker genes from 1,753 bacterial reference genomes + marker genes from 263 metagenomes and 3,496 bacterial genomes dating from February 2012

  11. b

    Microbial Protein Interaction Database

    • bioregistry.io
    • registry.identifiers.org
    Updated Dec 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Microbial Protein Interaction Database [Dataset]. https://bioregistry.io/registry/mpid
    Explore at:
    Dataset updated
    Dec 18, 2021
    Description

    The microbial protein interaction database (MPIDB) provides physical microbial interaction data. The interactions are manually curated from the literature or imported from other databases, and are linked to supporting experimental evidence, as well as evidences based on interaction conservation, protein complex membership, and 3D domain contacts.

  12. d

    DOLOP: A Database of Bacterial Lipoproteins

    • dknet.org
    • neuinfo.org
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). DOLOP: A Database of Bacterial Lipoproteins [Dataset]. http://identifiers.org/RRID:SCR_013487
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    DOLOP is an exclusive knowledge base for bacterial lipoproteins by processing information from 510 entries to provide a list of 199 distinct lipoproteins with relevant links to molecular details. Features include functional classification, predictive algorithm for query sequences, primary sequence analysis and lists of predicted lipoproteins from 43 completed bacterial genomes along with interactive information exchange facility. This website along will have additional information on the biosynthetic pathway, supplementary material and other related figures. DOLOP also contains information and links to molecular details for about 278 distinct lipoproteins and predicted lipoproteins from 234 completely sequenced bacterial genomes. Additionally, the website features a tool that applies a predictive algorithm to identify the presence or absence of the lipoprotein signal sequence in a user-given sequence. The experimentally verified lipoproteins have been classified into different functional classes and more importantly functional domain assignments using hidden Markov models from the SUPERFAMILY database that have been provided for the predicted lipoproteins. Other features include: primary sequence analysis, signal sequence analysis, and search facility and information exchange facility to allow researchers to exchange results on newly characterized lipoproteins.

  13. Z

    NCBI Refseq database as of May 2023 part 2

    • data.niaid.nih.gov
    Updated Feb 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert, Nichols (2024). NCBI Refseq database as of May 2023 part 2 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10452278
    Explore at:
    Dataset updated
    Feb 27, 2024
    Dataset authored and provided by
    Robert, Nichols
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the second part of the NCBI Refseq bacterial database originally downloaded in May of 2023. This was used to create the Bacterial 16S and Gyrb databases used in gyrB primer development. The first part can be found at 10.5281/zenodo.10452184.

    To recombine the database parts use the code

    "cat Bacteria.refseq.tar.gz.part* > Bacteria.refseq.tar.gz"

    the total file size of the downloaded refseq database is 88 GB

    The gyrB and 16S databases can be found at 10.5281/zenodo.10451935

  14. d

    TWIW database dump

    • data.dtu.dk
    txt
    Updated Jul 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sidsel Nag; Gunhild Larsen; Judit Szarvas; Laura Elmlund Kohl Birkedahl; Gábor Máté Gulyás; Wojciech Jakub Ciok; Timmie M. R. Lagermann; Silva Tafaj; Susan Bradbury; Peter Collignon; Denise Daley; Victorien Dougnon; Kafayath Fabiyi; Boubacar Coulibaly; René Dembélé; Georgette Nikiema; Natama Magloire; Isidore Juste Ouindgueta; Zenat Zebin Hossain; Anowara Begum; Deyan Donchev; Mathew Diggle; LeeAnn Turnbull; Simon Lévesque; Livia Berlinger; Kirstine Kobberoe Søgaard; Paula Diaz Guevara; Carolina Duarte Valderrama; Panagiota Maikanti; Jana Amlerova; Pavel Drevinek; Jan Tkadlec; Milica Dilas; Achim J. Kaasch; HenrikTorkil Westh; Mohamed Azzedine Bachtarzi; Wahiba Amhis; Carolina Elizabeth Satán Salazar; José Eduardo Villacis; Mária Angeles Dominguez Lúzon; Dàmaris Berbel Palau; Claire Duployez; Maxime Paluch; Solomon Asante-Sefa; Mie Møller; Margaret Ip; Ivana Marecović; Agnes Pál-Sonnevend; Clementiza Elvezia Cocuzza; Asta Dambrauskiene; Alexandre Macanze; Anelsio Cossa; Inácio Mandomando; Philip Nwajiobi-Princewill; Iruka N. Okeke; Aderemi O. Kehinde; Ini Adebiyi; Ifeoluwa Akintayo; Oluwafemi Popoola; Anthony Onipede; Anita Blomfeldt; Nora Elisabeth Nyquist; Kiri Bocker; James Ussher; Amjad Ali; Nimat Ullah; Habibullah Khan; Natalie Weiler Gustafson; Ikhlas Jarrar; Arif Al-Hamad; Viravarn Luvira; Wantana Paveenkittiporn; Irmak Baran; James C. L. Mwansa; Linda Sikakwa; Kaunda Yamba; Rene Sjøgren Hendriksen; Frank Møller Aarestrup (2023). TWIW database dump [Dataset]. http://doi.org/10.11583/DTU.21758456.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 10, 2023
    Dataset provided by
    Technical University of Denmark
    Authors
    Sidsel Nag; Gunhild Larsen; Judit Szarvas; Laura Elmlund Kohl Birkedahl; Gábor Máté Gulyás; Wojciech Jakub Ciok; Timmie M. R. Lagermann; Silva Tafaj; Susan Bradbury; Peter Collignon; Denise Daley; Victorien Dougnon; Kafayath Fabiyi; Boubacar Coulibaly; René Dembélé; Georgette Nikiema; Natama Magloire; Isidore Juste Ouindgueta; Zenat Zebin Hossain; Anowara Begum; Deyan Donchev; Mathew Diggle; LeeAnn Turnbull; Simon Lévesque; Livia Berlinger; Kirstine Kobberoe Søgaard; Paula Diaz Guevara; Carolina Duarte Valderrama; Panagiota Maikanti; Jana Amlerova; Pavel Drevinek; Jan Tkadlec; Milica Dilas; Achim J. Kaasch; HenrikTorkil Westh; Mohamed Azzedine Bachtarzi; Wahiba Amhis; Carolina Elizabeth Satán Salazar; José Eduardo Villacis; Mária Angeles Dominguez Lúzon; Dàmaris Berbel Palau; Claire Duployez; Maxime Paluch; Solomon Asante-Sefa; Mie Møller; Margaret Ip; Ivana Marecović; Agnes Pál-Sonnevend; Clementiza Elvezia Cocuzza; Asta Dambrauskiene; Alexandre Macanze; Anelsio Cossa; Inácio Mandomando; Philip Nwajiobi-Princewill; Iruka N. Okeke; Aderemi O. Kehinde; Ini Adebiyi; Ifeoluwa Akintayo; Oluwafemi Popoola; Anthony Onipede; Anita Blomfeldt; Nora Elisabeth Nyquist; Kiri Bocker; James Ussher; Amjad Ali; Nimat Ullah; Habibullah Khan; Natalie Weiler Gustafson; Ikhlas Jarrar; Arif Al-Hamad; Viravarn Luvira; Wantana Paveenkittiporn; Irmak Baran; James C. L. Mwansa; Linda Sikakwa; Kaunda Yamba; Rene Sjøgren Hendriksen; Frank Møller Aarestrup
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Two Weeks in the World is a global research collaboration which seeks to shed light on various aspects of antimicrobial resistance. The research project has resulted in a dataset of 3100 clinically relevant bacterial genomes with pertaining metadata. “Clinically relevant” refers to the fact that the bacteria from which the genomes were obtained, were all concluded as being a cause of clinical manifestations of infection. The metadata refers to the data describing the infection from which the bacteria was obtained, like geographic origin and approximate collection date. The bacteria were collected from 59 microbiological diagnostic units in 35 countries around the world during 2020. The data from the project consists of tabular data and genomic sequence data. The tabular data is available as a mysql dump (relational database) and as csv files. The tabular data includes the infection metadata, the results from bioinformatic analyses (species prediction, identification of acquired resistance genes and phylogenetic analysis) as well as the pertaining accession numbers of the individual genomic sequence data, which are available through the European Nucleotide Archive (ENA). At time of submission, the project also has a dedicated web app, from which data can be browsed and downloaded: https://twiw.genomicepidemiology.org/ This complete dataset is created and shared according to the FAIR principles and has large reuse potential within the research fields of antimicrobial resistance, clinical microbiology and global health.

    .v2: Author list and readme has been updated. And a file containing column descriptions, for the database dump, has been added: TWIW_dbcolumns_explained.csv.

  15. o

    Bakta database

    • explore.openaire.eu
    • zenodo.org
    Updated Nov 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oliver Schwengers (2020). Bakta database [Dataset]. http://doi.org/10.5281/zenodo.5215743
    Explore at:
    Dataset updated
    Nov 20, 2020
    Authors
    Oliver Schwengers
    Description

    This data repository contains the mandatory DB for Bakta (db.tar.gz). Bakta is a tool for the rapid & standardized local annotation of bacterial genomes & plasmids. It provides dbxref-rich and sORF-including annotations in machine-readble JSON & bioinformatics standard file formats for automatic downstream analysis: https://github.com/oschwengers/bakta This db provides protein sequence hash digests and lengths of UniProt's UniRef100 clusters, UniParc and NCBI RefSeq sequences for ultra-fast identification & lookups. It has been pre-annotated with several specialized db and enriched with Dbxrefs. Furthermore, seed sequences of UniProt's UniRef90 clusters are stored for fallback homology searches via Diamond sequence alignments. All conducted pre-annotations are logged and provided in the db.log.gz file. External DB versions: NCBI AMRFinderPlus: 2021-08-05 COG: 2020 DoriC: 10 ISFinder: 2019-09-25 Mob-suite: 2.0 Pfam: 34 RefSeq: r207 Rfam: 14.6 UniProtKB/Swiss-Prot: 2021_03 VFDB: 2021-08-05

  16. u

    National Microbial Germplasm Program

    • agdatacommons.nal.usda.gov
    • datasets.ai
    • +1more
    bin
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDA ARS National Germplasm Resources Laboratory (2023). National Microbial Germplasm Program [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/National_Microbial_Germplasm_Program/24661746
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    National Germplasm Resources Laboratory
    Authors
    USDA ARS National Germplasm Resources Laboratory
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The goal of the National Microbial Germplasm Program is to ensure that the genetic diversity of agriculturally important microorganisms is maintained to enhance and increase agricultural efficiency and profitability. The program collects, authenticates, and characterizes potentially useful microbial germplasm; preserves microbial genetic diversity; and facilitates distribution and utilization of microbial germplasm for research and industry. The Agricultural Research Service maintains several microbial germplasm collections including:

    USDA ARS Culture Collection USDA ARS Collection of Entomopathogenic Fungal Cultures (ARSEF) Query or Download the Rhizobium Database

    US National Fungus Collections Resources in this dataset:Resource Title: National Microbial Germplasm Program . File Name: Web Page, url: https://www.ars-grin.gov/nmg/ Main web site for the National Microbial Germplasm Program with links to component databases/collections.

  17. r

    Data from: Indexed reference databases for KMA and CCMetagen

    • researchdata.edu.au
    Updated Apr 30, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr Vanessa Rossetto Marcelino; Dr Vanessa Rossetto Marcelino; Dr Jan Buchmann; Clausen Philip (2019). Indexed reference databases for KMA and CCMetagen [Dataset]. http://doi.org/10.25910/5CC7CD40FCA8E
    Explore at:
    Dataset updated
    Apr 30, 2019
    Dataset provided by
    The University of Sydney
    Authors
    Dr Vanessa Rossetto Marcelino; Dr Vanessa Rossetto Marcelino; Dr Jan Buchmann; Clausen Philip
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Time period covered
    Apr 9, 2019 - Apr 30, 2019
    Description

    This database was built to identify taxa in metagenome samples using the CCMetagen pipeline. The whole NCBI nt collection allows a complete taxonomic overview, including from microbial eukaryotes that may be present in the dataset. This database is already indexed, ready to use with KMA and CCMetagen.

    A manual describing how to use this dataset can be found at: https://github.com/vrmarcelino/CCMetagen

    Additionally, a tutorial on the whole analysis of a set of metatranscriptome samples can be found at: https://github.com/vrmarcelino/CCMetagen/tree/master/tutorial

    The database was built as follows:

    The partially non-redundant nucleotide database was downloaded from the NCBI website (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nt.gz) in January 2018. This database was formatted to include taxids in sequence headers.

    Indexing was then performed with KMA using the commands:

    kma_index -i nt_taxid.fas -o ncbi_nt -NI -Sparse TG

    Three indexed databases are provided:

    1. NCBI nucleotide collection
    2. RefSeq database of bacterial and fungal genomes
  18. Additional file 4 of A large-scale genomically predicted protein mass...

    • figshare.com
    • springernature.figshare.com
    xlsx
    Updated Aug 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuji Sekiguchi; Kanae Teramoto; Dieter M. Tourlousse; Akiko Ohashi; Mayu Hamajima; Daisuke Miura; Yoshihiro Yamada; Shinichi Iwamoto; Koichi Tanaka (2024). Additional file 4 of A large-scale genomically predicted protein mass database enables rapid and broad-spectrum identification of bacterial and archaeal isolates by mass spectrometry [Dataset]. http://doi.org/10.6084/m9.figshare.26637792.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Aug 16, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yuji Sekiguchi; Kanae Teramoto; Dieter M. Tourlousse; Akiko Ohashi; Mayu Hamajima; Daisuke Miura; Yoshihiro Yamada; Shinichi Iwamoto; Koichi Tanaka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 4: Table S3. GPMsDB-tk results for identification of 81 bacterial and archaeal reference strains.

  19. Resilience of Microbial Communities Sequence Data Set

    • catalog.data.gov
    • data.amerigeoss.org
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Resilience of Microbial Communities Sequence Data Set [Dataset]. https://catalog.data.gov/dataset/resilience-of-microbial-communities-sequence-data-set
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The EX_Genome_Assemblies.zip file contain the contig sequences (i.e. assembly) of fifteen isolates used for genomic and antibiotic resistance genes (ARG) analysis. EX_OTU.fasta file contain the sequences of the bacterial 16S rRNA-encoding V4 region gene (≈250 nt) for each Operational Taxonomic Unit (OTU). This dataset is associated with the following publication: Gomez-Alvarez, V., S. Pfaller, J. Pressman, D. Wahman, and R. Revetta. Resilience of microbial communities in a simulated drinking water distribution system subjected to disturbances: role of conditionally rare taxa and potential implications for antibiotic-resistant bacteria. Environmental Science: Water Research & Technology. Royal Society of Chemistry, Cambridge, UK, 2: 645-657, (2016).

  20. Bacterial 23S Ribosomal RNA RefSeq Targeted Loci Project

    • gbif.org
    Updated Nov 29, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GBIF (2021). Bacterial 23S Ribosomal RNA RefSeq Targeted Loci Project [Dataset]. http://doi.org/10.15468/5cedfd
    Explore at:
    Dataset updated
    Nov 29, 2021
    Dataset provided by
    National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The 23S ribosomal RNA targeted loci project is the result of an international collaboration between a number of ribosomal RNA databases and NCBI to provide a curated and comprehensive set of complete and near full length Reference Sequence records for phylogenetic and evolutionary analyses. Sequences that represent the consensus of all contributing databases in both sequence content and taxonomic assignment are promoted to RefSeqs. All sequences will have the same project ID and can be found as such. Database URL: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA188943.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2019). DBETH - Database for Bacterial ExoToxins for Humans [Dataset]. http://identifiers.org/RRID:SCR_005908

DBETH - Database for Bacterial ExoToxins for Humans

RRID:SCR_005908, nlx_149481, biotools:dbeth, DBETH - Database for Bacterial ExoToxins for Humans (RRID:SCR_005908), DBETH, Database for Bacterial ExoToxins for Humans

Explore at:
Dataset updated
Oct 16, 2019
Description

Database of Bacterial ExoToxins for Human is a database of sequences, structures, interaction networks and analytical results for 229 exotoxins, from 26 different human pathogenic bacterial genus. All toxins are classified into 24 different Toxin classes. The aim of DBETH is to provide a comprehensive database for human pathogenic bacterial exotoxins. DBETH also provides a platform to its users to identify potential exotoxin like sequences through Homology based as well as Non-homology based methods. In homology based approach the users can identify potential exotoxin like sequences either running BLASTp against the toxin sequences or by running HMMER against toxin domains identified by DBETH from human pathogenic bacterial exotoxins. In Non-homology based part DBETH uses a machine learning approach to identify potential exotoxins (Toxin Prediction by Support Vector Machine based approach).

Search
Clear search
Close search
Google apps
Main menu