Facebook
TwitterDatabase of three-dimensional structures of macromolecules that allows the user to retrieve structures for specific molecule types as well as structures for genes and proteins of interest. Three main databases comprise Structure-The Molecular Modeling Database; Conserved Domains and Protein Classification; and the BioSystems Database. Structure also links to the PubChem databases to connect biological activity data to the macromolecular structures. Users can locate structural templates for proteins and interactively view structures and sequence data to closely examine sequence-structure relationships. * Macromolecular structures: The three-dimensional structures of biomolecules provide a wealth of information on their biological function and evolutionary relationships. The Molecular Modeling Database (MMDB), as part of the Entrez system, facilitates access to structure data by connecting them with associated literature, protein and nucleic acid sequences, chemicals, biomolecular interactions, and more. It is possible, for example, to find 3D structures for homologs of a protein of interest by following the Related Structure link in an Entrez Protein sequence record. * Conserved domains and protein classification: Conserved domains are functional units within a protein that act as building blocks in molecular evolution and recombine in various arrangements to make proteins with different functions. The Conserved Domain Database (CDD) brings together several collections of multiple sequence alignments representing conserved domains, in addition to NCBI-curated domains that use 3D-structure information explicitly to define domain boundaries and provide insights into sequence/structure/function relationships. * Small molecules and their biological activity: The PubChem project provides information on the biological activities of small molecules and is a component of NIH''''s Molecular Libraries Roadmap Initiative. PubChem includes three databases: PCSubstance, PCBioAssay, and PCCompound. The PubChem data are linked to other data types (illustrated example) in the Entrez system, making it possible, for example, to retrieve information about a compound and then Link to its biological activity data, retrieve 3D protein structures bound to the compound and interactively view their active sites, and find biosystems that include the compound as a component. * Biological Systems: A biosystem, or biological system, is a group of molecules that interact directly or indirectly, where the grouping is relevant to the characterization of living matter. The NCBI BioSystems Database provides centralized access to biological pathways from several source databases and connects the biosystem records with associated literature, molecular, and chemical data throughout the Entrez system. BioSystem records list and categorize components (illustrated example), such as the genes, proteins, and small molecules involved in a biological system. The companion FLink icon FLink tool, in turn, allows you to input a list of proteins, genes, or small molecules and retrieve a ranked list of biosystems.
Facebook
TwitterThe Molecular Modeling DataBase (MMDB), also known as Entrez Structure, is a database of experimentally determined structures obtained from the RCSB Protein Data Bank (PDB). MMDB is developed by the Structure Group of the NCBI Computational Biology Branch. The data processing procedure at NCBI results in the addition of a number of useful features that facilitate computation on the data and link them to many other data types in the Entrez system. The structure database is considerably smaller than Entrez''s Protein or Nucleotide databases, but a large fraction of all known protein sequences have homologs in this set, and one may often learn more about a protein by examining 3-D structures of its homologs. These are accessible as Related Structures in the Links menu of Entrez Protein sequence records (illustrated example). It is then possible to align the query protein to the structure-based sequence, as shown in the illustration on this page. Additional resources can be used along with MMDB to interactively view the structures, find similar 3D structures, learn about the types of interactions and bound chemicals that have been found to exist among the similar 3D structures, and more.
Facebook
TwitterThe Conserved Domain Database (CDD) is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution.
Facebook
TwitterDatabases of protein sequences and 3D structures of proteins. Collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB.
Facebook
TwitterThe dbRBC database provides an open, publicly accessible platform for DNA and clinical data related to the human Red Blood Cells (RBC). A new bioinformatics resource, dbRBC, has been installed at the National Center of Biotechnology Information (NCBI). This resource combines the well established Blood Group Antigen Gene Mutation Database (BGMUT) with tools and interlinked resources developed at the NCBI. The main task of dbRBC is to provide access to publicly available genomic, protein and structural information linked to the red blood cell antigens. The site offers a number of resources: * BGMUT Database * Alignment Viewer * SBT Tool * Probe/Primer Resource * Typing Kit Interface * Obstacle
Facebook
Twitterhttp://www.ncbi.nlm.nih.gov/About/disclaimer.htmlhttp://www.ncbi.nlm.nih.gov/About/disclaimer.html
The gene database provides information on gene sequence, structure, location, and function for annotated genes from the NCBI database. Users can search by accession ID or keyword, compare and identify sequences using BLAST, or submit references into function (RIFs) based on experimental results. Bulk download and an update mailing list are available.
Facebook
TwitterPROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].
Facebook
TwitterData and information collection and repository for biological activities of small molecules and small interfering RNAs (siRNAs) hosted by the US National Institutes of Health (NIH). Used to select and summarize the bioactivities of tested substances.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We describe a Bayesian Markov chain Monte Carlo (MCMC) sampler for protein multiple sequence alignment (MSA) that, as implemented in the program GISMO and applied to large numbers of diverse sequences, is more accurate than the popular MSA programs MUSCLE, MAFFT, Clustal-Ω and Kalign. Features of GISMO central to its performance are: (i) It employs a “top-down” strategy with a favorable asymptotic time complexity that first identifies regions generally shared by all the input sequences, and then realigns closely related subgroups in tandem. (ii) It infers position-specific gap penalties that favor insertions or deletions (indels) within each sequence at alignment positions in which indels are invoked in other sequences. This favors the placement of insertions between conserved blocks, which can be understood as making up the proteins’ structural core. (iii) It uses a Bayesian statistical measure of alignment quality based on the minimum description length principle and on Dirichlet mixture priors. Consequently, GISMO aligns sequence regions only when statistically justified. This is unlike methods based on the ad hoc, but widely used, sum-of-the-pairs scoring system, which will align random sequences. (iv) It defines a system for exploring alignment space that provides natural avenues for further experimentation through the development of new sampling strategies for more efficiently escaping from suboptimal traps. GISMO’s superior performance is illustrated using 408 protein sets containing, on average, 235 sequences. These sets correspond to NCBI Conserved Domain Database alignments, which have been manually curated in the light of available crystal structures, and thus provide a means to assess alignment accuracy. GISMO fills a different niche than other MSA programs, namely identifying and aligning a conserved domain present within a large, diverse set of full length sequences. The GISMO program is available at http://gismo.igs.umaryland.edu/.
Facebook
TwitterThis is a database of comparative protein structure models of MIP (Major Intrinsic Protein) family of proteins. The nearly completed sets of MIPs have been identified from the completed genome sequence of organisms available at NCBI. The structural models of MIP proteins were created by defined protocol. The database aims to provide key information of MIPs in particular based on sequence as well as structures. This will further help to decipher the function of uncharacterized MIPs. For each MIP entry, this database contains information about the source, gene structure, sequence features, substitutions in the conserved NPA motifs, structural model, the residues forming the selectivity filter and channel radius profile. For selected set of MIPs, it is possible to derive structure-based sequence alignment and evolutionary relationship. Sequences and structures of selected MIPs can be downloaded from MIPModDB database.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Links are provided to published protein structures for the apoenzyme DmdA from Pelagibacter ubique, as well as for DmdA co-crystals soaked with substrate DMSP or the cofactor tetrahydrofolate (THF) accessible via NCBI's Molecular Modeling Database (MMDB).
Experimental design, methods, and results are further described in:
D. J. Schuller, C. R. Reisch, M. A. Moran, W. B. Whitman, and W. N. Lanzilotta (2012). Structures of dimethylsulfoniopropionate-dependent demethylase from the marine organism Pelegabacter ubique. Protein Science, vol. 21, p. 289. doi: 10.1002/pro.2015
Facebook
TwitterDatabase providing information on structure of assembled genomes, assembly names and other meta-data, statistical reports, and links to genomic sequence data. The Archive links the raw sequence information found in the Trace Archive with assembly information found in publicly available sequence repositories (GenBank/EMBL/DDBJ).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NCBI hexapod transcriptomes.
Facebook
TwitterA database of allergenic proteins. It contains various computational tools that can assist structural biology studies related to allergens. SDAP is an important tool in the investigation of the cross-reactivity between known allergens, in testing the FAO/WHO allergenicity rules for new proteins, and in predicting the IgE-binding potential of genetically modified food proteins. Using this Internet service through a browser, it is possible to retrieve information related to an allergen from the most common protein sequence and structure databases (SwissProt, PIR, NCBI, PDB), to find sequence and structural neighbors for an allergen, and to search for the presence of an epitope other the whole collection of allergens.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Identification of high-risk missense SNPs of the human PC.
Facebook
TwitterStructural variation database designed to store data on variant DNA > / = 1 bp in size from all organisms. Associations of defined variants with phenotype information is also provided. Users can browse data containing number of variant cells from each study, and filter studies by organism, study type, method and genomic variant. Organisms include human, mouse, cattle and several additional animals.
Facebook
TwitterThe Human Intermediate Filament Database is a continuously updated review of the intermediate filament field. It is hoped that users will contribute to the development and expansion of the database on a regular basis. Contributions may include novel variants, new patients with previously discovered sequence and allelic variants. Suggestions on ways to improve the database are also welcome. The entire database can be searched through the Browse and Search options. A number of different parameters can be used to search the database including unique identifier, intermediate filament, disease DNA variations, amino acid variations, domain, date accepted, author and abstract. Output from the search is returned in a table containing all the pertinent cross referenced information. Multiple sequence alignment can also be performed via the CLUSTALW program to determine cDNA or protein sequence conservation. The database is linked to multiple other resources including NCBI RefSeq, PDB, OMIM, UCSC genome browser, NCBI Gene, HomoloGene, PubMed and HGNC. In the case of HGNC, reciprocal links are also available from HGNC that links to Human Intermediate Filament Database. Due to the protein centric nature of the Human Intermediate Filament Database and the gene centric nature of HGNC, a HGNC record will potentially link to multiple records in this database due to the presence of alternative splicing. In such an event, the Human Intermediate Filament Database will present to the user a list of all the protein records resulting from the HGNC gene record. The database uses Jalview and Jmol applets for the visualization of multiple sequence alignment and structure respectively. The database contains information on disease phenotypes of a variety of different intermediate filament related diseases.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of " 26 high-risk missense SNPs of human PC" identified by six in silico programs.
Facebook
TwitterBiodiversity changes due to human activities highlight the need for efficient biodiversity monitoring approaches. Environmental DNA (eDNA) metabarcoding offers a non-invasive method used for biodiversity monitoring and ecosystem assessment, but its accuracy depends on comprehensive DNA reference databases. Natural history collections often contain rare or difficult-to-obtain samples that can serve as a valuable resource to fill gaps in eDNA reference databases. Here, we discuss the utility of specimens from natural history collections in supporting future eDNA applications. Museomics—the application of -omics techniques to museum specimens—offers a promising avenue for improving eDNA reference databases by increasing species coverage. Furthermore, museomics can provide transferable methodological advancements for extracting genetic material from samples with low and degraded DNA. The integration of natural history collections, museomics, and eDNA approaches has the potential to signific..., Dataset for analyzing the potential of museum specimens to improve the DNA reference database To examine the cumulative number of species sequenced for a given DNA barcode/mitochondrial genome (also referred to as mitogenome) over the years, we retrieved all data available from NCBI using the R package rentrez v1.2.3 (Winter 2017). We searched the nucleotide database for the rRNA 12S, rRNA 16S, rRNA 18S, cytochrome B (cytB), cytochrome oxidase I (COI) barcodes, as well as for the complete mitogenomes for all fish orders. In addition, we also retrieved all the fish species with available data on the sequence read archive (SRA) using the Entrez Direct (Kans 2024), which provides access to the NCBI databases from a Unix terminal window. To highlight the potential of museum specimens for increasing the number of species with an available barcode/mitogenome sequence, we first downloaded all available datasets on the Global Biodiversity Information Facility (GBIF) listing fish specimens store..., , # Unlocking natural history collections to improve eDNA reference databases and biodiversity monitoring
The dataset consists of a main folder, data.zip.
Various
barcodes_data
output from the cumul_barcodes_plot.R script.
occurence_data
contains a different type of list of species (museum, 12S availability, etc.)
museum_potential/1_process_gbif_datasets.R. Contains all the species of fish found in the main natural ...,
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Title:
Unveiling Host-Parasite Relationships through Conserved MITEs in Prokaryote and Viral Genomes
Authors:
Francisco Nadal-Molero(1), Riccardo Roselli(1), Silvia Garcia-Juan(1), Alicia Campos-Lopez(1), Ana-Belen Martin-Cuadrado(1*)
SUPPLEMENTARY FILES
Supplementary File S1. Sequences of cMITEs detected in Bacteria genomes (fasta format). The hosting microbial species and inferred NCBI-taxonomy are indicated in the name of each sequence. The structure of the MITE name is: “Accession|Genome|start|end|TSD|TIRlength|MITETracker_group|Lineage”.
Supplementary File S2. Sequences of cMITEs detected in the Archaea genomes (fasta format). The hosting microbial species and inferred NCBI-taxonomy are indicated in the name of each sequence. The structure of the MITE name is: “Accession|Genome|start|end|TSD|TIRlength|MITETracker_group|Lineage”.
Supplementary File S3. Sequences of vMITEs detected in the virus sequences from the NCBI and IMG/VR v.4.1 database (fasta format). Virus, microbial host (if known) and inferred NCBI-taxonomy is stated in the name of each sequence. The structure of the MITE name is:
“Accession|Genome|start|end|TSD|TIRlength|MITETracker_group|Virus|Name|Host”.
Supplementary File S4. Sequences of si-vMITEs detected in the virus sequences from the NCBI and IMG/VR v.4.1 database (fasta format). Virus, microbial host (if known) and inferred NCBI-taxonomy are stated in the name of each sequence. The structure of the MITE name is: “Accession|Genome|start|end|Ident.Method.by.DB|Host”.
Supplementary Files S5. Cytoscape networks. (A) Figure 1A, (B) Figure 1B.
Supplementary File S6. Sequences of cMITEs obtained from 5837 genomes of Neisseriales. The structure of the MITE name is:
“Accession|NucleotideID|start|end|TSD|TIRlength|MITETracker_group|Genome|Lineage”.
Supplementary File S7. Sequences of si-vMITEs obtained from 5837 genomes of Neisseriales. The structure of the MITE name is: “Accession|Genome|start|end|Host”.
Supplementary File S8. Sequences of cMITEs obtained from 46051 genomes of Bacteroidota. The structure of the MITE name is:
“Accession|NucleotideID|start|end|TSD|TIRlength|MITETracker_group|Genome|Lineage”.
Supplementary File S9. Sequences of si-vMITEs obtained from 46051 genomes of Bacteroidota. The structure of the MITE name is: “Accession|Genome|start|end|Host”.
Facebook
TwitterDatabase of three-dimensional structures of macromolecules that allows the user to retrieve structures for specific molecule types as well as structures for genes and proteins of interest. Three main databases comprise Structure-The Molecular Modeling Database; Conserved Domains and Protein Classification; and the BioSystems Database. Structure also links to the PubChem databases to connect biological activity data to the macromolecular structures. Users can locate structural templates for proteins and interactively view structures and sequence data to closely examine sequence-structure relationships. * Macromolecular structures: The three-dimensional structures of biomolecules provide a wealth of information on their biological function and evolutionary relationships. The Molecular Modeling Database (MMDB), as part of the Entrez system, facilitates access to structure data by connecting them with associated literature, protein and nucleic acid sequences, chemicals, biomolecular interactions, and more. It is possible, for example, to find 3D structures for homologs of a protein of interest by following the Related Structure link in an Entrez Protein sequence record. * Conserved domains and protein classification: Conserved domains are functional units within a protein that act as building blocks in molecular evolution and recombine in various arrangements to make proteins with different functions. The Conserved Domain Database (CDD) brings together several collections of multiple sequence alignments representing conserved domains, in addition to NCBI-curated domains that use 3D-structure information explicitly to define domain boundaries and provide insights into sequence/structure/function relationships. * Small molecules and their biological activity: The PubChem project provides information on the biological activities of small molecules and is a component of NIH''''s Molecular Libraries Roadmap Initiative. PubChem includes three databases: PCSubstance, PCBioAssay, and PCCompound. The PubChem data are linked to other data types (illustrated example) in the Entrez system, making it possible, for example, to retrieve information about a compound and then Link to its biological activity data, retrieve 3D protein structures bound to the compound and interactively view their active sites, and find biosystems that include the compound as a component. * Biological Systems: A biosystem, or biological system, is a group of molecules that interact directly or indirectly, where the grouping is relevant to the characterization of living matter. The NCBI BioSystems Database provides centralized access to biological pathways from several source databases and connects the biosystem records with associated literature, molecular, and chemical data throughout the Entrez system. BioSystem records list and categorize components (illustrated example), such as the genes, proteins, and small molecules involved in a biological system. The companion FLink icon FLink tool, in turn, allows you to input a list of proteins, genes, or small molecules and retrieve a ranked list of biosystems.