57 datasets found
  1. r

    NCBI Structure

    • rrid.site
    • scicrunch.org
    • +2more
    Updated Jul 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). NCBI Structure [Dataset]. http://identifiers.org/RRID:SCR_004218
    Explore at:
    Dataset updated
    Jul 12, 2025
    Description

    Database of three-dimensional structures of macromolecules that allows the user to retrieve structures for specific molecule types as well as structures for genes and proteins of interest. Three main databases comprise Structure-The Molecular Modeling Database; Conserved Domains and Protein Classification; and the BioSystems Database. Structure also links to the PubChem databases to connect biological activity data to the macromolecular structures. Users can locate structural templates for proteins and interactively view structures and sequence data to closely examine sequence-structure relationships. * Macromolecular structures: The three-dimensional structures of biomolecules provide a wealth of information on their biological function and evolutionary relationships. The Molecular Modeling Database (MMDB), as part of the Entrez system, facilitates access to structure data by connecting them with associated literature, protein and nucleic acid sequences, chemicals, biomolecular interactions, and more. It is possible, for example, to find 3D structures for homologs of a protein of interest by following the Related Structure link in an Entrez Protein sequence record. * Conserved domains and protein classification: Conserved domains are functional units within a protein that act as building blocks in molecular evolution and recombine in various arrangements to make proteins with different functions. The Conserved Domain Database (CDD) brings together several collections of multiple sequence alignments representing conserved domains, in addition to NCBI-curated domains that use 3D-structure information explicitly to define domain boundaries and provide insights into sequence/structure/function relationships. * Small molecules and their biological activity: The PubChem project provides information on the biological activities of small molecules and is a component of NIH''''s Molecular Libraries Roadmap Initiative. PubChem includes three databases: PCSubstance, PCBioAssay, and PCCompound. The PubChem data are linked to other data types (illustrated example) in the Entrez system, making it possible, for example, to retrieve information about a compound and then Link to its biological activity data, retrieve 3D protein structures bound to the compound and interactively view their active sites, and find biosystems that include the compound as a component. * Biological Systems: A biosystem, or biological system, is a group of molecules that interact directly or indirectly, where the grouping is relevant to the characterization of living matter. The NCBI BioSystems Database provides centralized access to biological pathways from several source databases and connects the biosystem records with associated literature, molecular, and chemical data throughout the Entrez system. BioSystem records list and categorize components (illustrated example), such as the genes, proteins, and small molecules involved in a biological system. The companion FLink icon FLink tool, in turn, allows you to input a list of proteins, genes, or small molecules and retrieve a ranked list of biosystems.

  2. d

    NCBI Structure

    • datadiscoverystudio.org
    resource url
    Updated Mar 28, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). NCBI Structure [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/a999691cab304221894b23b942846b4f/html
    Explore at:
    resource urlAvailable download formats
    Dataset updated
    Mar 28, 2017
    Description

    Link Function: information

  3. r

    Molecular Modeling DataBase

    • rrid.site
    • scicrunch.org
    • +3more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Molecular Modeling DataBase [Dataset]. http://identifiers.org/RRID:SCR_010623/resolver?q=*&i=rrid
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    The Molecular Modeling DataBase (MMDB), also known as Entrez Structure, is a database of experimentally determined structures obtained from the RCSB Protein Data Bank (PDB). MMDB is developed by the Structure Group of the NCBI Computational Biology Branch. The data processing procedure at NCBI results in the addition of a number of useful features that facilitate computation on the data and link them to many other data types in the Entrez system. The structure database is considerably smaller than Entrez''s Protein or Nucleotide databases, but a large fraction of all known protein sequences have homologs in this set, and one may often learn more about a protein by examining 3-D structures of its homologs. These are accessible as Related Structures in the Links menu of Entrez Protein sequence records (illustrated example). It is then possible to align the query protein to the structure-based sequence, as shown in the illustration on this page. Additional resources can be used along with MMDB to interactively view the structures, find similar 3D structures, learn about the types of interactions and bound chemicals that have been found to exist among the similar 3D structures, and more.

  4. e

    NCBIFAM

    • ebi.ac.uk
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). NCBIFAM [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Dec 16, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NCBIfam is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology. NCBIfam is maintained at the National Center for Biotechnology Information (Bethesda, MD). NCBIfam includes models from TIGRFAMs, another database of protein families developed at The Institute for Genomic Research, then at the J. Craig Venter Institute (Rockville, MD, US).

  5. e

    CDD

    • ebi.ac.uk
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). CDD [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Apr 18, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CDD is a protein annotation resource that consists of a collection of annotated multiple sequence alignment models for ancient domains and full-length proteins. These are available as position-specific score matrices (PSSMs) for fast identification of conserved domains in protein sequences via RPS-BLAST. CDD content includes NCBI-curated domain models, which use 3D-structure information to explicitly define domain boundaries and provide insights into sequence/structure/function relationships, as well as domain models imported from a number of external source databases.

  6. n

    NCBI Protein Database

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Jun 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). NCBI Protein Database [Dataset]. http://identifiers.org/RRID:SCR_003257
    Explore at:
    Dataset updated
    Jun 16, 2025
    Description

    Databases of protein sequences and 3D structures of proteins. Collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB.

  7. e

    CATH-Gene3D

    • ebi.ac.uk
    Updated Oct 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). CATH-Gene3D [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Oct 21, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The CATH-Gene3D database describes protein families and domain architectures in complete genomes. Protein families are formed using a Markov clustering algorithm, followed by multi-linkage clustering according to sequence identity. Mapping of predicted structure and sequence domains is undertaken using hidden Markov models libraries representing CATH and Pfam domains. CATH-Gene3D is based at University College, London, UK.

  8. e

    SFLD

    • ebi.ac.uk
    Updated Sep 7, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). SFLD [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Sep 7, 2018
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SFLD (Structure-Function Linkage Database) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities.

  9. e

    PROSITE profiles

    • ebi.ac.uk
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PROSITE profiles [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.

  10. d

    NCBI Protein

    • datadiscoverystudio.org
    resource url
    Updated Mar 28, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). NCBI Protein [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/55b0aa4ecdd74e008788dd6086f46fa2/html
    Explore at:
    resource urlAvailable download formats
    Dataset updated
    Mar 28, 2017
    Area covered
    Description

    Link Function: information

  11. f

    RefSeq virus protein structure prediction database

    • uvaauas.figshare.com
    zip
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    W.E.W. Schravesande; Adriaan Verhage; M.V. Cligge; Raoul Frijters; H.A. van den Burg (2025). RefSeq virus protein structure prediction database [Dataset]. http://doi.org/10.21942/uva.28417079.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    University of Amsterdam / Amsterdam University of Applied Sciences
    Authors
    W.E.W. Schravesande; Adriaan Verhage; M.V. Cligge; Raoul Frijters; H.A. van den Burg
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Custom Virus database A custom foldseek target database was created, including all protein sequences derived from plant-infecting viruses currently found in the NCBI RefSeq database. In total, 8,191 protein sequences were extracted and used as template for protein structure predictions. Colabfold v1.5.2 (using localcolabfold), which is based upon AlphaFold v2.3.1(40), was used for protein model prediction. Setting: --random-seed 101 --num-seeds 3 --use-dropout --num-models 1 --num-recycle 8 --recycle-early-stop-tolerance 0.5No templates were used during the protein model prediction. The uniref30_2302 and colabfold_envdb_202108 databases were used to generate the multiple sequence alignments (https://colabfold.mmseqs.com/)The predicted structures were filtered based on the pLDDT value, resulting in a set of 7545 protein structures with a pLDDT ≥ 50.## Filesmodelling_stats.txt < Tab seperated file containing the modelling statistics for each structure predictionpdb_files/all < folder containing all pdb files resulting from the structure predictionpdb_files/pLDDT50 < folder containing all pdb files resulting from the structure prediction having a pLDDT score of 50 or higherVIRAL_PROTEIN_PLANT_REFSEQ.fasta < fasta file contain all protein sequences extracted from plant infecting viral genomes uploaded in the NCBI RefSeq database

  12. e

    SMART

    • ebi.ac.uk
    Updated Feb 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). SMART [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 14, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. SMART is based at EMBL, Heidelberg, Germany.

  13. Search NCBI databases

    • integbio.jp
    Updated May 25, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NCBI (National Center for Biotechnology Information) (2017). Search NCBI databases [Dataset]. https://integbio.jp/dbcatalog/en/record/nbdc00055?jtpl=56
    Explore at:
    Dataset updated
    May 25, 2017
    Dataset provided by
    National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
    Description

    This search engine combs for information from over 30 major databases at NCBI, including PubMed, nucleic acids, amino acid sequences, expression data, PubChem (small molecules with biochemical functions), protein structure, sequenced genomes, and taxonomy. The search engine provides links to the search results, as well as to other related databases.

  14. e

    SUPERFAMILY

    • ebi.ac.uk
    Updated Nov 8, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2010). SUPERFAMILY [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Nov 8, 2010
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SUPERFAMILY is a library of profile hidden Markov models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent the entire SCOP superfamily that the domain belongs to. SUPERFAMILY is based at the University of Bristol, UK.

  15. e

    PIRSF

    • ebi.ac.uk
    Updated Apr 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). PIRSF [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Apr 7, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PIRSF protein classification system is a network with multiple levels of sequence diversity from superfamilies to subfamilies that reflects the evolutionary relationship of full-length proteins and domains. PIRSF is based at the Protein Information Resource, Georgetown University Medical Centre, Washington DC, US.

  16. s

    MIPModDB

    • scicrunch.org
    • neuinfo.org
    Updated Nov 14, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). MIPModDB [Dataset]. http://identifiers.org/RRID:SCR_006058
    Explore at:
    Dataset updated
    Nov 14, 2011
    Description

    This is a database of comparative protein structure models of MIP (Major Intrinsic Protein) family of proteins. The nearly completed sets of MIPs have been identified from the completed genome sequence of organisms available at NCBI. The structural models of MIP proteins were created by defined protocol. The database aims to provide key information of MIPs in particular based on sequence as well as structures. This will further help to decipher the function of uncharacterized MIPs. For each MIP entry, this database contains information about the source, gene structure, sequence features, substitutions in the conserved NPA motifs, structural model, the residues forming the selectivity filter and channel radius profile. For selected set of MIPs, it is possible to derive structure-based sequence alignment and evolutionary relationship. Sequences and structures of selected MIPs can be downloaded from MIPModDB database.

  17. NCBI Gene

    • integbio.jp
    • bioregistry.io
    • +1more
    Updated Jun 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for Biotechnology Information (2019). NCBI Gene [Dataset]. https://integbio.jp/dbcatalog/en/record/nbdc00073?jtpl=56
    Explore at:
    Dataset updated
    Jun 9, 2019
    Dataset provided by
    National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
    License

    http://www.ncbi.nlm.nih.gov/About/disclaimer.htmlhttp://www.ncbi.nlm.nih.gov/About/disclaimer.html

    Description

    The gene database provides information on gene sequence, structure, location, and function for annotated genes from the NCBI database. Users can search by accession ID or keyword, compare and identify sequences using BLAST, or submit references into function (RIFs) based on experimental results. Bulk download and an update mailing list are available.

  18. f

    Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific...

    • figshare.com
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew F. Neuwald; Stephen F. Altschul (2023). Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific Gap Penalties [Dataset]. http://doi.org/10.1371/journal.pcbi.1004936
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Andrew F. Neuwald; Stephen F. Altschul
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We describe a Bayesian Markov chain Monte Carlo (MCMC) sampler for protein multiple sequence alignment (MSA) that, as implemented in the program GISMO and applied to large numbers of diverse sequences, is more accurate than the popular MSA programs MUSCLE, MAFFT, Clustal-Ω and Kalign. Features of GISMO central to its performance are: (i) It employs a “top-down” strategy with a favorable asymptotic time complexity that first identifies regions generally shared by all the input sequences, and then realigns closely related subgroups in tandem. (ii) It infers position-specific gap penalties that favor insertions or deletions (indels) within each sequence at alignment positions in which indels are invoked in other sequences. This favors the placement of insertions between conserved blocks, which can be understood as making up the proteins’ structural core. (iii) It uses a Bayesian statistical measure of alignment quality based on the minimum description length principle and on Dirichlet mixture priors. Consequently, GISMO aligns sequence regions only when statistically justified. This is unlike methods based on the ad hoc, but widely used, sum-of-the-pairs scoring system, which will align random sequences. (iv) It defines a system for exploring alignment space that provides natural avenues for further experimentation through the development of new sampling strategies for more efficiently escaping from suboptimal traps. GISMO’s superior performance is illustrated using 408 protein sets containing, on average, 235 sequences. These sets correspond to NCBI Conserved Domain Database alignments, which have been manually curated in the light of available crystal structures, and thus provide a means to assess alignment accuracy. GISMO fills a different niche than other MSA programs, namely identifying and aligning a conserved domain present within a large, diverse set of full length sequences. The GISMO program is available at http://gismo.igs.umaryland.edu/.

  19. s

    SDAP: Structural Database of Allergenic Proteins

    • scicrunch.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SDAP: Structural Database of Allergenic Proteins [Dataset]. http://identifiers.org/RRID:SCR_012806
    Explore at:
    Description

    A database of allergenic proteins. It contains various computational tools that can assist structural biology studies related to allergens. SDAP is an important tool in the investigation of the cross-reactivity between known allergens, in testing the FAO/WHO allergenicity rules for new proteins, and in predicting the IgE-binding potential of genetically modified food proteins. Using this Internet service through a browser, it is possible to retrieve information related to an allergen from the most common protein sequence and structure databases (SwissProt, PIR, NCBI, PDB), to find sequence and structural neighbors for an allergen, and to search for the presence of an epitope other the whole collection of allergens.

  20. d

    Links to published DMSP-dependent protein structures for the apoenzyme DmdA...

    • search.dataone.org
    • bco-dmo.org
    • +1more
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mary Ann Moran; Ronald P. Kiene; William Whitman (2021). Links to published DMSP-dependent protein structures for the apoenzyme DmdA from Pelagibacter ubique at NCBI's MMDB (En-Gen DMSP Cycling project) [Dataset]. https://search.dataone.org/view/sha256%3A1922d63f543c2b47dceacc3308d3017d5443fd62af993825f93fea95e25b403f
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Biological and Chemical Oceanography Data Management Office (BCO-DMO)
    Authors
    Mary Ann Moran; Ronald P. Kiene; William Whitman
    Description

    Links are provided to published protein structures for the apoenzyme DmdA from Pelagibacter ubique, as well as for DmdA co-crystals soaked with substrate DMSP or the cofactor tetrahydrofolate (THF) accessible via NCBI's Molecular Modeling Database (MMDB).

    Experimental design, methods, and results are further described in:
    D. J. Schuller, C. R. Reisch, M. A. Moran, W. B. Whitman, and W. N. Lanzilotta (2012). Structures of dimethylsulfoniopropionate-dependent demethylase from the marine organism Pelegabacter ubique. Protein Science, vol. 21, p. 289. doi: 10.1002/pro.2015

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2025). NCBI Structure [Dataset]. http://identifiers.org/RRID:SCR_004218

NCBI Structure

RRID:SCR_004218, nlx_23947, NCBI Structure (RRID:SCR_004218), NCBI Structure

Explore at:
294 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jul 12, 2025
Description

Database of three-dimensional structures of macromolecules that allows the user to retrieve structures for specific molecule types as well as structures for genes and proteins of interest. Three main databases comprise Structure-The Molecular Modeling Database; Conserved Domains and Protein Classification; and the BioSystems Database. Structure also links to the PubChem databases to connect biological activity data to the macromolecular structures. Users can locate structural templates for proteins and interactively view structures and sequence data to closely examine sequence-structure relationships. * Macromolecular structures: The three-dimensional structures of biomolecules provide a wealth of information on their biological function and evolutionary relationships. The Molecular Modeling Database (MMDB), as part of the Entrez system, facilitates access to structure data by connecting them with associated literature, protein and nucleic acid sequences, chemicals, biomolecular interactions, and more. It is possible, for example, to find 3D structures for homologs of a protein of interest by following the Related Structure link in an Entrez Protein sequence record. * Conserved domains and protein classification: Conserved domains are functional units within a protein that act as building blocks in molecular evolution and recombine in various arrangements to make proteins with different functions. The Conserved Domain Database (CDD) brings together several collections of multiple sequence alignments representing conserved domains, in addition to NCBI-curated domains that use 3D-structure information explicitly to define domain boundaries and provide insights into sequence/structure/function relationships. * Small molecules and their biological activity: The PubChem project provides information on the biological activities of small molecules and is a component of NIH''''s Molecular Libraries Roadmap Initiative. PubChem includes three databases: PCSubstance, PCBioAssay, and PCCompound. The PubChem data are linked to other data types (illustrated example) in the Entrez system, making it possible, for example, to retrieve information about a compound and then Link to its biological activity data, retrieve 3D protein structures bound to the compound and interactively view their active sites, and find biosystems that include the compound as a component. * Biological Systems: A biosystem, or biological system, is a group of molecules that interact directly or indirectly, where the grouping is relevant to the characterization of living matter. The NCBI BioSystems Database provides centralized access to biological pathways from several source databases and connects the biosystem records with associated literature, molecular, and chemical data throughout the Entrez system. BioSystem records list and categorize components (illustrated example), such as the genes, proteins, and small molecules involved in a biological system. The companion FLink icon FLink tool, in turn, allows you to input a list of proteins, genes, or small molecules and retrieve a ranked list of biosystems.

Search
Clear search
Close search
Google apps
Main menu