100+ datasets found
  1. e

    Data from: PROSITE

    • prosite.expasy.org
    • identifiers.org
    • +7more
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PROSITE [Dataset]. https://prosite.expasy.org/
    Explore at:
    Dataset updated
    Oct 15, 2025
    Description

    PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].

  2. s

    CharProtDB: Characterized Protein Database

    • scicrunch.org
    • rrid.site
    • +2more
    Updated Dec 4, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). CharProtDB: Characterized Protein Database [Dataset]. http://identifiers.org/RRID:SCR_005872
    Explore at:
    Dataset updated
    Dec 4, 2011
    Description

    The Characterized Protein Database, CharProtDB, is designed and being developed as a resource of expertly curated, experimentally characterized proteins described in published literature. For each protein record in CharProtDB, storage of several data types is supported. It includes functional annotation (several instances of protein names and gene symbols) taxonomic classification, literature links, specific Gene Ontology (GO) terms and GO evidence codes, EC (Enzyme Commisssion) and TC (Transport Classification) numbers and protein sequence. Additionally, each protein record is associated with cross links to all public accessions in major protein databases as ��synonymous accessions��. Each of the above data types can be linked to as many literature references as possible. Every CharProtDB entry requires minimum data types to be furnished. They are protein name, GO terms and supporting reference(s) associated to GO evidence codes. Annotating using the GO system is of importance for several reasons; the GO system captures defined concepts (the GO terms) with unique ids, which can be attached to specific genes and the three controlled vocabularies of the GO allow for the capture of much more annotation information than is traditionally captured in protein common names, including, for example, not just the function of the protein, but its location as well. GO evidence codes implemented in CharProtDB directly correlate with the GO consortium definitions of experimental codes. CharProtDB tools link characterization data from multiple input streams through synonymous accessions or direct sequence identity. CharProtDB can represent multiple characterizations of the same protein, with proper attribution and links to database sources. Users can use a variety of search terms including protein name, gene symbol, EC number, organism name, accessions or any text to search the database. Following the search, a display page lists all the proteins that match the search term. Click on the protein name to view more detailed annotated information for each protein. Additionally, each protein record can be annotated.

  3. n

    TM Function Database

    • neuinfo.org
    • rrid.site
    • +2more
    Updated Nov 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). TM Function Database [Dataset]. http://identifiers.org/RRID:SCR_007058
    Explore at:
    Dataset updated
    Nov 16, 2024
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE. Documented on October 29,2025. Database of functional residues in alpha-helical and beta-barrel membrane proteins. Each protein is identified with its name and source alongwith the Uniprot code. The protein data bank (PDB) codes are also given for available proteins. Different methods and experimental parameters, for example, affinity, dissociation constant, IC50, activity etc. are given in the database. Further, the database provides the numerical experimental value for each residue (or mutant) in a protein. The experimental data are collected from the literature both by searching the journals as well as with the keyword search at PUBMED. In addition, complete reference is given with journal citation and PMID number. TNFunction is cross-linked with the sequence database, Uniprot, structural database, PDB, and literature database, PubMed. The WWW interface enables users to search data based on various terms with different display options for outputs.

  4. r

    NCBI Protein Database

    • rrid.site
    • neuinfo.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2001). NCBI Protein Database [Dataset]. http://identifiers.org/RRID:SCR_003257
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Databases of protein sequences and 3D structures of proteins. Collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB.

  5. Bioinformatics Protein Dataset - Simulated

    • kaggle.com
    zip
    Updated Dec 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Gallo (2024). Bioinformatics Protein Dataset - Simulated [Dataset]. https://www.kaggle.com/datasets/gallo33henrique/bioinformatics-protein-dataset-simulated
    Explore at:
    zip(12928905 bytes)Available download formats
    Dataset updated
    Dec 27, 2024
    Authors
    Rafael Gallo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Subtitle

    "Synthetic protein dataset with sequences, physical properties, and functional classification for machine learning tasks."

    Description

    Introduction

    This synthetic dataset was created to explore and develop machine learning models in bioinformatics. It contains 20,000 synthetic proteins, each with an amino acid sequence, calculated physicochemical properties, and a functional classification.

    Columns Included

    • ID_Protein: Unique identifier for each protein.
    • Sequence: String of amino acids.
    • Molecular_Weight: Molecular weight calculated from the sequence.
    • Isoelectric_Point: Estimated isoelectric point based on the sequence composition.
    • Hydrophobicity: Average hydrophobicity calculated from the sequence.
    • Total_Charge: Sum of the charges of the amino acids in the sequence.
    • Polar_Proportion: Percentage of polar amino acids in the sequence.
    • Nonpolar_Proportion: Percentage of nonpolar amino acids in the sequence.
    • Sequence_Length: Total number of amino acids in the sequence.
    • Class: The functional class of the protein, one of five categories: Enzyme, Transport, Structural, Receptor, Other.

    Inspiration and Sources

    While this is a simulated dataset, it was inspired by patterns observed in real protein datasets, such as: - UniProt: A comprehensive database of protein sequences and annotations. - Kyte-Doolittle Scale: Calculations of hydrophobicity. - Biopython: A tool for analyzing biological sequences.

    Proposed Uses

    This dataset is ideal for: - Training classification models for proteins. - Exploratory analysis of physicochemical properties of proteins. - Building machine learning pipelines in bioinformatics.

    How This Dataset Was Created

    1. Sequence Generation: Amino acid chains were randomly generated with lengths between 50 and 300 residues.
    2. Property Calculation: Physicochemical properties were calculated using the Biopython library.
    3. Class Assignment: Classes were randomly assigned for classification purposes.

    Limitations

    • The sequences and properties do not represent real proteins but follow patterns observed in natural proteins.
    • The functional classes are simulated and do not correspond to actual biological characteristics.

    Data Split

    The dataset is divided into two subsets: - Training: 16,000 samples (proteinas_train.csv). - Testing: 4,000 samples (proteinas_test.csv).

    Acknowledgment

    This dataset was inspired by real bioinformatics challenges and designed to help researchers and developers explore machine learning applications in protein analysis.

  6. r

    Data from: PROSITE

    • rrid.site
    • dknet.org
    • +1more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). PROSITE [Dataset]. http://identifiers.org/RRID:SCR_003457
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database of protein families and domains that is based on the observation that, while there is a huge number of different proteins, most of them can be grouped, on the basis of similarities in their sequences, into a limited number of families. Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. ScanProsite finds matches of your protein sequences to PROSITE signatures. PROSITE currently contains patterns and profiles specific for more than a thousand protein families or domains. Each of these signatures comes with documentation providing background information on the structure and function of these proteins. The database is available via FTP.

  7. An information table for proteins.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hafeez Ur Rehman; Nouman Azam; JingTao Yao; Alfredo Benso (2023). An information table for proteins. [Dataset]. http://doi.org/10.1371/journal.pone.0171702.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Hafeez Ur Rehman; Nouman Azam; JingTao Yao; Alfredo Benso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An information table for proteins.

  8. n

    PSCDB - Protein Structural Change DataBase

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Nov 12, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). PSCDB - Protein Structural Change DataBase [Dataset]. http://identifiers.org/RRID:SCR_006116
    Explore at:
    Dataset updated
    Nov 12, 2011
    Description

    Database for protein structural change upon ligand binding that are classified into 7 classes in terms of the ligand binding sites and the location where the dominant motion occurs. # Coupled Domain motions are the domain motions induced upon ligand binding. # Independent Domain motions are the observable domain motions regardless of ligand binding. # Coupled Local motions are the local motions induced upon ligand binding. # Independent Local motions are the observable local motions regardless of ligand binding. # Burying ligand motions are imaginable motions required to hold ligand protein-inside. # No significant motions mean just nothing happen. # Other motions are motions unclassified into domain and local motions. Proteins are flexible molecules that undergo structural changes to function. The Protein Data Bank contains multiple entries for identical proteins determined under different conditions, e.g. with and without a ligand molecule, which provides important information for understanding the structural changes related to protein functions. We gathered 839 protein structural pairs of ligand-free and ligand-bound states from monomeric or homo-dimeric proteins, and constructed the Protein Structural Change DataBase (PSCDB). In the database, we focused on whether the motions were coupled with ligand binding. As a result, the protein structural changes were classified into seven classes, i.e. coupled domain motion (59 structural changes), independent domain motion (70), coupled local motion (125), independent local motion (135), burying ligand motion (104), no significant motion (311) and other type motion (35). PSCDB provides lists of each class. On each entry page, users can view detailed information about the motion, accompanied by a morphing animation of the structural changes.

  9. n

    H-Invitational Database: Protein-Protein Interaction Viewer

    • neuinfo.org
    • rrid.site
    • +2more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). H-Invitational Database: Protein-Protein Interaction Viewer [Dataset]. http://identifiers.org/RRID:SCR_008054/resolver?q=&i=rrid
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    The PPI view displays H-InvDB human protein-protein interaction (PPI) information. It is constructed by assigning interaction data to H-InvDB proteins which were originally predicted from transcriptional products generated by the H-Invitational project. The PPI view is now providing 32,198 human PPIs comprised of 9,268 H-InvDB proteins. H-Invitational Database (H-InvDB) is an integrated database of human genes and transcripts. By extensive analyses of all human transcripts, we provide curated annotations of human genes and transcripts that include gene structures, alternative splicing isoforms, non-coding functional RNAs, protein functions, functional domains, sub-cellular localizations, metabolic pathways, protein 3D structure, genetic polymorphisms (SNPs, indels and microsatellite repeats) , relation with diseases, gene expression profiling, molecular evolutionary features, protein-protein interactions (PPIs) and gene families/groups. Sponsors: This research is financially supported by the Ministry of Economy, Trade and Industry of Japan (METI), the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT) and the Japan Biological Informatics Consortium (JBIC). Also, this work is partly supported by the Research Grant for the RIKEN Genome Exploration Research Project from MEXT to Y.H. and the Grant for the RIKEN Frontier Research System, Functional RNA research program.

  10. Z

    Data from: ProtNote: a multimodal method for protein-function annotation

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Char, Samir; Corley, Nathaniel; Alamdari, Sarah; Yang, Kevin K.; Amini, Ava P. (2024). ProtNote: a multimodal method for protein-function annotation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13897919
    Explore at:
    Dataset updated
    Oct 13, 2024
    Dataset provided by
    University of Washington
    Microsoft Research
    Microsoft (United States)
    Authors
    Char, Samir; Corley, Nathaniel; Alamdari, Sarah; Yang, Kevin K.; Amini, Ava P.
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Understanding protein sequence-function relationships is essential for advancing protein biology and engineering. However, fewer than 1% of known protein sequences have human-verified functions, and scientists continually update the set of possible functions. While deep learning methods have demonstrated promise for protein function prediction, current models are limited to predicting only those functions on which they were trained. Here, we introduce ProtNote, a multimodal deep learning model that leverages free-form text to enable both supervised and zero-shot protein function prediction. ProtNote not only maintains near state-of-the-art performance for annotations in its train set, but also generalizes to unseen and novel functions in zero-shot test settings. We envision that ProtNote will enhance protein function discovery by enabling scientists to use free text inputs, without restriction to predefined labels – a necessary capability for navigating the dynamic landscape of protein biology.

  11. CAFA 5 Protein Database Files (PDB)

    • kaggle.com
    zip
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A Merii (2023). CAFA 5 Protein Database Files (PDB) [Dataset]. https://www.kaggle.com/datasets/amerii/cafa-5-pdbs
    Explore at:
    zip(12654687498 bytes)Available download formats
    Dataset updated
    Jul 25, 2023
    Authors
    A Merii
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains 3D protein structure files in PDB format, gathered via the AlphaFoldDB API, for the Critical Assessment of protein Function Annotation (CAFA) 5 challenge protein entries.

    The AlphaFoldDB is a comprehensive database that stores protein structures predicted by AlphaFold2 - an AI model developed by DeepMind that predicts the 3D structure of a protein based on its sequence. AlphaFold's predictions have been recognized for their remarkable accuracy, often comparable to those obtained from experimental methods.

    The CAFA challenge is a community-wide effort to assess computational methods that predict protein function. The protein entries in this dataset are specifically related to the 5th iteration of the challenge - CAFA 5.

    The dataset provides the following information for each protein:

    The naming conventions for the files are: `

  12. V

    A functional update of the

    • data.virginia.gov
    • catalog.data.gov
    html
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). A functional update of the [Dataset]. https://data.virginia.gov/dataset/a-functional-update-of-the
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background Since the genome of Escherichia coli K-12 was initially annotated in 1997, additional functional information based on biological characterization and functions of sequence-similar proteins has become available. On the basis of this new information, an updated version of the annotated chromosome has been generated.

       Results
       The E. coli K-12 chromosome is currently represented by 4,401 genes encoding 116 RNAs and 4,285 proteins. The boundaries of the genes identified in the GenBank Accession U00096 were used. Some protein-coding sequences are compound and encode multimodular proteins. The coding sequences (CDSs) are represented by modules (protein elements of at least 100 amino acids with biological activity and independent evolutionary history). There are 4,616 identified modules in the 4,285 proteins. Of these, 48.9% have been characterized, 29.5% have an imputed function, 2.1% have a phenotype and 19.5% have no function assignment. Only 7% of the modules appear unique to E. coli, and this number is expected to be reduced as more genome data becomes available. The imputed functions were assigned on the basis of manual evaluation of functions predicted by BLAST and DARWIN analyses and by the MAGPIE genome annotation system.
    
    
       Conclusions
       Much knowledge has been gained about functions encoded by the E. coli K-12 genome since the 1997 annotation was published. The data presented here should be useful for analysis of E. coli gene products as well as gene products encoded by other genomes.
    
  13. Large protein databases reveal structural complementarity and functional...

    • figshare.com
    bin
    Updated Jul 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paweł Szczerbiak; Tomasz Kosciolek; Lukasz Szydlowski; Witold Wydmański; P. Douglas Renfrew; Julia Koehler Leman (2025). Large protein databases reveal structural complementarity and functional locality [Dataset]. http://doi.org/10.6084/m9.figshare.27203073.v3
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Paweł Szczerbiak; Tomasz Kosciolek; Lukasz Szydlowski; Witold Wydmański; P. Douglas Renfrew; Julia Koehler Leman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recent breakthroughs in protein structure prediction have led to an unprecedented surge in high-quality 3D models, highlighting the need for efficient computational solutions to manage and analyze this wealth of structural data. In our work, we comprehensively examine the structural clusters obtained from the AlphaFold Protein Structure Database (AFDB), a high-quality subset of ESMAtlas, and the Microbiome Immunity Project (MIP). We create a single cohesive low-dimensional representation of the resulting protein space. Our results show that, while each database occupies distinct regions within the protein structure space, they collectively exhibit significant overlap in their functional profiles. High-level biological functions tend to cluster in particular regions, revealing a shared functional landscape despite the diverse sources of data. By creating a single, cohesive low-dimensional representation of protein structure space integrating data from diverse sources, localizing functional annotations within this space, and providing an open-access web-server for exploration, this work offers insights for future research concerning protein sequence-structure-function relationships, enabling various biological questions to be asked about taxonomic assignments, environmental factors, or functional specificity. This approach is generalizable to other or future datasets, enabling further discovery beyond findings presented here.

  14. d

    Data from: Protein Clusters

    • catalog.data.gov
    • datadiscovery.nlm.nih.gov
    • +1more
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). Protein Clusters [Dataset]. https://catalog.data.gov/dataset/protein-clusters
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset provided by
    National Library of Medicine
    Description

    A collection of Reference Sequence (RefSeq) proteins, from the complete genomes of prokaryotes, plasmids, and organelles, that have been grouped and annotated based on sequence similarity and protein function.

  15. c

    Protein Structural Domain Classification

    • cathdb.info
    • ec.i4cologne.com
    • +3more
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Protein Structural Domain Classification [Dataset]. http://identifiers.org/MIR:00100005
    Explore at:
    Dataset updated
    Sep 30, 2024
    Description

    CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.

  16. V

    Data from: Towards understanding the first genome sequence of a crenarchaeon...

    • odgavaprod.ogopendata.com
    • catalog.data.gov
    html
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs) [Dataset]. https://odgavaprod.ogopendata.com/dataset/towards-understanding-the-first-genome-sequence-of-a-crenarchaeon-by-genome-annotation-using-cl
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background: Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi.

       Results:
       A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix.
    
    
       Conclusions:
       Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and predicted protein functions provide for a significant improvement in genome annotation. A differential genome display approach helps in a systematic investigation of common and distinct features of gene repertoires and in some cases reveals unexpected connections that may be indicative of functional similarities between phylogenetically distant organisms and of lateral gene exchange.
    
  17. Z

    Data for: 'FAS: assessing the similarity between proteins using...

    • data.niaid.nih.gov
    Updated May 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Dosch; Holger Bergmann; Vinh Tran; Ingo Ebersberger (2023). Data for: 'FAS: assessing the similarity between proteins using multi-layered feature architectures' [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7896005
    Explore at:
    Dataset updated
    May 4, 2023
    Dataset provided by
    Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, Germany
    Authors
    Julian Dosch; Holger Bergmann; Vinh Tran; Ingo Ebersberger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw data and result data for the analyses made for the manuscript:

    'FAS: assessing the similarity between proteins using multi-layered feature architectures'

    https://doi.org/10.1093/bioinformatics/btad226

    This dataset contains raw data obtained from QFO Orthobench and Gene Ontology database. Analyses were made to showcase the different uses of the FAS algorithm.

  18. n

    MfunGD - MIPS Mouse Functional Genome Database

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). MfunGD - MIPS Mouse Functional Genome Database [Dataset]. http://identifiers.org/RRID:SCR_007783
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE. Documented on August 16, 2019.Database for annotated mouse proteins and their occurrence in protein networks. It contains cDNA and protein sequences, annotation, gene models and mapping, FunCat, UCSC Genome Viewer, SIMAP, pseudogenes (Genome Viewer Track), InterPro, and splice variants. Protein function annotation is performed using the Functional Catalogue (FunCat) annotation scheme, which is a hierarchically structured classification system. To provide up-to-date similarity search results and InterPro domain analyses, the protein entries are interconnected with the SIMAP database. The gene models are based on the RefSeq mouse cDNAs. The work of our group is focussed on the annotation of biological systems. Therefore, results from the Mammalian Protein-Protein Interaction Database and the Comprehensive Resource of Mammalian Protein Complexes are linked to the MfunGD dataset. Links to external resources are also provided. MfunGD is implemented in GenRE, a J2EE based component oriented multi-tier architecture.

  19. Data from: Sequence-structure-function relationships in the microbial...

    • data.niaid.nih.gov
    Updated Jun 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koehler Leman, Julia; Szczerbiak, Pawel; Renfrew, P. Douglas; Gligorijevic, Vladimir; Berenberg, Daniel; Vatanen, Tommi; Taylor, Bryn C.; Chandler, Chris; Janssen, Stefan; Pataki, Andras; Carriero, Nick; Fisk, Ian; Xavier, Ramnik J.; Knight, Rob; Bonneau, Richard; Kosciolek, Tomasz (2022). Sequence-structure-function relationships in the microbial protein universe [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6477241
    Explore at:
    Dataset updated
    Jun 4, 2022
    Dataset provided by
    Simons Foundationhttps://www.simonsfoundation.org/
    Broad Institute, Cambridge, MA, USA
    Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
    Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
    Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA 92093, USA
    Authors
    Koehler Leman, Julia; Szczerbiak, Pawel; Renfrew, P. Douglas; Gligorijevic, Vladimir; Berenberg, Daniel; Vatanen, Tommi; Taylor, Bryn C.; Chandler, Chris; Janssen, Stefan; Pataki, Andras; Carriero, Nick; Fisk, Ian; Xavier, Ramnik J.; Knight, Rob; Bonneau, Richard; Kosciolek, Tomasz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Microbiome Immunity Project (MIP) dataset contains models predicted with both Rosetta and DMPFold (folder dataset/). It also contains DeepFRI function predictions for all models.

    The metadata folder contains additional data which may be useful for searching the MIP database (FASTA files, BLAST databases and useful scripts for structure/function search) as well as retrieving the sequence/structural annotations.

    The intermediate_data folder contains preprocessed output for reproducing many of the figures in our manuscript in conjunction with scripts and Juypter notebooks found in our git repository: https://github.com/microbiome-immunity-project/protein_universe .

    More information about the dataset and associated metadata is provided in the README.md file).

    We are also providing workflows to search the MIP database against a protein sequence or structure or function of interest (see SEARCHING.md for more details).

  20. D

    Data underlying the paper: Plasmonic Enhancement of Protein Function

    • data.4tu.nl
    zip
    Updated Aug 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Locarno; Qiangrui Dong; Xin Meng; Cristiano Glessi; Nynke Hettema; Nidas Brandsma; Sebbe Blokhuizen; Alejandro Castañeda Garcia; Srividya Ganapathy; Marco Post; Thieme Schmidt; Lars van Roemburg; Bing Xu; Chun-Ting Cho; Liedewij Laan; Miao-Ping Chien; Daan Brinks (2024). Data underlying the paper: Plasmonic Enhancement of Protein Function [Dataset]. http://doi.org/10.4121/909b7170-816f-4d81-a8b0-12b113f29207.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 16, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Marco Locarno; Qiangrui Dong; Xin Meng; Cristiano Glessi; Nynke Hettema; Nidas Brandsma; Sebbe Blokhuizen; Alejandro Castañeda Garcia; Srividya Ganapathy; Marco Post; Thieme Schmidt; Lars van Roemburg; Bing Xu; Chun-Ting Cho; Liedewij Laan; Miao-Ping Chien; Daan Brinks
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Dataset funded by
    European Research Council
    Dutch Research Council
    Description

    These data are part of the paper Plasmonic Enhancement of Protein Function; it contains physics data pertaining to coupling plasmonic nanoparticles to proteins to enhance their fluorescence and modify their function. The data are predominantly image data (.TIF format) obtained in fluorescence imaging experiments.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2025). PROSITE [Dataset]. https://prosite.expasy.org/

Data from: PROSITE

Related Article
Explore at:
Dataset updated
Oct 15, 2025
Description

PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].

Search
Clear search
Close search
Google apps
Main menu