90 datasets found
  1. e

    SFLD

    • ebi.ac.uk
    Updated Sep 7, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). SFLD [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Sep 7, 2018
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SFLD (Structure-Function Linkage Database) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities.

  2. e

    PIRSF

    • ebi.ac.uk
    Updated Apr 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). PIRSF [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Apr 7, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PIRSF protein classification system is a network with multiple levels of sequence diversity from superfamilies to subfamilies that reflects the evolutionary relationship of full-length proteins and domains. PIRSF is based at the Protein Information Resource, Georgetown University Medical Centre, Washington DC, US.

  3. c

    Protein Structural Domain Classification

    • cathdb.info
    • ec.i4cologne.com
    • +3more
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Protein Structural Domain Classification [Dataset]. http://identifiers.org/MIR:00100005
    Explore at:
    Dataset updated
    Sep 30, 2024
    Description

    CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.

  4. e

    CATH-Gene3D

    • ebi.ac.uk
    Updated Oct 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). CATH-Gene3D [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Oct 21, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The CATH-Gene3D database describes protein families and domain architectures in complete genomes. Protein families are formed using a Markov clustering algorithm, followed by multi-linkage clustering according to sequence identity. Mapping of predicted structure and sequence domains is undertaken using hidden Markov models libraries representing CATH and Pfam domains. CATH-Gene3D is based at University College, London, UK.

  5. e

    PROSITE profiles

    • ebi.ac.uk
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PROSITE profiles [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.

  6. r

    CharProtDB: Characterized Protein Database

    • rrid.site
    • dknet.org
    • +1more
    Updated Dec 4, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). CharProtDB: Characterized Protein Database [Dataset]. http://identifiers.org/RRID:SCR_005872/resolver?q=*&i=rrid
    Explore at:
    Dataset updated
    Dec 4, 2011
    Description

    The Characterized Protein Database, CharProtDB, is designed and being developed as a resource of expertly curated, experimentally characterized proteins described in published literature. For each protein record in CharProtDB, storage of several data types is supported. It includes functional annotation (several instances of protein names and gene symbols) taxonomic classification, literature links, specific Gene Ontology (GO) terms and GO evidence codes, EC (Enzyme Commisssion) and TC (Transport Classification) numbers and protein sequence. Additionally, each protein record is associated with cross links to all public accessions in major protein databases as ��synonymous accessions��. Each of the above data types can be linked to as many literature references as possible. Every CharProtDB entry requires minimum data types to be furnished. They are protein name, GO terms and supporting reference(s) associated to GO evidence codes. Annotating using the GO system is of importance for several reasons; the GO system captures defined concepts (the GO terms) with unique ids, which can be attached to specific genes and the three controlled vocabularies of the GO allow for the capture of much more annotation information than is traditionally captured in protein common names, including, for example, not just the function of the protein, but its location as well. GO evidence codes implemented in CharProtDB directly correlate with the GO consortium definitions of experimental codes. CharProtDB tools link characterization data from multiple input streams through synonymous accessions or direct sequence identity. CharProtDB can represent multiple characterizations of the same protein, with proper attribution and links to database sources. Users can use a variety of search terms including protein name, gene symbol, EC number, organism name, accessions or any text to search the database. Following the search, a display page lists all the proteins that match the search term. Click on the protein name to view more detailed annotated information for each protein. Additionally, each protein record can be annotated.

  7. An information table for proteins.

    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hafeez Ur Rehman; Nouman Azam; JingTao Yao; Alfredo Benso (2023). An information table for proteins. [Dataset]. http://doi.org/10.1371/journal.pone.0171702.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Hafeez Ur Rehman; Nouman Azam; JingTao Yao; Alfredo Benso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An information table for proteins.

  8. e

    SUPERFAMILY

    • ebi.ac.uk
    Updated Nov 8, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2010). SUPERFAMILY [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Nov 8, 2010
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SUPERFAMILY is a library of profile hidden Markov models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent the entire SCOP superfamily that the domain belongs to. SUPERFAMILY is based at the University of Bristol, UK.

  9. Comparison of the proposed three way classification method with top...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hafeez Ur Rehman; Nouman Azam; JingTao Yao; Alfredo Benso (2023). Comparison of the proposed three way classification method with top performing methods of the field. [Dataset]. http://doi.org/10.1371/journal.pone.0171702.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Hafeez Ur Rehman; Nouman Azam; JingTao Yao; Alfredo Benso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The target classes comprise of broader gene ontology terms for Saccharomyces cerevisiae species proteins.

  10. e

    PRINTS

    • ebi.ac.uk
    Updated Jun 14, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2012). PRINTS [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Jun 14, 2012
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family or domain. PRINTS is based at the University of Manchester, UK.

  11. P

    SPD -- Secreted Protein database

    • opendata.pku.edu.cn
    Updated Nov 20, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peking University Open Research Data Platform (2015). SPD -- Secreted Protein database [Dataset]. http://doi.org/10.18170/DVN/DY1KWU
    Explore at:
    Dataset updated
    Nov 20, 2015
    Dataset provided by
    Peking University Open Research Data Platform
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Access to Data Due to the limited resource, the SPD database has not been updated since July 06, 2006. Secreted protein means the protein which is secreted outside of cell membrane. Because they are presented within the extracellular space, they are often accessible to various drug delivery mechanisms. Based on a bioinformatic pipeline and manual check, we have developed a collection of secreted proteins from Human, Mouse and Rat proteomes, which includes sequences from SwissProt, Trembl, Ensembl and Refseq. Such a collection is named as SPD, Secreted Protein Database. The 18152 entries were classified into fourteen functional categories, including "apolipoprotein", "cytokine", "protease", "toxin", etc. To make the dataset more comprehensive, nine related datasetswere also collected, such as SPDI, Riken mouse secretome, SwissProt vertebrate secreted proteins, SubLoc etc. SPD web interface includes five modules: • Browse: browse SPD proteins according to chromosomes and functional classification or GO assignments. • Search: search protein of interest via protein ID, keywords, description or sequence. • Download: download SPD sequences. • Data statistics: how many data does SPD contain in each special division? • Help: Frequently asked questions abort SPD, including SPD construction pipeline, introduction of protein entry display page, etc.

  12. e

    HAMAP

    • ebi.ac.uk
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). HAMAP [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    HAMAP stands for High-quality Automated and Manual Annotation of Proteins. HAMAP profiles are manually created by expert curators. They identify proteins that are part of well-conserved protein families or subfamilies. HAMAP is based at the SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.

  13. r

    NCBI Structure

    • rrid.site
    • scicrunch.org
    • +2more
    Updated Jul 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). NCBI Structure [Dataset]. http://identifiers.org/RRID:SCR_004218
    Explore at:
    Dataset updated
    Jul 12, 2025
    Description

    Database of three-dimensional structures of macromolecules that allows the user to retrieve structures for specific molecule types as well as structures for genes and proteins of interest. Three main databases comprise Structure-The Molecular Modeling Database; Conserved Domains and Protein Classification; and the BioSystems Database. Structure also links to the PubChem databases to connect biological activity data to the macromolecular structures. Users can locate structural templates for proteins and interactively view structures and sequence data to closely examine sequence-structure relationships. * Macromolecular structures: The three-dimensional structures of biomolecules provide a wealth of information on their biological function and evolutionary relationships. The Molecular Modeling Database (MMDB), as part of the Entrez system, facilitates access to structure data by connecting them with associated literature, protein and nucleic acid sequences, chemicals, biomolecular interactions, and more. It is possible, for example, to find 3D structures for homologs of a protein of interest by following the Related Structure link in an Entrez Protein sequence record. * Conserved domains and protein classification: Conserved domains are functional units within a protein that act as building blocks in molecular evolution and recombine in various arrangements to make proteins with different functions. The Conserved Domain Database (CDD) brings together several collections of multiple sequence alignments representing conserved domains, in addition to NCBI-curated domains that use 3D-structure information explicitly to define domain boundaries and provide insights into sequence/structure/function relationships. * Small molecules and their biological activity: The PubChem project provides information on the biological activities of small molecules and is a component of NIH''''s Molecular Libraries Roadmap Initiative. PubChem includes three databases: PCSubstance, PCBioAssay, and PCCompound. The PubChem data are linked to other data types (illustrated example) in the Entrez system, making it possible, for example, to retrieve information about a compound and then Link to its biological activity data, retrieve 3D protein structures bound to the compound and interactively view their active sites, and find biosystems that include the compound as a component. * Biological Systems: A biosystem, or biological system, is a group of molecules that interact directly or indirectly, where the grouping is relevant to the characterization of living matter. The NCBI BioSystems Database provides centralized access to biological pathways from several source databases and connects the biosystem records with associated literature, molecular, and chemical data throughout the Entrez system. BioSystem records list and categorize components (illustrated example), such as the genes, proteins, and small molecules involved in a biological system. The companion FLink icon FLink tool, in turn, allows you to input a list of proteins, genes, or small molecules and retrieve a ranked list of biosystems.

  14. n

    DAVID

    • neuinfo.org
    • rrid.site
    • +1more
    Updated Aug 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). DAVID [Dataset]. http://identifiers.org/RRID:SCR_001881
    Explore at:
    Dataset updated
    Aug 17, 2024
    Description

    Bioinformatics resource system including web server and web service for functional annotation and enrichment analyses of gene lists. Consists of comprehensive knowledgebase and set of functional analysis tools. Includes gene centered database integrating heterogeneous gene annotation resources to facilitate high throughput gene functional analysis.

  15. r

    COG

    • rrid.site
    • dknet.org
    • +3more
    Updated Jul 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). COG [Dataset]. http://identifiers.org/RRID:SCR_007139
    Explore at:
    Dataset updated
    Jul 19, 2025
    Description

    A database for phylogenetic classification for proteins encoded in complete genomes. Clusters of Orthologous Groups of proteins (COGs) were delineated by comparing protein sequences encoded in complete genomes, representing major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain. Please be aware that COGs hasn't been updated in many years and will not be.

  16. d

    Drug ADME Associated Protein Database

    • dknet.org
    Updated Aug 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Drug ADME Associated Protein Database [Dataset]. http://identifiers.org/RRID:SCR_013501/resolver
    Explore at:
    Dataset updated
    Aug 18, 2024
    Description

    A database for facilitating the search for drug Absorption, Distribution, Metabolism, Excretion (ADME) associated proteins. It contains information about known drug ADME associated proteins, functions, similarities, substrates / ligands, tissue distributions, and other properties of the targets. Associated references are also included. Drug absorption, distribution, metabolism and excretion (ADME) often involve interaction of a drug with specific proteins. Knowledge about these ADME-associated proteins is important in facilitating the study of the molecular mechanism of disposition and individual response as well as therapeutic action of drugs. It is also useful in the development and testing of pharmacokinetics prediction tools. Several databases describing specific classes of ADME-associated proteins have appeared. A new database, ADME-associated proteins (ADME-AP), is introduced to provide comprehensive information about all classes of ADME-associated proteins described in the literature including physiological function of each protein, pharmacokinetic effect, ADME classification, direction and driving force of disposition, location and tissue distribution, substrates, synonyms, gene name and protein availability in other species. Cross-links to other databases are also provided to facilitate the access of information about the sequence, 3D structure, function, polymorphisms, genetic disorders, nomenclature, ligand binding properties and related literatures of each protein. ADME-AP currently contains entries for 321 proteins and 964 substrates. ADME Class Based on their respective role of pharmacokinetics, ADME-associated proteins can be classified into four categories: A: This Category includes proteins involved in the absorption or re-absorption of drugs into systemic system. D: This category includes proteins responsible for facilitating the distribution of drugs from the systemic system to the target sites or away from the target sites back to the systemic system. Certain plasma proteins and intracellular binding proteins may alter free drug concentration by acting as drug storage depot. These proteins thus play a regulatory role in drug distribution and they are thus included in Category D. Based on their role in drug distribution, proteins in this category can be further divided into three groups D1, D2, and D3. The first group D1 includes transporters capable of transporting chemicals across membranes of various tissue barriers from the systemic system into the target sites. Blood-brain barrier and placenta barrier are examples of tissue barrier. Proteins in the second group D2 are responsible for transporting drugs back into the systemic system. Proteins in the third group D3 mainly function as drug storage depot. These include ligand binding proteins in plasma and intracellular proteins. M: Proteins in category M are drug-metabolizing enzymes. These enzymes can be further divided into two separate groups M1 and M2, according to whether the corresponding enzymatic reaction is phase I or phase II. E: This category E includes proteins that enable the excretion or presystemic elimination of drugs. Some proteins belong to more than one category: e.g. P-glycoprotein both limits intestinal absorption and excludes drugs from the brain back to the blood. It thus belongs to both Category E and D. For those proteins capable of transporting natural substrates without literature report of interaction with a drug, a postfix potential is attached to their respective classification to indicate that their specific role in ADME is yet to be confirmed. Use of ADME-AP for commercial purposes is not allowed.

  17. e

    SMART

    • ebi.ac.uk
    Updated Feb 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). SMART [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 14, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. SMART is based at EMBL, Heidelberg, Germany.

  18. Results of accuracy and generality for GTRS.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hafeez Ur Rehman; Nouman Azam; JingTao Yao; Alfredo Benso (2023). Results of accuracy and generality for GTRS. [Dataset]. http://doi.org/10.1371/journal.pone.0171702.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Hafeez Ur Rehman; Nouman Azam; JingTao Yao; Alfredo Benso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Results of accuracy and generality for GTRS.

  19. d

    Data from: CluSTr

    • dknet.org
    • neuinfo.org
    • +1more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). CluSTr [Dataset]. http://identifiers.org/RRID:SCR_007600
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented May 10, 2017. A pilot effort that has developed a centralized, web-based biospecimen locator that presents biospecimens collected and stored at participating Arizona hospitals and biospecimen banks, which are available for acquisition and use by researchers. Researchers may use this site to browse, search and request biospecimens to use in qualified studies. The development of the ABL was guided by the Arizona Biospecimen Consortium (ABC), a consortium of hospitals and medical centers in the Phoenix area, and is now being piloted by this Consortium under the direction of ABRC. You may browse by type (cells, fluid, molecular, tissue) or disease. Common data elements decided by the ABC Standards Committee, based on data elements on the National Cancer Institute''s (NCI''s) Common Biorepository Model (CBM), are displayed. These describe the minimum set of data elements that the NCI determined were most important for a researcher to see about a biospecimen. The ABL currently does not display information on whether or not clinical data is available to accompany the biospecimens. However, a requester has the ability to solicit clinical data in the request. Once a request is approved, the biospecimen provider will contact the requester to discuss the request (and the requester''s questions) before finalizing the invoice and shipment. The ABL is available to the public to browse. In order to request biospecimens from the ABL, the researcher will be required to submit the requested required information. Upon submission of the information, shipment of the requested biospecimen(s) will be dependent on the scientific and institutional review approval. Account required. Registration is open to everyone., documented June 24, 2013 as per the Miriam database (http://www.ebi.ac.uk/miriam/main/collections/MIR:00000021). The CluSTr database offers an automatic classification of UniProt Knowledgebase and IPI proteins into groups of related proteins. The clustering is based on analysis of all pairwise comparisons between protein sequences. The database provides links to InterPro, which integrates information on protein families, domains and functional sites from PROSITE, PRINTS, Pfam, ProDom, SMART, TIGRFAMs, Gene3D, SUPERFAMILY, PIR Superfamily and PANTHER. To date (2011), CluSTr contains the following information: * 9,450,285 sequences from UniProt Knowledgebase release 15.6 * 308,281 sequences from IPI * 3,636,831,744 similarities, with pairwise alignments generated on-the-fly * 17,616,060 clusters * Clustering for 972 organisms with completely sequenced genomes. For the full list of the genomes see Integr8 * Putative homologues predictions for the above species. For more information see Homologue Selection at Integr8

  20. f

    Data from: Plasma membrane proteomics

    • figshare.com
    zip
    Updated May 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yunmin Wei (2020). Plasma membrane proteomics [Dataset]. http://doi.org/10.6084/m9.figshare.12251273.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 6, 2020
    Dataset provided by
    figshare
    Authors
    Yunmin Wei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The resulting MS/MS data were processed using Maxquant search engine (v.1.5.2.8). Spectra search was performed against the Soybean genome sequence databases downloaded from the phytozome database (https://phytozome.jgi.doe.gov/pz/portal.html, containing 88,647 unigenes) concatenated with reverse decoy database. FDR of protein identification and PSM identification was set to 1%. To be considered diferentially expressed, proteins were required to exhibit a P value ≤ 0.05 calculated by the software. For protein abundance ratios measured using TMT, we considered a 1.3-fold change and a P value < 0.05 as the thresholds for identifying significant changes. Gene Ontology (GO) annotation proteome was derived from the UniProt-GOA database (www. http://www.ebi.ac.uk/GOA/). Firstly, Converting identified protein ID to UniProt ID and then mapping to GO IDs by protein ID. If some identified proteins were not annotated by UniProt-GOA database, the InterProScan soft would be used to annotated protein’s GO functional based on protein sequence alignment method. Then proteins were classified by Gene Ontology annotationbased on three categories: biological process, cellular component and molecular function. Identified proteins domain functional description were annotated by InterProScan (a sequence analysis application) based on protein sequence alignment method, and the InterPro domain database was used. InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Kyoto Encyclopedia of Genes and Genomes (KEGG)database was used to annotate protein pathway. Firstly, using KEGG online service tools KAAS to annotated protein’s KEGG database description. Then mapping the annotation result on the KEGG pathway database using KEGG online service tools KEGG mapper. We used wolfpsort a subcellular localization predication soft to predict subcellular localization. Wolfpsort an updated version of PSORT/PSORT II for the prediction of eukaryotic sequences. Proteins were classified by GO annotation into three categories: biological process, cellular compartment and molecular function. For each category, a two-tailed Fisher’s exact test was employed to test the enrichment of the differentially expressed protein against all identified proteins. The GO with a corrected p-value < 0.05 is considered significant. Encyclopedia of Genes and Genomes (KEGG) database was used to identify enriched pathways by a two-tailed Fisher’s exact test to test the enrichment of the differentially expressed protein against all identified proteins. The pathway with a corrected p-value < 0.05 was considered significant. These pathways were classified into hierarchical categories according to the KEGG website. For each category proteins, InterPro (a resource that provides functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites) database was researched and a two-tailed Fisher’s exact test was employed to test the enrichment of the differentially expressed protein against all identified proteins. Protein domains with a p-value < 0.05 were considered significant. For further hierarchical clustering based on different protein functional classification (such as: GO, Domain, Pathway, Complex). We first collated all the categories obtained after enrichment along with their P values, and then filtered for those categories which were at least enriched in one of the clusters with P value

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2018). SFLD [Dataset]. https://www.ebi.ac.uk/interpro/

SFLD

Explore at:
Dataset updated
Sep 7, 2018
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

SFLD (Structure-Function Linkage Database) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities.

Search
Clear search
Close search
Google apps
Main menu