8 datasets found
  1. c

    Protein Structural Domain Classification

    • cathdb.info
    • ec.i4cologne.com
    • +3more
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Protein Structural Domain Classification [Dataset]. http://identifiers.org/MIR:00100005
    Explore at:
    Dataset updated
    Sep 30, 2024
    Description

    CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.

  2. r

    Gene3D

    • rrid.site
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Gene3D [Dataset]. http://identifiers.org/RRID:SCR_007672
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A large database of CATH protein domain assignments for ENSEMBL genomes and Uniprot sequences. Gene3D is a resource of form studying proteins and the component domains. Gene3D takes CATH domains from Protein Databank (PDB) structures and assigns them to the millions of protein sequences with no PDB structures using Hidden Markov models. Assigning a CATH superfamily to a region of a protein sequence gives information on the gross 3D structure of that region of the protein. CATH superfamilies have a limited set of functions and so the domain assignment provides some functional insights. Furthermore most proteins have several different domains in a specific order, so looking for proteins with a similar domain organization provides further functional insights. Strict confidence cut-offs are used to ensure the reliability of the domain assignments. Gene3D imports functional information from sources such as UNIPROT, and KEGG. They also import experimental datasets on request to help researchers integrate there data with the corpus of the literature. The website allows users to view descriptions for both single proteins and genes and large protein sets, such as superfamilies or genomes. Subsets can then be selected for detailed investigation or associated functions and interactions can be used to expand explorations to new proteins. The Gene3D web services provide programmatic access to the CATH-Gene3D annotation resources and in-house software tools. These services include Gene3DScan for identifying structural domains within protein sequences, access to pre-calculated annotations for the major sequence databases, and linked functional annotation from UniProt, GO and KEGG., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.

  3. e

    CATH-Gene3D

    • ebi.ac.uk
    Updated May 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). CATH-Gene3D [Dataset]. https://www.ebi.ac.uk/interpro/entry/cathgene3d/
    Explore at:
    Dataset updated
    May 12, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of the type entry from the database CATH-Gene3D - version 4.3.0

  4. f

    Predicting Protein Function with Hierarchical Phylogenetic Profiles: The...

    • figshare.com
    ppt
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan A. G Ranea; Corin Yeats; Alastair Grant; Christine A Orengo (2023). Predicting Protein Function with Hierarchical Phylogenetic Profiles: The Gene3D Phylo-Tuner Method Applied to Eukaryotic Genomes [Dataset]. http://doi.org/10.1371/journal.pcbi.0030237
    Explore at:
    pptAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Juan A. G Ranea; Corin Yeats; Alastair Grant; Christine A Orengo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    “Phylogenetic profiling” is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence–absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence–absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence–absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity—from 30% to 100%—and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will “auto-tune” with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence–absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes.

  5. n

    Data from: CluSTr

    • neuinfo.org
    • rrid.site
    • +2more
    Updated Sep 7, 2012
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2012). CluSTr [Dataset]. http://identifiers.org/RRID:SCR_007600
    Explore at:
    Dataset updated
    Sep 7, 2012
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented May 10, 2017. A pilot effort that has developed a centralized, web-based biospecimen locator that presents biospecimens collected and stored at participating Arizona hospitals and biospecimen banks, which are available for acquisition and use by researchers. Researchers may use this site to browse, search and request biospecimens to use in qualified studies. The development of the ABL was guided by the Arizona Biospecimen Consortium (ABC), a consortium of hospitals and medical centers in the Phoenix area, and is now being piloted by this Consortium under the direction of ABRC. You may browse by type (cells, fluid, molecular, tissue) or disease. Common data elements decided by the ABC Standards Committee, based on data elements on the National Cancer Institute''s (NCI''s) Common Biorepository Model (CBM), are displayed. These describe the minimum set of data elements that the NCI determined were most important for a researcher to see about a biospecimen. The ABL currently does not display information on whether or not clinical data is available to accompany the biospecimens. However, a requester has the ability to solicit clinical data in the request. Once a request is approved, the biospecimen provider will contact the requester to discuss the request (and the requester''s questions) before finalizing the invoice and shipment. The ABL is available to the public to browse. In order to request biospecimens from the ABL, the researcher will be required to submit the requested required information. Upon submission of the information, shipment of the requested biospecimen(s) will be dependent on the scientific and institutional review approval. Account required. Registration is open to everyone., documented June 24, 2013 as per the Miriam database (http://www.ebi.ac.uk/miriam/main/collections/MIR:00000021). The CluSTr database offers an automatic classification of UniProt Knowledgebase and IPI proteins into groups of related proteins. The clustering is based on analysis of all pairwise comparisons between protein sequences. The database provides links to InterPro, which integrates information on protein families, domains and functional sites from PROSITE, PRINTS, Pfam, ProDom, SMART, TIGRFAMs, Gene3D, SUPERFAMILY, PIR Superfamily and PANTHER. To date (2011), CluSTr contains the following information: * 9,450,285 sequences from UniProt Knowledgebase release 15.6 * 308,281 sequences from IPI * 3,636,831,744 similarities, with pairwise alignments generated on-the-fly * 17,616,060 clusters * Clustering for 972 organisms with completely sequenced genomes. For the full list of the genomes see Integr8 * Putative homologues predictions for the above species. For more information see Homologue Selection at Integr8

  6. s

    Mouse Genome Informatics: The Mouse Gene Expression Information Resource...

    • scicrunch.org
    Updated Oct 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Mouse Genome Informatics: The Mouse Gene Expression Information Resource Project [Dataset]. http://identifiers.org/RRID:SCR_006630
    Explore at:
    Dataset updated
    Oct 17, 2019
    Description

    A unified resource that combines text-based and 3D graphical methods to store, display, and analyze mouse developmental gene expression information. The Mouse Gene Expression Information Resource resource will integrate the following components: * Gene Expression Database (GXD) - Integrates different types of expression data and provides links to many other resources to place the data into the larger biological and analytical context. * Anatomy Database - Provides the standard nomenclature for developmental anatomy. * 3D Atlas / Graphical Gene Expression Database - Provides a high-resolution digital representation of mouse anatomy reconstructed from serial sections of single embryos at each representative developmental stage enabling 3D graphical display and analysis of in situ expression data.

  7. Z

    Key-Residue-Annotate's Intermediary Files (resources/)

    • nde-dev.biothings.io
    Updated Apr 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Horta Santos, Eduardo (2025). Key-Residue-Annotate's Intermediary Files (resources/) [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_15171018
    Explore at:
    Dataset updated
    Apr 7, 2025
    Dataset authored and provided by
    Horta Santos, Eduardo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Key-Residue-Annotate: databases and required files

    Upon extraction to a desired directory, use its absolute path in the following arguments inside the config.ini or command line for your KRA run:

    Paths.resource_dir: The absolute path correspond to the resource_dir path.

    Inputs.hmm: extending the former with a "/hmm/Pfam-A.hmm" constitutes the hmm input.

    Refer to the following example config.ini for the H. sapiens reference proteome, UP000005640:

    [Inputs]fasta = /home/eduardohorta/KRA/data/fasta/proteomes/HUMAN_UP000005640_9606_31_03_2025.fastahmm = /home/eduardohorta/KRA/resources/hmm/Pfam-A.hmm

    [Paths]iprscan_path = /home/eduardohorta/my_interproscan/interproscan-5.73-104.0/interproscan.shresource_dir = /home/eduardohorta/KRA/resources/output_dir = /home/eduardohorta/KRA/git_repos/KRA/results/validation/HUMAN/python = /home/eduardohorta/anaconda3/envs/key_residue_annotate/bin/python3log = /home/eduardohorta/KRA/git_repos/KRA/logs/validation/HUMAN/executor_human.log

    [Parameters]output_format_iprscan = TSVcpu_cores_iprscan = 11number_jobs_iprscan = 1seq_batch_size_iprscan = 2000analyses_iprscan = panther,pfam,smart,gene3d,superfamily,prositepatterns,prositeprofiles,pirsfenable_precalc_iprscan = True # Should be False in actual use with novel proteinsdisable_res_iprscan = Falsethreads = 11total_memory = 14nucleotide = falseeco_codes =

  8. Z

    Results for 2,230 UK Biobank binary and continuous traits

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Mar 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiyang Ma; Chen Wang; Iuliana Ionita-Laza (2021). Results for 2,230 UK Biobank binary and continuous traits [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4397932
    Explore at:
    Dataset updated
    Mar 7, 2021
    Dataset provided by
    Columbia University
    Authors
    Shiyang Ma; Chen Wang; Iuliana Ionita-Laza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Results for 2,230 UK Biobank binary and continuous traits.

    We applied the gene-based tests (Gene1D, Gene3D, GeneScan1D and GeneScan3D) to 1,403 UK Biobank binary phecodes and 827 continuous phenotypes (797 continuous traits + 30 biomarkers) using GWAS summary statistics on 28 million imputed variants.

    The results are in 3 different zipped folders: 'GeneScan3D_UKBB_1403binary_results.zip', 'GeneScan3D_UKBB_797continuous_results.zip' and 'GeneScan3D_UKBB_30biomarkers_results.zip'. A list of all 2,230 binary and continuous phenotypes is available in excel file 'UKBB_phenotype_description.xlsx'.

    Reference: Ma, S., Dalgleish, J. L ., Lee, J., Wang, C., Liu, L., Gill, R., Buxbaum, J. D., Chung, W., Aschard, H., Silverman, E. K., Cho, M. H., He, Z. and Ionita-Laza, I. "Improved gene-based testing by integrating long-range chromatin interactions and knockoff statistics", 2021

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). Protein Structural Domain Classification [Dataset]. http://identifiers.org/MIR:00100005

Protein Structural Domain Classification

Explore at:
Dataset updated
Sep 30, 2024
Description

CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.

Search
Clear search
Close search
Google apps
Main menu