8 datasets found

c
Protein Structural Domain Classification
cathdb.info
ec.i4cologne.com
+3more
Updated Sep 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Protein Structural Domain Classification [Dataset]. http://identifiers.org/MIR:00100005
Explore at:
Unique identifier
https://identifiers.org/MIR:00100005
Dataset updated
Sep 30, 2024
Description
CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.
r
Gene3D
rrid.site
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Gene3D [Dataset]. http://identifiers.org/RRID:SCR_007672
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007672
Dataset updated
Jan 29, 2022
Description
A large database of CATH protein domain assignments for ENSEMBL genomes and Uniprot sequences. Gene3D is a resource of form studying proteins and the component domains. Gene3D takes CATH domains from Protein Databank (PDB) structures and assigns them to the millions of protein sequences with no PDB structures using Hidden Markov models. Assigning a CATH superfamily to a region of a protein sequence gives information on the gross 3D structure of that region of the protein. CATH superfamilies have a limited set of functions and so the domain assignment provides some functional insights. Furthermore most proteins have several different domains in a specific order, so looking for proteins with a similar domain organization provides further functional insights. Strict confidence cut-offs are used to ensure the reliability of the domain assignments. Gene3D imports functional information from sources such as UNIPROT, and KEGG. They also import experimental datasets on request to help researchers integrate there data with the corpus of the literature. The website allows users to view descriptions for both single proteins and genes and large protein sets, such as superfamilies or genomes. Subsets can then be selected for detailed investigation or associated functions and interactions can be used to expand explorations to new proteins. The Gene3D web services provide programmatic access to the CATH-Gene3D annotation resources and in-house software tools. These services include Gene3DScan for identifying structural domains within protein sequences, access to pre-calculated annotations for the major sequence databases, and linked functional annotation from UniProt, GO and KEGG., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.
e
CATH-Gene3D
ebi.ac.uk
Updated May 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). CATH-Gene3D [Dataset]. https://www.ebi.ac.uk/interpro/entry/cathgene3d/
Explore at:
Dataset updated
May 12, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset of the type entry from the database CATH-Gene3D - version 4.3.0
f
Predicting Protein Function with Hierarchical Phylogenetic Profiles: The...
figshare.com
ppt
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan A. G Ranea; Corin Yeats; Alastair Grant; Christine A Orengo (2023). Predicting Protein Function with Hierarchical Phylogenetic Profiles: The Gene3D Phylo-Tuner Method Applied to Eukaryotic Genomes [Dataset]. http://doi.org/10.1371/journal.pcbi.0030237
Explore at:
pptAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.0030237
Dataset updated
Jun 7, 2023
Dataset provided by
PLOS Computational Biology
Authors
Juan A. G Ranea; Corin Yeats; Alastair Grant; Christine A Orengo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
“Phylogenetic profiling” is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence–absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence–absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence–absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity—from 30% to 100%—and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will “auto-tune” with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence–absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes.
n
Data from: CluSTr
neuinfo.org
rrid.site
+2more
Updated Sep 7, 2012
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2012). CluSTr [Dataset]. http://identifiers.org/RRID:SCR_007600
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007600
Dataset updated
Sep 7, 2012
Description
THIS RESOURCE IS NO LONGER IN SERVICE, documented May 10, 2017. A pilot effort that has developed a centralized, web-based biospecimen locator that presents biospecimens collected and stored at participating Arizona hospitals and biospecimen banks, which are available for acquisition and use by researchers. Researchers may use this site to browse, search and request biospecimens to use in qualified studies. The development of the ABL was guided by the Arizona Biospecimen Consortium (ABC), a consortium of hospitals and medical centers in the Phoenix area, and is now being piloted by this Consortium under the direction of ABRC. You may browse by type (cells, fluid, molecular, tissue) or disease. Common data elements decided by the ABC Standards Committee, based on data elements on the National Cancer Institute''s (NCI''s) Common Biorepository Model (CBM), are displayed. These describe the minimum set of data elements that the NCI determined were most important for a researcher to see about a biospecimen. The ABL currently does not display information on whether or not clinical data is available to accompany the biospecimens. However, a requester has the ability to solicit clinical data in the request. Once a request is approved, the biospecimen provider will contact the requester to discuss the request (and the requester''s questions) before finalizing the invoice and shipment. The ABL is available to the public to browse. In order to request biospecimens from the ABL, the researcher will be required to submit the requested required information. Upon submission of the information, shipment of the requested biospecimen(s) will be dependent on the scientific and institutional review approval. Account required. Registration is open to everyone., documented June 24, 2013 as per the Miriam database (http://www.ebi.ac.uk/miriam/main/collections/MIR:00000021). The CluSTr database offers an automatic classification of UniProt Knowledgebase and IPI proteins into groups of related proteins. The clustering is based on analysis of all pairwise comparisons between protein sequences. The database provides links to InterPro, which integrates information on protein families, domains and functional sites from PROSITE, PRINTS, Pfam, ProDom, SMART, TIGRFAMs, Gene3D, SUPERFAMILY, PIR Superfamily and PANTHER. To date (2011), CluSTr contains the following information: * 9,450,285 sequences from UniProt Knowledgebase release 15.6 * 308,281 sequences from IPI * 3,636,831,744 similarities, with pairwise alignments generated on-the-fly * 17,616,060 clusters * Clustering for 972 organisms with completely sequenced genomes. For the full list of the genomes see Integr8 * Putative homologues predictions for the above species. For more information see Homologue Selection at Integr8
s
Mouse Genome Informatics: The Mouse Gene Expression Information Resource...
scicrunch.org
Updated Oct 17, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Mouse Genome Informatics: The Mouse Gene Expression Information Resource Project [Dataset]. http://identifiers.org/RRID:SCR_006630
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_006630
Dataset updated
Oct 17, 2019
Description
A unified resource that combines text-based and 3D graphical methods to store, display, and analyze mouse developmental gene expression information. The Mouse Gene Expression Information Resource resource will integrate the following components: * Gene Expression Database (GXD) - Integrates different types of expression data and provides links to many other resources to place the data into the larger biological and analytical context. * Anatomy Database - Provides the standard nomenclature for developmental anatomy. * 3D Atlas / Graphical Gene Expression Database - Provides a high-resolution digital representation of mouse anatomy reconstructed from serial sections of single embryos at each representative developmental stage enabling 3D graphical display and analysis of in situ expression data.
Z
Key-Residue-Annotate's Intermediary Files (resources/)
nde-dev.biothings.io
Updated Apr 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Horta Santos, Eduardo (2025). Key-Residue-Annotate's Intermediary Files (resources/) [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_15171018
Explore at:
Dataset updated
Apr 7, 2025
Dataset authored and provided by
Horta Santos, Eduardo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Key-Residue-Annotate: databases and required files

Upon extraction to a desired directory, use its absolute path in the following arguments inside the config.ini or command line for your KRA run:

Paths.resource_dir: The absolute path correspond to the resource_dir path.

Inputs.hmm: extending the former with a "/hmm/Pfam-A.hmm" constitutes the hmm input.

Refer to the following example config.ini for the H. sapiens reference proteome, UP000005640:

[Inputs]fasta = /home/eduardohorta/KRA/data/fasta/proteomes/HUMAN_UP000005640_9606_31_03_2025.fastahmm = /home/eduardohorta/KRA/resources/hmm/Pfam-A.hmm

[Paths]iprscan_path = /home/eduardohorta/my_interproscan/interproscan-5.73-104.0/interproscan.shresource_dir = /home/eduardohorta/KRA/resources/output_dir = /home/eduardohorta/KRA/git_repos/KRA/results/validation/HUMAN/python = /home/eduardohorta/anaconda3/envs/key_residue_annotate/bin/python3log = /home/eduardohorta/KRA/git_repos/KRA/logs/validation/HUMAN/executor_human.log

[Parameters]output_format_iprscan = TSVcpu_cores_iprscan = 11number_jobs_iprscan = 1seq_batch_size_iprscan = 2000analyses_iprscan = panther,pfam,smart,gene3d,superfamily,prositepatterns,prositeprofiles,pirsfenable_precalc_iprscan = True # Should be False in actual use with novel proteinsdisable_res_iprscan = Falsethreads = 11total_memory = 14nucleotide = falseeco_codes =
Z
Results for 2,230 UK Biobank binary and continuous traits
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Mar 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shiyang Ma; Chen Wang; Iuliana Ionita-Laza (2021). Results for 2,230 UK Biobank binary and continuous traits [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4397932
Explore at:
Dataset updated
Mar 7, 2021
Dataset provided by
Columbia University
Authors
Shiyang Ma; Chen Wang; Iuliana Ionita-Laza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Results for 2,230 UK Biobank binary and continuous traits.

We applied the gene-based tests (Gene1D, Gene3D, GeneScan1D and GeneScan3D) to 1,403 UK Biobank binary phecodes and 827 continuous phenotypes (797 continuous traits + 30 biomarkers) using GWAS summary statistics on 28 million imputed variants.

The results are in 3 different zipped folders: 'GeneScan3D_UKBB_1403binary_results.zip', 'GeneScan3D_UKBB_797continuous_results.zip' and 'GeneScan3D_UKBB_30biomarkers_results.zip'. A list of all 2,230 binary and continuous phenotypes is available in excel file 'UKBB_phenotype_description.xlsx'.

Reference: Ma, S., Dalgleish, J. L ., Lee, J., Wang, C., Liu, L., Gill, R., Buxbaum, J. D., Chung, W., Aschard, H., Silverman, E. K., Cho, M. H., He, Z. and Ionita-Laza, I. "Improved gene-based testing by integrating long-range chromatin interactions and knockoff statistics", 2021
Not seeing a result you expected?
Learn how you can add new datasets to our index.