Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The CATH-Gene3D database describes protein families and domain architectures in complete genomes. Protein families are formed using a Markov clustering algorithm, followed by multi-linkage clustering according to sequence identity. Mapping of predicted structure and sequence domains is undertaken using hidden Markov models libraries representing CATH and Pfam domains. CATH-Gene3D is based at University College, London, UK.
CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.
A large database of CATH protein domain assignments for ENSEMBL genomes and Uniprot sequences. Gene3D is a resource of form studying proteins and the component domains. Gene3D takes CATH domains from Protein Databank (PDB) structures and assigns them to the millions of protein sequences with no PDB structures using Hidden Markov models. Assigning a CATH superfamily to a region of a protein sequence gives information on the gross 3D structure of that region of the protein. CATH superfamilies have a limited set of functions and so the domain assignment provides some functional insights. Furthermore most proteins have several different domains in a specific order, so looking for proteins with a similar domain organization provides further functional insights. Strict confidence cut-offs are used to ensure the reliability of the domain assignments. Gene3D imports functional information from sources such as UNIPROT, and KEGG. They also import experimental datasets on request to help researchers integrate there data with the corpus of the literature. The website allows users to view descriptions for both single proteins and genes and large protein sets, such as superfamilies or genomes. Subsets can then be selected for detailed investigation or associated functions and interactions can be used to expand explorations to new proteins. The Gene3D web services provide programmatic access to the CATH-Gene3D annotation resources and in-house software tools. These services include Gene3DScan for identifying structural domains within protein sequences, access to pre-calculated annotations for the major sequence databases, and linked functional annotation from UniProt, GO and KEGG.
THIS RESOURCE IS NO LONGER IN SERVICE, documented May 10, 2017. A pilot effort that has developed a centralized, web-based biospecimen locator that presents biospecimens collected and stored at participating Arizona hospitals and biospecimen banks, which are available for acquisition and use by researchers. Researchers may use this site to browse, search and request biospecimens to use in qualified studies. The development of the ABL was guided by the Arizona Biospecimen Consortium (ABC), a consortium of hospitals and medical centers in the Phoenix area, and is now being piloted by this Consortium under the direction of ABRC. You may browse by type (cells, fluid, molecular, tissue) or disease. Common data elements decided by the ABC Standards Committee, based on data elements on the National Cancer Institute''s (NCI''s) Common Biorepository Model (CBM), are displayed. These describe the minimum set of data elements that the NCI determined were most important for a researcher to see about a biospecimen. The ABL currently does not display information on whether or not clinical data is available to accompany the biospecimens. However, a requester has the ability to solicit clinical data in the request. Once a request is approved, the biospecimen provider will contact the requester to discuss the request (and the requester''s questions) before finalizing the invoice and shipment. The ABL is available to the public to browse. In order to request biospecimens from the ABL, the researcher will be required to submit the requested required information. Upon submission of the information, shipment of the requested biospecimen(s) will be dependent on the scientific and institutional review approval. Account required. Registration is open to everyone., documented June 24, 2013 as per the Miriam database (http://www.ebi.ac.uk/miriam/main/collections/MIR:00000021). The CluSTr database offers an automatic classification of UniProt Knowledgebase and IPI proteins into groups of related proteins. The clustering is based on analysis of all pairwise comparisons between protein sequences. The database provides links to InterPro, which integrates information on protein families, domains and functional sites from PROSITE, PRINTS, Pfam, ProDom, SMART, TIGRFAMs, Gene3D, SUPERFAMILY, PIR Superfamily and PANTHER. To date (2011), CluSTr contains the following information: * 9,450,285 sequences from UniProt Knowledgebase release 15.6 * 308,281 sequences from IPI * 3,636,831,744 similarities, with pairwise alignments generated on-the-fly * 17,616,060 clusters * Clustering for 972 organisms with completely sequenced genomes. For the full list of the genomes see Integr8 * Putative homologues predictions for the above species. For more information see Homologue Selection at Integr8
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data item of the type homologous_superfamily from the database cathgene3d with accession G3DSA:1.10.10.1120 and name Lysin B, C-terminal linker domain
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains the functional annotation sheets of four manakins. The translated gene coding sequences were aligned to the SwissProt database (release-2020_05). Motifs and domains of each gene was annotated with modules PRINTS, SMART, PANTHER, ProSiteProfiles, ProSitePatterns, CDD, SFLD, Gene3D, SUPERFAMILY, and TMHMM of InterPro. The pathway in which the gene might be involved was aligned the protein sequence of each gene to the KEGG database (release-93).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Protein-coding genes in the O. furnacalis genome were predicted by integrating homology-based gene, transcript-based, and ab initio approaches. Homology-based predictions were performed by aligning protein sequences from nine reference species (Anopheles gambiae, Chilo suppressalis, Danaus plexippus, Drosophila melanogaster, Helicoverpa armigera, O. furnacalis, Plutella xylostella, Bombyx mori, and Spodoptera frugiperda) to the assembled genome using Exonerate (v2.4.7). Transcript-based predictions utilized RNA-seq data, which were assembled into transcripts using Trinity (v2.11.0) 29, and genes models were predicted from these transcripts using PASA (v2.3.1) 30. Additionally, the RNA-seq data were aligned to the genome assembly using Hisat2 (v2.2.1) 31, and the alignment results were further processed using StringTIE (v2.1.4) 32 to assemble the transcript. These transcripts were then analyzed by TransDecoder (v5.5.0) (https://github.com/TransDecoder/TransDecoder/wiki) to identify protein-coding genes. For ab initio predictions, Augustus (v3.3.3) 33 and GeneMark (v4.61) 34 were employed, incorporating transcript-based predictions as hints. Gene predictions from all three approaches were integrated using EvidenceModeler (v1.1.1) 30, resulting in 16,272 predicted genes with an average of 6.44 exons per gene and an average exon length of 220.96 bp. BUSCO (v4) (BUSCO, RRID: SCR_015008) 35 analysis revealed 1,334 (97.6%) were identified as complete or partial BUSCO profiles, reflecting a high level of gene prediction accuracy. Among these, 1,296 were classified as single-copy genes, while 30 were identified as duplicated copies (Table S5). Functional annotation of the predicted protein-coding genes was performed using InterProScan (v5.55) 36 and DIAMOND (v2.0.14.152) 37, with protein sequences aligned to the UniProt-TrEMBL database using a threshold parameter of ‘-e 1e-10’. A total of 14,116 genes (86.8%) were successfully annotated, identifying major functional domains across multiple databases. Key annotations included Pfam (11,224 genes, 69.0%), PANTHER (11,527 genes, 70.8%), Gene3D (9,683 genes, 59.5%), CDD (4,578 genes, 28.1%), SMART (5,167 genes, 31.8%), and SUPERFAMILY (9,099 genes, 55.9%). Additional contributions came from MobiDBLite (6,513 genes, 40.0%), PROSITE Patterns (3,280 genes, 20.2%), and PROSITE Profiles (5,902 genes, 36.3%) (Table S6). These results highlight the comprehensive functional insights provided by integrating multiple annotation resources.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The CATH-Gene3D database describes protein families and domain architectures in complete genomes. Protein families are formed using a Markov clustering algorithm, followed by multi-linkage clustering according to sequence identity. Mapping of predicted structure and sequence domains is undertaken using hidden Markov models libraries representing CATH and Pfam domains. CATH-Gene3D is based at University College, London, UK.