Database of manually annotated protein complexes from mammalian organisms. Annotation includes protein complex function, localization, subunit composition, literature references and more. All information is obtained from individual experiments published in scientific articles, but data from high-throughput experiments is excluded. The majority of protein complexes in CORUM originates from man (65%), followed by mouse (14%) and rat (14%).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predicted complexes in Human not present in CORUM and PCDq references.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results of best clustering metrics (with CYC2008 and CORUM references) obtained with DAPG (with complexes of minimum size 3) using different node ordering algorithms and applying sorting (ϕ function) in large PPIs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Number of predicted complexes with perfect matching with complexes in references (CYC2008 and CORUM) (OS = 1.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Adding random interactions in yeast and human PPI networks (with CYC2008 and CORUM references) obtained with DAPG (with complexes of minimum size 3).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Columns 1 and 2: gene names. Columns 3 show the representative CORUM complex the gene of interest belongs to. Columns 4 and 5 denote the increase in prediction performance between elastic net single feature (self transcript) vs. CORUM feature sets. Column 6 shows the number of transcripts used to predict the protein level of the gene of interest in the CORUM feature set. Column 7 shows the top trans-locus contributor to the protein level of the gene of interest, ranked by absolute coefficients in the elastic net model. Proteins whose own transcripts are the top predictors are marked with (self). Column 8 denotes the number of MSigDB C2 CGP (chemical and genetic perturbation) gene sets in which the gene appears. Column 9 denotes the top significantly associated Disease Ontology term with the gene of interest in the literature.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
WD40 repeat (WDR) domains are protein interaction scaffolds that represent one of the largest protein families in human, and a first WDR inhibitoran allosteric antagonist of polycomb repressive complex 2just entered the clinic. A systematic analysis of the CORUM database of protein complexes shows that WDR is the most represented domain in transcriptional regulation and one of the most prevalent in the ubiquitin proteasome system, two pathways of high relevance to drug discovery. Parsing the literature and the vulnerability of cancer cell lines to CRISPR knockout indicates that WDR proteins are targets of interest in oncology and other disease areas. A quantitative analysis of WDR structures reveals that druggable binding pockets can be found on multiple surfaces of these multifaceted protein interaction platforms. These data support the development of chemical probes to further interrogate WDR proteins as an emerging therapeutic target class.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Table S4. Subset of CORUM core complexes that consistently co-fractionate (Feb 2017 CORUM release). Complexes were chosen if they were significantly enriched for pairwise interactions in three published co-fractionation interactomes (Wan et al. 2015, Havugimana et al. 2012, and Kirkwood et al. 2013). Enrichment was calculated with a hypergeometric test, and significance was evaluated at four thresholds: p < 1, p < 1e-2, p < 1e-6, and p < 1e-10. All data aside from columns “p < 1”, “p < 1e-2”, “p < 1e-6”, and “p < 1e-10” are taken from the CORUM core data file. (XLSX 202 kb)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Table S9. Connected components within the reduced set of rewired proteins. Listed are all connected components (CCs) of the direction-specific subnetworks of the reference PPIN (up- and downregulated interactions) defined by the reduced set of rewired proteins. We only included CCs spanning at least 3 proteins. For each CC, we report the number of proteins that were members of the component, the direction of the regulation, and which CORUM complexes were completely included in the component. The size of the respective complexes is given in brackets. (XLSX 16.5 kb)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Functional analysis of large sets of genes and proteins is becoming more and more necessary with the increase of experimental biomolecular data at omic-scale. Enrichment analysis is by far the most popular available methodology to derive functional implications of sets of cooperating genes. The problem with these techniques relies in the redundancy of resulting information, that in most cases generate lots of trivial results with high risk to mask the reality of key biological events. We present and describe a computational method, called GeneTerm Linker, that filters and links enriched output data identifying sets of associated genes and terms, producing metagroups of coherent biological significance. The method uses fuzzy reciprocal linkage between genes and terms to unravel their functional convergence and associations. The algorithm is tested with a small set of well known interacting proteins from yeast and with a large collection of reference sets from three heterogeneous resources: multiprotein complexes (CORUM), cellular pathways (SGD) and human diseases (OMIM). Statistical Precision, Recall and balanced F-score are calculated showing robust results, even when different levels of random noise are included in the test sets. Although we could not find an equivalent method, we present a comparative analysis with a widely used method that combines enrichment and functional annotation clustering. A web application to use the method here proposed is provided at http://gtlinker.cnb.csic.es.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Code to calculate half-lives from MaxQuant output files, filter data, and generate figuresIncludes:R code for generating half-lives and heatmaps using PSM values from a MaxQuant-derived evidence.txt file (related to Figure 2, Figure 3, and SF1)Python scripts to filter data by RSQ and PSM counts (related to Figure 2G, Figure 3)Python scripts to assign proteins to complexes via CORUM, perform KS testing, plot distributions, and map half-lives onto a cryo-EM structure of the respirasome (related to Figure 4 and SF4)and all associated input/output files.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Database of manually annotated protein complexes from mammalian organisms. Annotation includes protein complex function, localization, subunit composition, literature references and more. All information is obtained from individual experiments published in scientific articles, but data from high-throughput experiments is excluded. The majority of protein complexes in CORUM originates from man (65%), followed by mouse (14%) and rat (14%).