Non-interacting protein pairs from NEGATOME2.0
Website Paper We map Uniprot ids with the ID mapping tool and record entires in which both sequences are found. PFAM entires are ommitted. Each split corresponds to a section of the dataset, labeled the same as the image below:
Please cite their paper if you use this dataset in your work @article{Blohm2013, title = {Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein… See the full description on the dataset page: https://huggingface.co/datasets/Synthyra/NEGATOME.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is taken from Negatome.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Proteins involved in interactions throughout the course of evolution tend to co-evolve and compensatory changes may occur in interacting proteins to maintain or refine such interactions. However, certain residue pair alterations may prove to be detrimental for functional interactions. Hence, determining co-evolutionary pairings that could be structurally or functionally relevant for maintaining the conservation of an inter-protein interaction is important. Inter-protein co-evolution analysis in several complexes utilizing multiple existing methodologies suggested that co-evolutionary pairings can occur in spatially proximal and distant regions in inter-protein interactions. Subsequently, the Co-Var (Correlated Variation) method based on mutual information and Bhattacharyya coefficient was developed, validated, and found to perform relatively better than CAPS and EV-complex. Interestingly, while applying the Co-Var measure and EV-complex program on a set of protein-protein interaction complexes, co-evolutionary pairings were obtained in interface and non-interface regions in protein complexes. The Co-Var approach involves determining high degree co-evolutionary pairings that include multiple co-evolutionary connections between particular co-evolved residue positions in one protein with multiple residue positions in the binding partner. Detailed analyses of high degree co-evolutionary pairings in protein-protein complexes involved in cancer metastasis suggested that most of the residue positions forming such co-evolutionary connections mainly occurred within functional domains of constituent proteins and substitution mutations were also common among these positions. The physiological relevance of these predictions suggests that Co-Var can predict residues that could be crucial for preserving functional protein-protein interactions. Finally, Co-Var web server (http://www.hpppi.iicb.res.in/ishi/covar/index.html) that implements this methodology identifies co-evolutionary pairings in intra and inter-protein interactions.
Methods A number of protein-protein interaction complexes [100] were identified from previous published data (1-3) and complexes involving proteins with sufficient number of homologs and available crystal structure were selected. Around 50 protein complexes were considered as “positive set”. Additionally, non-interacting proteins from the Negatome database (4) were considered as the “negative set”. Close orthologs or similar sequences were determined using DELTA-BLAST (Domain enhanced lookup time accelerated BLAST) (5) and taxonomy filtered non-redundant sequences having E-value <= 1E-04, query coverage >= 70%, sequence identity >= 45% were utilized for preparing multiple sequence alignments (MSA) representative of each sequence family in MAFFT (6). Alignments for homologous sequences of the representative interacting and non-interacting proteins in the “positive set” and the “negative set” were prepared in this manner.
References
Mintseris, J. and Weng, Z. (2003), Atomic contact vectors in protein‐protein recognition. Proteins, 53: 629-639. doi:10.1002/prot.10432
Sowmya, G., Breen, E. J., & Ranganathan, S. (2015). Linking structural features of protein complexes and biological function. Protein science : a publication of the Protein Society, 24(9), 1486-94.
Rodriguez-Rivas, J., Marsili, S., Juan, D., & Valencia, A. (2016). Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone. Proceedings of the National Academy of Sciences of the United States of America, 113(52), 15018–1502 doi:10.1073/pnas.1611861114
Smialowski, P., Pagel, P., Wong, P., Brauner, B., Dunger, I., Fobo, G., Frishman, G., Montrone, C., Rattei, T., Frishman, D., et al. (2009). The Negatome database: a reference set of non-interacting protein pairs. Nucleic acids research, 38(Database issue), D540-4.
Boratyn, G. M., Schäffer, A. A., Agarwala, R., Altschul, S. F., Lipman, D. J., & Madden, T. L. (2012). Domain enhanced lookup time accelerated BLAST. Biology direct, 7, 12.doi:10.1186/1745-6150-7-12
Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on Fast Fourier transform. Nucleic Acids Res. 2002;30(14):3059–66.
NEGATOME and multi-validated BIOGRID even 50-50
lhallee/Stringv12ModelOrgPairs90 but negatome sequences are removed, and splits are formed. The test set has no sequence overlap with the training set. The valid set is essentially a random split (leftover from test creation).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Non-interacting protein pairs from NEGATOME2.0
Website Paper We map Uniprot ids with the ID mapping tool and record entires in which both sequences are found. PFAM entires are ommitted. Each split corresponds to a section of the dataset, labeled the same as the image below:
Please cite their paper if you use this dataset in your work @article{Blohm2013, title = {Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein… See the full description on the dataset page: https://huggingface.co/datasets/Synthyra/NEGATOME.