5 datasets found

h
NEGATOME
huggingface.co
Updated Apr 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Synthyra (2025). NEGATOME [Dataset]. https://huggingface.co/datasets/Synthyra/NEGATOME
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 15, 2025
Dataset authored and provided by
Synthyra
Description
Non-interacting protein pairs from NEGATOME2.0

Website Paper We map Uniprot ids with the ID mapping tool and record entires in which both sequences are found. PFAM entires are ommitted. Each split corresponds to a section of the dataset, labeled the same as the image below:

Please cite their paper if you use this dataset in your work @article{Blohm2013, title = {Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein… See the full description on the dataset page: https://huggingface.co/datasets/Synthyra/NEGATOME.
Non interacting protein protein dataset [Negatome]
zenodo.org
tsv
Updated Jun 11, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ankit Kumar; Ankit Kumar (2020). Non interacting protein protein dataset [Negatome] [Dataset]. http://doi.org/10.1234/ankcorp.2
Explore at:
tsvAvailable download formats
Unique identifier
https://doi.org/10.1234/ankcorp.2
Dataset updated
Jun 11, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ankit Kumar; Ankit Kumar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is taken from Negatome.

Link: http://mips.helmholtz-muenchen.de/proj/ppi/negatome/
n
Dataset for article: Co-evolutionary landscape at the interface and...
data.niaid.nih.gov
datasetcatalog.nlm.nih.gov
+1more
zip
Updated Jul 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ishita Mukherjee; Saikat Chakrabarti (2021). Dataset for article: Co-evolutionary landscape at the interface and non-interface regions of protein-protein interaction complexes [Dataset]. http://doi.org/10.5061/dryad.zgmsbcc8g
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.zgmsbcc8g
Dataset updated
Jul 26, 2021
Dataset provided by
Indian Institute of Chemical Biology
Authors
Ishita Mukherjee; Saikat Chakrabarti
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Proteins involved in interactions throughout the course of evolution tend to co-evolve and compensatory changes may occur in interacting proteins to maintain or refine such interactions. However, certain residue pair alterations may prove to be detrimental for functional interactions. Hence, determining co-evolutionary pairings that could be structurally or functionally relevant for maintaining the conservation of an inter-protein interaction is important. Inter-protein co-evolution analysis in several complexes utilizing multiple existing methodologies suggested that co-evolutionary pairings can occur in spatially proximal and distant regions in inter-protein interactions. Subsequently, the Co-Var (Correlated Variation) method based on mutual information and Bhattacharyya coefficient was developed, validated, and found to perform relatively better than CAPS and EV-complex. Interestingly, while applying the Co-Var measure and EV-complex program on a set of protein-protein interaction complexes, co-evolutionary pairings were obtained in interface and non-interface regions in protein complexes. The Co-Var approach involves determining high degree co-evolutionary pairings that include multiple co-evolutionary connections between particular co-evolved residue positions in one protein with multiple residue positions in the binding partner. Detailed analyses of high degree co-evolutionary pairings in protein-protein complexes involved in cancer metastasis suggested that most of the residue positions forming such co-evolutionary connections mainly occurred within functional domains of constituent proteins and substitution mutations were also common among these positions. The physiological relevance of these predictions suggests that Co-Var can predict residues that could be crucial for preserving functional protein-protein interactions. Finally, Co-Var web server (http://www.hpppi.iicb.res.in/ishi/covar/index.html) that implements this methodology identifies co-evolutionary pairings in intra and inter-protein interactions.

Methods A number of protein-protein interaction complexes [100] were identified from previous published data (1-3) and complexes involving proteins with sufficient number of homologs and available crystal structure were selected. Around 50 protein complexes were considered as “positive set”. Additionally, non-interacting proteins from the Negatome database (4) were considered as the “negative set”. Close orthologs or similar sequences were determined using DELTA-BLAST (Domain enhanced lookup time accelerated BLAST) (5) and taxonomy filtered non-redundant sequences having E-value <= 1E-04, query coverage >= 70%, sequence identity >= 45% were utilized for preparing multiple sequence alignments (MSA) representative of each sequence family in MAFFT (6). Alignments for homologous sequences of the representative interacting and non-interacting proteins in the “positive set” and the “negative set” were prepared in this manner.

References

Mintseris, J. and Weng, Z. (2003), Atomic contact vectors in protein‐protein recognition. Proteins, 53: 629-639. doi:10.1002/prot.10432 Sowmya, G., Breen, E. J., & Ranganathan, S. (2015). Linking structural features of protein complexes and biological function. Protein science : a publication of the Protein Society, 24(9), 1486-94. Rodriguez-Rivas, J., Marsili, S., Juan, D., & Valencia, A. (2016). Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone. Proceedings of the National Academy of Sciences of the United States of America, 113(52), 15018–1502 doi:10.1073/pnas.1611861114 Smialowski, P., Pagel, P., Wong, P., Brauner, B., Dunger, I., Fobo, G., Frishman, G., Montrone, C., Rattei, T., Frishman, D., et al. (2009). The Negatome database: a reference set of non-interacting protein pairs. Nucleic acids research, 38(Database issue), D540-4. Boratyn, G. M., Schäffer, A. A., Agarwala, R., Altschul, S. F., Lipman, D. J., & Madden, T. L. (2012). Domain enhanced lookup time accelerated BLAST. Biology direct, 7, 12.doi:10.1186/1745-6150-7-12 Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on Fast Fourier transform. Nucleic Acids Res. 2002;30(14):3059–66.
h
PPI_test_set
huggingface.co
Updated Jul 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Logan Hallee (2024). PPI_test_set [Dataset]. https://huggingface.co/datasets/lhallee/PPI_test_set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 10, 2024
Authors
Logan Hallee
Description
NEGATOME and multi-validated BIOGRID even 50-50
h
stringv12_modelorgs_9090
huggingface.co
Updated Jun 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gleghorn Lab (2025). stringv12_modelorgs_9090 [Dataset]. https://huggingface.co/datasets/GleghornLab/stringv12_modelorgs_9090
Explore at:
Dataset updated
Jun 18, 2025
Dataset authored and provided by
Gleghorn Lab
Description
lhallee/Stringv12ModelOrgPairs90 but negatome sequences are removed, and splits are formed. The test set has no sequence overlap with the training set. The valid set is essentially a random split (leftover from test creation).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Synthyra (2025). NEGATOME [Dataset]. https://huggingface.co/datasets/Synthyra/NEGATOME

NEGATOME

Synthyra/NEGATOME

Explore at:

311 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 15, 2025

Dataset authored and provided by

Synthyra

Description

Non-interacting protein pairs from NEGATOME2.0

Website Paper We map Uniprot ids with the ID mapping tool and record entires in which both sequences are found. PFAM entires are ommitted. Each split corresponds to a section of the dataset, labeled the same as the image below:

Please cite their paper if you use this dataset in your work @article{Blohm2013, title = {Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein… See the full description on the dataset page: https://huggingface.co/datasets/Synthyra/NEGATOME.

Clear search

Close search

Google apps

Main menu

NEGATOME

Non interacting protein protein dataset [Negatome]

Dataset for article: Co-evolutionary landscape at the interface and...

PPI_test_set

stringv12_modelorgs_9090

NEGATOME

Synthyra/NEGATOME