5 datasets found
  1. h

    NEGATOME

    • huggingface.co
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Synthyra (2025). NEGATOME [Dataset]. https://huggingface.co/datasets/Synthyra/NEGATOME
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 15, 2025
    Dataset authored and provided by
    Synthyra
    Description

    Non-interacting protein pairs from NEGATOME2.0

    Website Paper We map Uniprot ids with the ID mapping tool and record entires in which both sequences are found. PFAM entires are ommitted. Each split corresponds to a section of the dataset, labeled the same as the image below:

    Please cite their paper if you use this dataset in your work @article{Blohm2013, title = {Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein… See the full description on the dataset page: https://huggingface.co/datasets/Synthyra/NEGATOME.

  2. Non interacting protein protein dataset [Negatome]

    • zenodo.org
    tsv
    Updated Jun 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankit Kumar; Ankit Kumar (2020). Non interacting protein protein dataset [Negatome] [Dataset]. http://doi.org/10.1234/ankcorp.2
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Jun 11, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ankit Kumar; Ankit Kumar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is taken from Negatome.

    Link: http://mips.helmholtz-muenchen.de/proj/ppi/negatome/

  3. n

    Dataset for article: Co-evolutionary landscape at the interface and...

    • data.niaid.nih.gov
    • datasetcatalog.nlm.nih.gov
    • +1more
    zip
    Updated Jul 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ishita Mukherjee; Saikat Chakrabarti (2021). Dataset for article: Co-evolutionary landscape at the interface and non-interface regions of protein-protein interaction complexes [Dataset]. http://doi.org/10.5061/dryad.zgmsbcc8g
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 26, 2021
    Dataset provided by
    Indian Institute of Chemical Biology
    Authors
    Ishita Mukherjee; Saikat Chakrabarti
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Proteins involved in interactions throughout the course of evolution tend to co-evolve and compensatory changes may occur in interacting proteins to maintain or refine such interactions. However, certain residue pair alterations may prove to be detrimental for functional interactions. Hence, determining co-evolutionary pairings that could be structurally or functionally relevant for maintaining the conservation of an inter-protein interaction is important. Inter-protein co-evolution analysis in several complexes utilizing multiple existing methodologies suggested that co-evolutionary pairings can occur in spatially proximal and distant regions in inter-protein interactions. Subsequently, the Co-Var (Correlated Variation) method based on mutual information and Bhattacharyya coefficient was developed, validated, and found to perform relatively better than CAPS and EV-complex. Interestingly, while applying the Co-Var measure and EV-complex program on a set of protein-protein interaction complexes, co-evolutionary pairings were obtained in interface and non-interface regions in protein complexes. The Co-Var approach involves determining high degree co-evolutionary pairings that include multiple co-evolutionary connections between particular co-evolved residue positions in one protein with multiple residue positions in the binding partner. Detailed analyses of high degree co-evolutionary pairings in protein-protein complexes involved in cancer metastasis suggested that most of the residue positions forming such co-evolutionary connections mainly occurred within functional domains of constituent proteins and substitution mutations were also common among these positions. The physiological relevance of these predictions suggests that Co-Var can predict residues that could be crucial for preserving functional protein-protein interactions. Finally, Co-Var web server (http://www.hpppi.iicb.res.in/ishi/covar/index.html) that implements this methodology identifies co-evolutionary pairings in intra and inter-protein interactions.

    Methods A number of protein-protein interaction complexes [100] were identified from previous published data (1-3) and complexes involving proteins with sufficient number of homologs and available crystal structure were selected. Around 50 protein complexes were considered as “positive set”. Additionally, non-interacting proteins from the Negatome database (4) were considered as the “negative set”. Close orthologs or similar sequences were determined using DELTA-BLAST (Domain enhanced lookup time accelerated BLAST) (5) and taxonomy filtered non-redundant sequences having E-value <= 1E-04, query coverage >= 70%, sequence identity >= 45% were utilized for preparing multiple sequence alignments (MSA) representative of each sequence family in MAFFT (6). Alignments for homologous sequences of the representative interacting and non-interacting proteins in the “positive set” and the “negative set” were prepared in this manner.

    References

    Mintseris, J. and Weng, Z. (2003), Atomic contact vectors in protein‐protein recognition. Proteins, 53: 629-639. doi:10.1002/prot.10432
    
    Sowmya, G., Breen, E. J., & Ranganathan, S. (2015). Linking structural features of protein complexes and biological function. Protein science : a publication of the Protein Society, 24(9), 1486-94.
    Rodriguez-Rivas, J., Marsili, S., Juan, D., & Valencia, A. (2016). Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone. Proceedings of the National Academy of Sciences of the United States of America, 113(52), 15018–1502 doi:10.1073/pnas.1611861114
    
    Smialowski, P., Pagel, P., Wong, P., Brauner, B., Dunger, I., Fobo, G., Frishman, G., Montrone, C., Rattei, T., Frishman, D., et al. (2009). The Negatome database: a reference set of non-interacting protein pairs. Nucleic acids research, 38(Database issue), D540-4.
    
    Boratyn, G. M., Schäffer, A. A., Agarwala, R., Altschul, S. F., Lipman, D. J., & Madden, T. L. (2012). Domain enhanced lookup time accelerated BLAST. Biology direct, 7, 12.doi:10.1186/1745-6150-7-12
    Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on Fast Fourier transform. Nucleic Acids Res. 2002;30(14):3059–66.
    
  4. h

    PPI_test_set

    • huggingface.co
    Updated Jul 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Logan Hallee (2024). PPI_test_set [Dataset]. https://huggingface.co/datasets/lhallee/PPI_test_set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 10, 2024
    Authors
    Logan Hallee
    Description

    NEGATOME and multi-validated BIOGRID even 50-50

  5. h

    stringv12_modelorgs_9090

    • huggingface.co
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gleghorn Lab (2025). stringv12_modelorgs_9090 [Dataset]. https://huggingface.co/datasets/GleghornLab/stringv12_modelorgs_9090
    Explore at:
    Dataset updated
    Jun 18, 2025
    Dataset authored and provided by
    Gleghorn Lab
    Description

    lhallee/Stringv12ModelOrgPairs90 but negatome sequences are removed, and splits are formed. The test set has no sequence overlap with the training set. The valid set is essentially a random split (leftover from test creation).

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Synthyra (2025). NEGATOME [Dataset]. https://huggingface.co/datasets/Synthyra/NEGATOME

NEGATOME

Synthyra/NEGATOME

Explore at:
311 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 15, 2025
Dataset authored and provided by
Synthyra
Description

Non-interacting protein pairs from NEGATOME2.0

Website Paper We map Uniprot ids with the ID mapping tool and record entires in which both sequences are found. PFAM entires are ommitted. Each split corresponds to a section of the dataset, labeled the same as the image below:

Please cite their paper if you use this dataset in your work @article{Blohm2013, title = {Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein… See the full description on the dataset page: https://huggingface.co/datasets/Synthyra/NEGATOME.

Search
Clear search
Close search
Google apps
Main menu