9 datasets found
  1. Characterizing Changes in the Rate of Protein-Protein Dissociation upon...

    • plos.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates (2023). Characterizing Changes in the Rate of Protein-Protein Dissociation upon Interface Mutation Using Hotspot Energy and Organization [Dataset]. http://doi.org/10.1371/journal.pcbi.1003216
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Predicting the effects of mutations on the kinetic rate constants of protein-protein interactions is central to both the modeling of complex diseases and the design of effective peptide drug inhibitors. However, while most studies have concentrated on the determination of association rate constants, dissociation rates have received less attention. In this work we take a novel approach by relating the changes in dissociation rates upon mutation to the energetics and architecture of hotspots and hotregions, by performing alanine scans pre- and post-mutation. From these scans, we design a set of descriptors that capture the change in hotspot energy and distribution. The method is benchmarked on 713 kinetically characterized mutations from the SKEMPI database. Our investigations show that, with the use of hotspot descriptors, energies from single-point alanine mutations may be used for the estimation of off-rate mutations to any residue type and also multi-point mutations. A number of machine learning models are built from a combination of molecular and hotspot descriptors, with the best models achieving a Pearson's Correlation Coefficient of 0.79 with experimental off-rates and a Matthew's Correlation Coefficient of 0.6 in the detection of rare stabilizing mutations. Using specialized feature selection models we identify descriptors that are highly specific and, conversely, broadly important to predicting the effects of different classes of mutations, interface regions and complexes. Our results also indicate that the distribution of the critical stability regions across protein-protein interfaces is a function of complex size more strongly than interface area. In addition, mutations at the rim are critical for the stability of small complexes, but consistently harder to characterize. The relationship between hotregion size and the dissociation rate is also investigated and, using hotspot descriptors which model cooperative effects within hotregions, we show how the contribution of hotregions of different sizes, changes under different cooperative effects.

  2. f

    Relationship between experimental ΔΔG, Δlog10(koff), Δlog10(kon) and change...

    • figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates (2023). Relationship between experimental ΔΔG, Δlog10(koff), Δlog10(kon) and change in interface hotspot energy (Int_HS_Energy) for 713 mutations in SKEMPI. [Dataset]. http://doi.org/10.1371/journal.pcbi.1003216.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    (A) Shows PCC between experimental ΔΔG with the respective Δlog10(koff) and Δlog10(kon) for single-point alanine, single-point non-alanine, multi-point and all 713 mutations. (B) Shows PCC between Int_HS_Energy with the respective ΔΔG, Δlog10(koff) and Δlog10(kon) for single-point alanine, single-point non-alanine, multi-point and all 713 mutations. Experimental values for the 713 mutations used here are extracted from SKEMPI [41] and are presented in Dataset S1.

  3. h

    atom3d-msp

    • huggingface.co
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vector Institute (2025). atom3d-msp [Dataset]. https://huggingface.co/datasets/vector-institute/atom3d-msp
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 21, 2025
    Dataset authored and provided by
    Vector Institute
    Description

    Mutation Stability Prediction

      Overview
    

    The Mutation Stability Prediction (MSP) task involves classifying whether mutations in the SKEMPI 2.0 database (J. Jankauskaite, B. Jiménez-García et al., 2019) are stabilizing or not using the provided protein structures. Each mutation in the MSP task includes a PDB file with the residue of interest transformed to the specified mutant amino acid as well as the native PDB file. A total of 4148 mutant structures accompanied by their… See the full description on the dataset page: https://huggingface.co/datasets/vector-institute/atom3d-msp.

  4. D

    Replication Data for: Persistent spectral based ensemble learning...

    • researchdata.ntu.edu.sg
    rar
    Updated Jun 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JunJie Wee; JunJie Wee; Kelin Xia; Kelin Xia (2023). Replication Data for: Persistent spectral based ensemble learning (PerSpect-EL) for protein-protein binding affinity prediction [Dataset]. http://doi.org/10.21979/N9/MEDJN1
    Explore at:
    rar(12220199)Available download formats
    Dataset updated
    Jun 23, 2023
    Dataset provided by
    DR-NTU (Data)
    Authors
    JunJie Wee; JunJie Wee; Kelin Xia; Kelin Xia
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Dataset funded by
    Ministry of Education (MOE)
    Nanyang Technological University
    Description

    Protein–protein interactions (PPIs) play a significant role in nearly all cellular and biological activities. Data-driven machine learning models have demonstrated great power in PPIs. However, the design of efficient molecular featurization poses a great challenge for all learning models for PPIs. Here, we propose persistent spectral (PerSpect) based PPI representation and featurization, and PerSpect-based ensemble learning (PerSpect-EL) models for PPI binding affinity prediction, for the first time. In our model, a sequence of Hodge (or combinatorial) Laplacian (HL) matrices at various different scales are generated from a specially designed filtration process. PerSpect attributes, which are statistical and combinatorial properties of spectrum information from these HL matrices, are used as features for PPI characterization. Each PerSpect attribute is input into a 1D convolutional neural network (CNN), and these CNN networks are stacked together in our PerSpect-based ensemble learning models. We systematically test our model on the two most commonly used datasets, i.e. SKEMPI and AB-Bind. It has been found that our model can achieve state-of-the-art results and outperform all existing models to the best of our knowledge.

  5. f

    Pearson's Correlation Coefficient (PCC) of hotspot descriptors with...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Dec 2, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates (2015). Pearson's Correlation Coefficient (PCC) of hotspot descriptors with experimental Δlog10(koff) for the 713 off-rate mutations in SKEMPI. [Dataset]. http://doi.org/10.1371/journal.pcbi.1003216.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 2, 2015
    Dataset provided by
    PLOS Computational Biology
    Authors
    Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pearson's Correlation Coefficient (PCC) of hotspot descriptors with experimental Δlog10(koff) for the 713 off-rate mutations in SKEMPI.

  6. PPIRef

    • zenodo.org
    zip
    Updated Aug 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anton Bushuiev; Anton Bushuiev (2024). PPIRef [Dataset]. http://doi.org/10.5281/zenodo.13208732
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anton Bushuiev; Anton Bushuiev
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PPIRef is a dataset of 3D structures of protein-protein interfaces. See the GitHub repository for more details.

    File description

    1. ppi_6A.zip stores the PPIRef dataset: .pdb files with all 6A-distance interfaces from PDB as of Jan 2024.
    2. ppi_6A_stats.zip stores the statistics about the ppi_6A dataset. This includes the indexes for fast search with MMseqs2 and iDist, as well as a .csv file with main statistics for all interfaces.
    3. ppi_10A.zip: .pdb files with all 10A-distance interfaces from PDB downloaded in June 2024.
    4. ppi_10A_stats.zip stores the statistics about the ppi_10A dataset. This includes iDist embeddings, as well as a .csv file with main statistics for all interfaces.
    5. pdb_redo_ppi_10A.zip: .pdb files with all 10A-distance interfaces from PDB-REDO downloaded in June 2024.
    6. pdb_redo_ppi_10A_stats.zip stores the statistics about the pdb_redo_ppi_10A dataset. This includes iDist embeddings, as well as a .csv file with main statistics for all interfaces.
    7. skempi2.zip stores PPI interfaces from the SKEMPI v2.0 dataset


    How to use

    It is recommended to download and extract the files in the PPIRef/ppiref/data/ppiref directory. This can be done automatically via the ppiref package. For example, to download and extract the ppi_6A.zip archive run:
    from ppiref.utils.misc import download_from_zenodo
    download_from_zenodo('ppi_6A.zip')

  7. The final data sets used in this work.

    • plos.figshare.com
    bin
    Updated Sep 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youzhi Zhang; Sijie Yao; Peng Chen (2023). The final data sets used in this work. [Dataset]. http://doi.org/10.1371/journal.pone.0290899.t003
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 18, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Youzhi Zhang; Sijie Yao; Peng Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Protein hotspot residues are key sites that mediate protein-protein interactions. Accurate identification of these residues is essential for understanding the mechanism from protein to function and for designing drug targets. Current research has mostly focused on using machine learning methods to predict hot spots from known interface residues, which artificially extract the corresponding features of amino acid residues from sequence, structure, evolution, energy, and other information to train and test machine learning models. The process is cumbersome, time-consuming and laborious to some extent. This paper proposes a novel idea that develops a pre-trained protein sequence embedding model combined with a one-dimensional convolutional neural network, called Embed-1dCNN, to predict protein hotspot residues. In order to obtain large data samples, this work integrates and extracts data from the datasets of ASEdb, BID, SKEMPI and dbMPIKT to generate a new dataset, and adopts the SMOTE algorithm to expand positive samples to form the training set. The experimental results show that the method achieves an F1 score of 0.82 on the test set. Compared with other hot spot prediction methods, our model achieved better prediction performance.

  8. Data from: S5 File -

    • plos.figshare.com
    zip
    Updated Sep 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youzhi Zhang; Sijie Yao; Peng Chen (2023). S5 File - [Dataset]. http://doi.org/10.1371/journal.pone.0290899.s005
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 18, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Youzhi Zhang; Sijie Yao; Peng Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Protein hotspot residues are key sites that mediate protein-protein interactions. Accurate identification of these residues is essential for understanding the mechanism from protein to function and for designing drug targets. Current research has mostly focused on using machine learning methods to predict hot spots from known interface residues, which artificially extract the corresponding features of amino acid residues from sequence, structure, evolution, energy, and other information to train and test machine learning models. The process is cumbersome, time-consuming and laborious to some extent. This paper proposes a novel idea that develops a pre-trained protein sequence embedding model combined with a one-dimensional convolutional neural network, called Embed-1dCNN, to predict protein hotspot residues. In order to obtain large data samples, this work integrates and extracts data from the datasets of ASEdb, BID, SKEMPI and dbMPIKT to generate a new dataset, and adopts the SMOTE algorithm to expand positive samples to form the training set. The experimental results show that the method achieves an F1 score of 0.82 on the test set. Compared with other hot spot prediction methods, our model achieved better prediction performance.

  9. Composition of MIX (new) set.

    • plos.figshare.com
    bin
    Updated Sep 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youzhi Zhang; Sijie Yao; Peng Chen (2023). Composition of MIX (new) set. [Dataset]. http://doi.org/10.1371/journal.pone.0290899.t002
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 18, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Youzhi Zhang; Sijie Yao; Peng Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Protein hotspot residues are key sites that mediate protein-protein interactions. Accurate identification of these residues is essential for understanding the mechanism from protein to function and for designing drug targets. Current research has mostly focused on using machine learning methods to predict hot spots from known interface residues, which artificially extract the corresponding features of amino acid residues from sequence, structure, evolution, energy, and other information to train and test machine learning models. The process is cumbersome, time-consuming and laborious to some extent. This paper proposes a novel idea that develops a pre-trained protein sequence embedding model combined with a one-dimensional convolutional neural network, called Embed-1dCNN, to predict protein hotspot residues. In order to obtain large data samples, this work integrates and extracts data from the datasets of ASEdb, BID, SKEMPI and dbMPIKT to generate a new dataset, and adopts the SMOTE algorithm to expand positive samples to form the training set. The experimental results show that the method achieves an F1 score of 0.82 on the test set. Compared with other hot spot prediction methods, our model achieved better prediction performance.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates (2023). Characterizing Changes in the Rate of Protein-Protein Dissociation upon Interface Mutation Using Hotspot Energy and Organization [Dataset]. http://doi.org/10.1371/journal.pcbi.1003216
Organization logo

Characterizing Changes in the Rate of Protein-Protein Dissociation upon Interface Mutation Using Hotspot Energy and Organization

Explore at:
20 scholarly articles cite this dataset (View in Google Scholar)
txtAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Predicting the effects of mutations on the kinetic rate constants of protein-protein interactions is central to both the modeling of complex diseases and the design of effective peptide drug inhibitors. However, while most studies have concentrated on the determination of association rate constants, dissociation rates have received less attention. In this work we take a novel approach by relating the changes in dissociation rates upon mutation to the energetics and architecture of hotspots and hotregions, by performing alanine scans pre- and post-mutation. From these scans, we design a set of descriptors that capture the change in hotspot energy and distribution. The method is benchmarked on 713 kinetically characterized mutations from the SKEMPI database. Our investigations show that, with the use of hotspot descriptors, energies from single-point alanine mutations may be used for the estimation of off-rate mutations to any residue type and also multi-point mutations. A number of machine learning models are built from a combination of molecular and hotspot descriptors, with the best models achieving a Pearson's Correlation Coefficient of 0.79 with experimental off-rates and a Matthew's Correlation Coefficient of 0.6 in the detection of rare stabilizing mutations. Using specialized feature selection models we identify descriptors that are highly specific and, conversely, broadly important to predicting the effects of different classes of mutations, interface regions and complexes. Our results also indicate that the distribution of the critical stability regions across protein-protein interfaces is a function of complex size more strongly than interface area. In addition, mutations at the rim are critical for the stability of small complexes, but consistently harder to characterize. The relationship between hotregion size and the dissociation rate is also investigated and, using hotspot descriptors which model cooperative effects within hotregions, we show how the contribution of hotregions of different sizes, changes under different cooperative effects.

Search
Clear search
Close search
Google apps
Main menu