6 datasets found
  1. Characterizing Changes in the Rate of Protein-Protein Dissociation upon...

    • plos.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates (2023). Characterizing Changes in the Rate of Protein-Protein Dissociation upon Interface Mutation Using Hotspot Energy and Organization [Dataset]. http://doi.org/10.1371/journal.pcbi.1003216
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Predicting the effects of mutations on the kinetic rate constants of protein-protein interactions is central to both the modeling of complex diseases and the design of effective peptide drug inhibitors. However, while most studies have concentrated on the determination of association rate constants, dissociation rates have received less attention. In this work we take a novel approach by relating the changes in dissociation rates upon mutation to the energetics and architecture of hotspots and hotregions, by performing alanine scans pre- and post-mutation. From these scans, we design a set of descriptors that capture the change in hotspot energy and distribution. The method is benchmarked on 713 kinetically characterized mutations from the SKEMPI database. Our investigations show that, with the use of hotspot descriptors, energies from single-point alanine mutations may be used for the estimation of off-rate mutations to any residue type and also multi-point mutations. A number of machine learning models are built from a combination of molecular and hotspot descriptors, with the best models achieving a Pearson's Correlation Coefficient of 0.79 with experimental off-rates and a Matthew's Correlation Coefficient of 0.6 in the detection of rare stabilizing mutations. Using specialized feature selection models we identify descriptors that are highly specific and, conversely, broadly important to predicting the effects of different classes of mutations, interface regions and complexes. Our results also indicate that the distribution of the critical stability regions across protein-protein interfaces is a function of complex size more strongly than interface area. In addition, mutations at the rim are critical for the stability of small complexes, but consistently harder to characterize. The relationship between hotregion size and the dissociation rate is also investigated and, using hotspot descriptors which model cooperative effects within hotregions, we show how the contribution of hotregions of different sizes, changes under different cooperative effects.

  2. MuToN dataset

    • zenodo.org
    zip
    Updated May 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pengpai Li; Pengpai Li (2024). MuToN dataset [Dataset]. http://doi.org/10.5281/zenodo.10445253
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pengpai Li; Pengpai Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 31, 2023
    Description

    This dataset is associated with the MuToN project hosted on GitHub (https://github.com/zpliulab/MuToN). It includes mutation records, PPI complexes, and pre-computed LLM embeddings for the SKEMPI dataset.

    Contents:

    ├── data/
    │ ├── skempi_v2.csv # contains the mutation records in SKEMPI dataset.
    │ ├── SKEMPI/
    │ │ ├── raws/ # contains the PPI complexes in SKEMPI dataset.
    │ │ │ ├── 1CSE.pdb
    │ │ ├── raw_pdb/ # contains the single wild and mutant structure.
    │ │ │ ├── 1CSE_E.pdb # extrated from 1CSE.pdb
    │ │ │ ├── 1CSE_I.pdb
    │ │ │ ├── 1CSE_I.mut.38_E.pdb # computed mutant structure of 1CSE_E.pdb
    │ │ ├── llm_embedding/ # contains the pre-computed LLM embeddings using ESM-2.
    │ │ │ ├── 1CSE_E.npy # LLM embedding of 1CSE_E.pdb, shape=(L, 1280)
    │ │ │ ├── 1CSE_I.npy
    │ │ │ ├── 1CSE_I.mut.38_E.npy
  3. h

    atom3d-msp

    • huggingface.co
    Updated Aug 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vector Institute (2024). atom3d-msp [Dataset]. https://huggingface.co/datasets/vector-institute/atom3d-msp
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 26, 2024
    Dataset authored and provided by
    Vector Institute
    Description

    Mutation Stability Prediction

      Overview
    

    The Mutation Stability Prediction (MSP) task involves classifying whether mutations in the SKEMPI 2.0 database (J. Jankauskaite, B. Jiménez-García et al., 2019) are stabilizing or not using the provided protein structures. Each mutation in the MSP task includes a PDB file with the residue of interest transformed to the specified mutant amino acid as well as the native PDB file. A total of 4148 mutant structures accompanied by their… See the full description on the dataset page: https://huggingface.co/datasets/vector-institute/atom3d-msp.

  4. Relationship between experimental ΔΔG, Δlog10(koff), Δlog10(kon) and change...

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates (2023). Relationship between experimental ΔΔG, Δlog10(koff), Δlog10(kon) and change in interface hotspot energy (Int_HS_Energy) for 713 mutations in SKEMPI. [Dataset]. http://doi.org/10.1371/journal.pcbi.1003216.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    (A) Shows PCC between experimental ΔΔG with the respective Δlog10(koff) and Δlog10(kon) for single-point alanine, single-point non-alanine, multi-point and all 713 mutations. (B) Shows PCC between Int_HS_Energy with the respective ΔΔG, Δlog10(koff) and Δlog10(kon) for single-point alanine, single-point non-alanine, multi-point and all 713 mutations. Experimental values for the 713 mutations used here are extracted from SKEMPI [41] and are presented in Dataset S1.

  5. Pearson's Correlation Coefficient (PCC) of hotspot descriptors with...

    • plos.figshare.com
    • figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates (2023). Pearson's Correlation Coefficient (PCC) of hotspot descriptors with experimental Δlog10(koff) for the 713 off-rate mutations in SKEMPI. [Dataset]. http://doi.org/10.1371/journal.pcbi.1003216.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pearson's Correlation Coefficient (PCC) of hotspot descriptors with experimental Δlog10(koff) for the 713 off-rate mutations in SKEMPI.

  6. f

    Composition of MIX set.

    • plos.figshare.com
    bin
    Updated Sep 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youzhi Zhang; Sijie Yao; Peng Chen (2023). Composition of MIX set. [Dataset]. http://doi.org/10.1371/journal.pone.0290899.t001
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 18, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Youzhi Zhang; Sijie Yao; Peng Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Protein hotspot residues are key sites that mediate protein-protein interactions. Accurate identification of these residues is essential for understanding the mechanism from protein to function and for designing drug targets. Current research has mostly focused on using machine learning methods to predict hot spots from known interface residues, which artificially extract the corresponding features of amino acid residues from sequence, structure, evolution, energy, and other information to train and test machine learning models. The process is cumbersome, time-consuming and laborious to some extent. This paper proposes a novel idea that develops a pre-trained protein sequence embedding model combined with a one-dimensional convolutional neural network, called Embed-1dCNN, to predict protein hotspot residues. In order to obtain large data samples, this work integrates and extracts data from the datasets of ASEdb, BID, SKEMPI and dbMPIKT to generate a new dataset, and adopts the SMOTE algorithm to expand positive samples to form the training set. The experimental results show that the method achieves an F1 score of 0.82 on the test set. Compared with other hot spot prediction methods, our model achieved better prediction performance.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates (2023). Characterizing Changes in the Rate of Protein-Protein Dissociation upon Interface Mutation Using Hotspot Energy and Organization [Dataset]. http://doi.org/10.1371/journal.pcbi.1003216
Organization logo

Characterizing Changes in the Rate of Protein-Protein Dissociation upon Interface Mutation Using Hotspot Energy and Organization

Explore at:
20 scholarly articles cite this dataset (View in Google Scholar)
txtAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Rudi Agius; Mieczyslaw Torchala; Iain H. Moal; Juan Fernández-Recio; Paul A. Bates
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Predicting the effects of mutations on the kinetic rate constants of protein-protein interactions is central to both the modeling of complex diseases and the design of effective peptide drug inhibitors. However, while most studies have concentrated on the determination of association rate constants, dissociation rates have received less attention. In this work we take a novel approach by relating the changes in dissociation rates upon mutation to the energetics and architecture of hotspots and hotregions, by performing alanine scans pre- and post-mutation. From these scans, we design a set of descriptors that capture the change in hotspot energy and distribution. The method is benchmarked on 713 kinetically characterized mutations from the SKEMPI database. Our investigations show that, with the use of hotspot descriptors, energies from single-point alanine mutations may be used for the estimation of off-rate mutations to any residue type and also multi-point mutations. A number of machine learning models are built from a combination of molecular and hotspot descriptors, with the best models achieving a Pearson's Correlation Coefficient of 0.79 with experimental off-rates and a Matthew's Correlation Coefficient of 0.6 in the detection of rare stabilizing mutations. Using specialized feature selection models we identify descriptors that are highly specific and, conversely, broadly important to predicting the effects of different classes of mutations, interface regions and complexes. Our results also indicate that the distribution of the critical stability regions across protein-protein interfaces is a function of complex size more strongly than interface area. In addition, mutations at the rim are critical for the stability of small complexes, but consistently harder to characterize. The relationship between hotregion size and the dissociation rate is also investigated and, using hotspot descriptors which model cooperative effects within hotregions, we show how the contribution of hotregions of different sizes, changes under different cooperative effects.

Search
Clear search
Close search
Google apps
Main menu