13 datasets found
  1. Z

    FireProtDB + PDB Structural Protein Stability Dataset

    • data.niaid.nih.gov
    Updated Jan 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brocidiacono, Michael (2024). FireProtDB + PDB Structural Protein Stability Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8169288
    Explore at:
    Dataset updated
    Jan 30, 2024
    Dataset provided by
    Brocidiacono, Michael
    Dieckhaus, Henry
    Randolph, Nicholas
    Kuhlman, Brian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset compiled and curated for use in the ThermoMPNN paper: https://doi.org/10.1073/pnas.2314853121:

    Dataset for training models for prediction of thermodynamic stability changes (ddG) of protein point mutations given a wildtype protein structure (PDB) file. Data was assembled by matching sequence-based ddG measurements in FireProtDB to structures from the RCSB Protein Data Bank (PDB). For details, see the Methods section of our manuscript.

    Citing this work: If you choose to use this dataset for your own research, please cite this repository and the ThermoMPNN paper: https://doi.org/10.1073/pnas.2314853121.

    Contents:

    pdbs/ directory contains all PDB files

    csvs/ directory contains all CSVs with mutation data

    csvs/4_fireprotDB_bestpH.csv is the main (full) dataset file with 3,438 mutations across 100 proteins.

    csvs/fireprot_splits.pkl contains the dataset splits (train/val/test) used in our study

    csvs/splits/ contains csvs for each of the splits (train/val/test/homologue-free) indexed from the full dataset csv.

    Important CSV columns:

    pdb_id_corrected: corresponds to the PDB in the pdbs/ directory (after curation and disambiguation)

    ddG: ddG value for mutation (mutant - WT)

    wild_type: wild-type amino acid (1-letter code)

    mutation: mutant amino acid (1-letter code)

    pdb_position: 0-based index of the mutated residue in the PDB file (may be different from position in the original FireProtDB sequence entry)

  2. t

    Protein Data Bank (PDB) dataset for peptide design - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Protein Data Bank (PDB) dataset for peptide design - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/protein-data-bank--pdb--dataset-for-peptide-design
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    A dataset of protein-peptide complexes for training a generative model for full-atom peptide design with Geometric Latent Diffusion.

  3. h

    pdb-rna_secondary_structure

    • huggingface.co
    Updated Apr 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MultiMolecule (2025). pdb-rna_secondary_structure [Dataset]. https://huggingface.co/datasets/multimolecule/pdb-rna_secondary_structure
    Explore at:
    Dataset updated
    Apr 18, 2025
    Dataset authored and provided by
    MultiMolecule
    License

    https://choosealicense.com/licenses/agpl-3.0/https://choosealicense.com/licenses/agpl-3.0/

    Description

    pdb-rna_secondary_structure

    [!IMPORTANT]The pdb-rna_secondary_structure dataset is in beta test. This dataset card may not accurately reflects the data content. The data content and this dataset card may subject to change. Please contact the MultiMolecule team on GitHub issues should you have any feedback.

    [!CAUTION] This dataset is converted from the dataset released by the authors of SPOT-RNA. The MultiMolecule is aware of a potential issue in data quality. We are working on… See the full description on the dataset page: https://huggingface.co/datasets/multimolecule/pdb-rna_secondary_structure.

  4. Network Visualization Map Data

    • springernature.figshare.com
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luigi Di Costanzo; Christopher Markosian (2023). Network Visualization Map Data [Dataset]. http://doi.org/10.6084/m9.figshare.6121436.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Luigi Di Costanzo; Christopher Markosian
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data used to generate co-occurrence network map of publication data keywords using the VOSviewer server (Version 1.6.5). Approximately 227,000 keywords were extracted from citation titles and abstracts from the Web of Science. A network was computed for a total of 2,460 terms selected by the full-counting method and relevance scoring as implemented within VOSviewer. For analysis, we reviewed co-occurrence network maps for thresholds between 5 and 40. The default cutoff of 30 as the number of term co-occurrence is shown.

  5. n

    pdb-data

    • neuinfo.org
    • dknet.org
    • +1more
    Updated Oct 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). pdb-data [Dataset]. http://identifiers.org/RRID:SCR_000386
    Explore at:
    Dataset updated
    Oct 16, 2019
    Description

    Search for carbohydrate containing PDB entries by criteria like species or the compound / classification terms. You can choose predefined, frequent terms from the pull-down-menus or enter your own queries manually.

  6. India Pdb Export | List of Pdb Exporters & Suppliers

    • seair.co.in
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim, India Pdb Export | List of Pdb Exporters & Suppliers [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset provided by
    Seair Exim Solutions
    Authors
    Seair Exim
    Area covered
    India
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  7. Z

    Data from: Redocking the PDB

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Flachsenberg, Florian (2023). Redocking the PDB [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7579501
    Explore at:
    Dataset updated
    Dec 6, 2023
    Dataset provided by
    Ehrt, Christiane
    Rarey, Matthias
    Flachsenberg, Florian
    Gutermuth, Torben
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains supplementary data to the journal article 'Redocking the PDB' by Flachsenberg et al. (https://doi.org/10.1021/acs.jcim.3c01573)[1]. In this paper, we described two datasets: The PDBScan22 dataset with a large set of 322,051 macromolecule–ligand binding sites generally suitable for redocking and the PDBScan22-HQ dataset with 21,355 binding sites passing different structure quality filters. These datasets were further characterized by calculating properties of the ligand (e.g., molecular weight), properties of the binding site (e.g., volume), and structure quality descriptors (e.g., crystal structure resolution). Additionally, we performed redocking experiments with our novel JAMDA structure preparation and docking workflow[1] and with AutoDock Vina[2,3]. Details for all these experiments and the dataset composition can be found in the journal article[1]. Here, we provide all the datasets, i.e., the PDBScan22 and PDBScan22-HQ datasets as well as the docking results and the additionally calculated properties (for the ligand, the binding sites, and structure quality descriptors). Furthermore, we give a detailed description of their content (i.e., the data types and a description of the column values). All datasets consist of CSV files with the actual data and associated metadata JSON files describing their content. The CSV/JSON files are compliant with the CSV on the web standard (https://csvw.org/). General hints

    All docking experiment results consist of two CSV files, one with general information about the docking run (e.g., was it successful?) and one with individual pose results (i.e., score and RMSD to the crystal structure). All files (except for the docking pose tables) can be indexed uniquely by the column tuple '(pdb, name)' containing the PDB code of the complex (e.g., 1gm8) and the name ligand (in the format '_', e.g., 'SOX_B_1559'). All files (except for the docking pose tables) have exactly the same number of rows as the dataset they were calculated on (e.g., PDBScan22 or PDBScan22-HQ). However, some CSV files may have missing values (see also the JSON metadata files) in some or even all columns (except for 'pdb' and 'name'). The docking pose tables also contain the 'pdb' and 'name' columns. However, these alone are not unique but only together with the 'rank' column (i.e., there might be multiple poses for each docking run or none). Example usage Using the pandas library (https://pandas.pydata.org/) in Python, we can calculate the number of protein-ligand complexes in the PDBScan22-HQ dataset with a top-ranked pose RMSD to the crystal structure ≤ 2.0 Å in the JAMDA redocking experiment and a molecular weight between 100 Da and 200 Da:

    import pandas as pd df = pd.read_csv('PDBScan22-HQ.csv') df_poses = pd.read_csv('PDBScan22-HQ_JAMDA_NL_NR_poses.csv') df_properties = pd.read_csv('PDBScan22_ligand_properties.csv') merged = df.merge(df_properties, how='left', on=['pdb', 'name']) merged = merged[(merged['MW'] >= 100) & (merged['MW'] <= 200)].merge(df_poses[df_poses['rank'] == 1], how='left', on=['pdb', 'name']) nof_successful_top_ranked = (merged['rmsd_ai'] <= 2.0).sum() nof_no_top_ranked = merged['rmsd_ai'].isna().sum() Datasets

    PDBScan22.csv: This is the PDBScan22 dataset[1]. This dataset was derived from the PDB4. It contains macromolecule–ligand binding sites (defined by PDB code and ligand identifier) that can be read by the NAOMI library[5,6] and pass basic consistency filters. PDBScan22-HQ.csv: This is the PDBScan22-HQ dataset[1]. It contains macromolecule–ligand binding sites from the PDBScan22 dataset that pass certain structure quality filters described in our publication[1]. PDBScan22-HQ-ADV-Success.csv: This is a subset of the PDBScan22-HQ dataset without 336 binding sites where AutoDock Vina[2,3] fails. PDBScan22-HQ-Macrocycles.csv: This is a subset of the PDBScan22-HQ dataset without 336 binding sites where AutoDock Vina[2,3] fails and only contains molecules with macrocycles with at least ten atoms. Properties for PDBScan22

    PDBScan22_ligand_properties.csv: Conformation-independent properties of all ligand molecules in the PDBScan22 dataset. Properties were calculated using an in-house tool developed with the NAOMI library[5,6]. PDBScan22_StructureProfiler_quality_descriptors.csv: Structure quality descriptors for the binding sites in the PDBScan22 dataset calculated using the StructureProfiler tool[7]. PDBScan22_basic_complex_properties.csv: Simple properties of the binding sites in the PDBScan22 dataset. Properties were calculated using an in-house tool developed with the NAOMI library[5,6]. Properties for PDBScan22-HQ

    PDBScan22-HQ_DoGSite3_pocket_descriptors.csv: Binding site descriptors calculated for the binding sites in the PDBScan22-HQ dataset using the DoGSite3 tool[8]. PDBScan22-HQ_molecule_types.csv: Assignment of ligands in the PDBScan22-HQ dataset (without 336 binding sites where AutoDock Vina fails) to different molecular classes (i.e., drug-like, fragment-like oligosaccharide, oligopeptide, cofactor, macrocyclic). A detailed description of the assignment can be found in our publication[1]. Docking results on PDBScan22

    PDBScan22_JAMDA_NL_NR.csv: Docking results of JAMDA[1] on the PDBScan22 dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22_JAMDA_NL_NR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22_JAMDA_NL_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22 dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. Docking results on PDBScan22-HQ

    PDBScan22-HQ_JAMDA_NL_NR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NL_NR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NL_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NL_WR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NL_WR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_NL_WR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_NW_NR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NW_NR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NW_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NW_WR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NW_WR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_NW_WR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_WL_NR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_WL_NR_poses.csv'. For this experiment, the ligand was considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_WL_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand

  8. s

    PDB 8TYZ

    • data.sbgrid.org
    Updated Jul 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). PDB 8TYZ [Dataset]. http://doi.org/10.2210/pdb8TYZ/pdb
    Explore at:
    Dataset updated
    Jul 9, 2024
    Description

    Protein Data Bank Entry 8TYZ is listed as the structure corresponding to this dataset

  9. Z

    Project files provided as supporting information to the manuscript "A deep...

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Giulini (2020). Project files provided as supporting information to the manuscript "A deep learning approach to the structural analysis of proteins" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3356842
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Marco Giulini
    Raffaello Potestio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    README file to the project files provided as supporting information to the manuscript “A deep learning approach to the structural analysis of proteins”

    Dec. 30, 2018

    Authors: Marco Giulini and Raffaello Potestio

    ==================================

    The dataset contains the following files:

    • datasets.zip: archive containing five .csv files, namely:

        - decoys_cm.csv : all the data for 10728 protein decoys, training set
      
        - evaluation_cm.csv : all data for 146 proteins in the evaluation set
      
        - random_CG.csv : 1200 Coulomb matrices. 100 CG models for each protein with 120 amino acids
      
        - 1e5g_centered_sphere.csv : 100 CG models in which the central atoms in 1e5g are not removed
      
        - 1e5g_random_sphere.csv : 10 CG models for 10 different (random) locations for the sphere that includes atoms that have to be retained. 100 CG models in total
      
    • decoys_labels.lab containing the labels associated to the 10728 decoys present in the training set

    • evaluation_labels.lab containing the labels associated to the 146 pdb files in the evaluation set

    • random_CG_labels.lab containing the labels associated to the 6 proteins with 120 amino acids

    • network_development_training: a python script that performs cross validation and full training of the model

    • saved_networks.zip FOLDER containing 10 networks: the architecture is included in .json files while weight parameters are inside .hs files

    • pdb_files.zip FOLDER containing the PDB files that have been employed in the project, namely:

        - pdb_files_len100 : pdb files with 100 amino acids
      
        - pdb_files_len101-110 : pdb files with a number of amino acids between 101 and 110
      
        - decoys : decoys of length 100 extracted from the above folder: name syntax == PDBNAME_decoy_STARTRES_ENDRES.pdb
      
              EXAMPLE 6gsp.pdb will give rise to 6gsp_decoy_0_100.pdb , 6gsp_decoy_1_101.pdb , 6gsp_decoy_2_102.pdb , 6gsp_decoy_3_103.pdb , 6gsp_decoy_4_104.pdb
      
        - pdb_files_len100 : 6 pdb files with 120 amino acids
      
  10. e

    SPERM WHALE MYOGLOBIN H64A N-BUTYL ISOCYANIDE AT PH 9.0

    • ebi.ac.uk
    Updated Nov 4, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). SPERM WHALE MYOGLOBIN H64A N-BUTYL ISOCYANIDE AT PH 9.0 [Dataset]. https://www.ebi.ac.uk/interpro/structure/PDB/
    Explore at:
    Dataset updated
    Nov 4, 2019
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data item of the type ? from the database pdb with accession 103m and name SPERM WHALE MYOGLOBIN H64A N-BUTYL ISOCYANIDE AT PH 9.0

  11. A

    ‘PDB Electric Power Load History’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘PDB Electric Power Load History’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-pdb-electric-power-load-history-f3b9/69b765ba/?iid=004-966&v=presentation
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘PDB Electric Power Load History’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ashfakyeafi/pbd-load-history on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Inspiration

    With this data, many works can be done in the Electrical Engineering sector.

    --- Original source retains full ownership of the source dataset ---

  12. s

    PDB 8TYX

    • data.sbgrid.org
    Updated Jul 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). PDB 8TYX [Dataset]. http://doi.org/10.2210/pdb8TYX/pdb
    Explore at:
    Dataset updated
    Jul 9, 2024
    Description

    Protein Data Bank Entry 8TYX is listed as the structure corresponding to this dataset

  13. o

    NR2F1 modeling and simulation data

    • explore.openaire.eu
    Updated Jan 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VALERIO MARINO (2024). NR2F1 modeling and simulation data [Dataset]. http://doi.org/10.5281/zenodo.10551664
    Explore at:
    Dataset updated
    Jan 22, 2024
    Authors
    VALERIO MARINO
    Description

    Homology modeling: NR2F1 Active form: NR2F1_act.pdb Auto-repressed form: NR2F1_rep.pdb Molecular Dynamics simulations Original models: nr2f1_lbd_wt.pdb, nr2f1_lbd_q244x.pdb, nr2f1_lbd_e400x.pdb Structures after 4 ns equilibration: nr2f1_lbd_wt_start.pdb, nr2f1_lbd_q244x_start.pdb, nr2f1_lbd_e400x_start.pdb Trajectories in gromacs compressed format aligned with the equilibrated structure: nr2f1_lbd_wt_clean.xtc, nr2f1_lbd_q244x_clean.xtc, nr2f1_lbd_e400x_clean.xtc Final structure after 500 ns productive MD simulations: nr2f1_lbd_wt_500ns.pdb, nr2f1_lbd_q244x_500ns.pdb, nr2f1_lbd_e400x_500ns.pdb Docking simulations For each docking simulation performed with PIPER we provide the best solution as detailed in the mansucript Homodimer: NR2F1_act_dimer.pdb (active), NR2F1_rep_dimer.pdb (auto-repressed) Heterodimer with NR2F2: NR2F1_act_NR2F2.pdb (active), NR2F1_rep_NR2F2.pdb (auto-repressed) Heterodimer with RXRa: NR2F1_act_RXRa.pdb (active), NR2F1_rep_RXRa.pdb (auto-repressed) Heterodimer with CRABP2: NR2F1_act_CRABP2_apo.pdb (apo), NR2F1_act_CRABP2_holo.pdb (holo)

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Brocidiacono, Michael (2024). FireProtDB + PDB Structural Protein Stability Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8169288

FireProtDB + PDB Structural Protein Stability Dataset

Explore at:
Dataset updated
Jan 30, 2024
Dataset provided by
Brocidiacono, Michael
Dieckhaus, Henry
Randolph, Nicholas
Kuhlman, Brian
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset compiled and curated for use in the ThermoMPNN paper: https://doi.org/10.1073/pnas.2314853121:

Dataset for training models for prediction of thermodynamic stability changes (ddG) of protein point mutations given a wildtype protein structure (PDB) file. Data was assembled by matching sequence-based ddG measurements in FireProtDB to structures from the RCSB Protein Data Bank (PDB). For details, see the Methods section of our manuscript.

Citing this work: If you choose to use this dataset for your own research, please cite this repository and the ThermoMPNN paper: https://doi.org/10.1073/pnas.2314853121.

Contents:

pdbs/ directory contains all PDB files

csvs/ directory contains all CSVs with mutation data

csvs/4_fireprotDB_bestpH.csv is the main (full) dataset file with 3,438 mutations across 100 proteins.

csvs/fireprot_splits.pkl contains the dataset splits (train/val/test) used in our study

csvs/splits/ contains csvs for each of the splits (train/val/test/homologue-free) indexed from the full dataset csv.

Important CSV columns:

pdb_id_corrected: corresponds to the PDB in the pdbs/ directory (after curation and disambiguation)

ddG: ddG value for mutation (mutant - WT)

wild_type: wild-type amino acid (1-letter code)

mutation: mutant amino acid (1-letter code)

pdb_position: 0-based index of the mutated residue in the PDB file (may be different from position in the original FireProtDB sequence entry)

Search
Clear search
Close search
Google apps
Main menu