26 datasets found
  1. s

    PDB 6P7O

    • data.sbgrid.org
    Updated Dec 20, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). PDB 6P7O [Dataset]. http://doi.org/10.2210/pdb6P7O/pdb
    Explore at:
    Dataset updated
    Dec 20, 2019
    Description

    Protein Data Bank Entry 6P7O is listed as the structure corresponding to this dataset

  2. t

    Protein Data Bank (PDB) dataset for peptide design - Dataset - LDM

    • service.tib.eu
    • resodate.org
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Protein Data Bank (PDB) dataset for peptide design - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/protein-data-bank--pdb--dataset-for-peptide-design
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    A dataset of protein-peptide complexes for training a generative model for full-atom peptide design with Geometric Latent Diffusion.

  3. s

    PDB 5ZLE

    • data.sbgrid.org
    Updated Aug 17, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). PDB 5ZLE [Dataset]. http://doi.org/10.2210/pdb5ZLE/pdb
    Explore at:
    Dataset updated
    Aug 17, 2018
    Description

    Protein Data Bank Entry 5ZLE is listed as the structure corresponding to this dataset

  4. s

    PDB 2I4A

    • data.sbgrid.org
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PDB 2I4A [Dataset]. http://doi.org/10.2210/pdb2I4A/pdb
    Explore at:
    Description

    Protein Data Bank Entry 2I4A is listed as the structure corresponding to this dataset

  5. Network Visualization Map Data

    • springernature.figshare.com
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luigi Di Costanzo; Christopher Markosian (2023). Network Visualization Map Data [Dataset]. http://doi.org/10.6084/m9.figshare.6121436.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Luigi Di Costanzo; Christopher Markosian
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data used to generate co-occurrence network map of publication data keywords using the VOSviewer server (Version 1.6.5). Approximately 227,000 keywords were extracted from citation titles and abstracts from the Web of Science. A network was computed for a total of 2,460 terms selected by the full-counting method and relevance scoring as implemented within VOSviewer. For analysis, we reviewed co-occurrence network maps for thresholds between 5 and 40. The default cutoff of 30 as the number of term co-occurrence is shown.

  6. Z

    Data from: Redocking the PDB

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Dec 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Flachsenberg, Florian; Ehrt, Christiane; Gutermuth, Torben; Rarey, Matthias (2023). Redocking the PDB [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7579501
    Explore at:
    Dataset updated
    Dec 6, 2023
    Dataset provided by
    Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
    Authors
    Flachsenberg, Florian; Ehrt, Christiane; Gutermuth, Torben; Rarey, Matthias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains supplementary data to the journal article 'Redocking the PDB' by Flachsenberg et al. (https://doi.org/10.1021/acs.jcim.3c01573)[1]. In this paper, we described two datasets: The PDBScan22 dataset with a large set of 322,051 macromolecule–ligand binding sites generally suitable for redocking and the PDBScan22-HQ dataset with 21,355 binding sites passing different structure quality filters. These datasets were further characterized by calculating properties of the ligand (e.g., molecular weight), properties of the binding site (e.g., volume), and structure quality descriptors (e.g., crystal structure resolution). Additionally, we performed redocking experiments with our novel JAMDA structure preparation and docking workflow[1] and with AutoDock Vina[2,3]. Details for all these experiments and the dataset composition can be found in the journal article[1]. Here, we provide all the datasets, i.e., the PDBScan22 and PDBScan22-HQ datasets as well as the docking results and the additionally calculated properties (for the ligand, the binding sites, and structure quality descriptors). Furthermore, we give a detailed description of their content (i.e., the data types and a description of the column values). All datasets consist of CSV files with the actual data and associated metadata JSON files describing their content. The CSV/JSON files are compliant with the CSV on the web standard (https://csvw.org/). General hints

    All docking experiment results consist of two CSV files, one with general information about the docking run (e.g., was it successful?) and one with individual pose results (i.e., score and RMSD to the crystal structure). All files (except for the docking pose tables) can be indexed uniquely by the column tuple '(pdb, name)' containing the PDB code of the complex (e.g., 1gm8) and the name ligand (in the format '_', e.g., 'SOX_B_1559'). All files (except for the docking pose tables) have exactly the same number of rows as the dataset they were calculated on (e.g., PDBScan22 or PDBScan22-HQ). However, some CSV files may have missing values (see also the JSON metadata files) in some or even all columns (except for 'pdb' and 'name'). The docking pose tables also contain the 'pdb' and 'name' columns. However, these alone are not unique but only together with the 'rank' column (i.e., there might be multiple poses for each docking run or none). Example usage Using the pandas library (https://pandas.pydata.org/) in Python, we can calculate the number of protein-ligand complexes in the PDBScan22-HQ dataset with a top-ranked pose RMSD to the crystal structure ≤ 2.0 Å in the JAMDA redocking experiment and a molecular weight between 100 Da and 200 Da:

    import pandas as pd df = pd.read_csv('PDBScan22-HQ.csv') df_poses = pd.read_csv('PDBScan22-HQ_JAMDA_NL_NR_poses.csv') df_properties = pd.read_csv('PDBScan22_ligand_properties.csv') merged = df.merge(df_properties, how='left', on=['pdb', 'name']) merged = merged[(merged['MW'] >= 100) & (merged['MW'] <= 200)].merge(df_poses[df_poses['rank'] == 1], how='left', on=['pdb', 'name']) nof_successful_top_ranked = (merged['rmsd_ai'] <= 2.0).sum() nof_no_top_ranked = merged['rmsd_ai'].isna().sum() Datasets

    PDBScan22.csv: This is the PDBScan22 dataset[1]. This dataset was derived from the PDB4. It contains macromolecule–ligand binding sites (defined by PDB code and ligand identifier) that can be read by the NAOMI library[5,6] and pass basic consistency filters. PDBScan22-HQ.csv: This is the PDBScan22-HQ dataset[1]. It contains macromolecule–ligand binding sites from the PDBScan22 dataset that pass certain structure quality filters described in our publication[1]. PDBScan22-HQ-ADV-Success.csv: This is a subset of the PDBScan22-HQ dataset without 336 binding sites where AutoDock Vina[2,3] fails. PDBScan22-HQ-Macrocycles.csv: This is a subset of the PDBScan22-HQ dataset without 336 binding sites where AutoDock Vina[2,3] fails and only contains molecules with macrocycles with at least ten atoms. Properties for PDBScan22

    PDBScan22_ligand_properties.csv: Conformation-independent properties of all ligand molecules in the PDBScan22 dataset. Properties were calculated using an in-house tool developed with the NAOMI library[5,6]. PDBScan22_StructureProfiler_quality_descriptors.csv: Structure quality descriptors for the binding sites in the PDBScan22 dataset calculated using the StructureProfiler tool[7]. PDBScan22_basic_complex_properties.csv: Simple properties of the binding sites in the PDBScan22 dataset. Properties were calculated using an in-house tool developed with the NAOMI library[5,6]. Properties for PDBScan22-HQ

    PDBScan22-HQ_DoGSite3_pocket_descriptors.csv: Binding site descriptors calculated for the binding sites in the PDBScan22-HQ dataset using the DoGSite3 tool[8]. PDBScan22-HQ_molecule_types.csv: Assignment of ligands in the PDBScan22-HQ dataset (without 336 binding sites where AutoDock Vina fails) to different molecular classes (i.e., drug-like, fragment-like oligosaccharide, oligopeptide, cofactor, macrocyclic). A detailed description of the assignment can be found in our publication[1]. Docking results on PDBScan22

    PDBScan22_JAMDA_NL_NR.csv: Docking results of JAMDA[1] on the PDBScan22 dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22_JAMDA_NL_NR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22_JAMDA_NL_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22 dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. Docking results on PDBScan22-HQ

    PDBScan22-HQ_JAMDA_NL_NR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NL_NR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NL_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NL_WR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NL_WR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_NL_WR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_NW_NR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NW_NR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NW_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NW_WR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NW_WR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_NW_WR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_WL_NR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_WL_NR_poses.csv'. For this experiment, the ligand was considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_WL_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand

  7. Structural Protein Sequences

    • kaggle.com
    zip
    Updated Feb 3, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SHAHIR (2018). Structural Protein Sequences [Dataset]. https://www.kaggle.com/datasets/shahir/protein-data-set
    Explore at:
    zip(28782775 bytes)Available download formats
    Dataset updated
    Feb 3, 2018
    Authors
    SHAHIR
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    This is a protein data set retrieved from Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB).

    The PDB archive is a repository of atomic coordinates and other information describing proteins and other important biological macromolecules. Structural biologists use methods such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy to determine the location of each atom relative to each other in the molecule. They then deposit this information, which is then annotated and publicly released into the archive by the wwPDB.

    The constantly-growing PDB is a reflection of the research that is happening in laboratories across the world. This can make it both exciting and challenging to use the database in research and education. Structures are available for many of the proteins and nucleic acids involved in the central processes of life, so you can go to the PDB archive to find structures for ribosomes, oncogenes, drug targets, and even whole viruses. However, it can be a challenge to find the information that you need, since the PDB archives so many different structures. You will often find multiple structures for a given molecule, or partial structures, or structures that have been modified or inactivated from their native form.

    Content

    There are two data files. Both are arranged on "structureId" of the protein:

    • pdb_data_no_dups.csv contains protein meta data which includes details on protein classification, extraction methods, etc.

    • data_seq.csv contains >400,000 protein structure sequences.

    Acknowledgements

    Original data set down loaded from http://www.rcsb.org/pdb/

    Inspiration

    Protein data base helped the life science community to study about different diseases and come with new drugs and solution that help the human survival.

  8. s

    PDB 6P7Q

    • data.sbgrid.org
    Updated Dec 20, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). PDB 6P7Q [Dataset]. http://doi.org/10.2210/pdb6P7Q/pdb
    Explore at:
    Dataset updated
    Dec 20, 2019
    Description

    Protein Data Bank Entry 6P7Q is listed as the structure corresponding to this dataset

  9. Cryo-EM structure of human KATP bound to ATP and ADP in propeller form

    • ebi.ac.uk
    application/cif
    Updated Jan 24, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Protein Data Bank in Europe (PDBe) (2018). Cryo-EM structure of human KATP bound to ATP and ADP in propeller form [Dataset]. http://doi.org/10.2210/pdb/6c3p/pdb
    Explore at:
    application/cifAvailable download formats
    Dataset updated
    Jan 24, 2018
    Dataset provided by
    European Bioinformatics Institutehttp://www.ebi.ac.uk/
    Authors
    Protein Data Bank in Europe (PDBe)
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Measurement technique
    Electron Microscopy
    Description

    Cryo-EM structure of human KATP bound to ATP and ADP in propeller form

  10. A Systematic Analysis of the Structures of Heterologously Expressed Proteins...

    • plos.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ren-Bin Zhou; Hui-Meng Lu; Jie Liu; Jian-Yu Shi; Jing Zhu; Qin-Qin Lu; Da-Chuan Yin (2023). A Systematic Analysis of the Structures of Heterologously Expressed Proteins and Those from Their Native Hosts in the RCSB PDB Archive [Dataset]. http://doi.org/10.1371/journal.pone.0161254
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ren-Bin Zhou; Hui-Meng Lu; Jie Liu; Jian-Yu Shi; Jing Zhu; Qin-Qin Lu; Da-Chuan Yin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recombinant expression of proteins has become an indispensable tool in modern day research. The large yields of recombinantly expressed proteins accelerate the structural and functional characterization of proteins. Nevertheless, there are literature reported that the recombinant proteins show some differences in structure and function as compared with the native ones. Now there have been more than 100,000 structures (from both recombinant and native sources) publicly available in the Protein Data Bank (PDB) archive, which makes it possible to investigate if there exist any proteins in the RCSB PDB archive that have identical sequence but have some difference in structures. In this paper, we present the results of a systematic comparative study of the 3D structures of identical naturally purified versus recombinantly expressed proteins. The structural data and sequence information of the proteins were mined from the RCSB PDB archive. The combinatorial extension (CE), FATCAT-flexible and TM-Align methods were employed to align the protein structures. The root-mean-square distance (RMSD), TM-score, P-value, Z-score, secondary structural elements and hydrogen bonds were used to assess the structure similarity. A thorough analysis of the PDB archive generated five-hundred-seventeen pairs of native and recombinant proteins that have identical sequence. There were no pairs of proteins that had the same sequence and significantly different structural fold, which support the hypothesis that expression in a heterologous host usually could fold correctly into their native forms.

  11. Crystal structure of alpha-1-antitrypsin, crystal form A

    • ebi.ac.uk
    application/cif
    Updated Aug 12, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Protein Data Bank in Europe (PDBe) (2008). Crystal structure of alpha-1-antitrypsin, crystal form A [Dataset]. http://doi.org/10.2210/pdb/2qug/pdb
    Explore at:
    application/cifAvailable download formats
    Dataset updated
    Aug 12, 2008
    Dataset provided by
    European Bioinformatics Institutehttp://www.ebi.ac.uk/
    Authors
    Protein Data Bank in Europe (PDBe)
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Measurement technique
    X-ray diffraction
    Description

    Crystal structure of alpha-1-antitrypsin, crystal form A

  12. d

    Enzyme Structures Database

    • dknet.org
    • neuinfo.org
    • +2more
    Updated Sep 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Enzyme Structures Database [Dataset]. http://identifiers.org/RRID:SCR_007125
    Explore at:
    Dataset updated
    Sep 5, 2024
    Description

    Database of known enzyme structures that have been deposited in the Protein Data Bank (PDB). The enzyme structures are classified by their E.C. number of the ENZYME Data Bank. Browse the classification hierarchy or enter an EC number or search-string. There are currently 45,638 PDB-enzyme entries in the PDB (as at 23 February, 2013) involving 38,109 separate PDB files - some files having more than one E.C. number associated with them.

  13. Crystal structure of human Tut1 bound with MgUTP, form II

    • ebi.ac.uk
    application/cif
    Updated May 31, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Protein Data Bank in Europe (PDBe) (2017). Crystal structure of human Tut1 bound with MgUTP, form II [Dataset]. http://doi.org/10.2210/pdb/5wu3/pdb
    Explore at:
    application/cifAvailable download formats
    Dataset updated
    May 31, 2017
    Dataset provided by
    European Bioinformatics Institutehttp://www.ebi.ac.uk/
    Authors
    Protein Data Bank in Europe (PDBe)
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Measurement technique
    X-ray diffraction
    Description

    Crystal structure of human Tut1 bound with MgUTP, form II

  14. Evolution of the SARS-CoV-2 proteome in three dimensions (3D) during the...

    • zenodo.org
    • nde-dev.biothings.io
    • +1more
    application/gzip, png
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph H. Lubin; Joseph H. Lubin; Christine Zardecki; Elliott M. Dolan; Changpeng Lu; Zhuofan Shen; Shuchismita Dutta; John D. Westbrook; Brian P. Hudson; David S. Goodsell; Jonathan K. Williams; Maria Voigt; Vidur Sarma; Lingjun Xie; Thejasvi Venkatachalam; Steven Arnold; Luz Helena Alfaro Alvarado; Kevin Catalfano; Aaliyah Khan; Erika McCarthy; Sophia Staggers; Brea Tinsley; Alan Trudeau; Jitendra Singh; Lindsey Whitmore; Helen Zheng; Matthew Benedek; Jenna Currier; Mark Dresel; Ashish Duvvuru; Britney Dyszel; Emily Fingar; Elizabeth M. Hennen; Michael Kirsch; Ali A. Khan; Charlotte Labrie-Cleary; Stephanie Laporte; Evan Lenkeit; Kailey Martin; Marilyn Orellana; Melanie Ortiz-Alvarez de la Campa; Isaac Paredes; Baleigh Wheeler; Allison Rupert; Andrew Sam; Katherine See; Santiago Soto Zapata; Paul A. Craig; Bonnie L. Hall; Jennifer Jiang; Julia R. Koeppe; Stephen A. Mills; Michael J. Pikaart; Rebecca Roberts; Yana Bromberg; J. Steen Hoyer; Siobain Duffy; Jay Tischfield; Francesc X. Ruiz; Eddy Arnold; Jean Baum; Jesse Sandberg; Grace Brannigan; Sagar D. Khare; Stephen K. Burley; Christine Zardecki; Elliott M. Dolan; Changpeng Lu; Zhuofan Shen; Shuchismita Dutta; John D. Westbrook; Brian P. Hudson; David S. Goodsell; Jonathan K. Williams; Maria Voigt; Vidur Sarma; Lingjun Xie; Thejasvi Venkatachalam; Steven Arnold; Luz Helena Alfaro Alvarado; Kevin Catalfano; Aaliyah Khan; Erika McCarthy; Sophia Staggers; Brea Tinsley; Alan Trudeau; Jitendra Singh; Lindsey Whitmore; Helen Zheng; Matthew Benedek; Jenna Currier; Mark Dresel; Ashish Duvvuru; Britney Dyszel; Emily Fingar; Elizabeth M. Hennen; Michael Kirsch; Ali A. Khan; Charlotte Labrie-Cleary; Stephanie Laporte; Evan Lenkeit; Kailey Martin; Marilyn Orellana; Melanie Ortiz-Alvarez de la Campa; Isaac Paredes; Baleigh Wheeler; Allison Rupert; Andrew Sam; Katherine See; Santiago Soto Zapata; Paul A. Craig; Bonnie L. Hall; Jennifer Jiang; Julia R. Koeppe; Stephen A. Mills; Michael J. Pikaart; Rebecca Roberts; Yana Bromberg; J. Steen Hoyer; Siobain Duffy; Jay Tischfield; Francesc X. Ruiz; Eddy Arnold; Jean Baum; Jesse Sandberg; Grace Brannigan; Sagar D. Khare; Stephen K. Burley (2024). Evolution of the SARS-CoV-2 proteome in three dimensions (3D) during the first six months of the COVID-19 pandemic -- Supplementary Tables and Models [Dataset]. http://doi.org/10.5281/zenodo.4293973
    Explore at:
    application/gzip, pngAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Joseph H. Lubin; Joseph H. Lubin; Christine Zardecki; Elliott M. Dolan; Changpeng Lu; Zhuofan Shen; Shuchismita Dutta; John D. Westbrook; Brian P. Hudson; David S. Goodsell; Jonathan K. Williams; Maria Voigt; Vidur Sarma; Lingjun Xie; Thejasvi Venkatachalam; Steven Arnold; Luz Helena Alfaro Alvarado; Kevin Catalfano; Aaliyah Khan; Erika McCarthy; Sophia Staggers; Brea Tinsley; Alan Trudeau; Jitendra Singh; Lindsey Whitmore; Helen Zheng; Matthew Benedek; Jenna Currier; Mark Dresel; Ashish Duvvuru; Britney Dyszel; Emily Fingar; Elizabeth M. Hennen; Michael Kirsch; Ali A. Khan; Charlotte Labrie-Cleary; Stephanie Laporte; Evan Lenkeit; Kailey Martin; Marilyn Orellana; Melanie Ortiz-Alvarez de la Campa; Isaac Paredes; Baleigh Wheeler; Allison Rupert; Andrew Sam; Katherine See; Santiago Soto Zapata; Paul A. Craig; Bonnie L. Hall; Jennifer Jiang; Julia R. Koeppe; Stephen A. Mills; Michael J. Pikaart; Rebecca Roberts; Yana Bromberg; J. Steen Hoyer; Siobain Duffy; Jay Tischfield; Francesc X. Ruiz; Eddy Arnold; Jean Baum; Jesse Sandberg; Grace Brannigan; Sagar D. Khare; Stephen K. Burley; Christine Zardecki; Elliott M. Dolan; Changpeng Lu; Zhuofan Shen; Shuchismita Dutta; John D. Westbrook; Brian P. Hudson; David S. Goodsell; Jonathan K. Williams; Maria Voigt; Vidur Sarma; Lingjun Xie; Thejasvi Venkatachalam; Steven Arnold; Luz Helena Alfaro Alvarado; Kevin Catalfano; Aaliyah Khan; Erika McCarthy; Sophia Staggers; Brea Tinsley; Alan Trudeau; Jitendra Singh; Lindsey Whitmore; Helen Zheng; Matthew Benedek; Jenna Currier; Mark Dresel; Ashish Duvvuru; Britney Dyszel; Emily Fingar; Elizabeth M. Hennen; Michael Kirsch; Ali A. Khan; Charlotte Labrie-Cleary; Stephanie Laporte; Evan Lenkeit; Kailey Martin; Marilyn Orellana; Melanie Ortiz-Alvarez de la Campa; Isaac Paredes; Baleigh Wheeler; Allison Rupert; Andrew Sam; Katherine See; Santiago Soto Zapata; Paul A. Craig; Bonnie L. Hall; Jennifer Jiang; Julia R. Koeppe; Stephen A. Mills; Michael J. Pikaart; Rebecca Roberts; Yana Bromberg; J. Steen Hoyer; Siobain Duffy; Jay Tischfield; Francesc X. Ruiz; Eddy Arnold; Jean Baum; Jesse Sandberg; Grace Brannigan; Sagar D. Khare; Stephen K. Burley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Evolution of the SARS-CoV-2 proteome in three dimensions (3D) during the first six months of the COVID-19 pandemic

    https://covid-19_proteome_evolution_paper.iqb.rutgers.edu

    Legends for Supplementary Figures for 29 SARS-CoV-2 Study Proteins

    Separate analysis of protein changes was performed for each study protein and complex. Description below applies to all figures.

    A: Grey scale representation of observed frequencies for all USV substitutions of Native Residue (i.e., amino acid type in the reference protein sequence) changing to Substituted Residue for a given protein/complex. Red boxes enclose conservative substitutions for hydrophobic, uncharged polar, positively charged, and negatively charged amino acids, respectively in order from upper left to lower right. Cysteine, Glycine and Proline are excluded from these groupings.

    B-D: Normalized Frequency histograms for ΔΔGApp calculated for all USVs for a given protein/complex. These were calculated using three methods, which we refer to as hard-hard (B), soft-hard (C), and soft-soft (D), based on the scoring functions used for sidechain rotamer optimization and gradient-based energy minimization respectively (see methods). All energy values described in the text were obtained using the soft-hard method. Overlay of energy histogram with fitted bi-Gaussian curve (solid red line) and fitted single Gaussian curves for subsets of USVs with surface (green), boundary layer (yellow), or core (blue) substitutions. USVs with multiple substitutions were included in single Gaussian fitting when all substitutions mapped to the same region of the study protein. The data used for fitting includes the energies of all unique protein models produced by a given method, excluding extreme outliers with energy values greater than 3 standard deviations away from the central mean.

    E-G: USV Count histograms indicate the number of USVs among the full set for a given protein in which each site included a substitution. Sites are separated by burial layer. Substitutions at sites that are absent from the available crystal structures are excluded from the histograms. In most cases, only a single protein is analyzed, and only panel E is included. In the case of complexes, a separate histogram is provided for each protein in the complex: for methyltransferase nsp10-nsp16, E is nsp10 and F is nsp16; for RDRP nsp12-nsp7-nsp8, E is nsp7, F is nsp8, and G is nsp12.

    Legends for Supplementary Tables for 29 SARS-CoV-2 Study Proteins

    Table: USVs: All identified USVs for a protein/complex. Columns are:

    • date: Date of first collection of a strain with the USV reported to GISAID
    • gisaid_count: The number of sequences in the GISAID database that include the USV
    • id: The GISAID strain identification for the first collected instance of the USV
    • location: The country in which the first strain including the USV was collected
    • substitutions: All substitutions in the USV, in the form [chain]_[sequence][site][substitution], with multiple substitutions separated by semicolons
    • is_in_PDB: whether a substitution is present in the PDB model used to generate the USV structure, with multiple substitutions separated by semicolons
    • multiple: whether more than one amino acid substitution is present in the USV
    • conservative: whether a substitution is conservative, with multiple substitutions separated by semicolons
    • layer: Identification of the burial layer (surface, boundary, or core) of a substitution in the reference structure, with multiple substitutions separated by semicolons and substitutions absent from the PDB excluded
    • sh_rmsd: The RMSD of the USV to the reference structure when modeled using the soft-hard method
    • sh_ddG: The ΔΔGApp of the USV when modeled using the soft-hard method
    • hh_rmsd: The RMSD of the USV to the reference structure when modeled using the hard-hard method
    • hh_ddG: The ΔΔGApp of the USV when modeled using the hard-hard method
    • ss_rmsd: The RMSD of the USV to the reference structure when modeled using the soft-soft method
    • ss_ddG: The ΔΔGApp of the USV when modeled using the soft-soft method

    Table: Substitutions: All substitutions identified for a protein/complex

    • chain: The chain identifier of the protein in the PDB file in which the substitution is present
    • site: The residue number at which the substitution is present
    • reference: The one-letter amino acid name of the residue in the reference sequence
    • mutant: The one-letter amino acid name of the residue in a USV
    • conservative: Indication of whether a substitution is conservative
    • in_pdb: whether the substitution site is present in the PDB model used to generate the USV structure
    • layer: Identification of the burial layer (surface, boundary, or core) of a substitution in the reference structure
    • date: date: Date of first collection of a strain with the substitution reported to GISAID
    • location: The country in which the first strain including the substitution was collected
    • gisaid_count: The number of sequences in the GISAID database including the substitution
    • usv_count: The number of identified USVs including the substitution
    • ddG: The soft-hard ΔΔGApp of the USV that includes only the substitution, left empty if no single-substitution USV was identified with the substitution
    • single: Indication of whether the substitution was present in a single-substitution USV
    • multiple: Indication of whether the substitution was present in a USV with multiple substitutions
    • associates: List of all other substitutions that were identified in a USV that included the substitution
    • strains: List of all USV-representative GISAID strains that included the substitution, with the single-substitution USV strain listed first if one was available

    Table: Gaussian Fit Statistics: Fitted models for the energies of all USVs either together (ALL) or by study protein.

    • fit: The number of Gaussian curves in the fitted energy model
    • protein: The protein/complex name
    • method: The modeling method used to calculate energy values
    • layer: The subset burial layer (surface, boundary, or core) of USVs for which the energy model was fitted, excluding all USVs with substitutions not in that layer
    • μ1: Mean of the first Gaussian in the fitted model
    • σ1: Variance of the first Gaussian in the fitted model
    • wt1: Weight of the first Gaussian in the fitted model
    • μ2: Mean of the second Gaussian in the fitted model
    • σ2: Variance of the second Gaussian in the fitted model
    • wt2: Weight of the second Gaussian in the fitted model
    • R2: R-squared value indicating the goodness of fit

    Description of Computed Structural Models for Unique Sequence Variants for 29 SARS-CoV-2 Study Proteins.

    USV Computed Structural Models. Computed structural models for all amino acid substituted USVs. We are providing the structural models of all study proteins modeled using the soft-hard modeling method (see Methods). Structural models are named according to the GISAID strain identification of the first strain in which the USV was identified, followed by an underscore-separated list of substitutions in the form [chain]_[sequence][site][substitution]. Atomic coordinates for each computed structural model are provided in the legacy Protein Data Bank format used by most molecular graphics software tools (see https://www.wwpdb.org/documentation/file-format-content/format33/v3.3.html for detailed description).

  15. HUMAN CD69 - TETRAGONAL FORM

    • ebi.ac.uk
    application/cif
    Updated Sep 26, 2000
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Protein Data Bank in Europe (PDBe) (2000). HUMAN CD69 - TETRAGONAL FORM [Dataset]. http://doi.org/10.2210/pdb/1e8i/pdb
    Explore at:
    application/cifAvailable download formats
    Dataset updated
    Sep 26, 2000
    Dataset provided by
    European Bioinformatics Institutehttp://www.ebi.ac.uk/
    Authors
    Protein Data Bank in Europe (PDBe)
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Measurement technique
    X-ray diffraction
    Description

    HUMAN CD69 - TETRAGONAL FORM

  16. Cryo-EM structure of human full-length extrasynaptic beta3delta GABA(A)R in...

    • ebi.ac.uk
    application/cif
    Updated Apr 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Protein Data Bank in Europe (PDBe) (2022). Cryo-EM structure of human full-length extrasynaptic beta3delta GABA(A)R in complex with THIP (gaboxadol), histamine and nanobody Nb25 [Dataset]. http://doi.org/10.2210/pdb/7qnd/pdb
    Explore at:
    application/cifAvailable download formats
    Dataset updated
    Apr 13, 2022
    Dataset provided by
    European Bioinformatics Institutehttp://www.ebi.ac.uk/
    Authors
    Protein Data Bank in Europe (PDBe)
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Measurement technique
    Electron Microscopy
    Description

    Cryo-EM structure of human full-length extrasynaptic beta3delta GABA(A)R in complex with THIP (gaboxadol), histamine and nanobody Nb25

  17. H

    PDB: 1YNT, Structure of the monomeric form of T. gondii SAG1 surface antigen...

    • dataverse.harvard.edu
    Updated Sep 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Razvan Stan (2025). PDB: 1YNT, Structure of the monomeric form of T. gondii SAG1 surface antigen bound to a human Fab (310K, 37°C, 500 ns) [Dataset]. http://doi.org/10.7910/DVN/AXYSCV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 30, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Razvan Stan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Molecular Dynamics Simulations (500 nanoseconds) of T. gondii SAG1 surface antigen bound to a human Fab at 310K (37°C). Includes PDB files obtained every 50 ns.

  18. s

    PDB 5IAT

    • data.sbgrid.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PDB 5IAT [Dataset]. http://doi.org/10.2210/pdb5IAT/pdb
    Explore at:
    Description

    Protein Data Bank Entry 5IAT is listed as the structure corresponding to this dataset

  19. s

    PDB 5K6Y

    • data.sbgrid.org
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PDB 5K6Y [Dataset]. http://doi.org/10.2210/pdb5K6Y/pdb
    Explore at:
    Description

    Protein Data Bank Entry 5K6Y is listed as the structure corresponding to this dataset

  20. RNA 3D Backbone Dataset

    • kaggle.com
    zip
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karan Patel (2025). RNA 3D Backbone Dataset [Dataset]. https://www.kaggle.com/datasets/thekapiswild/rna-pretraining-dataset
    Explore at:
    zip(4260813654 bytes)Available download formats
    Dataset updated
    May 8, 2025
    Authors
    Karan Patel
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Source: Protein Data Bank. RNA. Experimental. Resolution between 0.5 to 3.5 Angstroms. Batches 1 to 68. PDBx/mmCIF format.

    There are 119 total batches, with 5921 total files if all the files were directly downloaded. Will be updated with the full set of batches. Intended for the Stanford RNA 3D Competition, but can be used for general purpose.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2019). PDB 6P7O [Dataset]. http://doi.org/10.2210/pdb6P7O/pdb

PDB 6P7O

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Dec 20, 2019
Description

Protein Data Bank Entry 6P7O is listed as the structure corresponding to this dataset

Search
Clear search
Close search
Google apps
Main menu