100+ datasets found
  1. s

    AlphaFold Protein Structure Database

    • scicrunch.org
    • rrid.site
    Updated Nov 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). AlphaFold Protein Structure Database [Dataset]. http://identifiers.org/RRID:SCR_023662
    Explore at:
    Dataset updated
    Nov 19, 2021
    Description

    Database of protein structure predictions by AlphaFold that are freely and openly available to global scientific community. Included are nearly all catalogued proteins known to science. Provides programmatic access to and interactive visualization of predicted atomic coordinates, per residue and pairwise model confidence estimates and predicted aligned errors.

  2. AlphaFold Protein Structure Database

    • console.cloud.google.com
    Updated Aug 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=en-GB (2023). AlphaFold Protein Structure Database [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/deepmind-alphafold?hl=en-GB
    Explore at:
    Dataset updated
    Aug 9, 2023
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    License
    Description

    The AlphaFold Protein Structure Database is a collection of protein structure predictions made using the machine learning model AlphaFold. AlphaFold was developed by DeepMind , and this database was created in partnership with EMBL-EBI . For information on how to interpret, download and query the data, as well as on which proteins are included / excluded, and change log, please see our main dataset guide and FAQs . To interactively view individual entries or to download proteomes / Swiss-Prot please visit https://alphafold.ebi.ac.uk/ . The current release aims to cover most of the over 200M sequences in UniProt (a commonly used reference set of annotated proteins). The files provided for each entry include the structure plus two model confidence metrics (pLDDT and PAE). The files can be found in the Google Cloud Storage bucket gs://public-datasets-deepmind-alphafold-v4 with metadata in the BigQuery table bigquery-public-data.deepmind_alphafold.metadata . If you use this data, please cite: Jumper, J et al. Highly accurate protein structure prediction with AlphaFold. Nature (2021) Varadi, M et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research (2021) This public dataset is hosted in Google Cloud Storage and is available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage.

  3. The Encyclopedia of Domains (TED) structural domains assignments for...

    • zenodo.org
    application/gzip, bz2 +1
    Updated Oct 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andy Lau; Andy Lau; Nicola Bordin; Nicola Bordin; Shaun Kandathil; Shaun Kandathil; Ian Sillitoe; Ian Sillitoe; Vaishali Waman; Vaishali Waman; Jude Wells; Jude Wells; Christine Orengo; Christine Orengo; David T Jones; David T Jones (2024). The Encyclopedia of Domains (TED) structural domains assignments for AlphaFold Database v4 [Dataset]. http://doi.org/10.5281/zenodo.13369203
    Explore at:
    application/gzip, bz2, zipAvailable download formats
    Dataset updated
    Oct 31, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andy Lau; Andy Lau; Nicola Bordin; Nicola Bordin; Shaun Kandathil; Shaun Kandathil; Ian Sillitoe; Ian Sillitoe; Vaishali Waman; Vaishali Waman; Jude Wells; Jude Wells; Christine Orengo; Christine Orengo; David T Jones; David T Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset description:

    The Encyclopedia of Domains (TED) is a joint effort by CATH (Orengo group) and the Jones group at University College London to identify and classify protein domains in AlphaFold2 models from AlphaFold Database version 4, covering over 188 million unique sequences and 324 million domain assignments.

    In this data release, we will be making available to the community a table of domain boundaries and additional metadata on quality (pLDDT, globularity, number of secondary structures), taxonomy and putative CATH SuperFamily or Fold assignments for all 324 million domains in TED100.

    For all chains in the TED-redundant dataset, the attached file contains boundaries predictions, consensus level and information on the TED100 representative.

    Additionally, an archive with chain-level consensus domain assignments are available for 21 model organisms and 25 global health proteomes:

    For both TED100 and TEDredundant we provide domain boundaries predictions outputted by each of the three methods employed in the project (Chainsaw, Merizo, UniDoc).

    We are making available 7,427 novel folds PDB files, identified during the TED classification process with an annotation table sorted by novelty.

    Please use the gunzip command to extract files with a '.gz' extension.

    CATH annotations have been assigned using the FoldSeek algorithm applied in various modes and the FoldClass algorithm, both of which are used to report significant structural similarity to a known CATH domain.
    Note: The TED protocol differs from that of our standard CATH Assignment protocol for superfamily assignment, which also involves HMM-based protocols and manual curation for remote matches.


    This dataset contains:

    • ted_214m_per_chain_segmentation.tsv
      The file contains all 214M protein chains in TED with consensus domain boundaries and proteome information in the following columns.
      1. AFDB_model_ID: chain identifier from AFDB in the format AF-
    • ted_365m_domain_boundaries_consensus_level.tsv.gz
      The file contains all domain assignments in TED100 and TED-redundant (365M) in the format:
      1. TED_ID: TED domain identifier in the format AF-
    • ted_100_324m.domain_summary.cath.globularity.taxid.tsv and novel_folds_set.domain_summary.tsv are header-less with the following columns separated by tabs (.tsv).
    • ted_324m_seq_clustering.cathlabels.tsv
      The file contains the results of the domain sequences clustering with MMseqs2.
      Columns:
      1. Cluster_representative
      2. Cluster_member
      3. CATH code assignment if available i.e. 3.40.50.300 for a domain with a homologous match or 3.20.20 for a domain matching at the fold level in the CATH classification
      4. CATH assignment type - either Foldseek-T, Foldseek-H or Foldclass
    • novel_folds_set.domain_summary.tsv is sorted by novelty.
      1. ted_id - TED domain identifier in the format AF-
    • Domain assignments for TED redundant using single-chain and multi-chain consensus in ted_redundant_39m.multichain.consensus_domain_summary.taxid.tsv and ted_redundant_39m.singlechain.consensus_domain_summary.taxid.tsv
      The files contain a header with the following fields. Each column is tab-separated (.tsv).
      1. TED_redundant_id - TED chain identifier in the format AF-
    • and ted_redundant_39m.singlechain.consensus_domain_summary.taxid.tsv
      The file contains a header with the following fields. Each column is tab-separated (.tsv).
      1. TED_redundant_id - TED chain identifier in the format AF-
    • novel_folds_set_models.tar.gz contains PDB files of all novel folds identified in TED100.
    • All per-tool domain boundaries predictions are in the same format with the following columns.
      1. TED_chainID - TED chain identifier in the format AF-
    • Domain boundaries predictions share the same format, with each segment separated by '_' and segment boundaries (start,stop) separated by '-'

      i.e.domain prediction by Merizo for AF-A0A000-F1-model_v4
      AF-A0A000-F1-model_v4 e8872c7a0261b9e88e6ff47eb34e4162 394 2 10-52_289-394,53-288 0.90077

      Merizo predicts one continuous domain and a discontinuous domain,
      Domain1 (discontinuous): 10-52_289-394
      segment1: 10-52
      segment2: 289-394
      Domain 2 (continuous):
      segment 1: 53-288
    • ted-tools-main.zip - copy of the https://github.com/psipred/ted-tools repository, containing tools and software used to generate TED.
    • cath-alphaflow-main.zip - copy of CATH-AlphaFlow, used to generate globularity scores for TED domains.
    • ted-web-master.zip - copy of TED-web, containing code to generate the web interface of TED (https://ted.cathdb.info)
    • gofocus_data.tar.bz2 - GOFocus model weights
  4. r

    AlphaFold Unmasked data sets

    • demo.researchdata.se
    • figshare.scilifelab.se
    • +2more
    Updated Jan 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Claudio Mirabello; Björn Wallner; Björn Nystedt; Marta Carroni (2025). AlphaFold Unmasked data sets [Dataset]. http://doi.org/10.17044/SCILIFELAB.24198669
    Explore at:
    Dataset updated
    Jan 27, 2025
    Dataset provided by
    Linköping University
    Authors
    Claudio Mirabello; Björn Wallner; Björn Nystedt; Marta Carroni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Here are deposited all of the predictions generated for the test cases presented in "AlphaFold Unmasked: integration of experiments and predictions with a smarter template mechanism" (doi: https://doi.org/10.1101/2023.09.20.558579) along with the log files necessary to reproduce the experiments.

    Each tar.gz file includes one or more AlphaFold experiments, where multiple predictions have been generated either with AlphaFold-Multimer (standard pipeline, v2.2 and/or v2.3 parameters) or with AF_unmasked. An experiment is made of a set of 3D structure predictions (.pdb files) along with the ancillary data generated by AlphaFold (pickle files) and the corresponding inputs (Multiple Sequence Alignments, sequences). Scripts to reproduce the results are included along with the log files generated during the experiments.

    H1111, H1142, T1109 and T1110 are multimeric prediction targets from CASP15 (https://predictioncenter.org/casp15/) chosen because most or all predictors failed to correctly predict these complexes in the 2021 edition of CASP.

    Rubisco, NF1 and ClpB are examples of large and/or challenging targets where Cryo-EM data is available to be integrated in the prediction pipeline.

    The PDB benchmark is made of a set of protein heterodimeric structures deposited in the PDB before January 2022, i.e. before AlphaFold v2.3 was trained and released. These heterodimers have been redundancy reduced by structural similarity (MMalign score threshold: 0.4) to increase their diversity

  5. c

    Reciprocal Best Structure Hits (RBSH)

    • repository.cam.ac.uk
    bin
    Updated Sep 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monzon, Vivian; Paysan-Lafosse, Typhaine; Wood, Valerie; Bateman, Alex (2022). Reciprocal Best Structure Hits (RBSH) [Dataset]. http://doi.org/10.17863/CAM.87873
    Explore at:
    bin(171535 bytes), bin(155431 bytes), bin(79489 bytes), bin(84547 bytes), bin(39107 bytes)Available download formats
    Dataset updated
    Sep 22, 2022
    Dataset provided by
    University of Cambridge
    Apollo
    Authors
    Monzon, Vivian; Paysan-Lafosse, Typhaine; Wood, Valerie; Bateman, Alex
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this work, we are using AlphaFold structure models to find the closest homologues proteins between Homo sapiens and D. melanogaster, C. elegans, S. cerevisiae and S. pombe as well as between S. cerevisiae and S. pombe. We are using the structure aligner Foldseek to run all against all and search for the best scoring hit in both directions to detect the Reciprocal Best Structure Hits (RBSH). We compare the results to protein pairs detected by their sequence similarity as Reciprocal Best Hits (RBH) and verify the results using the PANTHER family classification files. \( \ \) Note: This dataset is an updated version of the dataset at https://doi.org/10.17863/CAM.85487.

  6. Z

    Prediction and Visualization of Human Transmembrane Proteins using AlphaFold...

    • data.niaid.nih.gov
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marquet, Céline; Grekova, Anastasia; Houri, Leen; Heinzinger, Michael; Rost, Burkhard (2024). Prediction and Visualization of Human Transmembrane Proteins using AlphaFold and Protein Language Models [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6816082
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Technical University Munich
    Authors
    Marquet, Céline; Grekova, Anastasia; Houri, Leen; Heinzinger, Michael; Rost, Burkhard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description: TMvis ("TMvis496.tar.gz") is a dataset containing 496 3D-structures of predicted human transmembrane proteins (TMP) and their predicted membrane embedding. The method TMbed [1], based on the protein language model ProtT5 [2] predicted 4.967 TMP for the human proteome (20,375 proteins, UniProt [3] version April 2022; excluding TITIN_HUMAN due to length). For these proteins, we obtained AlphaFold [4] structures from AlphaFoldDB [5] with an average per-residue confidence score (pLDDT) of more than 90%. This resulted in the 496 proteins of TMvis, as can be found in "TMvis496.fasta". The membrane embedding was predicted using the methods ANVIL [6], PPM3 [7], and per-residue TMbed predictions. As the three methods are based on different approaches, we decided to publish results for all. The figure “TMvis_project_overview.png” provides a graphical overview for each step described above.

    TMvis Folder Structure: TMvis is separated into “alpha” containing predicted alpha-helical TMPs, and “beta” containing predicted beta-barrel TMPs. Within these folders, each protein is assigned one folder, identifiable by the respective unique UniProt ID. Each protein folder consists of: - “UniprotID.fasta” with UniProt ID, sequence, TMbed per-residue prediction - “AF-UniprotID-F1-model_v2.pdb” with the AlphaFold structure - “AF-UniprotID-F1-model_v2.cif” with the AlphaFold structure - “AF-UniprotID-F1-model_v2_ANVIL.pdb” with predicted ANVIL membrane embedding - “AF-UniprotID-F1-model_v2_ppm.pdb” predicted PPM3 membrane embedding

    TMvis
    |
    ├── alpha
    │ │
    │ ├── A0A087X1C5
    │ │ ├── A0A087X1C5.fasta
    │ │ ├── AF-A0A087X1C5-F1-model_v2.pdb
    │ │ ├── AF-A0A087X1C5-F1-model_v2.cif
    │ │ ├── AF-A0A087X1C5-F1-model_v2_ANVIL.pdb
    │ │ └── AF-A0A087X1C5-F1-model_v2_ppm.PDB
    │ └── ...
    └── beta
    └── P45880

    TMvis visualization: The 3D-visualization of every protein in the dataset TMvis can be easily accessed using the Jupyter Notebook “TMvis.ipynb”. It contains detailed descriptions the different membrane prediction tools ANVIL, PPM3, and TMbed as well as the respective code. Additionally, it allows to visualize the per-residue confidence scores (pLDDT) of AlphaFold.

    ——————————————————————————————————————————————————————————————————————————

    References:

    [1] TMbed - TMbed Bernhofer, Michael, and Burkhard Rost. 2022. “TMbed – Transmembrane Proteins Predicted through Language Model Embeddings.” bioRxiv.

    [2] ProtT5 - A. Elnaggar et al., "ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2021.3095381.

    [3] UniProt - UniProt Consortium (2021). UniProt: the universal protein knowledgebase in 2021. Nucleic acids research, 49(D1), D480–D489.

    [4] AlphaFold - AlphaFold Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, et al. 2021. “Highly Accurate Protein Structure Prediction with AlphaFold.” Nature 596 (7873): 583–89.

    [5] Alphafold DB - Varadi, Mihaly, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, et al. 2022. “AlphaFold Protein Structure Database: Massively Expanding the Structural Coverage of Protein-Sequence Space with High-Accuracy Models.” Nucleic Acids Research 50 (D1): D439–44.

    [6] ANVIL - ANVIL Postic, Guillaume, Yassine Ghouzam, Vincent Guiraud, and Jean-Christophe Gelly. 2016. “Membrane Positioning for High- and Low-Resolution Protein Structures through a Binary Classification Approach.” Protein Engineering, Design & Selection: PEDS 29 (3): 87–91.

    [7] PPM3 - PPM3 Lomize, Mikhail A., Irina D. Pogozheva, Hyeon Joo, Henry I. Mosberg, and Andrei L. Lomize. 2012. “OPM Database and PPM Web Server: Resources for Positioning of Proteins in Membranes.” Nucleic Acids Research 40 (Database issue): D370–76.

    ——————————————————————————————————————————————————————————————————————————

    License:

    This work is licensed under a Creative Commons Attribution 4.0 International License (CC-BY 4.0).

  7. S

    AF-M predictions accompanying the manuscript: Predictomes: A...

    • data.sbgrid.org
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schmid, Ernst; Walter, Johannes; Schmid, Ernst; Walter, Johannes (2025). AF-M predictions accompanying the manuscript: Predictomes: A classifier-curated database of AlphaFold-modeled protein-protein interactions [Dataset]. http://doi.org/10.15785/SBGRID/1155
    Explore at:
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    SBGrid Data Bank
    Authors
    Schmid, Ernst; Walter, Johannes; Schmid, Ernst; Walter, Johannes
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    AF-M predictions accompanying the manuscript: Predictomes: A classifier-curated database of AlphaFold-modeled protein-protein interactions : The set of all AlphaFold multimer (AF-M) v2.3 pairwise structure predictions accompanying the publication: Predictomes: A classifier-curated database of AlphaFold-modeled protein-protein interactions. This dataset includes prediction pairs used for training random forest classifiers including SPOC, pairs used for 30 ranking experiments, all pairs that belong to the genome maintenance matrix on predictomes.org, and three proteome wide in-silico interaction screens conducted with human DONSON, human STK19, and human USP37. All pairs were generated with ColabFold v1.5.2. All our predictions used AF-M multimer version 3 weights models 1, 2, and 4 with 3 recycles, templates enabled, 1 ensemble, no dropout, and no AMBER relaxation. The Multiple Sequence Alignments (MSAs) (unpaired + paired) supplied to AF-M were generated by the MMSeqs2 server using default settings. Sequences run were generally capped at 3,600 amino acids total to avoid memory exhaustion on GPUs. ;

  8. AlphaFold Predictions

    • data.niaid.nih.gov
    Updated Nov 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AlphaFold (2022). AlphaFold Predictions [Dataset]. https://data.niaid.nih.gov/resources?id=ds_dc1c8cb24c
    Explore at:
    Dataset updated
    Nov 9, 2022
    Dataset provided by
    European Bioinformatics Institutehttp://www.ebi.ac.uk/
    Authors
    AlphaFold
    Description

    AlphaFold is an artificial intelligence created by DeepMind that predicts protein structure from amino acid sequences. AlphaFold has worked with EMBL-EBI to create a publicly available database of structural predictions.

  9. Z

    The comparison of the AlphaFold and SwissModel Repository databases

    • data.niaid.nih.gov
    Updated Mar 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arthur Zalevsky (2023). The comparison of the AlphaFold and SwissModel Repository databases [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7709896
    Explore at:
    Dataset updated
    Mar 9, 2023
    Dataset provided by
    Shemyakin–Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russian Federation
    Authors
    Arthur Zalevsky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset supplements the code at https://github.com/aozalevsky/alphafold2_vs_swissmodel for the comparison of the AlphaFold2 database (https://alphafold.ebi.ac.uk) with the SwissModel Repository (https://swissmodel.expasy.org/repository). Results of the analysis were published as part of the AlphaFold community review https://www.nature.com/articles/s41594-022-00849-w

  10. Z

    AlphaFold structures with AlphaMissense scores

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hegedűs, Tamás (2024). AlphaFold structures with AlphaMissense scores [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10023059
    Explore at:
    Dataset updated
    Apr 9, 2024
    Dataset provided by
    Semmelweis University
    Authors
    Hegedűs, Tamás
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    These repository provides:

    NEW: AFwAM-pdb-qb.tar file including pdb.gz files for human protein structures from the AlphaFoldDb with occupancy column set to residue-wise mean of all a.a. variations and temperature factor column set to residue-wise mean of single nucleotde variations; a PyMOL plugin file (coloram-qb.py) for coloring these structures (coloram column=b or coloram column=q; b is the default)

    AFwAM-pdb.tar file including pdb.gz files for human protein structures from the AlphaFoldDb with occupancy and temperature factor columns set to residue-wise mean of AlphaMissense scores;

    A PyMOL plugin file (coloram.py) for coloring these structures;

    For data, Python scripts, and notebooks, please refer to the pub.zip file; detailed instructions are provided in the README.md within this archive and further explained in our manuscript.

    For alternative data access, please visit https://alphamissense.hegelab.org.

    Disclaimer: The AlphaMissense Database and other information provided on or linked to this site is for theoretical modelling only, caution should be exercised in use. It is provided "as-is" without any warranty of any kind, whether express or implied. For clarity, no warranty is given that use of the information shall not infringe the rights of any third party (and this disclaimer takes precedence over any contrary provisions in the Google Cloud Platform Terms of Service). The information provided is not intended to be a substitute for professional medical advice, diagnosis, or treatment, and does not constitute medical or other professional advice.

    Data contained within the AlphaMissense Database is provided for non-commercial research use only under CC BY-NC-SA 4.0 license.

    DeepMind - AlphaMissense: https://doi.org/10.1126/science.adg7492

  11. Data from: MsmRho AlphaFold predictions

    • ourarchive.otago.ac.nz
    • figshare.com
    Updated Jun 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia Magalhaes Moreira (2024). MsmRho AlphaFold predictions [Dataset]. https://ourarchive.otago.ac.nz/esploro/outputs/dataset/MsmRho-AlphaFold-predictions/9926653800801891
    Explore at:
    Dataset updated
    Jun 24, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Sofia Magalhaes Moreira
    Time period covered
    Jun 24, 2024
    Description

    This folder contains the files in cif format generated by AlphaFold 3 to build Figure 4.1 of the thesis of Sofia Megalhães Moreira - https://hdl.handle.net/10523/43234. The data is embargoed in Figshare until 24 June 2026.

  12. e

    AlphaFold

    • ebi.ac.uk
    Updated Mar 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). AlphaFold [Dataset]. https://www.ebi.ac.uk/ebisearch/search.ebi?db=allebi&t=SPCH
    Explore at:
    Dataset updated
    Mar 27, 2019
    Description

    AlphaFold DB provides open access to over 200 million protein structure predictions to accelerate scientific research.

  13. f

    Data from: af3cli: Streamlining AlphaFold3 Input Preparation

    • acs.figshare.com
    zip
    Updated Apr 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philipp Döpner; Stefan Kemnitz; Mark Doerr; Lukas Schulig (2025). af3cli: Streamlining AlphaFold3 Input Preparation [Dataset]. http://doi.org/10.1021/acs.jcim.5c00276.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 9, 2025
    Dataset provided by
    ACS Publications
    Authors
    Philipp Döpner; Stefan Kemnitz; Mark Doerr; Lukas Schulig
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    With the release of AlphaFold3, modeling capabilities have expanded beyond protein structure prediction to embrace the inherent complexity of biomolecular systems, including nucleic acids, ions, small molecules, and their interactions. The increased complexity of these assemblies is reflected in the input file generation process, presenting a significant hurdle for researchers without advanced computational expertise. While AlphaFold Server comes with a user-friendly graphical user interface, it supports only a subset of the features of AlphaFold3. To address this, we present af3cli, an open-source tool designed to facilitate the generation of AlphaFold3 input files, specifically tailored to the standalone version of AlphaFold3 and its unrestricted functionality. Featuring a user-friendly command-line interface and an accompanying Python library, af3cli simplifies the input generation process while maintaining flexibility and customization, which makes af3cli especially useful for fast (automated) generation of a large number of input files since it enables direct incorporation of FASTA files, keeps track of IDs, and validates the JSON file. Through practical examples, we demonstrate its capabilities for constructing input data for diverse biological structures, ranging from simple proteins to complex systems, and demonstrate its seamless integration into both manual and automated workflows.

  14. d

    UltraScan Solution Modeler (US-SOMO) hydrodynamic parameter, structural...

    • dataone.org
    • search.dataone.org
    • +2more
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emre Brookes; Mattia Rocco (2025). UltraScan Solution Modeler (US-SOMO) hydrodynamic parameter, structural small angle scattering and SESCA circular dichroism (CD) calculations on AlphaFold predicted structures [Dataset]. http://doi.org/10.5061/dryad.jq2bvq89s
    Explore at:
    Dataset updated
    Jul 17, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Emre Brookes; Mattia Rocco
    Time period covered
    Jan 1, 2021
    Description

    Recent spectacular advances by AI programs in 3D structure predictions from protein sequences have revolutionized the field in terms of accuracy and speed. The resulting "folding frenzy" has already produced predicted protein structure databases for the entire human and other organisms' proteomes. However, rapidly ascertaining a predicted structure's reliability based on measured properties in solution should be considered. Shape-sensitive hydrodynamic parameters such as the diffusion and sedimentation coefficients (D0t(20,w),s0(20,w)) and the intrinsic viscosity ([η]) can provide a rapid assessment of the overall structure likeliness, and SAXS would yield the structure-related pair-wise distance distribution function p(r) vs. r. Using the extensively validated UltraScan SOlution MOdeler (US‑SOMO) suite, a database was implemented calculating from AlphaFold structures the corresponding D0t(20,w), s0(20,w), [η], p(r) vs. r, and other parameters. Circular dichroism spectra were computed u..., Production of this dataset required three major steps: collect the AlphaFold entries and additional metadata; prepare the structures for hydrodynamic, structural and CD calculations; and compute the hydrodynamic, structural and CD propertiesBriefly, each entry in the entire AlphaFold database was first compared with the corresponding entry in the UniProt database to find the (putative) initiator methionine, signal peptide and transit peptide regions, which were subsequently removed from the AlphaFold PDB files. Additional variants were created when propeptides were found. Potential disulfides were identified (subsequently allowing a better evaluation of the partial specific volume and of M) and written as SSBOND records in the cured PDBs, together with HELIX and SHEET information identified using the DSSP implementation in UCSF Chimera (Pettersen et al, 2004. Journal of computational chemistry, 25(13), pp.1605-1612). Batch-mode US-SOMO was then used to calculate the mass M, The translat..., This is a tar archive of all datasets for each AlphaFold entry. This includes a csv file containing all hydrodynamic parameters, a pdb file containing the cured pdb structure, an mmCIF file containing the cured pdb structure and a data file containing the circular dichroism spectrum, and a p(r) vs r dat file.Use "tar xf somoaf_all_data.tar" to extract the primary archive.This will result in 1,002,038 individual .txz file, each representing one UniProt accession code and containing 5 files.When propepties are identified and removed, the extracted file name will contain a -pp#, where # is a list of the propepties removed.For example, to extract the data from an individual txz file, use "tar Jxf xxxx.txz", where xxxx is replaced by the appropriate name containing the accession code. Further details are in the provided README.md file.

  15. Data from: Structure-guided isoform identification for the human...

    • figshare.com
    bin
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Markus Sommer; Sooyoung Cha; Ales Varabyou; Natalia Rincon; Sukhwan Park; Ilia Minkin; Mihaela Pertea; Martin Steinegger; Steven L. Salzberg (2023). Structure-guided isoform identification for the human transcriptome [Dataset]. http://doi.org/10.6084/m9.figshare.21802476.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Markus Sommer; Sooyoung Cha; Ales Varabyou; Natalia Rincon; Sukhwan Park; Ilia Minkin; Mihaela Pertea; Martin Steinegger; Steven L. Salzberg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Protein structure prediction files for the CHESS human protein structure database version 1.2. AlphaFold2/ColabFold predictions of the GTEx assembled human proteome.

  16. Phytoplasma AlphaFold-2 Structural Models from the AlphaFold Database (NCBI...

    • plos.figshare.com
    xlsx
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federico G. Mirkin; Sam T. Mugford; Vera Thole; Mar Marzo; Saskia A. Hogenhout (2025). Phytoplasma AlphaFold-2 Structural Models from the AlphaFold Database (NCBI Taxonomy ID: txid33926) Exhibiting SAP05-like folds. [Dataset]. http://doi.org/10.1371/journal.pgen.1011946.s024
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 13, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Federico G. Mirkin; Sam T. Mugford; Vera Thole; Mar Marzo; Saskia A. Hogenhout
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Phytoplasma AlphaFold-2 Structural Models from the AlphaFold Database (NCBI Taxonomy ID: txid33926) Exhibiting SAP05-like folds.

  17. r

    Predictomes

    • rrid.site
    • scicrunch.org
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Predictomes [Dataset]. http://identifiers.org/RRID:SCR_026691
    Explore at:
    Dataset updated
    Mar 31, 2025
    Description

    Interactive database of protein protein interactions modeled by AlphaFold multimer. Classifier-curated database of AlphaFold-modeled protein-protein interactions.

  18. d

    Data from: Deep Green Unannotated Protein Structures

    • catalog.data.gov
    • data.openei.org
    • +1more
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Renewable Energy Laboratory (2025). Deep Green Unannotated Protein Structures [Dataset]. https://catalog.data.gov/dataset/deep-green-unannotated-protein-structures-ba494
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    National Renewable Energy Laboratory
    Description

    The Deep Green list is based on the identification and curation of conserved unannotated proteins in three green lineage (Viridiplantae) model organisms; Arabidopsis thaliana, Chlamydomonas reinhardtii, and Setaria viridis. Preliminary characterization of Deep Green proteins and genes was done using various informatics tools and published data sets and is presented in Knoshaug, Sun, et al., 2023, submitted. The structures of these unannotated proteins were also predicted using AlphaFold (Jumper et al., 2021). The data deposited here are the AlphaFold structural predictions having the highest pLDDT score and thus identified as the best folded structure (ranked_0). These data enable others to do in-depth structural characterizations to aid in functional characterization leading to deeper understanding of plant biology. References: Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., Back, T., Petersen, S., Reiman, D., Clancy, E., Zielinski, M., Steinegger, M., Pacholska, M., Berghammer, T., Bodenstein, S., Silver, D., Vinyals, O., Senior, A. W., Kavukcuoglu, K., Kohli, P. and Hassabis, D. (2021) Highly accurate protein structure prediction with AlphaFold. Nature, 596:583-589. Knoshaug, E. P., Sun, P., Nag, A., Nguyen, H., Mattoon, E. M., Zhang, N., Liu, J., Chen, C., Cheng, J., Zhang, R., St. John, P., and Umen, J. (submitted) Identification and preliminary characterization of conserved uncharacterized proteins from Chlamydomonas reinhardtii, Arabidopsis thaliana, and Setaria viridis.

  19. q

    Using AlphaFold to predict the PDB structure of CFTR G551D

    • qubeshub.org
    Updated Aug 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keith Johnson (2025). Using AlphaFold to predict the PDB structure of CFTR G551D [Dataset]. http://doi.org/10.25334/HMZY-ZM20
    Explore at:
    Dataset updated
    Aug 13, 2025
    Dataset provided by
    QUBES
    Authors
    Keith Johnson
    Description

    This brief worksheet explains how to generate a protein structure of a mutant that is currently not in the PDB database. Using AlphaFold 3 and the wild-type sequence (uniprot or other resource), the mutant protein structure can be predicted in a variety of formats. In this example, the G551D mutation of CFTR, which influences ATP binding, is illustrated.

  20. List of Mollicute AlphaFold-2 structural models in the AlphaFold database...

    • plos.figshare.com
    xlsx
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federico G. Mirkin; Sam T. Mugford; Vera Thole; Mar Marzo; Saskia A. Hogenhout (2025). List of Mollicute AlphaFold-2 structural models in the AlphaFold database (NCBI Taxonomy ID: txid544448). [Dataset]. http://doi.org/10.1371/journal.pgen.1011946.s028
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 13, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Federico G. Mirkin; Sam T. Mugford; Vera Thole; Mar Marzo; Saskia A. Hogenhout
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    List of Mollicute AlphaFold-2 structural models in the AlphaFold database (NCBI Taxonomy ID: txid544448).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2021). AlphaFold Protein Structure Database [Dataset]. http://identifiers.org/RRID:SCR_023662

AlphaFold Protein Structure Database

RRID:SCR_023662, r3d100013615, AlphaFold Protein Structure Database (RRID:SCR_023662), AlphaFold DB

Explore at:
41 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Nov 19, 2021
Description

Database of protein structure predictions by AlphaFold that are freely and openly available to global scientific community. Included are nearly all catalogued proteins known to science. Provides programmatic access to and interactive visualization of predicted atomic coordinates, per residue and pairwise model confidence estimates and predicted aligned errors.

Search
Clear search
Close search
Google apps
Main menu