39 datasets found
  1. s

    AlphaFold Protein Structure Database

    • scicrunch.org
    • rrid.site
    Updated Nov 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). AlphaFold Protein Structure Database [Dataset]. http://identifiers.org/RRID:SCR_023662
    Explore at:
    Dataset updated
    Nov 19, 2021
    Description

    Database of protein structure predictions by AlphaFold that are freely and openly available to global scientific community. Included are nearly all catalogued proteins known to science. Provides programmatic access to and interactive visualization of predicted atomic coordinates, per residue and pairwise model confidence estimates and predicted aligned errors.

  2. AlphaFold Protein Structure Database

    • console.cloud.google.com
    Updated Aug 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=en-GB (2023). AlphaFold Protein Structure Database [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/deepmind-alphafold?hl=en-GB
    Explore at:
    Dataset updated
    Aug 9, 2023
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    License
    Description

    The AlphaFold Protein Structure Database is a collection of protein structure predictions made using the machine learning model AlphaFold. AlphaFold was developed by DeepMind , and this database was created in partnership with EMBL-EBI . For information on how to interpret, download and query the data, as well as on which proteins are included / excluded, and change log, please see our main dataset guide and FAQs . To interactively view individual entries or to download proteomes / Swiss-Prot please visit https://alphafold.ebi.ac.uk/ . The current release aims to cover most of the over 200M sequences in UniProt (a commonly used reference set of annotated proteins). The files provided for each entry include the structure plus two model confidence metrics (pLDDT and PAE). The files can be found in the Google Cloud Storage bucket gs://public-datasets-deepmind-alphafold-v4 with metadata in the BigQuery table bigquery-public-data.deepmind_alphafold.metadata . If you use this data, please cite: Jumper, J et al. Highly accurate protein structure prediction with AlphaFold. Nature (2021) Varadi, M et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research (2021) This public dataset is hosted in Google Cloud Storage and is available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage.

  3. Z

    Prediction and Visualization of Human Transmembrane Proteins using AlphaFold...

    • data.niaid.nih.gov
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marquet, Céline; Grekova, Anastasia; Houri, Leen; Heinzinger, Michael; Rost, Burkhard (2024). Prediction and Visualization of Human Transmembrane Proteins using AlphaFold and Protein Language Models [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6816082
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Technical University Munich
    Authors
    Marquet, Céline; Grekova, Anastasia; Houri, Leen; Heinzinger, Michael; Rost, Burkhard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description: TMvis ("TMvis496.tar.gz") is a dataset containing 496 3D-structures of predicted human transmembrane proteins (TMP) and their predicted membrane embedding. The method TMbed [1], based on the protein language model ProtT5 [2] predicted 4.967 TMP for the human proteome (20,375 proteins, UniProt [3] version April 2022; excluding TITIN_HUMAN due to length). For these proteins, we obtained AlphaFold [4] structures from AlphaFoldDB [5] with an average per-residue confidence score (pLDDT) of more than 90%. This resulted in the 496 proteins of TMvis, as can be found in "TMvis496.fasta". The membrane embedding was predicted using the methods ANVIL [6], PPM3 [7], and per-residue TMbed predictions. As the three methods are based on different approaches, we decided to publish results for all. The figure “TMvis_project_overview.png” provides a graphical overview for each step described above.

    TMvis Folder Structure: TMvis is separated into “alpha” containing predicted alpha-helical TMPs, and “beta” containing predicted beta-barrel TMPs. Within these folders, each protein is assigned one folder, identifiable by the respective unique UniProt ID. Each protein folder consists of: - “UniprotID.fasta” with UniProt ID, sequence, TMbed per-residue prediction - “AF-UniprotID-F1-model_v2.pdb” with the AlphaFold structure - “AF-UniprotID-F1-model_v2.cif” with the AlphaFold structure - “AF-UniprotID-F1-model_v2_ANVIL.pdb” with predicted ANVIL membrane embedding - “AF-UniprotID-F1-model_v2_ppm.pdb” predicted PPM3 membrane embedding

    TMvis
    |
    ├── alpha
    │ │
    │ ├── A0A087X1C5
    │ │ ├── A0A087X1C5.fasta
    │ │ ├── AF-A0A087X1C5-F1-model_v2.pdb
    │ │ ├── AF-A0A087X1C5-F1-model_v2.cif
    │ │ ├── AF-A0A087X1C5-F1-model_v2_ANVIL.pdb
    │ │ └── AF-A0A087X1C5-F1-model_v2_ppm.PDB
    │ └── ...
    └── beta
    └── P45880

    TMvis visualization: The 3D-visualization of every protein in the dataset TMvis can be easily accessed using the Jupyter Notebook “TMvis.ipynb”. It contains detailed descriptions the different membrane prediction tools ANVIL, PPM3, and TMbed as well as the respective code. Additionally, it allows to visualize the per-residue confidence scores (pLDDT) of AlphaFold.

    ——————————————————————————————————————————————————————————————————————————

    References:

    [1] TMbed - TMbed Bernhofer, Michael, and Burkhard Rost. 2022. “TMbed – Transmembrane Proteins Predicted through Language Model Embeddings.” bioRxiv.

    [2] ProtT5 - A. Elnaggar et al., "ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2021.3095381.

    [3] UniProt - UniProt Consortium (2021). UniProt: the universal protein knowledgebase in 2021. Nucleic acids research, 49(D1), D480–D489.

    [4] AlphaFold - AlphaFold Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, et al. 2021. “Highly Accurate Protein Structure Prediction with AlphaFold.” Nature 596 (7873): 583–89.

    [5] Alphafold DB - Varadi, Mihaly, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, et al. 2022. “AlphaFold Protein Structure Database: Massively Expanding the Structural Coverage of Protein-Sequence Space with High-Accuracy Models.” Nucleic Acids Research 50 (D1): D439–44.

    [6] ANVIL - ANVIL Postic, Guillaume, Yassine Ghouzam, Vincent Guiraud, and Jean-Christophe Gelly. 2016. “Membrane Positioning for High- and Low-Resolution Protein Structures through a Binary Classification Approach.” Protein Engineering, Design & Selection: PEDS 29 (3): 87–91.

    [7] PPM3 - PPM3 Lomize, Mikhail A., Irina D. Pogozheva, Hyeon Joo, Henry I. Mosberg, and Andrei L. Lomize. 2012. “OPM Database and PPM Web Server: Resources for Positioning of Proteins in Membranes.” Nucleic Acids Research 40 (Database issue): D370–76.

    ——————————————————————————————————————————————————————————————————————————

    License:

    This work is licensed under a Creative Commons Attribution 4.0 International License (CC-BY 4.0).

  4. AlphaFold structures with AlphaMissense scores

    • zenodo.org
    • data.niaid.nih.gov
    tar, text/x-python +1
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tamás Hegedűs; Tamás Hegedűs (2023). AlphaFold structures with AlphaMissense scores [Dataset]. http://doi.org/10.5281/zenodo.10171520
    Explore at:
    tar, zip, text/x-pythonAvailable download formats
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tamás Hegedűs; Tamás Hegedűs
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    These repository provides:

    1. A file including pdb.gz files for human protein structures from the AlphaFoldDb with occupancy and temperature factor columns set to residue-wise mean of AlphaMissense scores.
    2. A PyMOL plugin file (coloram.py) for coloring these structures.
    3. Python scripts and additional files (e.g. Jinja2 templates, matplotlib styles) are in pub.zip.

    Please see details at https://alphamissense.hegelab.org

    Disclaimer: The AlphaMissense Database and other information provided on or linked to this site is for theoretical modelling only, caution should be exercised in use. It is provided "as-is" without any warranty of any kind, whether express or implied. For clarity, no warranty is given that use of the information shall not infringe the rights of any third party (and this disclaimer takes precedence over any contrary provisions in the Google Cloud Platform Terms of Service). The information provided is not intended to be a substitute for professional medical advice, diagnosis, or treatment, and does not constitute medical or other professional advice.

    Data contained within the AlphaMissense Database is provided for non-commercial research use only under CC BY-NC-SA 4.0 license.

    DeepMind - AlphaMissense: https://doi.org/10.1126/science.adg7492

  5. Large protein databases reveal structural complementarity and functional...

    • figshare.com
    bin
    Updated Jul 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paweł Szczerbiak; Tomasz Kosciolek; Lukasz Szydlowski; Witold Wydmański; P. Douglas Renfrew; Julia Koehler Leman (2025). Large protein databases reveal structural complementarity and functional locality [Dataset]. http://doi.org/10.6084/m9.figshare.27203073.v3
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Paweł Szczerbiak; Tomasz Kosciolek; Lukasz Szydlowski; Witold Wydmański; P. Douglas Renfrew; Julia Koehler Leman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recent breakthroughs in protein structure prediction have led to an unprecedented surge in high-quality 3D models, highlighting the need for efficient computational solutions to manage and analyze this wealth of structural data. In our work, we comprehensively examine the structural clusters obtained from the AlphaFold Protein Structure Database (AFDB), a high-quality subset of ESMAtlas, and the Microbiome Immunity Project (MIP). We create a single cohesive low-dimensional representation of the resulting protein space. Our results show that, while each database occupies distinct regions within the protein structure space, they collectively exhibit significant overlap in their functional profiles. High-level biological functions tend to cluster in particular regions, revealing a shared functional landscape despite the diverse sources of data. By creating a single, cohesive low-dimensional representation of protein structure space integrating data from diverse sources, localizing functional annotations within this space, and providing an open-access web-server for exploration, this work offers insights for future research concerning protein sequence-structure-function relationships, enabling various biological questions to be asked about taxonomic assignments, environmental factors, or functional specificity. This approach is generalizable to other or future datasets, enabling further discovery beyond findings presented here.

  6. AlphaFold Predictions

    • data.niaid.nih.gov
    Updated Nov 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AlphaFold (2022). AlphaFold Predictions [Dataset]. https://data.niaid.nih.gov/resources?id=ds_dc1c8cb24c
    Explore at:
    Dataset updated
    Nov 9, 2022
    Dataset provided by
    European Bioinformatics Institutehttp://www.ebi.ac.uk/
    Authors
    AlphaFold
    Description

    AlphaFold is an artificial intelligence created by DeepMind that predicts protein structure from amino acid sequences. AlphaFold has worked with EMBL-EBI to create a publicly available database of structural predictions.

  7. e

    AlphaFold

    • ebi.ac.uk
    Updated Mar 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). AlphaFold [Dataset]. https://www.ebi.ac.uk/ebisearch/search.ebi?db=allebi&t=SPCH
    Explore at:
    Dataset updated
    Mar 27, 2019
    Description

    AlphaFold DB provides open access to over 200 million protein structure predictions to accelerate scientific research.

  8. AlphaFold3 ensembles of 100 randomly selected human proteins with 100...

    • zenodo.org
    zip
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gunnar Jeschke; Gunnar Jeschke (2025). AlphaFold3 ensembles of 100 randomly selected human proteins with 100 conformers per ensemble [Dataset]. http://doi.org/10.5281/zenodo.14609656
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gunnar Jeschke; Gunnar Jeschke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 7, 2025
    Description

    100 proteins were selected from the set of human proteins covered by the AlphaFold Protein Structure Database. For each protein, 20 random seed values in the range form 1 to 999999999 were generated. Based on the canonical peptide sequence of the protein, json input files for the AlphaFold3 server were created and then manually submitted. The output was manually downloaded and processed with Matlab scripts that are contained in the folders for the individual proteins. This processing requires MMMx, which is available at https://github.com/gjeschke/MMMx. The main folder contains two additional Matlab scripts for analysis of the whole set of ensembles.

  9. c

    Protein Structural Domain Classification

    • cathdb.info
    • ec.i4cologne.com
    • +3more
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Protein Structural Domain Classification [Dataset]. http://identifiers.org/MIR:00100005
    Explore at:
    Dataset updated
    Sep 30, 2024
    Description

    CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.

  10. Data from: PreTA-mediated metabolism of 5-Fluorouracil by intratumoral...

    • figshare.com
    txt
    Updated Sep 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weiben Xu (2025). PreTA-mediated metabolism of 5-Fluorouracil by intratumoral Citrobacter freundii drives chemoresistance in pancreatic cancer [Dataset]. http://doi.org/10.6084/m9.figshare.30214981.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 26, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Weiben Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AlphaFold2 structure prediction and FoldSeek-based homology analysis of PreT/PreA proteins The protein structures of E. coli PreT (UniProt: P76440) and PreA (UniProt: P25889), P. fluorescens PreT (UniProt: A0A5E7TWM2) and PreA (UniProt: A0A854XEI5), and human DPD (UniProt: Q12882) were retrieved from the AlphaFold protein structure database for subsequent homology comparison. Additionally, PreT and PreA protein sequences of C. freundii, L. reuteri, and P. putida strains were obtained from the NCBI protein database. These sequences in FASTA format were used as input for AlphaFold2 to predict their 3D structures and generate PDB files. Based on the AlphaFold2 structures of C. freundii PreT and PreA, the PreTA protein complex was constructed in PyMOL. Structural homology was evaluated using the FoldSeek online tool. Specifically, the C. freundii PreTA complex was compared with human DPD. Furthermore, the PreT and PreA protein structures of E. coli, L. reuteri, P. putida, and P. fluorescens strains were uploaded to the FoldSeek platform and compared with the PreT and PreA proteins of C. freundii strain to assess the structural similarity. The multiple sequence alignments (MSA) local distance difference test (LDDT) scores25 were used to assess the difference between local distances and predicted structures.

  11. The Encyclopedia of Domains (TED) structural domains assignments for...

    • zenodo.org
    Updated Mar 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andy Lau; Andy Lau; Nicola Bordin; Nicola Bordin; Shaun Kandathil; Shaun Kandathil; Ian Sillitoe; Ian Sillitoe; Vaishali Waman; Vaishali Waman; Jude Wells; Jude Wells; Christine Orengo; Christine Orengo; David T Jones; David T Jones (2024). The Encyclopedia of Domains (TED) structural domains assignments for AlphaFold Database v4 [Dataset]. http://doi.org/10.5281/zenodo.10848710
    Explore at:
    Dataset updated
    Mar 27, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andy Lau; Andy Lau; Nicola Bordin; Nicola Bordin; Shaun Kandathil; Shaun Kandathil; Ian Sillitoe; Ian Sillitoe; Vaishali Waman; Vaishali Waman; Jude Wells; Jude Wells; Christine Orengo; Christine Orengo; David T Jones; David T Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset description:

    The Encyclopedia of Domains (TED) is a joint effort by CATH (Orengo group) and the Jones group at University College London to identify and classify protein domains in AlphaFold2 models from AlphaFold Database version 4, covering over 188 million unique sequences and 324 million domain assignments.

    In this data release, we will be making available to the community a table of domain boundaries and additional metadata on quality (pLDDT, globularity, number of secondary structures), taxonomy and putative CATH SuperFamily or Fold assignments for all 324 million domains in TED100.

    For all chains in the TED-redundant dataset, the attached file contains boundaries predictions, consensus level and information on the TED100 representative.

    Additionally, an archive with chain-level consensus domain assignments are available for 21 model organisms and 25 global health proteomes:

    Organism TaxonID

    arabidopsis_thaliana 3702
    caenorhabditis_elegans 6239
    candida_albicans 237561
    danio_rerio 7955
    dictyostelium_discoideum 44689
    drosophila_melanogaster 7227
    escherichia_coli 83333
    glycine_max 3847
    homo_sapiens 9606
    methanocaldococcus_jannaschii 243232
    mus_musculus 10090
    oryza_sativa 39947
    rattus_norvegicus 10116
    saccharomyces_cerevisiae 559292
    schizosaccharomyces_pombe 284812
    zea_mays 4577
    ajellomyces_capsulatus 447093
    brugia_malayi 6279
    campylobacter_jejuni 192222
    cladophialophora_carrionii 86049
    dracunculus_medinensis 318479
    fonsecaea_pedrosoi 1442368
    haemophilus_influenzae 71421
    helicobacter_pylori 85962
    klebsiella_pneumoniae 1125630
    leishmania_infantum 5671
    madurella_mycetomatis 100816
    mycobacterium_leprae 272631
    mycobacterium_tuberculosis 83332
    mycobacterium_ulcerans 1299332
    neisseria_gonorrhoeae 242231
    nocardia_brasiliensis 1133849
    onchocerca_volvulus 6282
    paracoccidioides_lutzii 502779
    plasmodium_falciparum 36329
    pseudomonas_aeruginosa 208964
    salmonella_typhimurium 99287
    schistosoma_mansoni 6183
    shigella_dysenteriae 300267
    sporothrix_schenckii 1391915
    staphylococcus_aureus 93061
    streptococcus_pneumoniae 171101
    strongyloides_stercoralis 6248
    trypanosoma_brucei 185431
    trypanosoma_cruzi 353153
    wuchereria_bancrofti 6293


    For both TED100 and TEDredundant we provide domain boundaries predictions outputted by each of the three methods employed in the project (Chainsaw, Merizo, UniDoc).

    We are making available 7,427 novel folds PDB files, identified during the TED classification process with an annotation table sorted by novelty.


    This dataset contains:

    • ted_100_324m.domain_summary.cath.globularity.taxid.tsv and novel_folds_set.domain_summary.tsv are header-less with the following columns separated by tabs (.tsv).
    • novel_folds_set.domain_summary.tsv is sorted by novelty.
      1. ted_id
      2. md5_domain
      3. consensus_level
      4. chopping
      5. nres_domain
      6. num_segments
      7. plddt
      8. num_helix_strand_turn
      9. num_helix
      10. num_strand
      11. num_helix_strand
      12. num_turn
      13. proteome_id
      14. cath_label
      15. cath_assignment_level
      16. cath_assignment_method
      17. packing_density
      18. norm_rg
      19. tax_common_name
      20. tax_scientific_name
      21. tax_lineage
    • Domain assignments for TED redundant in ted_redundant_39m.consensus_domain_summary.taxid.tsv
      The file contains a header with the following fields. Each column is tab separated (.tsv).
      1. TED_redundant_id
      2. md5
      3. nres
      4. n_high
      5. n_med
      6. high_consensus
      7. med_consensus
      8. ndom_consensus
      9. n_targets
      10. proteome_id
      11. TED_redundant_species
      12. TED100_chain_rep
      13. TED100_chain_rep_species
    • novel_folds_set_models.tar.gz contains PDB files of all novel folds identified in TED100.
    • All per-tool domain boundaries predictions are in the same format with the following columns.
      1. TED_chainID
      2. TED_chain_md5
      3. TED_chain_length
      4. ndoms
      5. Domain boundaries
      6. Prediction probability
    • Domain boundaries predictions share the same format, with each segment separated by '_' and segment boundaries (start,stop) separated by '-'

      i.e.domain prediction by Merizo for AF-A0A000-F1-model_v4
      AF-A0A000-F1-model_v4 e8872c7a0261b9e88e6ff47eb34e4162 394 2 10-52_289-394,53-288 0.90077

      Merizo predicts one continuous domain and a discontinuous domain,
      Domain1 (discontinuous): 10-52_289-394
      segment1: 10-52
      segment2: 289-394
      Domain 2 (continuous):
      segment 1: 53-288
    • model_organisms_and_global_health_proteomes.tar.gz - domain assignments for 21 model organisms and 25 global health proteomes
  12. c

    Reciprocal Best Structure Hits (RBSH)

    • repository.cam.ac.uk
    bin
    Updated Sep 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monzon, Vivian; Paysan-Lafosse, Typhaine; Wood, Valerie; Bateman, Alex (2022). Reciprocal Best Structure Hits (RBSH) [Dataset]. http://doi.org/10.17863/CAM.87873
    Explore at:
    bin(171535 bytes), bin(155431 bytes), bin(79489 bytes), bin(84547 bytes), bin(39107 bytes)Available download formats
    Dataset updated
    Sep 22, 2022
    Dataset provided by
    Apollo
    University of Cambridge
    Authors
    Monzon, Vivian; Paysan-Lafosse, Typhaine; Wood, Valerie; Bateman, Alex
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this work, we are using AlphaFold structure models to find the closest homologues proteins between Homo sapiens and D. melanogaster, C. elegans, S. cerevisiae and S. pombe as well as between S. cerevisiae and S. pombe. We are using the structure aligner Foldseek to run all against all and search for the best scoring hit in both directions to detect the Reciprocal Best Structure Hits (RBSH). We compare the results to protein pairs detected by their sequence similarity as Reciprocal Best Hits (RBH) and verify the results using the PANTHER family classification files. \( \ \) Note: This dataset is an updated version of the dataset at https://doi.org/10.17863/CAM.85487.

  13. Data from: Structure-guided isoform identification for the human...

    • figshare.com
    bin
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Markus Sommer; Sooyoung Cha; Ales Varabyou; Natalia Rincon; Sukhwan Park; Ilia Minkin; Mihaela Pertea; Martin Steinegger; Steven L. Salzberg (2023). Structure-guided isoform identification for the human transcriptome [Dataset]. http://doi.org/10.6084/m9.figshare.21802476.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Markus Sommer; Sooyoung Cha; Ales Varabyou; Natalia Rincon; Sukhwan Park; Ilia Minkin; Mihaela Pertea; Martin Steinegger; Steven L. Salzberg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Protein structure prediction files for the CHESS human protein structure database version 1.2. AlphaFold2/ColabFold predictions of the GTEx assembled human proteome.

  14. Structural Models and Sequence Alignment Results of the Desulfovibrio...

    • osti.gov
    Updated Jul 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States) (2023). Structural Models and Sequence Alignment Results of the Desulfovibrio vulgaris Proteome [Dataset]. http://doi.org/10.13139/ORNLNCCS/1988139
    Explore at:
    Dataset updated
    Jul 12, 2023
    Dataset provided by
    Office of Sciencehttp://www.er.doe.gov/
    Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
    Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
    Description

    This dataset contains the structural models for the primary transcripts of the Desulfovibrio vulgaris proteome as well as sequence alignment results for a subset of the encoded proteins. For each protein, the five models inferred from AlphaFold 2 are provided. The largest pTM-scoring model for each protein was energy minimized; this minimized structure as well as its AlphaFold pickle output file are also provided. This set of structures represent an alternate source of models for the D. vulgaris proteome to those available in the AlphaFold Protein Structure Database (AFDB). This is a bit more complicated since the proteins reporting in the AFDB originate from an outdated form of the D. vulgaris sequence. The different versions of the D. vulgaris gene annotation are collected in the Chronology subdirectory; further consideration of these changes on the structural space of the proteome are currently underway. For proteins that have been annotated as "hypothetical", sequence alignment results from the HHblits and SAdLSA alignment methods are provided. These methods are often more capable to resolve sequence homology than other methods. Therefore, the results from both HHblits and SAdLSA are provided to identify possible homologs for these challenging proteins. Numerous sequence databases are utilized for these alignments.

  15. S

    AF-M predictions accompanying the manuscript: Predictomes: A...

    • data.sbgrid.org
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schmid, Ernst; Walter, Johannes; Schmid, Ernst; Walter, Johannes (2025). AF-M predictions accompanying the manuscript: Predictomes: A classifier-curated database of AlphaFold-modeled protein-protein interactions [Dataset]. http://doi.org/10.15785/SBGRID/1155
    Explore at:
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    SBGrid Data Bank
    Authors
    Schmid, Ernst; Walter, Johannes; Schmid, Ernst; Walter, Johannes
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    AF-M predictions accompanying the manuscript: Predictomes: A classifier-curated database of AlphaFold-modeled protein-protein interactions : The set of all AlphaFold multimer (AF-M) v2.3 pairwise structure predictions accompanying the publication: Predictomes: A classifier-curated database of AlphaFold-modeled protein-protein interactions. This dataset includes prediction pairs used for training random forest classifiers including SPOC, pairs used for 30 ranking experiments, all pairs that belong to the genome maintenance matrix on predictomes.org, and three proteome wide in-silico interaction screens conducted with human DONSON, human STK19, and human USP37. All pairs were generated with ColabFold v1.5.2. All our predictions used AF-M multimer version 3 weights models 1, 2, and 4 with 3 recycles, templates enabled, 1 ensemble, no dropout, and no AMBER relaxation. The Multiple Sequence Alignments (MSAs) (unpaired + paired) supplied to AF-M were generated by the MMSeqs2 server using default settings. Sequences run were generally capped at 3,600 amino acids total to avoid memory exhaustion on GPUs. ;

  16. q

    Using AlphaFold to predict the PDB structure of CFTR G551D

    • qubeshub.org
    Updated Aug 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keith Johnson (2025). Using AlphaFold to predict the PDB structure of CFTR G551D [Dataset]. http://doi.org/10.25334/HMZY-ZM20
    Explore at:
    Dataset updated
    Aug 13, 2025
    Dataset provided by
    QUBES
    Authors
    Keith Johnson
    Description

    This brief worksheet explains how to generate a protein structure of a mutant that is currently not in the PDB database. Using AlphaFold 3 and the wild-type sequence (uniprot or other resource), the mutant protein structure can be predicted in a variety of formats. In this example, the G551D mutation of CFTR, which influences ATP binding, is illustrated.

  17. Z

    Discoba protein sequences for protein structure predictions

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wheeler, Richard John (2021). Discoba protein sequences for protein structure predictions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5563073
    Explore at:
    Dataset updated
    Nov 13, 2021
    Dataset provided by
    University of Oxford
    Authors
    Wheeler, Richard John
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comprehensive database of Discoba protein sequences, gathered for the purpose of improving protein structure predictions of Discoba species (including Trypanosoma and Leishmania) by AlphaFold and RoseTTAFold. Originally gathered for use with: https://github.com/zephyris/discoba_alphafold

  18. Protein structure predictions

    • figshare.com
    txt
    Updated Sep 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meret Arter (2025). Protein structure predictions [Dataset]. http://doi.org/10.6084/m9.figshare.30071176.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 7, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Meret Arter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Protein structure predictions from the Alphafold database or generated with colabfold, as described in the article "Structure-informed evolutionary analysis of the meiotic recombination machinery".

  19. D

    Protein Structure Prediction AI Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Protein Structure Prediction AI Market Research Report 2033 [Dataset]. https://dataintelo.com/report/protein-structure-prediction-ai-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Protein Structure Prediction AI Market Outlook



    According to our latest research, the global protein structure prediction AI market size reached USD 1.42 billion in 2024, reflecting the surging adoption of AI-powered technologies in life sciences. The market is expected to expand robustly at a CAGR of 19.2% from 2025 to 2033, positioning the sector to attain a remarkable USD 6.08 billion by 2033. This growth is primarily driven by the escalating demand for accurate, rapid protein structure analysis in drug discovery, disease diagnosis, and personalized medicine, as well as the increasing convergence of computational biology and artificial intelligence.




    A key growth factor for the protein structure prediction AI market is the exponential increase in biological data generated by high-throughput sequencing and proteomics technologies. Traditional methods, such as X-ray crystallography and cryo-electron microscopy, are time-consuming and expensive, limiting their scalability. In contrast, AI-based protein structure prediction tools, leveraging deep learning and advanced machine learning algorithms, can analyze vast datasets, predict complex protein folds, and accelerate the overall research process. This technological leap is enabling pharmaceutical and biotechnology companies to shorten drug development cycles, reduce costs, and improve success rates in identifying novel therapeutic targets. The integration of AI into these workflows is transforming the landscape of structural biology and driving unprecedented growth in the market.




    Another significant driver is the growing collaboration between academia, research institutes, and industry stakeholders. Major breakthroughs, such as DeepMind’s AlphaFold, have demonstrated the potential of AI in predicting protein structures with near-experimental accuracy. This has led to increased investments from both public and private sectors, fostering innovation and the development of more sophisticated AI models. The proliferation of open-source platforms and databases is further democratizing access to protein structure prediction tools, empowering a wider range of researchers and organizations to leverage AI for scientific discovery. These collaborative efforts are catalyzing advancements across multiple disciplines, including drug discovery, disease diagnosis, and personalized medicine, thereby fueling market expansion.




    The rising prevalence of chronic and rare diseases, coupled with the urgent need for targeted therapies, is propelling the adoption of AI-driven protein structure prediction in clinical and translational research. Healthcare providers are increasingly relying on these technologies to understand disease mechanisms at the molecular level, identify biomarkers, and develop personalized treatment strategies. Furthermore, regulatory agencies are recognizing the value of AI in accelerating drug approval processes, creating a favorable environment for market growth. The convergence of AI, big data analytics, and cloud computing is also enabling real-time analysis and seamless integration of protein structure prediction tools into existing healthcare and research infrastructures, amplifying their impact across the value chain.




    Regionally, North America is expected to maintain its dominance in the protein structure prediction AI market, owing to its advanced healthcare infrastructure, substantial investments in AI research, and presence of leading pharmaceutical and biotechnology companies. Europe follows closely, driven by robust funding for life sciences and a strong emphasis on collaborative research. The Asia Pacific region is poised for the fastest growth, supported by increasing government initiatives, expanding healthcare expenditure, and a rapidly growing biotechnology sector. Latin America and the Middle East & Africa are gradually emerging as promising markets, benefiting from improving research capabilities and international collaborations. This dynamic regional landscape underscores the global significance and transformative potential of AI in protein structure prediction.



    Component Analysis



    The protein structure prediction AI market is segmented by component into software, hardware, and services, each playing a crucial role in driving the adoption and effectiveness of AI-based solutions. Software forms the backbone of this market, encompassing advanced algorithms, machine learning models, and user-friendly interfaces that enable researchers to predict protein stru

  20. d

    UltraScan Solution Modeler (US-SOMO) hydrodynamic parameter, structural...

    • dataone.org
    • search.dataone.org
    • +2more
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emre Brookes; Mattia Rocco (2025). UltraScan Solution Modeler (US-SOMO) hydrodynamic parameter, structural small angle scattering and SESCA circular dichroism (CD) calculations on AlphaFold predicted structures [Dataset]. http://doi.org/10.5061/dryad.jq2bvq89s
    Explore at:
    Dataset updated
    Jul 17, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Emre Brookes; Mattia Rocco
    Time period covered
    Jan 1, 2021
    Description

    Recent spectacular advances by AI programs in 3D structure predictions from protein sequences have revolutionized the field in terms of accuracy and speed. The resulting "folding frenzy" has already produced predicted protein structure databases for the entire human and other organisms' proteomes. However, rapidly ascertaining a predicted structure's reliability based on measured properties in solution should be considered. Shape-sensitive hydrodynamic parameters such as the diffusion and sedimentation coefficients (D0t(20,w),s0(20,w)) and the intrinsic viscosity ([η]) can provide a rapid assessment of the overall structure likeliness, and SAXS would yield the structure-related pair-wise distance distribution function p(r) vs. r. Using the extensively validated UltraScan SOlution MOdeler (US‑SOMO) suite, a database was implemented calculating from AlphaFold structures the corresponding D0t(20,w), s0(20,w), [η], p(r) vs. r, and other parameters. Circular dichroism spectra were computed u..., Production of this dataset required three major steps: collect the AlphaFold entries and additional metadata; prepare the structures for hydrodynamic, structural and CD calculations; and compute the hydrodynamic, structural and CD propertiesBriefly, each entry in the entire AlphaFold database was first compared with the corresponding entry in the UniProt database to find the (putative) initiator methionine, signal peptide and transit peptide regions, which were subsequently removed from the AlphaFold PDB files. Additional variants were created when propeptides were found. Potential disulfides were identified (subsequently allowing a better evaluation of the partial specific volume and of M) and written as SSBOND records in the cured PDBs, together with HELIX and SHEET information identified using the DSSP implementation in UCSF Chimera (Pettersen et al, 2004. Journal of computational chemistry, 25(13), pp.1605-1612). Batch-mode US-SOMO was then used to calculate the mass M, The translat..., This is a tar archive of all datasets for each AlphaFold entry. This includes a csv file containing all hydrodynamic parameters, a pdb file containing the cured pdb structure, an mmCIF file containing the cured pdb structure and a data file containing the circular dichroism spectrum, and a p(r) vs r dat file.Use "tar xf somoaf_all_data.tar" to extract the primary archive.This will result in 1,002,038 individual .txz file, each representing one UniProt accession code and containing 5 files.When propepties are identified and removed, the extracted file name will contain a -pp#, where # is a list of the propepties removed.For example, to extract the data from an individual txz file, use "tar Jxf xxxx.txz", where xxxx is replaced by the appropriate name containing the accession code. Further details are in the provided README.md file.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2021). AlphaFold Protein Structure Database [Dataset]. http://identifiers.org/RRID:SCR_023662

AlphaFold Protein Structure Database

RRID:SCR_023662, r3d100013615, AlphaFold Protein Structure Database (RRID:SCR_023662), AlphaFold DB

Explore at:
41 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Nov 19, 2021
Description

Database of protein structure predictions by AlphaFold that are freely and openly available to global scientific community. Included are nearly all catalogued proteins known to science. Provides programmatic access to and interactive visualization of predicted atomic coordinates, per residue and pairwise model confidence estimates and predicted aligned errors.

Search
Clear search
Close search
Google apps
Main menu