5 datasets found
  1. h

    oas_paired_human_sars_cov_2

    • huggingface.co
    Updated Aug 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian Loyal (2023). oas_paired_human_sars_cov_2 [Dataset]. https://huggingface.co/datasets/bloyal/oas_paired_human_sars_cov_2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 3, 2023
    Authors
    Brian Loyal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Paired SARS-COV-2 heavy/light chain sequences from the Observed Antibody Space database

    Human paired heavy/light chain amino acid sequences from the Observed Antibody Space (OAS) database obtained from SARS-COV-2 studies. https://opig.stats.ox.ac.uk/webapps/oas/ Please include the following citation in your work: Olsen, TH, Boyles, F, Deane, CM. Observed Antibody Space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Science.… See the full description on the dataset page: https://huggingface.co/datasets/bloyal/oas_paired_human_sars_cov_2.

  2. Z

    OASis peptide database

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laurence Fayadat-Dilman (2021). OASis peptide database [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_5164684
    Explore at:
    Dataset updated
    Aug 7, 2021
    Dataset provided by
    Daniel Svozil
    Veronica Juan
    Andrew Waight
    David Prihoda
    Danny A. Bitton
    Jad Maamary
    Laurence Fayadat-Dilman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OASis human 9-mer peptide database, generated from 118 million human antibody sequences from the Observed Antibody Space database.

    Attached is a gzipped SQLite database containing two tables: "peptides" and "subjects".

    Links:

    BioPhi codebase and documentation: https://github.com/Merck/BioPhi

    Public BioPhi server: https://biophi.dichlab.org

    OAS Database: http://opig.stats.ox.ac.uk/webapps/oas/

  3. ABodyBuilder2 predicted structures of paired antibody sequences from...

    • zenodo.org
    application/gzip
    Updated Dec 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Greenshields-Watson; Alexander Greenshields-Watson; Brennan Abanades; Brennan Abanades (2023). ABodyBuilder2 predicted structures of paired antibody sequences from Observed Antibody Space. [Dataset]. http://doi.org/10.5281/zenodo.10280181
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Dec 11, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexander Greenshields-Watson; Alexander Greenshields-Watson; Brennan Abanades; Brennan Abanades
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We used ABodyBuilder2 (https://doi.org/10.1038/s42003-023-04927-7) to model ~1.5M paired antibody structures from paired antibody sequences in Observed Antibody Space (https://opig.stats.ox.ac.uk/webapps/oas/oas_paired/). We have save the structures in folders and sub folders that correspond to the OAS files they came from. Parent folders are named according to study. Within each parent folder are sub folders names according to the files (named by SRA ID) containing sequences. Each structure is then named with the parent file followed by the row number from this file.

  4. Databases of human SARS-CoV-2 antibody peptides for bottom-up proteomics

    • zenodo.org
    bin
    Updated Feb 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xuan-Tung Trinh; Xuan-Tung Trinh; Rebecca Freitag; Konrad Krawczyk; Veit Schwämmle; Veit Schwämmle; Rebecca Freitag; Konrad Krawczyk (2024). Databases of human SARS-CoV-2 antibody peptides for bottom-up proteomics [Dataset]. http://doi.org/10.5281/zenodo.10566370
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 6, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Xuan-Tung Trinh; Xuan-Tung Trinh; Rebecca Freitag; Konrad Krawczyk; Veit Schwämmle; Veit Schwämmle; Rebecca Freitag; Konrad Krawczyk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bottom-up proteomics approaches rely on database searches that compare experimental values of peptides to theoretical values derived from protein sequences in a database. While the human body can produce millions of distinct antibodies, current databases for human antibodies such as UniProtKB are limited to only 1095 sequences (as of 2024 January). This limitation may hinder the identification of new antibodies using bottom-up proteomics. Therefore, extending the databases is an important task for discovering new antibodies.

    Herein, we adopted extensive collection of antibody sequences from Observed Antibody Space for conducting efficient database searches in publicly available proteomics data with a focus on the SARS-CoV-2 disease. Thirty million heavy antibody sequences from 146 SARS-CoV-2 patients in the Observed Antibody Space were in silico digested to obtain 18 million unique peptides. These peptides were then used to create six databases (DB1-DB6) for bottom-up proteomics. We used those databases for searching antibody peptides in publicly available SARS-CoV-2 human plasma samples in the Proteomics Identification Database (PRIDE), and we consistently found new antibody peptides in those samples. The database searching task was done by using Fragpipe softwares.

    Table 1. Information of databases. In addition to human SARS-CoV-2 antibody peptides, every database also contains human protein sequences from UniProt database and contaminants from cRAP database.

    FileDatabaseNumber of human SARS-CoV-2 antibody peptidesNumber of covered antibodies
    DB1.fastaDB11001.28E7
    DB2.fastaDB21E31.93E7
    DB3.fastaDB31E42.40E7
    DB4.fastaDB41E52.66E7
    DB5.fastaDB51E62.83E7
    DB6.fastaDB61E73.01E7
  5. Training and test data for antibody humanness evaluation

    • zenodo.org
    application/gzip
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Parkinson; Jonathan Parkinson (2024). Training and test data for antibody humanness evaluation [Dataset]. http://doi.org/10.5281/zenodo.10562968
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jonathan Parkinson; Jonathan Parkinson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 23, 2024
    Description

    ### Training and test data for humanness evaluation

    This data was collected in conjunction with and used for
    training and testing for Parkinson / Wang et al 2024. The
    data is organized as follows:

    - Heavy chain training and multispecies test data (under the heavy chain folder)
    - The conslidated cAb rep file contains training human sequences
    - The test sample sequences folder contains fasta files with test sequences for each species
    - Light chain training and multispecies test data (under the light chain folder)
    - The conslidated cAb rep file contains training human sequences
    - The test sample sequences folder contains fasta files with test sequences for each species
    - Abybank data (under the abybank compiled data folder)
    - This folder contains separate folders for heavy and light chain
    - Each subfolder contains test data for a more diverse species set under fasta files for each species
    - Humanization test data (under the humanization test data folder)
    - The sequences in the parental.fa file were originally humanized as part of drug discovery programs
    - The experimental.fa file contains the humanization results
    - IMGT and ADA data (under the imgt test data folder)
    - The imgt mab db fa and tsv files contain sequences and species assignments for IMGT mAb DB
    - The thera ada fa file contains sequences evaluated in the clinic
    - The Therapeutic ADA txt file contains anti drug antibody results for those antibodies

    The data was retrieved from the following sources.

    1. All heavy and light chain training data is from the cAb-Rep database from [Guo et al.](https://pubmed.ncbi.nlm.nih.gov/31649674/)
    2. All testing data is from the Observed Antibody Space [(OAS) database](https://opig.stats.ox.ac.uk/webapps/oas/)

    The training and test data show is after filtering for quality. The testing data was additionally randomly sampled to yield a set of 50,000 sequences for each species, then filtered to remove duplicates. The human test data was checked to ensure no overlap with the human training set.


    The IMGT, ADA and humanization test data was retrieved from Prihoda et al. and
    the associated [Github repo](https://github.com/Merck/BioPhi-2021-publication).

    See Parkinson et al. 2024 and the associated github repos for more details on how models other than
    SAM / AntPack were evaluated on this data.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Brian Loyal (2023). oas_paired_human_sars_cov_2 [Dataset]. https://huggingface.co/datasets/bloyal/oas_paired_human_sars_cov_2

oas_paired_human_sars_cov_2

bloyal/oas_paired_human_sars_cov_2

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 3, 2023
Authors
Brian Loyal
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Paired SARS-COV-2 heavy/light chain sequences from the Observed Antibody Space database

Human paired heavy/light chain amino acid sequences from the Observed Antibody Space (OAS) database obtained from SARS-COV-2 studies. https://opig.stats.ox.ac.uk/webapps/oas/ Please include the following citation in your work: Olsen, TH, Boyles, F, Deane, CM. Observed Antibody Space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Science.… See the full description on the dataset page: https://huggingface.co/datasets/bloyal/oas_paired_human_sars_cov_2.

Search
Clear search
Close search
Google apps
Main menu