The AntiBody Sequence Database is a public dataset for antibody sequence data. It provides unique identifiers for antibody sequences, including both immunoglobulin and single-chain variable fragment sequences. These are are critical for immunological studies, and allows users to search and retrieve antibody sequences based on sequence similarity and specificity, and other biological properties.
Tracks all antibody and nanobody related therapeutics recognized by World Health Organisation, and identifies any corresponding structures in Structural Antibody Database with near exact or exact variable domain sequence matches. Synchronized with SAbDab to update weekly, reflecting new Protein Data Bank entries and availability of new sequence data published by WHO.
A database of antibody structure containing sequences from Kabat, IMGT and the Protein Databank (PDB), as well as structure data from the PDB. It provides search of the sequence data on various criteria and display of results in different formats. For data from the PDB, sequence searches can be combined with structural constraints. For example, one can ask for all the antibodies with a 10-residue Kabat CDR-L1 with a serine at H23 and an arginine within 10A of H36. The site also has software for structure analysis and other information on antibody structure available.
Database containing all antibody structures available in the PDB, annotated and presented in consistent fashion.Each structure is annotated with number of properties including experimental details, antibody nomenclature (e.g. heavy-light pairings), curated affinity data and sequence annotations. You can use the database to inspect individual structures, create and download datasets for analysis, search the database for structures with similar sequences to your query, monitor the known structural repetoire of antibodies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Structural region to sequence mapping for RosettaAntibody.
The Kabat Database determines the combining site of antibodies based on the available amino acid sequences. The precise delineation of complementarity determining regions (CDR) of both light and heavy chains provides the first example of how properly aligned sequences can be used to derive structural and functional information of biological macromolecules. The Kabat database now includes nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules, and other proteins of immunological interest. The Kabat Database searching and analysis tools package is an ASP.NET web-based portal containing lookup tools, sequence matching tools, alignment tools, length distribution tools, positional correlation tools and much more. The searching and analysis tools are custom made for the aligned data sets contained in both the SQL Server and ASCII text flat file formats. The searching and analysis tools may be run on a single PC workstation or in a distributed environment. The analysis tools are written in ASP.NET and C# and are available in Visual Studio .NET 2003/2005/2008 formats. The Kabat Database was initially started in 1970 to determine the combining site of antibodies based on the available amino acid sequences at that time. Bence Jones proteins, mostly from human, were aligned, using the now-known Kabat numbering system, and a quantitative measure, variability, was calculated for every position. Three peaks, at positions 24-34, 50-56 and 89-97, were identified and proposed to form the complementarity determining regions (CDR) of light chains. Subsequently, antibody heavy chain amino acid sequences were also aligned using a different numbering system, since the locations of their CDRs (31-35B, 50-65 and 95-102) are different from those of the light chains. CDRL1 starts right after the first invariant Cys 23 of light chains, while CDRH1 is eight amino acid residues away from the first invariant Cys 22 of heavy chains. During the past 30 years, the Kabat database has grown to include nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules and other proteins of immunological interest. It has been used extensively by immunologists to derive useful structural and functional information from the primary sequences of these proteins.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Serum antibodies are valuable source of information on the health state of an organism. The profiles of serum antibody reactivity can be generated by using a high throughput sequencing of peptide-coding DNA from combinatorial random peptide phage display libraries selected for binding to serum antibodies. Here we demonstrate that the targets of immune response, which are recognized by serum antibodies directed against sequential epitopes, can be identified using the serum antibody repertoire profiles generated by high throughput sequencing. We developed an algorithm to filter the results of the protein database BLAST search for selected peptides to distinguish real antigens recognized by serum antibodies from irrelevant proteins retrieved randomly. When we used this algorithm to analyze serum antibodies from mice immunized with human protein, we were able to identify the protein used for immunizations among the top candidate antigens. When we analyzed human serum sample from the metastatic melanoma patient, the recombinant protein, corresponding to the top candidate from the list generated using the algorithm, was recognized by antibodies from metastatic melanoma serum on the western blot, thus confirming that the method can identify autoantigens recognized by serum antibodies. We demonstrated also that our unbiased method of looking at the repertoire of serum antibodies reveals quantitative information on the epitope composition of the targets of immune response. A method for deciphering information contained in the serum antibody repertoire profiles may help to identify autoantibodies that can be used for diagnosing and monitoring autoimmune diseases or malignancies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Model weights of the AbMPNN model (arXiv:2310.19513) presented at the 2023 ICML Workshop on Computational Biology, and csv files with the split between train, test and validation across the SAbDab and ImmuneBuilder datasets.
This model is based on ProteinMPNN and can be run using the corresponding code: https://github.com/dauparas/ProteinMPNN.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Human antibody light and heavey chain, IgG1 light and heavy chain digested by Asp-N, Chymotrypsin, Trypsin
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Antibody sequences can be determined at 99% accuracy directly from the polypeptide product by using bottom-up proteomics techniques. Sequencing accuracy at the peptide level is limited by the isobaric residues leucine and isoleucine, incomplete fragmentation spectra in which the order of two or more residues remains ambiguous due to lacking fragment ions for the intermediate positions, and isobaric combinations of amino acids, of potentially different lengths, for example, GG = N and GA = Q. Here, we present several updates to Stitch (v1.5), which performs template-based assembly of de novo peptides to reconstruct antibody sequences. This version introduces a mass-based alignment algorithm that explicitly accounts for mass coincidence errors. In addition, it incorporates a postprocessing procedure to assign I/L residues based on secondary fragments (satellite ions, i.e., w-ions). Moreover, evidence for sequence assignments can now be directly evaluated with the addition of an integrated spectrum viewer. Lastly, input data from a wider selection of de novo peptide sequencing algorithms are allowed, now including Casanovo, PEAKS, Novor.Cloud, pNovo, and MaxNovo, in addition to flat text and FASTA. Combined, these changes make Stitch compatible with a larger range of data processing pipelines and improve its tolerance to peptide-level sequencing errors.
Files, folders, tabular data and some raw data used in the publication: AB-SR reconstructs polyclonal antibody Fv domains after bottom-up proteomic de-novo sequencing (N. Maillet & B. Saunier). The AB-SR software reconstructs the sequences of most pairs of heavy and light chain variable regions from (in silico) pools containing up to 500 immunoglobulins in just a few minutes. For each Figure, the data before and after AB-SR software are available (see README.md for detailed explanations). Data presented here are used to benchmark AB-SR. More precisely, each experiment consists in IgGs coming from public databases being in silico digested using RPG software. Resulting peptides are then fed to AB-SR that reconstructs most initial IgGs.
Ig-Base is a collection of antibody data compiled from Protein Data Bank with associated annotations by manual inspection. Each entry contains PBD ID, antibody name, length and sequence of parts, etc. It allows searches using PDB ID, antibody names, length of CDRs, sequence or canonical structures.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
More templates are available for all structural regions in the new database.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
In multiple myeloma diseases, monoclonal immunoglobulin light chains (LCs) are abundantly produced, with, as a consequence in some cases, the formation of deposits affecting various organs, such as the kidney, while in other cases remaining soluble up to concentrations of several g·L–1 in plasma. The exact factors crucial for the solubility of LCs are poorly understood, but it can be hypothesized that their amino acid sequence plays an important role. Determining the precise sequences of patient-derived LCs is therefore highly desirable. We establish here a novel de novo sequencing workflow for patient-derived LCs, based on the combination of bottom-up and top-down proteomics without database search. PEAKS is used for the de novo sequencing of peptides that are further assembled into full length LC sequences using ALPS. Top-down proteomics provides the molecular masses of proteoforms and allows the exact determination of the amino acid sequence including all posttranslational modifications. This pipeline is then used for the complete de novo sequencing of LCs extracted from the urine of 10 patients with multiple myeloma. We show that for the bottom-up part, digestions with trypsin and Nepenthes digestive fluid are sufficient to produce overlapping peptides able to generate the best sequence candidates. Top-down proteomics is absolutely required to achieve 100% final sequence coverage and characterize clinical samples containing several LCs. Our work highlights an unexpected range of modifications.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set contains pre-computed mmseqs databases for the antiref fasta files created by Briney et al.
Please cite the original work if you use any of the databases provided here.
Sources:
The mmseqs databases were created as follows:
```
aria2x -x16 -s16 --input-file antiref_links.txt
snakemake -s antiref_mmseqs.smk --jobs 1 --cores 1 --local-cores 250
```
Please check the summary repository for the fasta files and snakemake files. In this sub repo we only store the antiref files matching the title of the repo.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The accurate germline gene assignment and assessment of somatic hypermutation in antibodies induced by immunization or infection are important in immunological studies. Here, we illustrate issues specific to the construction of comprehensive immunoglobulin (IG) germline gene reference databases for outbred animal species using rhesus macaques, a frequently used non-human primate model, as a model test case. We demonstrate that the genotypic variation found in macaque germline inference studies is reflected in similar levels of gene diversity in genomic assemblies. We show that the high frequency of IG heavy chain V (IGHV) region structural and gene copy number variation between subjects means that individual animals lack genes that are present in other animals. Therefore, gene databases compiled from a single or too few animals will inevitably result in inaccurate gene assignment and erroneous SHM level assessment for those genes it lacks. We demonstrate this by assigning a test macaque IgG library to the KIMDB, a database compiled of germline IGHV sequences from 27 rhesus macaques, and, alternatively, to the IMGT rhesus macaque database, based on IGHV genes inferred primarily from the genomic sequence of the rheMac10 reference assembly, supplemented with 10 genes from the Mmul_051212 assembly. We found that the use of a gene-restricted database led to overestimations of SHM by up to 5% due to misassignments. The principles described in the current study provide a model for the creation of comprehensive immunoglobulin reference databases from outbred species to ensure accurate gene assignment, lineage tracing and SHM calculations.
A high-quality integrated knowledge resource specialized in the immunoglobulins (IG) or antibodies, T cell receptors (TR), major histocompatibility complex (MHC) of human and other vertebrate species, and in the immunoglobulin superfamily (IgSF), MHC superfamily (MhcSF) and related proteins of the immune system (RPI) of vertebrates and invertebrates, serving as the global reference in immunogenetics and immunoinformatics. IMGT provides a common access to sequence, genome and structure Immunogenetics data, based on the concepts of IMGT-ONTOLOGY and on the IMGT Scientific chart rules. IMGT works in close collaboration with EBI (Europe), DDBJ (Japan) and NCBI (USA). IMGT consists of sequence databases, genome database, structure database, and monoclonal antibodies database, Web resources and interactive tools.
Epitome is a database of structurally inferred antigenic epitopes in proteins. It includes all known antigenic residues and the antibodies that interact with them, including a detailed description of residues involved in the interaction and their sequence/structure environments. Additionally, Interactions can be visualized using an interface into Jmol. The website also contains specialized software, NLProt, to enable users to extract protein names and sequences from natural language text, and links to several other databases involved in antibody/antigen interactions. antibody/antigen interactions, antigen epitope
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OASis human 9-mer peptide database, generated from 118 million human antibody sequences from the Observed Antibody Space database.
Attached is a gzipped SQLite database containing two tables: "peptides" and "subjects".
Links:
BioPhi codebase and documentation: https://github.com/Merck/BioPhi
Public BioPhi server: https://biophi.dichlab.org
OAS Database: http://opig.stats.ox.ac.uk/webapps/oas/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Antibody sequence characteristics.
The AntiBody Sequence Database is a public dataset for antibody sequence data. It provides unique identifiers for antibody sequences, including both immunoglobulin and single-chain variable fragment sequences. These are are critical for immunological studies, and allows users to search and retrieve antibody sequences based on sequence similarity and specificity, and other biological properties.