100+ datasets found
  1. b

    Data from: AntiBody Sequence Database

    • bioregistry.io
    Updated Jan 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). AntiBody Sequence Database [Dataset]. https://bioregistry.io/absd
    Explore at:
    Dataset updated
    Jan 23, 2025
    Description

    The AntiBody Sequence Database is a public dataset for antibody sequence data. It provides unique identifiers for antibody sequences, including both immunoglobulin and single-chain variable fragment sequences. These are are critical for immunological studies, and allows users to search and retrieve antibody sequences based on sequence similarity and specificity, and other biological properties.

  2. d

    DIG IT - Database of Immunoglobulins and Integrated Tools

    • dknet.org
    • scicrunch.org
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). DIG IT - Database of Immunoglobulins and Integrated Tools [Dataset]. http://identifiers.org/RRID:SCR_005924/resolver?q=&i=rrid
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    The Database of Immunoglobulins and Integrated Tools (DIG IT) is an integrated resource storing sequences of annotated immunoglobulin variable domains of NCBI database and enriched with tools for searching and analyzing them. It contains 145759 heavy chain sequences and 71404 light chain sequences (47168 kappa type and 24236 lambda type) with assigned canonical structures for the hypervariable loops and the data on the type of antigen as well as the pairing information of immunoglobulin heavy and light chains (9672 total pairs). The user can input the immunoglobulin variable domain sequence (amino acid or nucleotide) of interest (heavy chain variable domain sequence; light chain variable domain sequence or both) to retrieve the closest sequences (sorted according to e-value) with complete annotation. The user can also directly query the database by antigen type, canonical structure, germline family in accordance to the requirements.

  3. R

    Raw data from external antibody databases and scripts to homogenize and...

    • entrepot.recherche.data.gouv.fr
    application/x-gzip +1
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas MAILLET; Nicolas MAILLET; Simon MALESYS; Simon MALESYS (2025). Raw data from external antibody databases and scripts to homogenize and standardize them used to build AntiBody Sequence Database (for reproducibility) [Dataset]. http://doi.org/10.57745/DDLHWU
    Explore at:
    application/x-gzip(620431), application/x-gzip(163643), application/x-gzip(6833391387), text/markdown(12475), application/x-gzip(80726198), application/x-gzip(65497009)Available download formats
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    Recherche Data Gouv
    Authors
    Nicolas MAILLET; Nicolas MAILLET; Simon MALESYS; Simon MALESYS
    License

    https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.57745/DDLHWUhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.57745/DDLHWU

    Description

    Reproducibility data for the AntiBody Sequence Database (ABSD) article. This dataset contains the raw data (antibody sequences) extracted on June 20, 2024, from various databases, as well as the several scripts, to ensure the reproducibility of our results. External databases used: ABDB, AbPDB, CoV-AbDab, Genbank, IMGT, PDB, SACS, SAbDab, TheraSAbDab, UniProt, KABAT Scripts usage: each external database has a corresponding script to format all antibody sequences extracted from it. A last script enable merging all extracted antibody sequences while removing redundancy, standardizing and cleaning data.

  4. f

    Table_1_High-Quality Library Preparation for NGS-Based Immunoglobulin...

    • frontiersin.figshare.com
    pdf
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Néstor Vázquez Bernat; Martin Corcoran; Uta Hardt; Mateusz Kaduk; Ganesh E. Phad; Marcel Martin; Gunilla B. Karlsson Hedestam (2023). Table_1_High-Quality Library Preparation for NGS-Based Immunoglobulin Germline Gene Inference and Repertoire Expression Analysis.pdf [Dataset]. http://doi.org/10.3389/fimmu.2019.00660.s004
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Néstor Vázquez Bernat; Martin Corcoran; Uta Hardt; Mateusz Kaduk; Ganesh E. Phad; Marcel Martin; Gunilla B. Karlsson Hedestam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Next generation sequencing (NGS) of immunoglobulin (Ig) repertoires (Rep-seq) enables examination of the adaptive immune system at an unprecedented level. Applications include studies of expressed repertoires, gene usage, somatic hypermutation levels, Ig lineage tracing and identification of genetic variation within the Ig loci through inference methods. All these applications require starting libraries that allow the generation of sequence data with low error rate and optimal representation of the expressed repertoire. Here, we provide detailed protocols for the production of libraries suitable for human Ig germline gene inference and Ig repertoire studies. Various parameters used in the process were tested in order to demonstrate factors that are critical to obtain high quality libraries. We demonstrate an improved 5′RACE technique that reduces the length constraints of Illumina MiSeq based Rep-seq analysis but allows for the acquisition of sequences upstream of Ig V genes, useful for primer design. We then describe a 5′ multiplex method for library preparation, which yields full length V(D)J sequences suitable for genotype identification and novel gene inference. We provide comprehensive sets of primers targeting IGHV, IGKV, and IGLV genes. Using the optimized protocol, we produced IgM, IgG, IgK, and IgL libraries and analyzed them using the germline inference tool IgDiscover to identify expressed germline V alleles. This process additionally uncovered three IGHV, one IGKV, and six IGLV novel alleles in a single individual, which are absent from the IMGT reference database, highlighting the need for further study of Ig genetic variation. The library generation protocols presented here enable a robust means of analyzing expressed Ig repertoires, identifying novel alleles and producing individualized germline gene databases from humans.

  5. r

    Structural Antibody Database

    • rrid.site
    • neuinfo.org
    • +2more
    Updated Apr 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Structural Antibody Database [Dataset]. http://identifiers.org/RRID:SCR_022096/resolver?q=*&i=rrid
    Explore at:
    Dataset updated
    Apr 20, 2022
    Description

    Database containing all antibody structures available in the PDB, annotated and presented in consistent fashion.Each structure is annotated with number of properties including experimental details, antibody nomenclature (e.g. heavy-light pairings), curated affinity data and sequence annotations. You can use the database to inspect individual structures, create and download datasets for analysis, search the database for structures with similar sequences to your query, monitor the known structural repetoire of antibodies.

  6. f

    Data from: Identification of Immunoglobulin Gene Sequences from a Small Read...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Oct 28, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semba, Yuichiro; Kuniyoshi, Yuki; Hayashi, Masayasu; Fujita, Masatoshi; Iwasaki, Takeshi; Kimura, Hiroshi; Sato, Yuko; Maehara, Kazumitsu; Ohkawa, Yasuyuki; Harada, Akihito (2016). Identification of Immunoglobulin Gene Sequences from a Small Read Number of mRNA-Seq Using Hybridomas [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001582118
    Explore at:
    Dataset updated
    Oct 28, 2016
    Authors
    Semba, Yuichiro; Kuniyoshi, Yuki; Hayashi, Masayasu; Fujita, Masatoshi; Iwasaki, Takeshi; Kimura, Hiroshi; Sato, Yuko; Maehara, Kazumitsu; Ohkawa, Yasuyuki; Harada, Akihito
    Description

    Identification of immunoglobulin genes in hybridomas is essential for producing antibodies for research and clinical applications. A couple of methods such as RACE and degenerative PCR have been developed for determination of the Igh and Igl/Igk coding sequences (CDSs) but it has been difficult to process a number of hybridomas both with accuracy and rapidness. Here, we propose a new strategy for antibody sequence determination by mRNA-seq of hybridomas. We demonstrated that hybridomas highly expressed the Igh and Igl/Igk genes and that de novo transcriptome assembly using mRNA-seq data enabled identification of the CDS of both Igh and Igl/Igk accurately. Furthermore, we estimated that only 30,000 sequenced reads are required to identify immunoglobulin sequences from four different hybridoma clones. Thus, our approach would facilitate determining variable CDSs drastically.

  7. n

    IMGT/LIGM-DB

    • neuinfo.org
    • scicrunch.org
    • +1more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). IMGT/LIGM-DB [Dataset]. http://identifiers.org/RRID:SCR_006931
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    IMGT/LIGM-DB is a comprehensive database of immunoglobulin (IG) and T cell receptor (TR) nucleotide sequences from human and other vertebrate species (270). IMGT/LIGM-DB includes all germline (non-rearranged) and rearranged IG and TR genomic DNA (gDNA) and complementary DNA (cDNA) sequences published in generalist databases. IMGT/LIGM-DB allows searches from the Web interface according to biological and immunogenetic criteria through five distinct modules depending on the user interest. Users can search the catalogue by accession number, mnemonic, definition, creation date, length, or annotation level. They also have the option to search through taxonomic classification, keywords, and annotated labels. For a given entry, nine types of display are available including the IMGT flat file, the translation of the coding regions and the analysis by the IMGT/V-QUEST tool (see parent org. below). IMGT/LIGM-DB distributes expertly annotated sequences. The annotations hugely enhance the quality and the accuracy of the distributed detailed information. They include the sequence identification, the gene and allele classification, the constitutive and specific motif description, the codon and amino acid numbering, and the sequence obtaining information, according to the main concepts of IMGT-ONTOLOGY. They represent the main source of IG and TR gene and allele knowledge stored in IMGT/GENE-DB and in the IMGT reference directory., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.

  8. d

    IMGT - the international ImMunoGeneTics information system

    • dknet.org
    Updated Dec 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). IMGT - the international ImMunoGeneTics information system [Dataset]. http://identifiers.org/RRID:SCR_012780/resolver/mentions?q=&i=rrid
    Explore at:
    Dataset updated
    Dec 20, 2024
    Description

    A high-quality integrated knowledge resource specialized in the immunoglobulins (IG) or antibodies, T cell receptors (TR), major histocompatibility complex (MHC) of human and other vertebrate species, and in the immunoglobulin superfamily (IgSF), MHC superfamily (MhcSF) and related proteins of the immune system (RPI) of vertebrates and invertebrates, serving as the global reference in immunogenetics and immunoinformatics. IMGT provides a common access to sequence, genome and structure Immunogenetics data, based on the concepts of IMGT-ONTOLOGY and on the IMGT Scientific chart rules. IMGT works in close collaboration with EBI (Europe), DDBJ (Japan) and NCBI (USA). IMGT consists of sequence databases, genome database, structure database, and monoclonal antibodies database, Web resources and interactive tools.

  9. Antibody and Nanobody Design Dataset (ANDD)

    • zenodo.org
    zip
    Updated Sep 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yikai Wu; Yikai Wu (2025). Antibody and Nanobody Design Dataset (ANDD) [Dataset]. http://doi.org/10.5281/zenodo.16894086
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 26, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yikai Wu; Yikai Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Title: Antibody and Nanobody Design Dataset (ANDD): A Comprehensive Resource with Sequence, Structure, and Binding Affinity Data

    DOI: 10.5281/zenodo.16894086

    Resource Type: Dataset

    Publisher: Zenodo

    Publication Year: 2025

    License: Creative Commons Attribution 4.0 International (CC BY 4.0)

    Overview (Abstract):

    The Antibody and Nanobody Design Dataset (ANDD) is a unified, large-scale dataset created to overcome the limitations of data fragmentation and incompleteness in antibody and nanobody research. It integrates sequence, structure, antigen information, and binding affinity data from 15 diverse sources, including OAS, PDB, SabDab, and others. ANDD comprises 48,800 antibody/nanobody sequences, structural data for 25,158 entries, antigen sequences for 12,617 entries, and a total of 9,569 binding affinity values for antibody/nanobody-antigen pairs. A key innovation is the augmentation of experimental affinity data with 5,218 high-quality predictions generated by the ANTIPASTI model. This makes ANDD the largest available dataset of its kind, providing a robust foundation for training and validating deep learning models in therapeutic antibody and nanobody design.

    Keywords: Dataset, Antibody Design, Nanobody Design, VHH, Deep Learning, Protein Engineering, Binding Affinity, Therapeutic Antibodies, Computational Biology

    Methods (Data Curation and Processing):

    The ANDD was constructed through a rigorous multi-step process:

    1. Data Collection: Data was aggregated from 15 primary sources, including both antibody/nanobody-specific databases (e.g., OAS, SAbDab, INDI, sdAb-DB) and general protein databases (e.g., PDB, UNIPROT, PDBbind).
    2. Integration and Standardization: Data from disparate sources was consolidated into a consistent format, addressing challenges of format inconsistency. Entries were manually validated to exclude non-relevant data (e.g., T-cell receptors).
    3. Affinity Data Augmentation: The ANTIPASTI deep learning model was used to predict and add binding affinity values for entries that had structural data but lacked experimental affinity measurements.
    4. Manual Curation: Web-based data and information from publicly available patents targeting key antigens (HER2, IL-6, CD45, SARS-CoV-2 RBD) were manually extracted to enhance completeness.
    5. Hierarchical Organization: Data is organized in a hierarchical structure, offering four progressively detailed levels: Sequence-only, Sequence+Structure, Sequence+Structure+Antigen, and Sequence+Structure+Antigen+Affinity.

    Data Specifications and Format:

    The dataset is distributed in two parts:

    1. ANDD.csv: A comprehensive spreadsheet containing all annotated metadata for each entry.
    2. All_structures/Folder: A directory containing the corresponding PDB structure files for entries with structural data.

    The ANDD.csvfile includes the following key fields (a full description is available in the Data Record section of the paper):

    • General Info: Source, Update_Date, PDB_ID, Experimental_Method, Ab_or_Nano, Source_Organism.
    • Chain Details: Entity IDs, Asym IDs, Database Accession Codes, and Macromolecule Names for Heavy (H) and Light (L) chains.
    • Antigen Details: Ag_Name, Ag_Seq, Ag_Source Organism, and relevant database identifiers.
    • Sequence Data: Full amino acid sequences for H/L chains and individual CDR regions (H1-H3, L1-L3).
    • Affinity Data: Experimentally measured or predicted Affinity_Kd(M), ∆Gbinding(kJ), and the Affinity_Method.
    • Mutation Data: Annotation of any amino acid mutations (Ab/Nano_mutation).

    Technical Validation:

    The quality of ANDD has been ensured through extensive validation:

    1. Manual Curation: A rigorous manual review process was conducted to check for accuracy and consistency between sequence, structure, and affinity data across randomly selected entries.
    2. Affinity Validation with AlphaBind: The experimental Kd values were validated by comparing them against enrichment ratios predicted by the AlphaBind model, showing a significant correlation (Pearson’s r = 0.750).
    3. Cross-Mapping Validation: The internal consistency between Kd and ∆Gbinding values within the dataset was confirmed, showing a perfect correlation (Pearson’s r = 1.000) as per thermodynamic principles.
    4. Proof-of-Concept Application: The dataset's utility was demonstrated by fine-tuning the Diffab generative model on a subset of ANDD. The fine-tuned model showed significant improvements in generating nanobodies with better predicted binding affinity, structural diversity, and developability metrics.

    Potential Uses:

    ANDD is designed to accelerate research in computational biology and drug discovery, including:

    • Training and benchmarking deep learning models for de novoantibody/nanobody sequence and structure generation.
    • Developing and validating predictive models for antibody-antigen binding affinity.
    • Studying structure-function relationships in antibody-antigen interactions.
    • Facilitating the design of optimized therapeutic antibodies and nanobodies with improved specificity and efficacy.

    Access and License:

    The ANDD dataset is publicly available for download under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. Users are free to share and adapt the material for any purpose, even commercially, provided appropriate credit is given to the original authors and this data descriptor is cited.

  10. n

    IMGT/GENE-DB

    • neuinfo.org
    • rrid.site
    • +2more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). IMGT/GENE-DB [Dataset]. http://identifiers.org/RRID:SCR_006964
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    IMGT/GENE-DB is the comprehensive IMGT genome database for immunoglobulin (IG) and T cell receptor (TR) genes from human and mouse, and, in development, from other vertebrates. IMGT/GENE-DB is the international reference for the IG and TR gene nomenclature and works in close collaboration with the HUGO Nomenclature Committee, Mouse Genome Database and genome committees for other species. IMGT/GENE-DB allows a search of IG and TR genes by locus, group and subgroup, which are CLASSIFICATION concepts of IMGT-ONTOLOGY. Short cuts allow the retrieval gene information by gene name or clone name. Direct links with configurable URL give access to information usable by humans or programs. An IMGT/GENE-DB entry displays accurate gene data related to genome (gene localization), allelic polymorphisms (number of alleles, IMGT reference sequences, functionality, etc.) gene expression (known cDNAs), proteins and structures (Protein displays, IMGT Colliers de Perles). It provides internal links to the IMGT sequence databases and to the IMGT Repertoire Web resources, and external links to genome and generalist sequence databases. IMGT/GENE-DB manages the IMGT reference directory used by the IMGT tools for IG and TR gene and allele comparison and assignment, and by the IMGT databases for gene data annotation., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.

  11. n

    Data from: Kabat Database of Sequences of Proteins of Immunological Interest...

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Jun 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Kabat Database of Sequences of Proteins of Immunological Interest [Dataset]. http://identifiers.org/RRID:SCR_006465
    Explore at:
    Dataset updated
    Jun 27, 2024
    Description

    The Kabat Database determines the combining site of antibodies based on the available amino acid sequences. The precise delineation of complementarity determining regions (CDR) of both light and heavy chains provides the first example of how properly aligned sequences can be used to derive structural and functional information of biological macromolecules. The Kabat database now includes nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules, and other proteins of immunological interest. The Kabat Database searching and analysis tools package is an ASP.NET web-based portal containing lookup tools, sequence matching tools, alignment tools, length distribution tools, positional correlation tools and much more. The searching and analysis tools are custom made for the aligned data sets contained in both the SQL Server and ASCII text flat file formats. The searching and analysis tools may be run on a single PC workstation or in a distributed environment. The analysis tools are written in ASP.NET and C# and are available in Visual Studio .NET 2003/2005/2008 formats. The Kabat Database was initially started in 1970 to determine the combining site of antibodies based on the available amino acid sequences at that time. Bence Jones proteins, mostly from human, were aligned, using the now-known Kabat numbering system, and a quantitative measure, variability, was calculated for every position. Three peaks, at positions 24-34, 50-56 and 89-97, were identified and proposed to form the complementarity determining regions (CDR) of light chains. Subsequently, antibody heavy chain amino acid sequences were also aligned using a different numbering system, since the locations of their CDRs (31-35B, 50-65 and 95-102) are different from those of the light chains. CDRL1 starts right after the first invariant Cys 23 of light chains, while CDRH1 is eight amino acid residues away from the first invariant Cys 22 of heavy chains. During the past 30 years, the Kabat database has grown to include nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules and other proteins of immunological interest. It has been used extensively by immunologists to derive useful structural and functional information from the primary sequences of these proteins.

  12. d

    Therapeutic Structural Antibody Database

    • dknet.org
    • neuinfo.org
    • +2more
    Updated Apr 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Therapeutic Structural Antibody Database [Dataset]. http://identifiers.org/RRID:SCR_022093
    Explore at:
    Dataset updated
    Apr 20, 2022
    Description

    Tracks all antibody and nanobody related therapeutics recognized by World Health Organisation, and identifies any corresponding structures in Structural Antibody Database with near exact or exact variable domain sequence matches. Synchronized with SAbDab to update weekly, reflecting new Protein Data Bank entries and availability of new sequence data published by WHO.

  13. Serum Antibody Repertoire Profiling Using In Silico Antigen Screen

    • plos.figshare.com
    doc
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xinyue Liu; Qiang Hu; Song Liu; Luke J. Tallo; Lisa Sadzewicz; Cassandra A. Schettine; Mikhail Nikiforov; Elena N. Klyushnenkova; Yurij Ionov (2023). Serum Antibody Repertoire Profiling Using In Silico Antigen Screen [Dataset]. http://doi.org/10.1371/journal.pone.0067181
    Explore at:
    docAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xinyue Liu; Qiang Hu; Song Liu; Luke J. Tallo; Lisa Sadzewicz; Cassandra A. Schettine; Mikhail Nikiforov; Elena N. Klyushnenkova; Yurij Ionov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Serum antibodies are valuable source of information on the health state of an organism. The profiles of serum antibody reactivity can be generated by using a high throughput sequencing of peptide-coding DNA from combinatorial random peptide phage display libraries selected for binding to serum antibodies. Here we demonstrate that the targets of immune response, which are recognized by serum antibodies directed against sequential epitopes, can be identified using the serum antibody repertoire profiles generated by high throughput sequencing. We developed an algorithm to filter the results of the protein database BLAST search for selected peptides to distinguish real antigens recognized by serum antibodies from irrelevant proteins retrieved randomly. When we used this algorithm to analyze serum antibodies from mice immunized with human protein, we were able to identify the protein used for immunizations among the top candidate antigens. When we analyzed human serum sample from the metastatic melanoma patient, the recombinant protein, corresponding to the top candidate from the list generated using the algorithm, was recognized by antibodies from metastatic melanoma serum on the western blot, thus confirming that the method can identify autoantigens recognized by serum antibodies. We demonstrated also that our unbiased method of looking at the repertoire of serum antibodies reveals quantitative information on the epitope composition of the targets of immune response. A method for deciphering information contained in the serum antibody repertoire profiles may help to identify autoantibodies that can be used for diagnosing and monitoring autoimmune diseases or malignancies.

  14. Pre-processed B cell receptor repertoire sequencing data from BioProject...

    • zenodo.org
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marie Ghraichy; Marie Ghraichy; Johannes Trück; Johannes Trück (2020). Pre-processed B cell receptor repertoire sequencing data from BioProject PRJNA527941 [Dataset]. http://doi.org/10.5281/zenodo.2640393
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marie Ghraichy; Marie Ghraichy; Johannes Trück; Johannes Trück
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Processing

    Samples were demultiplexed via their Illumina indices, and processed using the Immcantation toolkit(1,2). Raw fastq files were filtered based on a quality score threshold of 20. Paired reads were joined if they had a minimum length of 10 nt, maximum error rate of 0.3 and a significance threshold of 0.0001. Reads with identical UMI were collapsed to a consensus sequence. Reads with identical full-length sequence and identical constant primer but differing UMI were further collapsed. Sequences were then submitted to IgBlast (3) for VDJ assignment and sequence annotation. Constant region sequences were mapped to germline using Stampy(4). The number and type of V gene mutations was calculated using the shazam R package.(2)

    software_versions pRESTO:0.5.3,Change-O:0.3.4,IgBlast 1.6.1, stampy1.0.21. shazam0.1.8

    quality_thresholds FilterSeq.py pRESTO Q>20

    paired_reads_assembly AssemblePairs.py pRESTO minlen 10 maxerror 0.3 alpha 0.0001

    primer_match_cutoffs MaskPrimers.py pRESTO C primer & V primer maxerror 0.2

    consensus_building BuildConsensus.py pRESTO maxerror 0.1 maxgap 0.5

    collapsing_method CollapseSeq.py pRESTO

    germline_database IMGT

    Format

    Processed sequences are provided in a tab delimited file format, including the following annotations:

    C_CALL Isotype subclass

    SEQUENCE_ID Sequence identifier

    V_CALL V segment gene and allele

    D_CALL D segment gene and allele

    J_CALL J segment gene and allele

    JUNCTION_LENGTH Junction length

    CONSCOUNT Raw read count from which UMI consensus sequences were generated, summed over all UMIs for the given unique sequence.

    DUPCOUNT UMI count for the given unique sequence

    ISOTYPE Constant region primer (isotype)

    MU_COUNT_CDR_R Number of replacement mutations in CDR region

    MU_COUNT_CDR_S Number of silent mutations in CDR region

    MU_COUNT_FWR_R Number of replacement mutations in FWR region

    MU_COUNT_FWR_S Number of silent mutations in FWR region

    MUT_TOTAL Total number of mutations in V gene

    SEQUENCE_INPUT Full length sequence

    SEQUENCE_IMGT Gapped IMGT sequence

    V_GERM_START_VDJ position of the first nucleotide in ungapped V germline sequence alignment

    JUNCTION Junction nucleotide sequence

    GERMLINE_IMGT_D_MASK IMGT-gapped germline nucleotide sequence with ns masking the NP1-D-NP2 regions

    Run ID of sequencing run

    Sample_type The tissue sampled (e.g Peripheral Blood, bone marrow, ..)

    Sex Sex of the Subject

    Age Age of the subject

    UNIQUE_ID Subject identifier

    SAMPLE_ID Sample identifier, linking back to raw data

    Subset Defined B cell subset

    Repertoire Defined B cell repertoire (Naive, Memory IgM/IgD, IgA, IgG)

    R_SCDR R/S ratio in CDR region

    R_SFWR R/S ratio in FWR region

    V_FAM V family gene

    V_GENE V segment gene

    D_GENE D segment gene

    J_GENE J segment gene

    Clust_Rank Cluster rank

    Clust_REPRES Cluster representative

    Clust_SIZE Cluster size

    Clust_MAXFREQ Cluster maximum frequency

    Clust_SHAREDNESS Cluster sharedness

    CDR3_AA_GRAVY CDR3 hydrophobicity index

    CDR3_AA_CHARGE CDR3 charge

    CDRH3PDB CDRH3 PDB (Structure) code

    H1Canon H1 Canonical class

    H2Canon H2 Canonical class

    H1_GERMLINE H1 Germline Canonical class

    H2_GERMLINE H2 Germline Canonical class

    References

    1. Vander Heiden, J. A., G. Yaari, M. Uduman, J. N. H. Stern, K. C. O’Connor, D. A. Hafler, F. Vigneault, and S. H. Kleinstein. 2014. PRESTO: A toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics30: 1930–1932.

    2. Gupta, N. T., J. A. Vander Heiden, M. Uduman, D. Gadala-Maria, G. Yaari, and S. H. Kleinstein. 2015. Change-O: A toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics31: 3356–3358.

    3. Ye, J., N. Ma, T. L. Madden, and J. M. Ostell. 2013. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res.41.

    4. Lunter, G., and M. Goodson. 2011. Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res.21: 936–939.

  15. Validation of Methods to Assess the Immunoglobulin Gene Repertoire in...

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Validation of Methods to Assess the Immunoglobulin Gene Repertoire in Tissues Obtained from Mice on the International Space Station Followers 0 --> [Dataset]. https://data.nasa.gov/dataset/validation-of-methods-to-assess-the-immunoglobulin-gene-repertoire-in-tissues-obtained-fro-e1070
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Spaceflight is known to affect immune cell populations. In particular, splenic B-cell numbers decrease during spaceflight and in ground-based physiological models. Although antibody isotype changes have been assessed during and after spaceflight, an extensive characterization of the impact of spaceflight on antibody composition has not been conducted in mice. Next Generation Sequencing and bioinformatic tools are now available to assess antibody repertoires. We can now identify immunoglobulin gene- segment usage, junctional regions, and modifications that contribute to specificity and diversity. Due to limitations on the International Space Station, alternate sample collection and storage methods must be employed. Our group compared Illumina MiSeq sequencing data from multiple sample preparation methods in normal C57Bl/6J mice to validate that sample preparation and storage would not bias the outcome of antibody repertoire characterization. In this report, we also compared sequencing techniques and a bioinformatic workflow on the data output when we assessed the IgH and Igκ variable gene usage. Our bioinformatic workflow has been optimized for Illumina HiSeq and MiSeq datasets, and is designed specifically to reduce bias, capture the most information from Ig sequences, and produce a data set that provides other data mining options.

  16. n

    Abysis Database

    • neuinfo.org
    • rrid.site
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Abysis Database [Dataset]. http://identifiers.org/RRID:SCR_000756
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A database of antibody structure containing sequences from Kabat, IMGT and the Protein Databank (PDB), as well as structure data from the PDB. It provides search of the sequence data on various criteria and display of results in different formats. For data from the PDB, sequence searches can be combined with structural constraints. For example, one can ask for all the antibodies with a 10-residue Kabat CDR-L1 with a serine at H23 and an arginine within 10A of H36. The site also has software for structure analysis and other information on antibody structure available.

  17. b

    Integrative database of germ-line V genes from the immunoglobulin loci of...

    • bioregistry.io
    Updated Apr 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Integrative database of germ-line V genes from the immunoglobulin loci of human and mouse [Dataset]. https://bioregistry.io/vbase2
    Explore at:
    Dataset updated
    Apr 24, 2021
    Description

    The database VBASE2 provides germ-line sequences of human and mouse immunoglobulin variable (V) genes.

  18. Pre-processed IgH receptor repertoire data from MS patients after aHSCT from...

    • zenodo.org
    application/gzip
    Updated Oct 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valentin von Niederhäusern; Valentin von Niederhäusern; Marie Ghraichy; Marie Ghraichy; Johannes Trück; Johannes Trück (2021). Pre-processed IgH receptor repertoire data from MS patients after aHSCT from BioProject PRJNA763367 [Dataset]. http://doi.org/10.5281/zenodo.5513967
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 1, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Valentin von Niederhäusern; Valentin von Niederhäusern; Marie Ghraichy; Marie Ghraichy; Johannes Trück; Johannes Trück
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Processing

    Samples were demultiplexed via their Illumina indices, and processed using the Immcantation toolkit(1,2). Raw fastq files were filtered based on a quality score threshold of 20. Paired reads were joined if they had a minimum length of 10 nt, maximum error rate of 0.3 and a significance threshold of 0.0001. Reads with identical UMI were collapsed to a consensus sequence. Reads with identical full-length sequence and identical constant primer but differing UMI were further collapsed. Sequences were then submitted to IgBlast (3) for VDJ assignment and sequence annotation. Constant region sequences were mapped to germline using Stampy(4). The number and type of V gene mutations was calculated using the shazam R package.(2)

    software_versions pRESTO:0.5.3,Change-O:0.3.4,IgBlast 1.6.1, stampy1.0.21. shazam0.1.8

    quality_thresholds FilterSeq.py pRESTO Q>20

    paired_reads_assembly AssemblePairs.py pRESTO minlen 10 maxerror 0.3 alpha 0.0001

    primer_match_cutoffs MaskPrimers.py pRESTO C primer & V primer maxerror 0.2

    consensus_building BuildConsensus.py pRESTO maxerror 0.1 maxgap 0.5

    collapsing_method CollapseSeq.py pRESTO

    germline_database IMGT

    Format

    Processed sequences are provided in a tab delimited file format, including the following annotations:

    ISOTYPE_SUBCLASS Isotype subclass

    SEQUENCE_ID Sequence identifier

    JUNCTION_LENGTH Junction length

    CONSCOUNT Raw read count from which UMI consensus sequences were generated, summed over all UMIs for the given unique sequence.

    DUPCOUNT UMI count for the given unique sequence

    ISOTYPE Constant region primer (isotype)

    MUT_TOTAL Total number of mutations in V gene

    SAMPLE Sample identifier, linking back to raw data

    JUNCTION Junction nucleotide sequence

    Protein_seq Amino acid sequence

    CDR3_AA_GRAVY CDR3 hydrophobicity index

    CDR3_AA_BULK CDR3 bulkiness

    CDR3_AA_ALIPHATIC CDR3 aliphatic index

    CDR3_AA_POLARITY CDR3 polarity

    CDR3_AA_CHARGE CDR3 normalized net charge

    CDR3_AA_BASIC CDR3 basic side chain residue content

    CDR3_AA_ACIDIC CDR3 acidic side chain residue content

    CDR3_AA_AROMATIC CDR3 aromatic side chain content

    Subset Defined B cell subset

    Repertoire Defined B cell repertoire (Naive, Memory IgM/IgD, IgA, IgG)

    R_SCDR R/S ratio in CDR region

    R_SFWR R/S ratio in FWR region

    V_GENE V segment gene

    D_GENE D segment gene

    J_GENE J segment gene

    V_FAM V family gene

    Clust_REPRES Cluster representative

    Clust_SIZE Cluster size

    Sex Sex of the Subject

    UNIQUE_ID Sample identifier

    Bcellno Input B cell number

    Days_posttx Sampling time point relative to transplantation

    Age_at_tx Age of the subject (at aHSCT)

    Disease MS subtype

    Last_therapy Last therapy prior to aHSCT

    Disease_duration Disease duration

    CMV_reactivation Cytomegalovirus reactivation

    Month_label Month post-aHSCT inverval bin

    Patient_label Subject identifier

     

    References

    1. Vander Heiden, J. A., G. Yaari, M. Uduman, J. N. H. Stern, K. C. O’Connor, D. A. Hafler, F. Vigneault, and S. H. Kleinstein. 2014. PRESTO: A toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics30: 1930–1932.

    2. Gupta, N. T., J. A. Vander Heiden, M. Uduman, D. Gadala-Maria, G. Yaari, and S. H. Kleinstein. 2015. Change-O: A toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics31: 3356–3358.

    3. Ye, J., N. Ma, T. L. Madden, and J. M. Ostell. 2013. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res.41.

    4. Lunter, G., and M. Goodson. 2011. Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res.21: 936–939.

  19. n

    Data from: Consistency of VDJ rearrangement and substitution parameters...

    • data.niaid.nih.gov
    • datasetcatalog.nlm.nih.gov
    • +2more
    zip
    Updated Jan 27, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Duncan K. Ralph; Frederick A. Matsen IV; Frederick A. Matsen (2016). Consistency of VDJ rearrangement and substitution parameters enables accurate B cell receptor sequence annotation [Dataset]. http://doi.org/10.5061/dryad.149m8
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 27, 2016
    Dataset provided by
    Fred Hutch Cancer Center
    Authors
    Duncan K. Ralph; Frederick A. Matsen IV; Frederick A. Matsen
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    VDJ rearrangement and somatic hypermutation work together to produce antibody-coding B cell receptor (BCR) sequences for a remarkable diversity of antigens. It is now possible to sequence these BCRs in high throughput; analysis of these sequences is bringing new insight into how antibodies develop, in particular for broadly-neutralizing antibodies against HIV and influenza. A fundamental step in such sequence analysis is to annotate each base as coming from a specific one of the V, D, or J genes, or from an N-addition (a.k.a. non-templated insertion). Previous work has used simple parametric distributions to model transitions from state to state in a hidden Markov model (HMM) of VDJ recombination, and assumed that mutations occur via the same process across sites. However, codon frame and other effects have been observed to violate these parametric assumptions for such coding sequences, suggesting that a non-parametric approach to modeling the recombination process could be useful. In our paper, we find that indeed large modern data sets suggest a model using parameter-rich per-allele categorical distributions for HMM transition probabilities and per-allele-per-position mutation probabilities, and that using such a model for inference leads to significantly improved results. We present an accurate and efficient BCR sequence annotation software package using a novel HMM “factorization” strategy. This package, called partis (https://github.com/psathyrella/partis/), is built on a new general-purpose HMM compiler that can perform efficient inference given a simple text description of an HMM.

  20. f

    Table_2_Addressing IGHV Gene Structural Diversity Enhances Immunoglobulin...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mateusz Kaduk; Martin Corcoran; Gunilla B. Karlsson Hedestam (2023). Table_2_Addressing IGHV Gene Structural Diversity Enhances Immunoglobulin Repertoire Analysis: Lessons From Rhesus Macaque.xlsx [Dataset]. http://doi.org/10.3389/fimmu.2022.818440.s005
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    Frontiers
    Authors
    Mateusz Kaduk; Martin Corcoran; Gunilla B. Karlsson Hedestam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The accurate germline gene assignment and assessment of somatic hypermutation in antibodies induced by immunization or infection are important in immunological studies. Here, we illustrate issues specific to the construction of comprehensive immunoglobulin (IG) germline gene reference databases for outbred animal species using rhesus macaques, a frequently used non-human primate model, as a model test case. We demonstrate that the genotypic variation found in macaque germline inference studies is reflected in similar levels of gene diversity in genomic assemblies. We show that the high frequency of IG heavy chain V (IGHV) region structural and gene copy number variation between subjects means that individual animals lack genes that are present in other animals. Therefore, gene databases compiled from a single or too few animals will inevitably result in inaccurate gene assignment and erroneous SHM level assessment for those genes it lacks. We demonstrate this by assigning a test macaque IgG library to the KIMDB, a database compiled of germline IGHV sequences from 27 rhesus macaques, and, alternatively, to the IMGT rhesus macaque database, based on IGHV genes inferred primarily from the genomic sequence of the rheMac10 reference assembly, supplemented with 10 genes from the Mmul_051212 assembly. We found that the use of a gene-restricted database led to overestimations of SHM by up to 5% due to misassignments. The principles described in the current study provide a model for the creation of comprehensive immunoglobulin reference databases from outbred species to ensure accurate gene assignment, lineage tracing and SHM calculations.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2025). AntiBody Sequence Database [Dataset]. https://bioregistry.io/absd

Data from: AntiBody Sequence Database

Related Article
Explore at:
Dataset updated
Jan 23, 2025
Description

The AntiBody Sequence Database is a public dataset for antibody sequence data. It provides unique identifiers for antibody sequences, including both immunoglobulin and single-chain variable fragment sequences. These are are critical for immunological studies, and allows users to search and retrieve antibody sequences based on sequence similarity and specificity, and other biological properties.

Search
Clear search
Close search
Google apps
Main menu