92 datasets found
  1. f

    Table_1_UcTCRdb: An unconventional T cell receptor sequence database with...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yunsheng Dou; Shiwen Shan; Jian Zhang (2023). Table_1_UcTCRdb: An unconventional T cell receptor sequence database with online analysis functions.xlsx [Dataset]. http://doi.org/10.3389/fimmu.2023.1158295.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Frontiers
    Authors
    Yunsheng Dou; Shiwen Shan; Jian Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Unlike conventional major histocompatibility complex (MHC) class I and II molecules reactive T cells, the unconventional T cell subpopulations recognize various non-polymorphic antigen-presenting molecules and are typically characterized by simplified patterns of T cell receptors (TCRs), rapid effector responses and ‘public’ antigen specificities. Dissecting the recognition patterns of the non-MHC antigens by unconventional TCRs can help us further our understanding of the unconventional T cell immunity. The small size and irregularities of the released unconventional TCR sequences are far from high-quality to support systemic analysis of unconventional TCR repertoire. Here we present UcTCRdb, a database that contains 669,900 unconventional TCRs collected from 34 corresponding studies in humans, mice, and cattle. In UcTCRdb, users can interactively browse TCR features of different unconventional T cell subsets in different species, search and download sequences under different conditions. Additionally, basic and advanced online TCR analysis tools have been integrated into the database, which will facilitate the study of unconventional TCR patterns for users with different backgrounds. UcTCRdb is freely available at http://uctcrdb.cn/.

  2. m

    Data to support TCRa and TCRb repertoire analysis of human thymocyte subsets...

    • bridges.monash.edu
    • researchdata.edu.au
    txt
    Updated Jan 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen Daley (2019). Data to support TCRa and TCRb repertoire analysis of human thymocyte subsets (samples fk_167, fk_168 and fk_172) [Dataset]. http://doi.org/10.26180/5c484681591c0
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 23, 2019
    Dataset provided by
    Monash University
    Authors
    Stephen Daley
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The file contains human T cell receptor (TCR) sequences obtained by multiplex PCR amplification of cDNA molecules followed by Illumina sequencing. Sequences were aligned to the human genome using MIGEC software (see doi: 10.1038/nmeth.2960 for details). Except for the header row, each row contains information about a unique TCR nucleotide sequence. Column 1 stores the TCR chain (a, alpha; b, beta). Column 2 stores the T cell subset. Column 3 is an identifier for the thymus sample of origin. Columns 4 and 5 store the nucleotide sequence and amino acid sequence, respectively, of the complementarity-determining region 3 (CDR3). Columns 6 and 7 store the TCR variable (v) and joining (j) gene segment information.

  3. Z

    ESM-2 embeddings for TCR-Epitope Binding Affinity Prediction Task

    • data.niaid.nih.gov
    Updated Jun 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tony Reina (2024). ESM-2 embeddings for TCR-Epitope Binding Affinity Prediction Task [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7502653
    Explore at:
    Dataset updated
    Jun 17, 2024
    Authors
    Tony Reina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the accompanying dataset that was generated by the GitHub project: https://github.com/tonyreina/tdc-tcr-epitope-antibody-binding. In that repository I show how to create a machine learning models for predicting if a T-cell receptor (TCR) and protein epitope will bind to each other.

    A model that can predict how well a TCR bindings to an epitope can lead to more effective treatments that use immunotherapy. For example, in anti-cancer therapies it is important for the T-cell receptor to bind to the protein marker in the cancer cell so that the T-cell (actually the T-cell's friends in the immune system) can kill the cancer cell.

    HuggingFace provides a "one-stop shop" to train and deploy AI models. In this case, we use Facebook's open-source Evolutionary Scale Model (ESM-2). These embeddings turn the protein sequences into a vector of numbers that the computer can use in a mathematical model.

    To load them into Python use the Pandas library:

    import pandas as pd

    train_data = pd.read_pickle("train_data.pkl") validation_data = pd.read_pickle("validation_data.pkl") test_data = pd.read_pickle("test_data.pkl")

    The epitope_aa and the tcr_full columns are the protein (peptide) sequences for the epitope and the T-cell receptor, respectively. The letters correspond to the standard amino acid codes.

    The epitope_smi column is the SMILES notation for the chemical structure of the epitope. We won't use this information. Instead, the ESM-1b embedder should be sufficient for the input to our binary classification model.

    The tcr column is the CDR3 hyperloop. It's the part of the TCR that actually binds (assuming it binds) to the epitope.

    The label column is whether the two proteins bind. 0 = No. 1 = Yes.

    The tcr_vector and epitope_vector columns are the bio-embeddings of the TCR and epitope sequences generated by the Facebook ESM-1b model. These two vectors can be used to create a machine learning model that predicts whether the combination will produce a successful protein binding.

    From the TDC website:

    T-cells are an integral part of the adaptive immune system, whose survival, proliferation, activation and function are all governed by the interaction of their T-cell receptor (TCR) with immunogenic peptides (epitopes). A large repertoire of T-cell receptors with different specificity is needed to provide protection against a wide range of pathogens. This new task aims to predict the binding affinity given a pair of TCR sequence and epitope sequence.

    Weber et al.

    Dataset Description: The dataset is from Weber et al. who assemble a large and diverse data from the VDJ database and ImmuneCODE project. It uses human TCR-beta chain sequences. Since this dataset is highly imbalanced, the authors exclude epitopes with less than 15 associated TCR sequences and downsample to a limit of 400 TCRs per epitope. The dataset contains amino acid sequences either for the entire TCR or only for the hypervariable CDR3 loop. Epitopes are available as amino acid sequences. Since Weber et al. proposed to represent the peptides as SMILES strings (which reformulates the problem to protein-ligand binding prediction) the SMILES strings of the epitopes are also included. 50% negative samples were generated by shuffling the pairs, i.e. associating TCR sequences with epitopes they have not been shown to bind.

    Task Description: Binary classification. Given the epitope (a peptide, either represented as amino acid sequence or as SMILES) and a T-cell receptor (amino acid sequence, either of the full protein complex or only of the hypervariable CDR3 loop), predict whether the epitope binds to the TCR.

    Dataset Statistics: 47,182 TCR-Epitope pairs between 192 epitopes and 23,139 TCRs.

    References:

    Weber, Anna, Jannis Born, and María Rodriguez Martínez. “TITAN: T-cell receptor specificity prediction with bimodal attention networks.” Bioinformatics 37.Supplement_1 (2021): i237-i244.

    Bagaev, Dmitry V., et al. “VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium.” Nucleic Acids Research 48.D1 (2020): D1057-D1062.

    Dines, Jennifer N., et al. “The immunerace study: A prospective multicohort study of immune response action to covid-19 events with the immunecode™ open access database.” medRxiv (2020).

    Dataset License: CC BY 4.0.

    Contributed by: Anna Weber and Jannis Born.

    The Facebook ESM-2 model has the MIT license and was published in:

    HuggingFace has several versions of the trained model.

    Checkpoint name Number of layers Number of parameters

    esm2_t48_15B_UR50D 48 15B

    esm2_t36_3B_UR50D 36 3B

    esm2_t33_650M_UR50D 33 650M

    esm2_t30_150M_UR50D 30 150M

    esm2_t12_35M_UR50D 12 35M

    esm2_t6_8M_UR50D 6 8M

  4. Z

    Control T-cell receptor (TCR) alpha and beta chain nucleotide and amino acid...

    • data.niaid.nih.gov
    Updated Mar 10, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikhail Shugay (2022). Control T-cell receptor (TCR) alpha and beta chain nucleotide and amino acid sequences from human and mouse [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1318985
    Explore at:
    Dataset updated
    Mar 10, 2022
    Dataset provided by
    Institute of Bioorganic Chemistry, Russian Academy of Sciences
    Authors
    Mikhail Shugay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A dataset of pooled T-cell receptor (TCR) sequences for TCR alpha and beta chains of human and mouse.

    Sequences are obtained from various samples of healthy individuals/mice using our conventional protocols: see for example [Britanova et al "Dynamics of individual T cell repertoires: from cord blood to centenarians" The Journal of Immunology 2016] and [Izraelson et al. "Comparative analysis of murine T‐cell receptor repertoires." Immunology 2018].

    The sequences are stored as gzipped clonotype tables in VDJtools format, see [https://vdjtools-doc.readthedocs.io/en/master/input.html#vdjtools-format].

    This control dataset can be used as a proxy for a generative VDJ rearrangement model to estimate the expected frequency distribution of TCRs and check for enrichment of rare TCR clonotypes and groups of similar TCR sequences. For the implementation of the enrichment analysis, please see CalcDegreeStats routine from VDJtools software, see [https://vdjtools-doc.readthedocs.io/en/master/annotate.html#calcdegreestats].

    Files named "human.tra.strict.txt.gz", etc are pools of random/naive TCR clonotypes containing unique V/J/CDR3 nucleotide sequence combinations observed in data. The pools.zip file is used for TCR motif inference in VDJdb database [https://github.com/antigenomics/vdjdb-motifs], it contains human.tra.aa.txt, etc files that contain random/naive TCR clonotypes grouped by CDR3 amino acid sequence with the most frequent representative V and J.

  5. Pre-processed B-cell receptor amplicon sequencing data from SRR1842411

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adaptive Immunity Group; Adaptive Immunity Group (2020). Pre-processed B-cell receptor amplicon sequencing data from SRR1842411 [Dataset]. http://doi.org/10.5281/zenodo.806864
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Adaptive Immunity Group; Adaptive Immunity Group
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An example dataset containing B-cell receptor (BCR) gene sequences. This dataset is intended to be used for testing software tools developed to annotate (i.e. map Variable, Diversity and Joining segments) and perform clonal analysis of BCR sequencing data.

    Sequencing:

    Libraries prepared using 5'RACE from PBMCs of a healthy donor. Input molecules were tagged with unique molecular identifiers (UMIs). Sequencing was ran on MiSeq , 300+300bp reads.

    Contents:

    The dataset contains both raw sequencing reads and high-quality consensus sequences assembled using unique molecular tagging (UMI) approach. Consensus assembly corrects for sequencing errors and eliminates sequencing artifacts.

    • age_ig_s7_R1.fastq.gz and age_ig_s7_R2.fastq.gz contain raw reads
    • age_ig_s7_R1.t10.cf.fastq.gz and age_ig_s7_R2.t10.cf.fastq.gz contain consensus sequences

    All files contain an UMI tag sequence in their header, in form UMI:NNNN:QQQQ where N is the base character and Q is the quality character (for assembled consensuses the total number of reads is given instead of Q string).

    Note that consensus sequences were assembled using only raw sequences that correspond to UMI tags supported by at least 10 sequencing reads. That means that consensus sequence files contain a subset of all UMI tags found in raw sequences. Thus, if one wants to assess software performance on raw sequencing reads using assembled consensus sequences as a high-quality data standard, raw sequencing reads should be filtered to contain only those UMI tags that are present in consensus sequence file.

    Citations:

    The whole dataset was used to benchmark MiXCR software and was originally referenced in Bolotin DA, et al. MiXCR: software for comprehensive adaptive immunity profiling Nature methods 12(5):380-381, 2015.

    Data pre-processing was carried out using MIGEC software, Shugay M et al. Towards error-free profiling of immune repertoires. Nature Methods 11(6):653-655, 2014.

    Contributors:

    The dataset was generated in Prof. Chudakov lab (Adaptive Immunity Group in Masaryk University, Brno and Genomics of Adaptive Immunity Lab in Institute of Bioorganic Chemistry, Moscow). Sample preparation and sequencing was performed by Dr. Olga Britanova and Dr. Maria Turchaninova. Raw sequencing reads were pre-processed and uploaded by Dr. Mikhail Shugay.

  6. f

    Data_Sheet_1_Detection of Enriched T Cell Epitope Specificity in Full T Cell...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    • +1more
    Updated Nov 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gielis, Sofie; Laukens, Kris; Meysman, Pieter; De Neuter, Nicolas; Moris, Pieter; Bittremieux, Wout; Ogunjimi, Benson (2019). Data_Sheet_1_Detection of Enriched T Cell Epitope Specificity in Full T Cell Receptor Sequence Repertoires.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000078136
    Explore at:
    Dataset updated
    Nov 29, 2019
    Authors
    Gielis, Sofie; Laukens, Kris; Meysman, Pieter; De Neuter, Nicolas; Moris, Pieter; Bittremieux, Wout; Ogunjimi, Benson
    Description

    High-throughput T cell receptor (TCR) sequencing allows the characterization of an individual's TCR repertoire and directly queries their immune state. However, it remains a non-trivial task to couple these sequenced TCRs to their antigenic targets. In this paper, we present a novel strategy to annotate full TCR sequence repertoires with their epitope specificities. The strategy is based on a machine learning algorithm to learn the TCR patterns common to the recognition of a specific epitope. These results are then combined with a statistical analysis to evaluate the occurrence of specific epitope-reactive TCR sequences per epitope in repertoire data. In this manner, we can directly study the capacity of full TCR repertoires to target specific epitopes of the relevant vaccines or pathogens. We demonstrate the usability of this approach on three independent datasets related to vaccine monitoring and infectious disease diagnostics by independently identifying the epitopes that are targeted by the TCR repertoire. The developed method is freely available as a web tool for academic use at tcrex.biodatamining.be.

  7. u

    Data from: Persistent T Cell Repertoire Perturbation and T Cell Activation...

    • rdr.ucl.ac.uk
    • datasetcatalog.nlm.nih.gov
    application/gzip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carolin Turner; James Brown; Emily Shaw-Wise; Imran Uddin; Evi Tsaliki; Jennifer Roe; G Pollara; Yuxin Sun; James Heather; Marc Lipman; Benny Chain; Mahdad Noursadeghi (2023). Persistent T Cell Repertoire Perturbation and T Cell Activation in HIV After Long Term Treatment [Dataset]. http://doi.org/10.5522/04/14931870.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    University College London
    Authors
    Carolin Turner; James Brown; Emily Shaw-Wise; Imran Uddin; Evi Tsaliki; Jennifer Roe; G Pollara; Yuxin Sun; James Heather; Marc Lipman; Benny Chain; Mahdad Noursadeghi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    T cell receptor sequence data of 26 people living with HIV on long-term anti-retroviral therapy, and 12 HIV-negative healthy controls, produced using the UCL Chain lab protocol. All participants were Caucasian male adults recruited from London, UK. People living with HIV were on anti-retroviral therapy for a median of 8.5 years (interquartile range 3-16 years). They had undetectable plasma HIV viral load (

  8. f

    Table 1_Patterns of restricted TCR usage following SARS-CoV-2 vaccination...

    • figshare.com
    xlsx
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Parsons; Zhongyan Lu; Stephanie A. Richard; Amanda Zelkoski; Janifer Le; Naraen Palanikumar; Phuong Nguyen; Camille Alba; Gauthaman Sukumar; John Rosenberger; Xijun Zhang; Timothy H. Burgess; Rhonda Colombo; Katrin Mende; Catherine Berjohn; Nursat Epsi; Brian K. Agan; David Tribble; David A. Lindholm; Clifton L. Dalgard; Simon D. Pollett; Allison M. W. Malloy; EPICC COVID-19 Cohort Study Group (2025). Table 1_Patterns of restricted TCR usage following SARS-CoV-2 vaccination and severe disease.xlsx [Dataset]. http://doi.org/10.3389/fimmu.2025.1576903.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 2, 2025
    Dataset provided by
    Frontiers
    Authors
    Emily Parsons; Zhongyan Lu; Stephanie A. Richard; Amanda Zelkoski; Janifer Le; Naraen Palanikumar; Phuong Nguyen; Camille Alba; Gauthaman Sukumar; John Rosenberger; Xijun Zhang; Timothy H. Burgess; Rhonda Colombo; Katrin Mende; Catherine Berjohn; Nursat Epsi; Brian K. Agan; David Tribble; David A. Lindholm; Clifton L. Dalgard; Simon D. Pollett; Allison M. W. Malloy; EPICC COVID-19 Cohort Study Group
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionT cells influence COVID-19 severity and establish long-lasting immune memory in response to vaccination and infection. The diversity of the T cell repertoire, and complexity of T cell epitope recognition, make it challenging to define protective epitope-specific T cells. In this study, we created a highly specific TCR meta-database to identify T cell epitopes from the nearly complete SARS-CoV-2 proteome and determine whether vaccination with mRNA vaccines influenced the TCR repertoire.MethodsUsing this meta-database, we analyzed immunosequencing data of genomic DNA to define the variable region of T cell receptor (TCR) b chain (TCRB) sequences among participants in a longitudinal COVID-19 cohort study. The TCR repertoire was compared between participants who were vaccinated or unvaccinated against SARS-CoV-2 and stratified by disease severity. TCR diversity was measured using clonality, an index defined as the inverted normalized Shannon entropy. ResultsHighly clonal TCR repertoires correlated with age and comorbidities. Using our meta-database approach, we found that vaccinated participants hospitalized with infection had the most restricted SARS-CoV-2-specific CD8 TCR repertoire. However, TCRB with predicted specificity to non-spike SARS-CoV-2 proteins dominated the response, even in vaccinated participants. We identified a peptide sequence in the ORF10 accessory protein that was more frequently recognized in study participants with mild disease. Conversely, CD8 T cell recognition of a peptide sequence in ORF1ab more closely correlated with severe disease.DiscussionOverarchingly, TCR repertoire analysis revealed that CD8 T cells responding to SARS-CoV-2 broadly recognize epitopes across the SARS-CoV-2 proteome, and provided opportunities to identify epitopes associated with disease.

  9. Table 1_Comprehensive analysis of αβT-cell receptor repertoires reveals...

    • frontiersin.figshare.com
    docx
    Updated Sep 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniil V. Luppov; Elizaveta K. Vlasova; Dmitry M. Chudakov; Mikhail Shugay (2025). Table 1_Comprehensive analysis of αβT-cell receptor repertoires reveals signatures of thymic selection.docx [Dataset]. http://doi.org/10.3389/fimmu.2025.1605170.s003
    Explore at:
    docxAvailable download formats
    Dataset updated
    Sep 19, 2025
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Daniil V. Luppov; Elizaveta K. Vlasova; Dmitry M. Chudakov; Mikhail Shugay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Thymic selection is crucial for forming a pool of T-cells that can efficiently discriminate self from non-self using their T-cell receptors (TCRs) to develop adaptive immunity. In the present study we analyzed how a diverse set of physicochemical and sequence features of a TCR can affect the chances of successfully passing the selection. On a global scale we identified differences in selection probabilities based on CDR3 loop length, hydrophobicity, and residue sizes depending on variable genes and TCR chain context. We also observed a substantial decrease in N-glycosylation sites and other short sequence motifs for both alpha and beta chains. At the local scale we used dedicated statistical and machine learning methods coupled with a probabilistic model of the V(D)J rearrangement process to infer patterns in the CDR3 region that are either enriched or depleted during the course of selection. While the abundance of patterns containing poly-Glycines can improve CDR3 flexibility in selected TCRs, the “holes” in the TCR repertoire induced by negative selection can be related to Arginines in the (N)-Diversity (D)-N-region (NDN) region. Corresponding patterns were stored by us in a database available online. We demonstrated how TCR sequence composition affects lineage commitment during thymic selection. Structural modeling reveals that TCRs with “flat” and “bulged” CDR3 loops are more likely to commit T-cells to the CD4+ and CD8+ lineage respectively. Finally, we highlighted the effect of an individual MHC haplotype on the selection process, suggesting that those “holes” can be donor-specific. Our results can be further applied to identify potentially self-reactive TCRs in donor repertoires and aid in TCR selection for immunotherapies.

  10. s

    Human T cell scRNAseq

    • figshare.scilifelab.se
    • demo.researchdata.se
    • +2more
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joanna Hård; Jakob Michaelsson (2025). Human T cell scRNAseq [Dataset]. http://doi.org/10.17044/scilifelab.14376104.v1
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Karolinska Institutet
    Authors
    Joanna Hård; Jakob Michaelsson
    License

    https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/

    Description

    This dataset contains genomic TCR beta sequences from single cell DNA samples amplified by multiple displacement amplification (MDA) and subjected to nested PCR targeting the genomic TCR beta locus. The individual files contain raw data representing nucleotide sequences including both productive and non-productive rearrangements of the TCR beta sequence (with dropout in some cases). FASTQ files corresponding to single cell RNAseq data from single CD8+ T cells prepared by the smart-seq2 method.FASTQ files for 25-cell ‘mini-bulk’ RNAseq for CD8+ T cells prepared according to the smart-seq2 protocol.

  11. n

    Data from: Kabat Database of Sequences of Proteins of Immunological Interest...

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Jun 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Kabat Database of Sequences of Proteins of Immunological Interest [Dataset]. http://identifiers.org/RRID:SCR_006465
    Explore at:
    Dataset updated
    Jun 27, 2024
    Description

    The Kabat Database determines the combining site of antibodies based on the available amino acid sequences. The precise delineation of complementarity determining regions (CDR) of both light and heavy chains provides the first example of how properly aligned sequences can be used to derive structural and functional information of biological macromolecules. The Kabat database now includes nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules, and other proteins of immunological interest. The Kabat Database searching and analysis tools package is an ASP.NET web-based portal containing lookup tools, sequence matching tools, alignment tools, length distribution tools, positional correlation tools and much more. The searching and analysis tools are custom made for the aligned data sets contained in both the SQL Server and ASCII text flat file formats. The searching and analysis tools may be run on a single PC workstation or in a distributed environment. The analysis tools are written in ASP.NET and C# and are available in Visual Studio .NET 2003/2005/2008 formats. The Kabat Database was initially started in 1970 to determine the combining site of antibodies based on the available amino acid sequences at that time. Bence Jones proteins, mostly from human, were aligned, using the now-known Kabat numbering system, and a quantitative measure, variability, was calculated for every position. Three peaks, at positions 24-34, 50-56 and 89-97, were identified and proposed to form the complementarity determining regions (CDR) of light chains. Subsequently, antibody heavy chain amino acid sequences were also aligned using a different numbering system, since the locations of their CDRs (31-35B, 50-65 and 95-102) are different from those of the light chains. CDRL1 starts right after the first invariant Cys 23 of light chains, while CDRH1 is eight amino acid residues away from the first invariant Cys 22 of heavy chains. During the past 30 years, the Kabat database has grown to include nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules and other proteins of immunological interest. It has been used extensively by immunologists to derive useful structural and functional information from the primary sequences of these proteins.

  12. u

    Data from: The clonal structure and dynamics of the human T cell response to...

    • rdr.ucl.ac.uk
    application/gzip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tahel Ronel; Matthew Harries; Kate Wicks; Theres Oakes; Helen Singleton; Rebecca Dearman; Gavin Maxwell; Benny Chain (2023). The clonal structure and dynamics of the human T cell response to an organic chemical hapten - Dataset [Dataset]. http://doi.org/10.5522/04/14199809
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    University College London
    Authors
    Tahel Ronel; Matthew Harries; Kate Wicks; Theres Oakes; Helen Singleton; Rebecca Dearman; Gavin Maxwell; Benny Chain
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    T cell receptor sequence data of alopecia patients before and during sensitisation with diphenylcyclopropenone and healthy volunteers at equivalent timepoints, using the UCL Chain lab protocol. Details of the study are provided in Ronel et al, eLife 2021 (10.7554/eLife.54747). The processed data files have been generated using Decombinator V4 (https://github.com/innate2adaptive/Decombinator). The raw data files are available at the NCBI Sequence Read Archive, accession number PRJNA592875.

  13. Pre-processed B cell receptor repertoire sequencing data from BioProject...

    • zenodo.org
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marie Ghraichy; Marie Ghraichy; Johannes Trück; Johannes Trück (2020). Pre-processed B cell receptor repertoire sequencing data from BioProject PRJNA527941 [Dataset]. http://doi.org/10.5281/zenodo.2640393
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marie Ghraichy; Marie Ghraichy; Johannes Trück; Johannes Trück
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Processing

    Samples were demultiplexed via their Illumina indices, and processed using the Immcantation toolkit(1,2). Raw fastq files were filtered based on a quality score threshold of 20. Paired reads were joined if they had a minimum length of 10 nt, maximum error rate of 0.3 and a significance threshold of 0.0001. Reads with identical UMI were collapsed to a consensus sequence. Reads with identical full-length sequence and identical constant primer but differing UMI were further collapsed. Sequences were then submitted to IgBlast (3) for VDJ assignment and sequence annotation. Constant region sequences were mapped to germline using Stampy(4). The number and type of V gene mutations was calculated using the shazam R package.(2)

    software_versions pRESTO:0.5.3,Change-O:0.3.4,IgBlast 1.6.1, stampy1.0.21. shazam0.1.8

    quality_thresholds FilterSeq.py pRESTO Q>20

    paired_reads_assembly AssemblePairs.py pRESTO minlen 10 maxerror 0.3 alpha 0.0001

    primer_match_cutoffs MaskPrimers.py pRESTO C primer & V primer maxerror 0.2

    consensus_building BuildConsensus.py pRESTO maxerror 0.1 maxgap 0.5

    collapsing_method CollapseSeq.py pRESTO

    germline_database IMGT

    Format

    Processed sequences are provided in a tab delimited file format, including the following annotations:

    C_CALL Isotype subclass

    SEQUENCE_ID Sequence identifier

    V_CALL V segment gene and allele

    D_CALL D segment gene and allele

    J_CALL J segment gene and allele

    JUNCTION_LENGTH Junction length

    CONSCOUNT Raw read count from which UMI consensus sequences were generated, summed over all UMIs for the given unique sequence.

    DUPCOUNT UMI count for the given unique sequence

    ISOTYPE Constant region primer (isotype)

    MU_COUNT_CDR_R Number of replacement mutations in CDR region

    MU_COUNT_CDR_S Number of silent mutations in CDR region

    MU_COUNT_FWR_R Number of replacement mutations in FWR region

    MU_COUNT_FWR_S Number of silent mutations in FWR region

    MUT_TOTAL Total number of mutations in V gene

    SEQUENCE_INPUT Full length sequence

    SEQUENCE_IMGT Gapped IMGT sequence

    V_GERM_START_VDJ position of the first nucleotide in ungapped V germline sequence alignment

    JUNCTION Junction nucleotide sequence

    GERMLINE_IMGT_D_MASK IMGT-gapped germline nucleotide sequence with ns masking the NP1-D-NP2 regions

    Run ID of sequencing run

    Sample_type The tissue sampled (e.g Peripheral Blood, bone marrow, ..)

    Sex Sex of the Subject

    Age Age of the subject

    UNIQUE_ID Subject identifier

    SAMPLE_ID Sample identifier, linking back to raw data

    Subset Defined B cell subset

    Repertoire Defined B cell repertoire (Naive, Memory IgM/IgD, IgA, IgG)

    R_SCDR R/S ratio in CDR region

    R_SFWR R/S ratio in FWR region

    V_FAM V family gene

    V_GENE V segment gene

    D_GENE D segment gene

    J_GENE J segment gene

    Clust_Rank Cluster rank

    Clust_REPRES Cluster representative

    Clust_SIZE Cluster size

    Clust_MAXFREQ Cluster maximum frequency

    Clust_SHAREDNESS Cluster sharedness

    CDR3_AA_GRAVY CDR3 hydrophobicity index

    CDR3_AA_CHARGE CDR3 charge

    CDRH3PDB CDRH3 PDB (Structure) code

    H1Canon H1 Canonical class

    H2Canon H2 Canonical class

    H1_GERMLINE H1 Germline Canonical class

    H2_GERMLINE H2 Germline Canonical class

    References

    1. Vander Heiden, J. A., G. Yaari, M. Uduman, J. N. H. Stern, K. C. O’Connor, D. A. Hafler, F. Vigneault, and S. H. Kleinstein. 2014. PRESTO: A toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics30: 1930–1932.

    2. Gupta, N. T., J. A. Vander Heiden, M. Uduman, D. Gadala-Maria, G. Yaari, and S. H. Kleinstein. 2015. Change-O: A toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics31: 3356–3358.

    3. Ye, J., N. Ma, T. L. Madden, and J. M. Ostell. 2013. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res.41.

    4. Lunter, G., and M. Goodson. 2011. Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res.21: 936–939.

  14. AIRRSHIP: Example synthetic B cell receptor repertoire data

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, csv +1
    Updated Jan 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Catherine Sutherland; Catherine Sutherland; Graeme J M Cowan; Graeme J M Cowan (2023). AIRRSHIP: Example synthetic B cell receptor repertoire data [Dataset]. http://doi.org/10.5281/zenodo.7568252
    Explore at:
    application/gzip, csv, txtAvailable download formats
    Dataset updated
    Jan 26, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Catherine Sutherland; Catherine Sutherland; Graeme J M Cowan; Graeme J M Cowan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example repertoire data generated by AIRRSHIP (https://github.com/Cowanlab/airrship). Four repertoires are available (two with SHM, two without), each of which contains 100,000 sequences produced using the default AIRRSHIP parameters. Sequence data is contained in the FASTA files, TSV files give details of each step in the generation process, summary file shows the command given to AIRRSHIP and the locus file contains the alleles used in the repertoire. See https://airrship.readthedocs.io/en/latest/output/ for more information on file format.

    Repertoires were created using version 0.1.2 of AIRRSHIP.

  15. f

    Single-cell RNA and TCR sequencing data from 20 tumors

    • datasetcatalog.nlm.nih.gov
    Updated Jul 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vazquez, Ines Luque; Ocon, Maria-del-Mar; Pasquier, Andrea; Rodriguez, Maria; Yelensky, Roman; Stephens, Dennis; Nahas, Michelle; Champagne, Devin; Lochab, Amaneet; Seijo, Luis Miguel; Froburg, Kate; Borgia, Jeffrey A.; Korle, Stephanie L.; Kivlehan, Sophie; Moudgalya, Hita; Braun, Jasper; Seder, Christopher W.; Montuenga, Luis M.; Lizotte, Patrick H.; Brown, Markus; Hintz, Emma; Li, Yilong; Gjeci, Iliana; Bueno, Raphael; Campo, Arantza; Fortuno, Maria Antonia (2025). Single-cell RNA and TCR sequencing data from 20 tumors [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002055550
    Explore at:
    Dataset updated
    Jul 15, 2025
    Authors
    Vazquez, Ines Luque; Ocon, Maria-del-Mar; Pasquier, Andrea; Rodriguez, Maria; Yelensky, Roman; Stephens, Dennis; Nahas, Michelle; Champagne, Devin; Lochab, Amaneet; Seijo, Luis Miguel; Froburg, Kate; Borgia, Jeffrey A.; Korle, Stephanie L.; Kivlehan, Sophie; Moudgalya, Hita; Braun, Jasper; Seder, Christopher W.; Montuenga, Luis M.; Lizotte, Patrick H.; Brown, Markus; Hintz, Emma; Li, Yilong; Gjeci, Iliana; Bueno, Raphael; Campo, Arantza; Fortuno, Maria Antonia
    Description

    Liquid biopsy is a promising non-invasive technology that is capable of diagnosing cancer. However, current ctDNA-based approaches detect only a minority of early-stage disease. We set out to improve the sensitivity of liquid biopsy by harnessing tumor recognition by T cells through the sequencing of the circulating T-cell receptor repertoire. We studied a cohort of 463 patients with lung cancer (86% stage I) and 587 subjects without cancer using gDNA extracted from blood buffy coats. We performed TCR β chain sequencing to yield a median of 113,571 TCR clonotypes per sample and built a TCR sequence similarity graph to cluster clonotypes into TCR repertoire functional units (RFUs). The TCR frequencies of RFUs were tested for association with cancer status and RFUs with a statistically significant association were combined into a cancer score using a support vector machine model. The model was evaluated by 10-fold cross-validation and compared with a ctDNA panel of 237 mutation hotspots in 154 lung cancer driver genes and 17 cancer related protein biomarkers in 85 subjects. We identified 327 cancer- associated TCR RFUs with a false discovery rate (FDR) ≤ 0.1, including 157 enriched in cancer samples and 170 enriched in controls. Levels of 247/327 (76%) RFUs were correlated with the presence of an HLA allele at FDR ≤ 0.1 and tumor-infiltrating lymphocyte TCRs from multiple RFUs bound HLA presented tumor antigen peptides, suggesting antigen recognition as a driver of the cancer-RFU associations found. The RFU cancer score detected nearly 50% of stage I lung cancers at a specificity of 80% and boosted the sensitivity by up to 20 percentage points when added to ctDNA and circulating proteins in a multi- analyte cancer screening test. Overall, we show that circulating TCR repertoire functional unit analysis can complement established analytes to improve liquid biopsy sensitivity for early-stage cancer.This dataset contains the CellRanger output for 20 cancer patients. Please refer to https://www.10xgenomics.com/support/software/cell-ranger/latest for documentation.For details on how the data was generated, please see Li Y. et al. 2025: Circulating T-cell Receptor Repertoire for Cancer Early Detection.

  16. data_sheet_1_The CAIRR Pipeline for Submitting Standards-Compliant B and T...

    • frontiersin.figshare.com
    pdf
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Syed Ahmad Chan Bukhari; Martin J. O’Connor; Marcos Martínez-Romero; Attila L. Egyedi; Debra Willrett; John Graybeal; Mark A. Musen; Florian Rubelt; Kei-Hoi Cheung; Steven H. Kleinstein (2023). data_sheet_1_The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.PDF [Dataset]. http://doi.org/10.3389/fimmu.2018.01877.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Syed Ahmad Chan Bukhari; Martin J. O’Connor; Marcos Martínez-Romero; Attila L. Egyedi; Debra Willrett; John Graybeal; Mark A. Musen; Florian Rubelt; Kei-Hoi Cheung; Steven H. Kleinstein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists’ ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.

  17. G

    B Cell Receptor Sequencing Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). B Cell Receptor Sequencing Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/b-cell-receptor-sequencing-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Aug 22, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    B Cell Receptor Sequencing Market Outlook



    According to our latest research, the global B Cell Receptor Sequencing market size reached USD 382.4 million in 2024, demonstrating robust momentum driven by technological advancements and the growing demand for precision medicine. The market is expected to expand at a CAGR of 16.2% during the forecast period, reaching a projected value of USD 1,346.7 million by 2033. This substantial growth is propelled by increasing applications in immunology, oncology, and vaccine development, alongside the widespread adoption of next-generation sequencing technologies.




    One of the most significant growth factors for the B Cell Receptor Sequencing market is the surging focus on personalized medicine and immunotherapy. The ability to sequence B cell receptors at a high resolution provides researchers and clinicians with deep insights into the adaptive immune system, enabling the identification of disease-specific antibodies and the development of targeted therapies. The rise of chronic diseases, including various types of cancers and autoimmune conditions, has further fueled the need for advanced immunoprofiling techniques. As a result, pharmaceutical and biotechnology companies are increasingly investing in B cell receptor sequencing technologies to accelerate drug discovery and enhance the efficacy of immunotherapeutic interventions, thereby driving market expansion.




    Another major driver is the technological evolution in sequencing platforms, particularly the adoption of next-generation sequencing (NGS). NGS has revolutionized the field by allowing high-throughput, cost-effective, and accurate sequencing of B cell receptors, surpassing the limitations of traditional methods like Sanger sequencing. The integration of bioinformatics and advanced data analysis tools has further streamlined the process, making it more accessible for both research and clinical applications. Continuous improvements in sequencing accuracy, speed, and scalability are encouraging a broader range of end-users, including academic institutes, hospitals, and pharmaceutical companies, to integrate B cell receptor sequencing into their workflows, which is anticipated to further boost market growth.




    Regulatory support and increasing investments in biomedical research have also played a pivotal role in market development. Governments and funding agencies worldwide are prioritizing immunology research, infectious disease monitoring, and vaccine development, especially in the wake of recent global health crises. Collaborative initiatives between public and private sectors have led to the establishment of research consortia and biobanks, fostering the adoption of advanced sequencing technologies. The expansion of clinical trials involving immunotherapies and monoclonal antibodies has further emphasized the importance of comprehensive B cell receptor profiling, thereby creating a conducive environment for market growth over the coming years.




    From a regional perspective, North America continues to dominate the B Cell Receptor Sequencing market, accounting for the largest share due to its well-established healthcare infrastructure, high research and development spending, and presence of leading biotechnology firms. Europe follows closely, supported by strong academic research and government initiatives. The Asia Pacific region is witnessing the fastest growth, attributed to increasing investments in healthcare, rising awareness about precision medicine, and the rapid expansion of research facilities. As global collaborations intensify and technological adoption accelerates, the market is poised for significant growth across all major regions during the forecast period.





    Product Type Analysis



    The B Cell Receptor Sequencing market is segmented by product type into Reagents & Kits, Instruments, and Software & Services, each playing a distinct role in the overall ecosystem. Reagents & Kits represent the largest and most dynamic segment, driven by their recurring demand in sequencing

  18. h

    Supporting data for “B Cell Receptor Sequencing Guided Screening and...

    • datahub.hku.hk
    Updated Sep 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bohao Chen (2025). Supporting data for “B Cell Receptor Sequencing Guided Screening and Optimization of Broadly Neutralizing Antibodies against SARS-CoV-2” [Dataset]. http://doi.org/10.25442/hku.30000919.v1
    Explore at:
    Dataset updated
    Sep 9, 2025
    Dataset provided by
    HKU Data Repository
    Authors
    Bohao Chen
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The COVID-19 pandemic, driven by the continuous evolution of severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2 and the emergence of immune-evasive variants, remains a global threat, elevating reinfection risks and challenging existing therapeutics. I, therefore, conducted a comprehensive study on SARS-CoV-2 antibodies, under the hypothesis that antibody engineering strategies such as bispecific antibodies could overcome such immune evasion and integrating B cell receptor (BCR) sequencing with functional screening assay would enable efficient antibody discovery. This study focused on two main aspects: the engineering of broad-neutralizing antibodies to combat immune evasion and the development of efficient strategies for screening potent antibodies.For the first aspect, following the fifth wave of COVID-19 in Hong Kong, I did a serological survey (n=36) to assess herd immunity against emerging variants. Using neutralization assays, I demonstrated that convalescents from the third and fourth waves infected by B.1.1.63 and B.1.36 showed significantly weaker responses to Omicron sublineages as compared with those infected with BA.2/BA.5 during the fifth wave. These results indicated a higher susceptibility to reinfection among patients previously exposed to earlier-waves. Moreover, I found that breakthrough infections elicited stronger neutralizing responses than infection alone. This finding underscored the role of hybrid immunity for better protection. Subsequently, to overcome the immune escape of BA.4/5 against the previously identified broadly neutralizing antibody (bnAb) ZCB11, I engineered bispecific antibodies in DVD-Ig format by fusing the class I ZCB11 with class III neutralizing antibodies P2D9/P3E6. My results showed that these bispecific antibodies successfully restored neutralization activities against BA.4/5, although with reduced potency. I found higher IC50 values (ZCB11-P2D9: 0.5746 μg/mL; ZCB11-P3E6: 0.1639 μg/mL) than those of parental monoclonal antibodies (P2D9: 0.0753 μg/mL; P3E6: 0.0743 μg/mL) against BA.4/5. Structure-guided design targeting the F486V-driven disruption of a hydrophobic interface failed to yield functional gain-of-binding mutants, underscoring the challenges of rational affinity maturation. These results indicated that the pairing between class I and class III neutralizing antibodies is unlikely a good strategy for constructing potent bispecific broadly neutralizing antibodies, probably due to structural hinders.For the second aspect, I tried to optimize antibody screening by integrating BCR sequencing with functional validation. A total of 146 BCR sequences were selected and tested via phylogenetic and similarity-based criteria from the total BCR repertoire derived from a well-defined bnAb donor by sequencing 3395 single B cell clones. None of them, however, showed neutralization activities. Concurrently, several ultrapotent broadly neutralizing antibodies were isolated from this donor using conventional single B cell sorting method. Unexpectedly, identical BCR clones were not found from the repertoire sequenced. This result indicated the low frequency of ultrapotent bnAbs in the donor. Lastly, I adopted a method of linking B cell receptor to antigen specificity through sequencing (LIBRA-seq). I successfully identified 20 cross-reactive antibodies from memory B cells with the top candidate showing broad but weak neutralization. In conclusion, my findings not only revealed polyclonal antibody responses against SARS-CoV-2 but indicated useful platforms of technology for engineering of bispecific antibodies and a promising sequence-guided screening framework for rapid antibody discovery.

  19. Data from: Systematic profiling of full-length immunoglobulin and T-cell...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated May 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hayden Brochu; Elizabeth Tseng; Elise Smith; Matthew Thomas; Aiden Jones; Kayleigh Diveley; Lynn Law; Scott Hansen; Louis Picker; Michael Gale; Xinxia Peng; Hayden Brochu; Elizabeth Tseng; Elise Smith; Matthew Thomas; Aiden Jones; Kayleigh Diveley; Lynn Law; Scott Hansen; Louis Picker; Michael Gale; Xinxia Peng (2020). Systematic profiling of full-length immunoglobulin and T-cell receptor repertoire diversity in rhesus macaque through long read transcriptome sequencing [Dataset]. http://doi.org/10.5281/zenodo.3634899
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 8, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hayden Brochu; Elizabeth Tseng; Elise Smith; Matthew Thomas; Aiden Jones; Kayleigh Diveley; Lynn Law; Scott Hansen; Louis Picker; Michael Gale; Xinxia Peng; Hayden Brochu; Elizabeth Tseng; Elise Smith; Matthew Thomas; Aiden Jones; Kayleigh Diveley; Lynn Law; Scott Hansen; Louis Picker; Michael Gale; Xinxia Peng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Using long read sequencing, we sequenced four Indian-origin rhesus macaque tissues. From raw full-length, non-chimeric circular consensus sequencing (CCS) reads, we obtained high quality, full-length sequences for over 6,000 unique immunoglobulin and T-cell receptor transcripts, without the need for sequence assembly.

  20. Z

    Data from: Pre-processed B cell receptor sequences from BioProject...

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gupta, Namita; Laserson, Uri; Vander Heiden, Jason (2020). Pre-processed B cell receptor sequences from BioProject PRJNA349143 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_802383
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Yale
    Mt. Sinai
    Authors
    Gupta, Namita; Laserson, Uri; Vander Heiden, Jason
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Processed sequencing data from BioProject PRJNA349143.

    Study Design

    Samples were collected from human volunteers as described in Laserson and Vigneault et al, 2014 (1). Briefly, blood samples were collected from three individuals both pre- and post-vaccination for seasonal influenza. Samples were collected for sequencing at time points -8 days, -2 days, -1 hour, +1 hour, +1 day, +3 days, +7 days, +14 days, +21 days and +28 days relative to injection with seasonal influenza vaccine.

    Library Preparation and Sequencing

    The original samples from Laserson and Vigneault et al, 2014 (1) were re-sequenced as described in Gupta et al, 2017 (2). Briefly, sequencing libraries were prepared from mRNA using 5'RACE with addition of 17-nucleotide unique molecular identifiers (UMIs). Amplification was performed using constant region primers specific to IGHA, IGHD, IGHE, IGHG, IGHM, IGKC and IGLC. Sequencing was conducted on the Illumina MiSeq platform using the 600 cycle kit with 325 cycles for read 1 and 275 cycles for read 2. A 10% PhiX spike-in was added for sequencing.

    Data Processing

    Sequences were processed using the pRESTO (3) and Change-O (4) toolkits as described in Gupta et al, 2017 (2).

    Note, the provided data has been filtered significantly, including the removal of sequences that fail V(D)J alignment and the exclusion of non-functional sequences.

    Format

    Processed sequences are provided in FASTA format annotated using the pRESTO scheme.

    Annotations included are as follows:

    CONSCOUNT: Raw read count from which UMI consensus sequences were generated, summed over all UMIs for the given unique sequence.

    DUPCOUNT: UMI count for the given unique sequence.

    PRCONS: Constant region primer (isotype).

    SUBJECT: Subject identifier.

    TIME_POINT: Time point label.

    Citations

    Laserson U and Vigneault F, et al. High-resolution antibody dynamics of vaccine-induced immune responses. Proc Natl Acad Sci USA 111, 4928-33 (2014).

    Gupta NT, et al. Hierarchical Clustering Can Identify B Cell Clones with High Confidence in Ig Repertoire Sequencing Data. J Immunol 1601850 (2017).

    Vander Heiden JA and Yaari G, et al. pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics 30, 1930–2 (2014).

    Gupta NT and Vander Heiden JA, et al. Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics 31, 3356–8 (2015).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yunsheng Dou; Shiwen Shan; Jian Zhang (2023). Table_1_UcTCRdb: An unconventional T cell receptor sequence database with online analysis functions.xlsx [Dataset]. http://doi.org/10.3389/fimmu.2023.1158295.s002

Table_1_UcTCRdb: An unconventional T cell receptor sequence database with online analysis functions.xlsx

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
Jun 21, 2023
Dataset provided by
Frontiers
Authors
Yunsheng Dou; Shiwen Shan; Jian Zhang
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Unlike conventional major histocompatibility complex (MHC) class I and II molecules reactive T cells, the unconventional T cell subpopulations recognize various non-polymorphic antigen-presenting molecules and are typically characterized by simplified patterns of T cell receptors (TCRs), rapid effector responses and ‘public’ antigen specificities. Dissecting the recognition patterns of the non-MHC antigens by unconventional TCRs can help us further our understanding of the unconventional T cell immunity. The small size and irregularities of the released unconventional TCR sequences are far from high-quality to support systemic analysis of unconventional TCR repertoire. Here we present UcTCRdb, a database that contains 669,900 unconventional TCRs collected from 34 corresponding studies in humans, mice, and cattle. In UcTCRdb, users can interactively browse TCR features of different unconventional T cell subsets in different species, search and download sequences under different conditions. Additionally, basic and advanced online TCR analysis tools have been integrated into the database, which will facilitate the study of unconventional TCR patterns for users with different backgrounds. UcTCRdb is freely available at http://uctcrdb.cn/.

Search
Clear search
Close search
Google apps
Main menu