92 datasets found

f
Table_1_UcTCRdb: An unconventional T cell receptor sequence database with...
frontiersin.figshare.com
xlsx
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yunsheng Dou; Shiwen Shan; Jian Zhang (2023). Table_1_UcTCRdb: An unconventional T cell receptor sequence database with online analysis functions.xlsx [Dataset]. http://doi.org/10.3389/fimmu.2023.1158295.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fimmu.2023.1158295.s002
Dataset updated
Jun 21, 2023
Dataset provided by
Frontiers
Authors
Yunsheng Dou; Shiwen Shan; Jian Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Unlike conventional major histocompatibility complex (MHC) class I and II molecules reactive T cells, the unconventional T cell subpopulations recognize various non-polymorphic antigen-presenting molecules and are typically characterized by simplified patterns of T cell receptors (TCRs), rapid effector responses and ‘public’ antigen specificities. Dissecting the recognition patterns of the non-MHC antigens by unconventional TCRs can help us further our understanding of the unconventional T cell immunity. The small size and irregularities of the released unconventional TCR sequences are far from high-quality to support systemic analysis of unconventional TCR repertoire. Here we present UcTCRdb, a database that contains 669,900 unconventional TCRs collected from 34 corresponding studies in humans, mice, and cattle. In UcTCRdb, users can interactively browse TCR features of different unconventional T cell subsets in different species, search and download sequences under different conditions. Additionally, basic and advanced online TCR analysis tools have been integrated into the database, which will facilitate the study of unconventional TCR patterns for users with different backgrounds. UcTCRdb is freely available at http://uctcrdb.cn/.
m
Data to support TCRa and TCRb repertoire analysis of human thymocyte subsets...
bridges.monash.edu
researchdata.edu.au
txt
Updated Jan 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen Daley (2019). Data to support TCRa and TCRb repertoire analysis of human thymocyte subsets (samples fk_167, fk_168 and fk_172) [Dataset]. http://doi.org/10.26180/5c484681591c0
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.26180/5c484681591c0
Dataset updated
Jan 23, 2019
Dataset provided by
Monash University
Authors
Stephen Daley
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The file contains human T cell receptor (TCR) sequences obtained by multiplex PCR amplification of cDNA molecules followed by Illumina sequencing. Sequences were aligned to the human genome using MIGEC software (see doi: 10.1038/nmeth.2960 for details). Except for the header row, each row contains information about a unique TCR nucleotide sequence. Column 1 stores the TCR chain (a, alpha; b, beta). Column 2 stores the T cell subset. Column 3 is an identifier for the thymus sample of origin. Columns 4 and 5 store the nucleotide sequence and amino acid sequence, respectively, of the complementarity-determining region 3 (CDR3). Columns 6 and 7 store the TCR variable (v) and joining (j) gene segment information.
Z
ESM-2 embeddings for TCR-Epitope Binding Affinity Prediction Task
data.niaid.nih.gov
Updated Jun 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tony Reina (2024). ESM-2 embeddings for TCR-Epitope Binding Affinity Prediction Task [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7502653
Explore at:
Dataset updated
Jun 17, 2024
Authors
Tony Reina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the accompanying dataset that was generated by the GitHub project: https://github.com/tonyreina/tdc-tcr-epitope-antibody-binding. In that repository I show how to create a machine learning models for predicting if a T-cell receptor (TCR) and protein epitope will bind to each other.

A model that can predict how well a TCR bindings to an epitope can lead to more effective treatments that use immunotherapy. For example, in anti-cancer therapies it is important for the T-cell receptor to bind to the protein marker in the cancer cell so that the T-cell (actually the T-cell's friends in the immune system) can kill the cancer cell.

HuggingFace provides a "one-stop shop" to train and deploy AI models. In this case, we use Facebook's open-source Evolutionary Scale Model (ESM-2). These embeddings turn the protein sequences into a vector of numbers that the computer can use in a mathematical model.

To load them into Python use the Pandas library:

import pandas as pd

train_data = pd.read_pickle("train_data.pkl") validation_data = pd.read_pickle("validation_data.pkl") test_data = pd.read_pickle("test_data.pkl")

The epitope_aa and the tcr_full columns are the protein (peptide) sequences for the epitope and the T-cell receptor, respectively. The letters correspond to the standard amino acid codes.

The epitope_smi column is the SMILES notation for the chemical structure of the epitope. We won't use this information. Instead, the ESM-1b embedder should be sufficient for the input to our binary classification model.

The tcr column is the CDR3 hyperloop. It's the part of the TCR that actually binds (assuming it binds) to the epitope.

The label column is whether the two proteins bind. 0 = No. 1 = Yes.

The tcr_vector and epitope_vector columns are the bio-embeddings of the TCR and epitope sequences generated by the Facebook ESM-1b model. These two vectors can be used to create a machine learning model that predicts whether the combination will produce a successful protein binding.

From the TDC website:

T-cells are an integral part of the adaptive immune system, whose survival, proliferation, activation and function are all governed by the interaction of their T-cell receptor (TCR) with immunogenic peptides (epitopes). A large repertoire of T-cell receptors with different specificity is needed to provide protection against a wide range of pathogens. This new task aims to predict the binding affinity given a pair of TCR sequence and epitope sequence.

Weber et al.

Dataset Description: The dataset is from Weber et al. who assemble a large and diverse data from the VDJ database and ImmuneCODE project. It uses human TCR-beta chain sequences. Since this dataset is highly imbalanced, the authors exclude epitopes with less than 15 associated TCR sequences and downsample to a limit of 400 TCRs per epitope. The dataset contains amino acid sequences either for the entire TCR or only for the hypervariable CDR3 loop. Epitopes are available as amino acid sequences. Since Weber et al. proposed to represent the peptides as SMILES strings (which reformulates the problem to protein-ligand binding prediction) the SMILES strings of the epitopes are also included. 50% negative samples were generated by shuffling the pairs, i.e. associating TCR sequences with epitopes they have not been shown to bind.

Task Description: Binary classification. Given the epitope (a peptide, either represented as amino acid sequence or as SMILES) and a T-cell receptor (amino acid sequence, either of the full protein complex or only of the hypervariable CDR3 loop), predict whether the epitope binds to the TCR.

Dataset Statistics: 47,182 TCR-Epitope pairs between 192 epitopes and 23,139 TCRs.

References:

Weber, Anna, Jannis Born, and María Rodriguez Martínez. “TITAN: T-cell receptor specificity prediction with bimodal attention networks.” Bioinformatics 37.Supplement_1 (2021): i237-i244.

Bagaev, Dmitry V., et al. “VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium.” Nucleic Acids Research 48.D1 (2020): D1057-D1062.

Dines, Jennifer N., et al. “The immunerace study: A prospective multicohort study of immune response action to covid-19 events with the immunecode™ open access database.” medRxiv (2020).

Dataset License: CC BY 4.0.

Contributed by: Anna Weber and Jannis Born.

The Facebook ESM-2 model has the MIT license and was published in:

Zeming Lin et al, Evolutionary-scale prediction of atomic-level protein structure with a language model, Science (2023). DOI: 10.1126/science.ade2574 https://www.science.org/doi/10.1126/science.ade2574

HuggingFace has several versions of the trained model.

Checkpoint name Number of layers Number of parameters

esm2_t48_15B_UR50D 48 15B

esm2_t36_3B_UR50D 36 3B

esm2_t33_650M_UR50D 33 650M

esm2_t30_150M_UR50D 30 150M

esm2_t12_35M_UR50D 12 35M

esm2_t6_8M_UR50D 6 8M
Z
Control T-cell receptor (TCR) alpha and beta chain nucleotide and amino acid...
data.niaid.nih.gov
Updated Mar 10, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mikhail Shugay (2022). Control T-cell receptor (TCR) alpha and beta chain nucleotide and amino acid sequences from human and mouse [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1318985
Explore at:
Dataset updated
Mar 10, 2022
Dataset provided by
Institute of Bioorganic Chemistry, Russian Academy of Sciences
Authors
Mikhail Shugay
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A dataset of pooled T-cell receptor (TCR) sequences for TCR alpha and beta chains of human and mouse.

Sequences are obtained from various samples of healthy individuals/mice using our conventional protocols: see for example [Britanova et al "Dynamics of individual T cell repertoires: from cord blood to centenarians" The Journal of Immunology 2016] and [Izraelson et al. "Comparative analysis of murine T‐cell receptor repertoires." Immunology 2018].

The sequences are stored as gzipped clonotype tables in VDJtools format, see [https://vdjtools-doc.readthedocs.io/en/master/input.html#vdjtools-format].

This control dataset can be used as a proxy for a generative VDJ rearrangement model to estimate the expected frequency distribution of TCRs and check for enrichment of rare TCR clonotypes and groups of similar TCR sequences. For the implementation of the enrichment analysis, please see CalcDegreeStats routine from VDJtools software, see [https://vdjtools-doc.readthedocs.io/en/master/annotate.html#calcdegreestats].

Files named "human.tra.strict.txt.gz", etc are pools of random/naive TCR clonotypes containing unique V/J/CDR3 nucleotide sequence combinations observed in data. The pools.zip file is used for TCR motif inference in VDJdb database [https://github.com/antigenomics/vdjdb-motifs], it contains human.tra.aa.txt, etc files that contain random/naive TCR clonotypes grouped by CDR3 amino acid sequence with the most frequent representative V and J.
Pre-processed B-cell receptor amplicon sequencing data from SRR1842411
zenodo.org
data.niaid.nih.gov
+1more
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adaptive Immunity Group; Adaptive Immunity Group (2020). Pre-processed B-cell receptor amplicon sequencing data from SRR1842411 [Dataset]. http://doi.org/10.5281/zenodo.806864
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.806864
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Adaptive Immunity Group; Adaptive Immunity Group
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
An example dataset containing B-cell receptor (BCR) gene sequences. This dataset is intended to be used for testing software tools developed to annotate (i.e. map Variable, Diversity and Joining segments) and perform clonal analysis of BCR sequencing data.

Sequencing:

Libraries prepared using 5'RACE from PBMCs of a healthy donor. Input molecules were tagged with unique molecular identifiers (UMIs). Sequencing was ran on MiSeq , 300+300bp reads.

Contents:

The dataset contains both raw sequencing reads and high-quality consensus sequences assembled using unique molecular tagging (UMI) approach. Consensus assembly corrects for sequencing errors and eliminates sequencing artifacts.

age_ig_s7_R1.fastq.gz and age_ig_s7_R2.fastq.gz contain raw reads

age_ig_s7_R1.t10.cf.fastq.gz and age_ig_s7_R2.t10.cf.fastq.gz contain consensus sequences

All files contain an UMI tag sequence in their header, in form UMI:NNNN:QQQQ where N is the base character and Q is the quality character (for assembled consensuses the total number of reads is given instead of Q string).

Note that consensus sequences were assembled using only raw sequences that correspond to UMI tags supported by at least 10 sequencing reads. That means that consensus sequence files contain a subset of all UMI tags found in raw sequences. Thus, if one wants to assess software performance on raw sequencing reads using assembled consensus sequences as a high-quality data standard, raw sequencing reads should be filtered to contain only those UMI tags that are present in consensus sequence file.

Citations:

The whole dataset was used to benchmark MiXCR software and was originally referenced in Bolotin DA, et al. MiXCR: software for comprehensive adaptive immunity profiling Nature methods 12(5):380-381, 2015.

Data pre-processing was carried out using MIGEC software, Shugay M et al. Towards error-free profiling of immune repertoires. Nature Methods 11(6):653-655, 2014.

Contributors:

The dataset was generated in Prof. Chudakov lab (Adaptive Immunity Group in Masaryk University, Brno and Genomics of Adaptive Immunity Lab in Institute of Bioorganic Chemistry, Moscow). Sample preparation and sequencing was performed by Dr. Olga Britanova and Dr. Maria Turchaninova. Raw sequencing reads were pre-processed and uploaded by Dr. Mikhail Shugay.
f
Data_Sheet_1_Detection of Enriched T Cell Epitope Specificity in Full T Cell...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
+1more
Updated Nov 29, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gielis, Sofie; Laukens, Kris; Meysman, Pieter; De Neuter, Nicolas; Moris, Pieter; Bittremieux, Wout; Ogunjimi, Benson (2019). Data_Sheet_1_Detection of Enriched T Cell Epitope Specificity in Full T Cell Receptor Sequence Repertoires.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000078136
Explore at:
Dataset updated
Nov 29, 2019
Authors
Gielis, Sofie; Laukens, Kris; Meysman, Pieter; De Neuter, Nicolas; Moris, Pieter; Bittremieux, Wout; Ogunjimi, Benson
Description
High-throughput T cell receptor (TCR) sequencing allows the characterization of an individual's TCR repertoire and directly queries their immune state. However, it remains a non-trivial task to couple these sequenced TCRs to their antigenic targets. In this paper, we present a novel strategy to annotate full TCR sequence repertoires with their epitope specificities. The strategy is based on a machine learning algorithm to learn the TCR patterns common to the recognition of a specific epitope. These results are then combined with a statistical analysis to evaluate the occurrence of specific epitope-reactive TCR sequences per epitope in repertoire data. In this manner, we can directly study the capacity of full TCR repertoires to target specific epitopes of the relevant vaccines or pathogens. We demonstrate the usability of this approach on three independent datasets related to vaccine monitoring and infectious disease diagnostics by independently identifying the epitopes that are targeted by the TCR repertoire. The developed method is freely available as a web tool for academic use at tcrex.biodatamining.be.
u
Data from: Persistent T Cell Repertoire Perturbation and T Cell Activation...
rdr.ucl.ac.uk
datasetcatalog.nlm.nih.gov
application/gzip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carolin Turner; James Brown; Emily Shaw-Wise; Imran Uddin; Evi Tsaliki; Jennifer Roe; G Pollara; Yuxin Sun; James Heather; Marc Lipman; Benny Chain; Mahdad Noursadeghi (2023). Persistent T Cell Repertoire Perturbation and T Cell Activation in HIV After Long Term Treatment [Dataset]. http://doi.org/10.5522/04/14931870.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5522/04/14931870.v1
Dataset updated
May 31, 2023
Dataset provided by
University College London
Authors
Carolin Turner; James Brown; Emily Shaw-Wise; Imran Uddin; Evi Tsaliki; Jennifer Roe; G Pollara; Yuxin Sun; James Heather; Marc Lipman; Benny Chain; Mahdad Noursadeghi
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
T cell receptor sequence data of 26 people living with HIV on long-term anti-retroviral therapy, and 12 HIV-negative healthy controls, produced using the UCL Chain lab protocol. All participants were Caucasian male adults recruited from London, UK. People living with HIV were on anti-retroviral therapy for a median of 8.5 years (interquartile range 3-16 years). They had undetectable plasma HIV viral load (
f
Table 1_Patterns of restricted TCR usage following SARS-CoV-2 vaccination...
figshare.com
xlsx
Updated Oct 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emily Parsons; Zhongyan Lu; Stephanie A. Richard; Amanda Zelkoski; Janifer Le; Naraen Palanikumar; Phuong Nguyen; Camille Alba; Gauthaman Sukumar; John Rosenberger; Xijun Zhang; Timothy H. Burgess; Rhonda Colombo; Katrin Mende; Catherine Berjohn; Nursat Epsi; Brian K. Agan; David Tribble; David A. Lindholm; Clifton L. Dalgard; Simon D. Pollett; Allison M. W. Malloy; EPICC COVID-19 Cohort Study Group (2025). Table 1_Patterns of restricted TCR usage following SARS-CoV-2 vaccination and severe disease.xlsx [Dataset]. http://doi.org/10.3389/fimmu.2025.1576903.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fimmu.2025.1576903.s001
Dataset updated
Oct 2, 2025
Dataset provided by
Frontiers
Authors
Emily Parsons; Zhongyan Lu; Stephanie A. Richard; Amanda Zelkoski; Janifer Le; Naraen Palanikumar; Phuong Nguyen; Camille Alba; Gauthaman Sukumar; John Rosenberger; Xijun Zhang; Timothy H. Burgess; Rhonda Colombo; Katrin Mende; Catherine Berjohn; Nursat Epsi; Brian K. Agan; David Tribble; David A. Lindholm; Clifton L. Dalgard; Simon D. Pollett; Allison M. W. Malloy; EPICC COVID-19 Cohort Study Group
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionT cells influence COVID-19 severity and establish long-lasting immune memory in response to vaccination and infection. The diversity of the T cell repertoire, and complexity of T cell epitope recognition, make it challenging to define protective epitope-specific T cells. In this study, we created a highly specific TCR meta-database to identify T cell epitopes from the nearly complete SARS-CoV-2 proteome and determine whether vaccination with mRNA vaccines influenced the TCR repertoire.MethodsUsing this meta-database, we analyzed immunosequencing data of genomic DNA to define the variable region of T cell receptor (TCR) b chain (TCRB) sequences among participants in a longitudinal COVID-19 cohort study. The TCR repertoire was compared between participants who were vaccinated or unvaccinated against SARS-CoV-2 and stratified by disease severity. TCR diversity was measured using clonality, an index defined as the inverted normalized Shannon entropy. ResultsHighly clonal TCR repertoires correlated with age and comorbidities. Using our meta-database approach, we found that vaccinated participants hospitalized with infection had the most restricted SARS-CoV-2-specific CD8 TCR repertoire. However, TCRB with predicted specificity to non-spike SARS-CoV-2 proteins dominated the response, even in vaccinated participants. We identified a peptide sequence in the ORF10 accessory protein that was more frequently recognized in study participants with mild disease. Conversely, CD8 T cell recognition of a peptide sequence in ORF1ab more closely correlated with severe disease.DiscussionOverarchingly, TCR repertoire analysis revealed that CD8 T cells responding to SARS-CoV-2 broadly recognize epitopes across the SARS-CoV-2 proteome, and provided opportunities to identify epitopes associated with disease.
Table 1_Comprehensive analysis of αβT-cell receptor repertoires reveals...
frontiersin.figshare.com
docx
Updated Sep 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniil V. Luppov; Elizaveta K. Vlasova; Dmitry M. Chudakov; Mikhail Shugay (2025). Table 1_Comprehensive analysis of αβT-cell receptor repertoires reveals signatures of thymic selection.docx [Dataset]. http://doi.org/10.3389/fimmu.2025.1605170.s003
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fimmu.2025.1605170.s003
Dataset updated
Sep 19, 2025
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Daniil V. Luppov; Elizaveta K. Vlasova; Dmitry M. Chudakov; Mikhail Shugay
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Thymic selection is crucial for forming a pool of T-cells that can efficiently discriminate self from non-self using their T-cell receptors (TCRs) to develop adaptive immunity. In the present study we analyzed how a diverse set of physicochemical and sequence features of a TCR can affect the chances of successfully passing the selection. On a global scale we identified differences in selection probabilities based on CDR3 loop length, hydrophobicity, and residue sizes depending on variable genes and TCR chain context. We also observed a substantial decrease in N-glycosylation sites and other short sequence motifs for both alpha and beta chains. At the local scale we used dedicated statistical and machine learning methods coupled with a probabilistic model of the V(D)J rearrangement process to infer patterns in the CDR3 region that are either enriched or depleted during the course of selection. While the abundance of patterns containing poly-Glycines can improve CDR3 flexibility in selected TCRs, the “holes” in the TCR repertoire induced by negative selection can be related to Arginines in the (N)-Diversity (D)-N-region (NDN) region. Corresponding patterns were stored by us in a database available online. We demonstrated how TCR sequence composition affects lineage commitment during thymic selection. Structural modeling reveals that TCRs with “flat” and “bulged” CDR3 loops are more likely to commit T-cells to the CD4+ and CD8+ lineage respectively. Finally, we highlighted the effect of an individual MHC haplotype on the selection process, suggesting that those “holes” can be donor-specific. Our results can be further applied to identify potentially self-reactive TCRs in donor repertoires and aid in TCR selection for immunotherapies.
s
Human T cell scRNAseq
figshare.scilifelab.se
demo.researchdata.se
+2more
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joanna Hård; Jakob Michaelsson (2025). Human T cell scRNAseq [Dataset]. http://doi.org/10.17044/scilifelab.14376104.v1
Explore at:
Unique identifier
https://doi.org/10.17044/scilifelab.14376104.v1
Dataset updated
Jan 15, 2025
Dataset provided by
Karolinska Institutet
Authors
Joanna Hård; Jakob Michaelsson
License
https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
Description
This dataset contains genomic TCR beta sequences from single cell DNA samples amplified by multiple displacement amplification (MDA) and subjected to nested PCR targeting the genomic TCR beta locus. The individual files contain raw data representing nucleotide sequences including both productive and non-productive rearrangements of the TCR beta sequence (with dropout in some cases). FASTQ files corresponding to single cell RNAseq data from single CD8+ T cells prepared by the smart-seq2 method.FASTQ files for 25-cell ‘mini-bulk’ RNAseq for CD8+ T cells prepared according to the smart-seq2 protocol.
n
Data from: Kabat Database of Sequences of Proteins of Immunological Interest...
neuinfo.org
dknet.org
+2more
Updated Jun 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Kabat Database of Sequences of Proteins of Immunological Interest [Dataset]. http://identifiers.org/RRID:SCR_006465
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_006465
Dataset updated
Jun 27, 2024
Description
The Kabat Database determines the combining site of antibodies based on the available amino acid sequences. The precise delineation of complementarity determining regions (CDR) of both light and heavy chains provides the first example of how properly aligned sequences can be used to derive structural and functional information of biological macromolecules. The Kabat database now includes nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules, and other proteins of immunological interest. The Kabat Database searching and analysis tools package is an ASP.NET web-based portal containing lookup tools, sequence matching tools, alignment tools, length distribution tools, positional correlation tools and much more. The searching and analysis tools are custom made for the aligned data sets contained in both the SQL Server and ASCII text flat file formats. The searching and analysis tools may be run on a single PC workstation or in a distributed environment. The analysis tools are written in ASP.NET and C# and are available in Visual Studio .NET 2003/2005/2008 formats. The Kabat Database was initially started in 1970 to determine the combining site of antibodies based on the available amino acid sequences at that time. Bence Jones proteins, mostly from human, were aligned, using the now-known Kabat numbering system, and a quantitative measure, variability, was calculated for every position. Three peaks, at positions 24-34, 50-56 and 89-97, were identified and proposed to form the complementarity determining regions (CDR) of light chains. Subsequently, antibody heavy chain amino acid sequences were also aligned using a different numbering system, since the locations of their CDRs (31-35B, 50-65 and 95-102) are different from those of the light chains. CDRL1 starts right after the first invariant Cys 23 of light chains, while CDRH1 is eight amino acid residues away from the first invariant Cys 22 of heavy chains. During the past 30 years, the Kabat database has grown to include nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules and other proteins of immunological interest. It has been used extensively by immunologists to derive useful structural and functional information from the primary sequences of these proteins.
u
Data from: The clonal structure and dynamics of the human T cell response to...
rdr.ucl.ac.uk
application/gzip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tahel Ronel; Matthew Harries; Kate Wicks; Theres Oakes; Helen Singleton; Rebecca Dearman; Gavin Maxwell; Benny Chain (2023). The clonal structure and dynamics of the human T cell response to an organic chemical hapten - Dataset [Dataset]. http://doi.org/10.5522/04/14199809
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5522/04/14199809
Dataset updated
May 31, 2023
Dataset provided by
University College London
Authors
Tahel Ronel; Matthew Harries; Kate Wicks; Theres Oakes; Helen Singleton; Rebecca Dearman; Gavin Maxwell; Benny Chain
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
T cell receptor sequence data of alopecia patients before and during sensitisation with diphenylcyclopropenone and healthy volunteers at equivalent timepoints, using the UCL Chain lab protocol. Details of the study are provided in Ronel et al, eLife 2021 (10.7554/eLife.54747). The processed data files have been generated using Decombinator V4 (https://github.com/innate2adaptive/Decombinator). The raw data files are available at the NCBI Sequence Read Archive, accession number PRJNA592875.
Pre-processed B cell receptor repertoire sequencing data from BioProject...
zenodo.org
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marie Ghraichy; Marie Ghraichy; Johannes Trück; Johannes Trück (2020). Pre-processed B cell receptor repertoire sequencing data from BioProject PRJNA527941 [Dataset]. http://doi.org/10.5281/zenodo.2640393
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2640393
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marie Ghraichy; Marie Ghraichy; Johannes Trück; Johannes Trück
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Processing

Samples were demultiplexed via their Illumina indices, and processed using the Immcantation toolkit(1,2). Raw fastq files were filtered based on a quality score threshold of 20. Paired reads were joined if they had a minimum length of 10 nt, maximum error rate of 0.3 and a significance threshold of 0.0001. Reads with identical UMI were collapsed to a consensus sequence. Reads with identical full-length sequence and identical constant primer but differing UMI were further collapsed. Sequences were then submitted to IgBlast (3) for VDJ assignment and sequence annotation. Constant region sequences were mapped to germline using Stampy(4). The number and type of V gene mutations was calculated using the shazam R package.(2)

software_versions pRESTO:0.5.3,Change-O:0.3.4,IgBlast 1.6.1, stampy1.0.21. shazam0.1.8

quality_thresholds FilterSeq.py pRESTO Q>20

paired_reads_assembly AssemblePairs.py pRESTO minlen 10 maxerror 0.3 alpha 0.0001

primer_match_cutoffs MaskPrimers.py pRESTO C primer & V primer maxerror 0.2

consensus_building BuildConsensus.py pRESTO maxerror 0.1 maxgap 0.5

collapsing_method CollapseSeq.py pRESTO

germline_database IMGT

Format

Processed sequences are provided in a tab delimited file format, including the following annotations:

C_CALL Isotype subclass

SEQUENCE_ID Sequence identifier

V_CALL V segment gene and allele

D_CALL D segment gene and allele

J_CALL J segment gene and allele

JUNCTION_LENGTH Junction length

CONSCOUNT Raw read count from which UMI consensus sequences were generated, summed over all UMIs for the given unique sequence.

DUPCOUNT UMI count for the given unique sequence

ISOTYPE Constant region primer (isotype)

MU_COUNT_CDR_R Number of replacement mutations in CDR region

MU_COUNT_CDR_S Number of silent mutations in CDR region

MU_COUNT_FWR_R Number of replacement mutations in FWR region

MU_COUNT_FWR_S Number of silent mutations in FWR region

MUT_TOTAL Total number of mutations in V gene

SEQUENCE_INPUT Full length sequence

SEQUENCE_IMGT Gapped IMGT sequence

V_GERM_START_VDJ position of the first nucleotide in ungapped V germline sequence alignment

JUNCTION Junction nucleotide sequence

GERMLINE_IMGT_D_MASK IMGT-gapped germline nucleotide sequence with ns masking the NP1-D-NP2 regions

Run ID of sequencing run

Sample_type The tissue sampled (e.g Peripheral Blood, bone marrow, ..)

Sex Sex of the Subject

Age Age of the subject

UNIQUE_ID Subject identifier

SAMPLE_ID Sample identifier, linking back to raw data

Subset Defined B cell subset

Repertoire Defined B cell repertoire (Naive, Memory IgM/IgD, IgA, IgG)

R_SCDR R/S ratio in CDR region

R_SFWR R/S ratio in FWR region

V_FAM V family gene

V_GENE V segment gene

D_GENE D segment gene

J_GENE J segment gene

Clust_Rank Cluster rank

Clust_REPRES Cluster representative

Clust_SIZE Cluster size

Clust_MAXFREQ Cluster maximum frequency

Clust_SHAREDNESS Cluster sharedness

CDR3_AA_GRAVY CDR3 hydrophobicity index

CDR3_AA_CHARGE CDR3 charge

CDRH3PDB CDRH3 PDB (Structure) code

H1Canon H1 Canonical class

H2Canon H2 Canonical class

H1_GERMLINE H1 Germline Canonical class

H2_GERMLINE H2 Germline Canonical class

References

1. Vander Heiden, J. A., G. Yaari, M. Uduman, J. N. H. Stern, K. C. O’Connor, D. A. Hafler, F. Vigneault, and S. H. Kleinstein. 2014. PRESTO: A toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics30: 1930–1932.

2. Gupta, N. T., J. A. Vander Heiden, M. Uduman, D. Gadala-Maria, G. Yaari, and S. H. Kleinstein. 2015. Change-O: A toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics31: 3356–3358.

3. Ye, J., N. Ma, T. L. Madden, and J. M. Ostell. 2013. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res.41.

4. Lunter, G., and M. Goodson. 2011. Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res.21: 936–939.
AIRRSHIP: Example synthetic B cell receptor repertoire data
zenodo.org
data.niaid.nih.gov
application/gzip, csv +1
Updated Jan 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Catherine Sutherland; Catherine Sutherland; Graeme J M Cowan; Graeme J M Cowan (2023). AIRRSHIP: Example synthetic B cell receptor repertoire data [Dataset]. http://doi.org/10.5281/zenodo.7568252
Explore at:
application/gzip, csv, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7568252
Dataset updated
Jan 26, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Catherine Sutherland; Catherine Sutherland; Graeme J M Cowan; Graeme J M Cowan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example repertoire data generated by AIRRSHIP (https://github.com/Cowanlab/airrship). Four repertoires are available (two with SHM, two without), each of which contains 100,000 sequences produced using the default AIRRSHIP parameters. Sequence data is contained in the FASTA files, TSV files give details of each step in the generation process, summary file shows the command given to AIRRSHIP and the locus file contains the alleles used in the repertoire. See https://airrship.readthedocs.io/en/latest/output/ for more information on file format.

Repertoires were created using version 0.1.2 of AIRRSHIP.
f
Single-cell RNA and TCR sequencing data from 20 tumors
datasetcatalog.nlm.nih.gov
Updated Jul 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vazquez, Ines Luque; Ocon, Maria-del-Mar; Pasquier, Andrea; Rodriguez, Maria; Yelensky, Roman; Stephens, Dennis; Nahas, Michelle; Champagne, Devin; Lochab, Amaneet; Seijo, Luis Miguel; Froburg, Kate; Borgia, Jeffrey A.; Korle, Stephanie L.; Kivlehan, Sophie; Moudgalya, Hita; Braun, Jasper; Seder, Christopher W.; Montuenga, Luis M.; Lizotte, Patrick H.; Brown, Markus; Hintz, Emma; Li, Yilong; Gjeci, Iliana; Bueno, Raphael; Campo, Arantza; Fortuno, Maria Antonia (2025). Single-cell RNA and TCR sequencing data from 20 tumors [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002055550
Explore at:
Dataset updated
Jul 15, 2025
Authors
Vazquez, Ines Luque; Ocon, Maria-del-Mar; Pasquier, Andrea; Rodriguez, Maria; Yelensky, Roman; Stephens, Dennis; Nahas, Michelle; Champagne, Devin; Lochab, Amaneet; Seijo, Luis Miguel; Froburg, Kate; Borgia, Jeffrey A.; Korle, Stephanie L.; Kivlehan, Sophie; Moudgalya, Hita; Braun, Jasper; Seder, Christopher W.; Montuenga, Luis M.; Lizotte, Patrick H.; Brown, Markus; Hintz, Emma; Li, Yilong; Gjeci, Iliana; Bueno, Raphael; Campo, Arantza; Fortuno, Maria Antonia
Description
Liquid biopsy is a promising non-invasive technology that is capable of diagnosing cancer. However, current ctDNA-based approaches detect only a minority of early-stage disease. We set out to improve the sensitivity of liquid biopsy by harnessing tumor recognition by T cells through the sequencing of the circulating T-cell receptor repertoire. We studied a cohort of 463 patients with lung cancer (86% stage I) and 587 subjects without cancer using gDNA extracted from blood buffy coats. We performed TCR β chain sequencing to yield a median of 113,571 TCR clonotypes per sample and built a TCR sequence similarity graph to cluster clonotypes into TCR repertoire functional units (RFUs). The TCR frequencies of RFUs were tested for association with cancer status and RFUs with a statistically significant association were combined into a cancer score using a support vector machine model. The model was evaluated by 10-fold cross-validation and compared with a ctDNA panel of 237 mutation hotspots in 154 lung cancer driver genes and 17 cancer related protein biomarkers in 85 subjects. We identified 327 cancer- associated TCR RFUs with a false discovery rate (FDR) ≤ 0.1, including 157 enriched in cancer samples and 170 enriched in controls. Levels of 247/327 (76%) RFUs were correlated with the presence of an HLA allele at FDR ≤ 0.1 and tumor-infiltrating lymphocyte TCRs from multiple RFUs bound HLA presented tumor antigen peptides, suggesting antigen recognition as a driver of the cancer-RFU associations found. The RFU cancer score detected nearly 50% of stage I lung cancers at a specificity of 80% and boosted the sensitivity by up to 20 percentage points when added to ctDNA and circulating proteins in a multi- analyte cancer screening test. Overall, we show that circulating TCR repertoire functional unit analysis can complement established analytes to improve liquid biopsy sensitivity for early-stage cancer.This dataset contains the CellRanger output for 20 cancer patients. Please refer to https://www.10xgenomics.com/support/software/cell-ranger/latest for documentation.For details on how the data was generated, please see Li Y. et al. 2025: Circulating T-cell Receptor Repertoire for Cancer Early Detection.
data_sheet_1_The CAIRR Pipeline for Submitting Standards-Compliant B and T...
frontiersin.figshare.com
pdf
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Syed Ahmad Chan Bukhari; Martin J. O’Connor; Marcos Martínez-Romero; Attila L. Egyedi; Debra Willrett; John Graybeal; Mark A. Musen; Florian Rubelt; Kei-Hoi Cheung; Steven H. Kleinstein (2023). data_sheet_1_The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.PDF [Dataset]. http://doi.org/10.3389/fimmu.2018.01877.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fimmu.2018.01877.s001
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Syed Ahmad Chan Bukhari; Martin J. O’Connor; Marcos Martínez-Romero; Attila L. Egyedi; Debra Willrett; John Graybeal; Mark A. Musen; Florian Rubelt; Kei-Hoi Cheung; Steven H. Kleinstein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists’ ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.
G
B Cell Receptor Sequencing Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Aug 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). B Cell Receptor Sequencing Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/b-cell-receptor-sequencing-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Aug 22, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
B Cell Receptor Sequencing Market Outlook

According to our latest research, the global B Cell Receptor Sequencing market size reached USD 382.4 million in 2024, demonstrating robust momentum driven by technological advancements and the growing demand for precision medicine. The market is expected to expand at a CAGR of 16.2% during the forecast period, reaching a projected value of USD 1,346.7 million by 2033. This substantial growth is propelled by increasing applications in immunology, oncology, and vaccine development, alongside the widespread adoption of next-generation sequencing technologies.

One of the most significant growth factors for the B Cell Receptor Sequencing market is the surging focus on personalized medicine and immunotherapy. The ability to sequence B cell receptors at a high resolution provides researchers and clinicians with deep insights into the adaptive immune system, enabling the identification of disease-specific antibodies and the development of targeted therapies. The rise of chronic diseases, including various types of cancers and autoimmune conditions, has further fueled the need for advanced immunoprofiling techniques. As a result, pharmaceutical and biotechnology companies are increasingly investing in B cell receptor sequencing technologies to accelerate drug discovery and enhance the efficacy of immunotherapeutic interventions, thereby driving market expansion.

Another major driver is the technological evolution in sequencing platforms, particularly the adoption of next-generation sequencing (NGS). NGS has revolutionized the field by allowing high-throughput, cost-effective, and accurate sequencing of B cell receptors, surpassing the limitations of traditional methods like Sanger sequencing. The integration of bioinformatics and advanced data analysis tools has further streamlined the process, making it more accessible for both research and clinical applications. Continuous improvements in sequencing accuracy, speed, and scalability are encouraging a broader range of end-users, including academic institutes, hospitals, and pharmaceutical companies, to integrate B cell receptor sequencing into their workflows, which is anticipated to further boost market growth.

Regulatory support and increasing investments in biomedical research have also played a pivotal role in market development. Governments and funding agencies worldwide are prioritizing immunology research, infectious disease monitoring, and vaccine development, especially in the wake of recent global health crises. Collaborative initiatives between public and private sectors have led to the establishment of research consortia and biobanks, fostering the adoption of advanced sequencing technologies. The expansion of clinical trials involving immunotherapies and monoclonal antibodies has further emphasized the importance of comprehensive B cell receptor profiling, thereby creating a conducive environment for market growth over the coming years.

From a regional perspective, North America continues to dominate the B Cell Receptor Sequencing market, accounting for the largest share due to its well-established healthcare infrastructure, high research and development spending, and presence of leading biotechnology firms. Europe follows closely, supported by strong academic research and government initiatives. The Asia Pacific region is witnessing the fastest growth, attributed to increasing investments in healthcare, rising awareness about precision medicine, and the rapid expansion of research facilities. As global collaborations intensify and technological adoption accelerates, the market is poised for significant growth across all major regions during the forecast period.

Product Type Analysis

The B Cell Receptor Sequencing market is segmented by product type into Reagents & Kits, Instruments, and Software & Services, each playing a distinct role in the overall ecosystem. Reagents & Kits represent the largest and most dynamic segment, driven by their recurring demand in sequencing
h
Supporting data for “B Cell Receptor Sequencing Guided Screening and...
datahub.hku.hk
Updated Sep 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bohao Chen (2025). Supporting data for “B Cell Receptor Sequencing Guided Screening and Optimization of Broadly Neutralizing Antibodies against SARS-CoV-2” [Dataset]. http://doi.org/10.25442/hku.30000919.v1
Explore at:
Unique identifier
https://doi.org/10.25442/hku.30000919.v1
Dataset updated
Sep 9, 2025
Dataset provided by
HKU Data Repository
Authors
Bohao Chen
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The COVID-19 pandemic, driven by the continuous evolution of severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2 and the emergence of immune-evasive variants, remains a global threat, elevating reinfection risks and challenging existing therapeutics. I, therefore, conducted a comprehensive study on SARS-CoV-2 antibodies, under the hypothesis that antibody engineering strategies such as bispecific antibodies could overcome such immune evasion and integrating B cell receptor (BCR) sequencing with functional screening assay would enable efficient antibody discovery. This study focused on two main aspects: the engineering of broad-neutralizing antibodies to combat immune evasion and the development of efficient strategies for screening potent antibodies.For the first aspect, following the fifth wave of COVID-19 in Hong Kong, I did a serological survey (n=36) to assess herd immunity against emerging variants. Using neutralization assays, I demonstrated that convalescents from the third and fourth waves infected by B.1.1.63 and B.1.36 showed significantly weaker responses to Omicron sublineages as compared with those infected with BA.2/BA.5 during the fifth wave. These results indicated a higher susceptibility to reinfection among patients previously exposed to earlier-waves. Moreover, I found that breakthrough infections elicited stronger neutralizing responses than infection alone. This finding underscored the role of hybrid immunity for better protection. Subsequently, to overcome the immune escape of BA.4/5 against the previously identified broadly neutralizing antibody (bnAb) ZCB11, I engineered bispecific antibodies in DVD-Ig format by fusing the class I ZCB11 with class III neutralizing antibodies P2D9/P3E6. My results showed that these bispecific antibodies successfully restored neutralization activities against BA.4/5, although with reduced potency. I found higher IC50 values (ZCB11-P2D9: 0.5746 μg/mL; ZCB11-P3E6: 0.1639 μg/mL) than those of parental monoclonal antibodies (P2D9: 0.0753 μg/mL; P3E6: 0.0743 μg/mL) against BA.4/5. Structure-guided design targeting the F486V-driven disruption of a hydrophobic interface failed to yield functional gain-of-binding mutants, underscoring the challenges of rational affinity maturation. These results indicated that the pairing between class I and class III neutralizing antibodies is unlikely a good strategy for constructing potent bispecific broadly neutralizing antibodies, probably due to structural hinders.For the second aspect, I tried to optimize antibody screening by integrating BCR sequencing with functional validation. A total of 146 BCR sequences were selected and tested via phylogenetic and similarity-based criteria from the total BCR repertoire derived from a well-defined bnAb donor by sequencing 3395 single B cell clones. None of them, however, showed neutralization activities. Concurrently, several ultrapotent broadly neutralizing antibodies were isolated from this donor using conventional single B cell sorting method. Unexpectedly, identical BCR clones were not found from the repertoire sequenced. This result indicated the low frequency of ultrapotent bnAbs in the donor. Lastly, I adopted a method of linking B cell receptor to antigen specificity through sequencing (LIBRA-seq). I successfully identified 20 cross-reactive antibodies from memory B cells with the top candidate showing broad but weak neutralization. In conclusion, my findings not only revealed polyclonal antibody responses against SARS-CoV-2 but indicated useful platforms of technology for engineering of bispecific antibodies and a promising sequence-guided screening framework for rapid antibody discovery.
Data from: Systematic profiling of full-length immunoglobulin and T-cell...
zenodo.org
data.niaid.nih.gov
application/gzip
Updated May 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hayden Brochu; Elizabeth Tseng; Elise Smith; Matthew Thomas; Aiden Jones; Kayleigh Diveley; Lynn Law; Scott Hansen; Louis Picker; Michael Gale; Xinxia Peng; Hayden Brochu; Elizabeth Tseng; Elise Smith; Matthew Thomas; Aiden Jones; Kayleigh Diveley; Lynn Law; Scott Hansen; Louis Picker; Michael Gale; Xinxia Peng (2020). Systematic profiling of full-length immunoglobulin and T-cell receptor repertoire diversity in rhesus macaque through long read transcriptome sequencing [Dataset]. http://doi.org/10.5281/zenodo.3634899
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3634899
Dataset updated
May 8, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hayden Brochu; Elizabeth Tseng; Elise Smith; Matthew Thomas; Aiden Jones; Kayleigh Diveley; Lynn Law; Scott Hansen; Louis Picker; Michael Gale; Xinxia Peng; Hayden Brochu; Elizabeth Tseng; Elise Smith; Matthew Thomas; Aiden Jones; Kayleigh Diveley; Lynn Law; Scott Hansen; Louis Picker; Michael Gale; Xinxia Peng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Using long read sequencing, we sequenced four Indian-origin rhesus macaque tissues. From raw full-length, non-chimeric circular consensus sequencing (CCS) reads, we obtained high quality, full-length sequences for over 6,000 unique immunoglobulin and T-cell receptor transcripts, without the need for sequence assembly.
Z
Data from: Pre-processed B cell receptor sequences from BioProject...
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gupta, Namita; Laserson, Uri; Vander Heiden, Jason (2020). Pre-processed B cell receptor sequences from BioProject PRJNA349143 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_802383
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Yale
Mt. Sinai
Authors
Gupta, Namita; Laserson, Uri; Vander Heiden, Jason
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Processed sequencing data from BioProject PRJNA349143.

Study Design

Samples were collected from human volunteers as described in Laserson and Vigneault et al, 2014 (1). Briefly, blood samples were collected from three individuals both pre- and post-vaccination for seasonal influenza. Samples were collected for sequencing at time points -8 days, -2 days, -1 hour, +1 hour, +1 day, +3 days, +7 days, +14 days, +21 days and +28 days relative to injection with seasonal influenza vaccine.

Library Preparation and Sequencing

The original samples from Laserson and Vigneault et al, 2014 (1) were re-sequenced as described in Gupta et al, 2017 (2). Briefly, sequencing libraries were prepared from mRNA using 5'RACE with addition of 17-nucleotide unique molecular identifiers (UMIs). Amplification was performed using constant region primers specific to IGHA, IGHD, IGHE, IGHG, IGHM, IGKC and IGLC. Sequencing was conducted on the Illumina MiSeq platform using the 600 cycle kit with 325 cycles for read 1 and 275 cycles for read 2. A 10% PhiX spike-in was added for sequencing.

Data Processing

Sequences were processed using the pRESTO (3) and Change-O (4) toolkits as described in Gupta et al, 2017 (2).

Note, the provided data has been filtered significantly, including the removal of sequences that fail V(D)J alignment and the exclusion of non-functional sequences.

Format

Processed sequences are provided in FASTA format annotated using the pRESTO scheme.

Annotations included are as follows:

CONSCOUNT: Raw read count from which UMI consensus sequences were generated, summed over all UMIs for the given unique sequence.

DUPCOUNT: UMI count for the given unique sequence.

PRCONS: Constant region primer (isotype).

SUBJECT: Subject identifier.

TIME_POINT: Time point label.

Citations

Laserson U and Vigneault F, et al. High-resolution antibody dynamics of vaccine-induced immune responses. Proc Natl Acad Sci USA 111, 4928-33 (2014).

Gupta NT, et al. Hierarchical Clustering Can Identify B Cell Clones with High Confidence in Ig Repertoire Sequencing Data. J Immunol 1601850 (2017).

Vander Heiden JA and Yaari G, et al. pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics 30, 1930–2 (2014).

Gupta NT and Vander Heiden JA, et al. Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics 31, 3356–8 (2015).

Facebook

Twitter

Click to copy link

Link copied

Cite

Yunsheng Dou; Shiwen Shan; Jian Zhang (2023). Table_1_UcTCRdb: An unconventional T cell receptor sequence database with online analysis functions.xlsx [Dataset]. http://doi.org/10.3389/fimmu.2023.1158295.s002

Table_1_UcTCRdb: An unconventional T cell receptor sequence database with online analysis functions.xlsx

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.3389/fimmu.2023.1158295.s002

Dataset updated

Jun 21, 2023

Dataset provided by

Frontiers

Authors

Yunsheng Dou; Shiwen Shan; Jian Zhang

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Unlike conventional major histocompatibility complex (MHC) class I and II molecules reactive T cells, the unconventional T cell subpopulations recognize various non-polymorphic antigen-presenting molecules and are typically characterized by simplified patterns of T cell receptors (TCRs), rapid effector responses and ‘public’ antigen specificities. Dissecting the recognition patterns of the non-MHC antigens by unconventional TCRs can help us further our understanding of the unconventional T cell immunity. The small size and irregularities of the released unconventional TCR sequences are far from high-quality to support systemic analysis of unconventional TCR repertoire. Here we present UcTCRdb, a database that contains 669,900 unconventional TCRs collected from 34 corresponding studies in humans, mice, and cattle. In UcTCRdb, users can interactively browse TCR features of different unconventional T cell subsets in different species, search and download sequences under different conditions. Additionally, basic and advanced online TCR analysis tools have been integrated into the database, which will facilitate the study of unconventional TCR patterns for users with different backgrounds. UcTCRdb is freely available at http://uctcrdb.cn/.

Clear search

Close search

Google apps

Main menu

Table_1_UcTCRdb: An unconventional T cell receptor sequence database with...

Data to support TCRa and TCRb repertoire analysis of human thymocyte subsets...

ESM-2 embeddings for TCR-Epitope Binding Affinity Prediction Task

Control T-cell receptor (TCR) alpha and beta chain nucleotide and amino acid...

Pre-processed B-cell receptor amplicon sequencing data from SRR1842411

Data_Sheet_1_Detection of Enriched T Cell Epitope Specificity in Full T Cell...

Data from: Persistent T Cell Repertoire Perturbation and T Cell Activation...

Table 1_Patterns of restricted TCR usage following SARS-CoV-2 vaccination...

Table 1_Comprehensive analysis of αβT-cell receptor repertoires reveals...

Human T cell scRNAseq

Data from: Kabat Database of Sequences of Proteins of Immunological Interest...

Data from: The clonal structure and dynamics of the human T cell response to...

Pre-processed B cell receptor repertoire sequencing data from BioProject...

AIRRSHIP: Example synthetic B cell receptor repertoire data

Single-cell RNA and TCR sequencing data from 20 tumors

data_sheet_1_The CAIRR Pipeline for Submitting Standards-Compliant B and T...

B Cell Receptor Sequencing Market Research Report 2033

B Cell Receptor Sequencing Market Outlook

Product Type Analysis

Supporting data for “B Cell Receptor Sequencing Guided Screening and...

Data from: Systematic profiling of full-length immunoglobulin and T-cell...

Data from: Pre-processed B cell receptor sequences from BioProject...

Table_1_UcTCRdb: An unconventional T cell receptor sequence database with online analysis functions.xlsxSee More Versions

Table_1_UcTCRdb: An unconventional T cell receptor sequence database with online analysis functions.xlsx