29 datasets found
  1. f

    Table_3_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

    • frontiersin.figshare.com
    docx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Lytal; Di Ran; Lingling An (2023). Table_3_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s004
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicholas Lytal; Di Ran; Lingling An
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.

  2. f

    Table_1_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

    • frontiersin.figshare.com
    docx
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Lytal; Di Ran; Lingling An (2023). Table_1_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s002
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicholas Lytal; Di Ran; Lingling An
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.

  3. N

    Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust...

    • data.niaid.nih.gov
    Updated May 15, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bacher R; Chu L; Kendziorski C; Swanson S (2019). Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust normalization of single-cell rna-seq data [Dataset]. https://data.niaid.nih.gov/resources?id=gse85917
    Explore at:
    Dataset updated
    May 15, 2019
    Dataset provided by
    University of Florida
    Authors
    Bacher R; Chu L; Kendziorski C; Swanson S
    Description

    Normalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data. Total 183 single cells (92 H1 cells, 91 H9 cells), sequenced twice, were used to evaluate SCnorm in normalizing single cell RNA-seq experiments. Total 48 bulk H1 samples were used to compare bulk and single cell properties. For single-cell RNA-seq, the identical single-cell indexed and fragmented cDNA were pooled at 96 cells per lane or at 24 cells per lane to test the effects of sequencing depth, resulting in approximately 1 million and 4 million mapped reads per cell in the two pooling groups, respectively.

  4. f

    Table_5_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

    • frontiersin.figshare.com
    docx
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Lytal; Di Ran; Lingling An (2023). Table_5_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s006
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicholas Lytal; Di Ran; Lingling An
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.

  5. f

    Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data...

    • frontiersin.figshare.com
    zip
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

  6. pbmc single cell RNA-seq matrix

    • zenodo.org
    csv
    Updated May 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Buchet; Samuel Buchet; Francesco Carbone; Morgan Magnin; Morgan Magnin; Mickaël Ménager; Olivier Roux; Olivier Roux; Francesco Carbone; Mickaël Ménager (2021). pbmc single cell RNA-seq matrix [Dataset]. http://doi.org/10.5281/zenodo.4730807
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 4, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Samuel Buchet; Samuel Buchet; Francesco Carbone; Morgan Magnin; Morgan Magnin; Mickaël Ménager; Olivier Roux; Olivier Roux; Francesco Carbone; Mickaël Ménager
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single cell RNA-sequencing dataset of peripheral blood mononuclear cells (pbmc: T, B, NK and monocytes) extracted from two healthy donors.

    Cells labeled as C26 come from a 30 years old female and cells labeled as C27 come from a 53 years old male. Cells have been isolated from blood using ficoll. Samples were sequenced using standard 3' v3 chemistry protocols by 10x genomics. Cellranger v4.0.0 was used for the processing, and reads were aligned to the ensembl GRCg38 human genome (GRCg38_r98-ensembl_Sept2019). QC metrics were calculated on the count matrix generated by cellranger (filtered_feature_bc_matrix). Cells with less than 3 genes per cells, less than 500 reads per cell and more than 20% of mithocondrial genes were discarded.

    The processing steps was performed with the R package Seurat (https://satijalab.org/seurat/), including sample integration, data normalisation and scaling, dimensional reduction, and clustering. SCTransform method was adopted for the normalisation and scaling steps. The clustered cells were manually annotated using known cell type markers.

    Files content:

    - raw_dataset.csv: raw gene counts

    - normalized_dataset.csv: normalized gene counts (single cell matrix)

    - cell_types.csv: cell types identified from annotated cell clusters

    - cell_types_macro.csv: cell macro types

    - UMAP_coordinates.csv: 2d cell coordinates computed with UMAP algorithm in Seurat

  7. f

    Data_Sheet_1_Non-linear Normalization for Non-UMI Single Cell RNA-Seq.PDF

    • frontiersin.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhijin Wu; Kenong Su; Hao Wu (2023). Data_Sheet_1_Non-linear Normalization for Non-UMI Single Cell RNA-Seq.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.612670.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhijin Wu; Kenong Su; Hao Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single cell RNA-seq data, like data from other sequencing technology, contain systematic technical noise. Such noise results from a combined effect of unequal efficiencies in the capturing and counting of mRNA molecules, such as extraction/amplification efficiency and sequencing depth. We show that such technical effects are not only cell-specific, but also affect genes differently, thus a simple cell-wise size factor adjustment may not be sufficient. We present a non-linear normalization approach that provides a cell- and gene-specific normalization factor for each gene in each cell. We show that the proposed normalization method (implemented in “SC2P" package) reduces more technical variation than competing methods, without reducing biological variation. When technical effects such as sequencing depths are not balanced between cell populations, SC2P normalization also removes the bias due to uneven technical noise. This method is applicable to scRNA-seq experiments that do not use unique molecular identifier (UMI) thus retain amplification biases.

  8. Ngs-Based Rna-Seq Market Analysis North America, Europe, Asia, Rest of World...

    • technavio.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio, Ngs-Based Rna-Seq Market Analysis North America, Europe, Asia, Rest of World (ROW) - US, UK, Germany, Singapore, China - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/ngs-based-rna-seq-market-analysis
    Explore at:
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Global
    Description

    Snapshot img

    NGS-Based Rna-Seq Market Size 2024-2028

    The NGS-based RNA-seq market size is forecast to increase by USD 6.66 billion, at a CAGR of 20.52% between 2023 and 2028.

    The market is witnessing significant growth, driven by the increased adoption of next-generation sequencing (NGS) methods for RNA-Seq analysis. The advanced capabilities of NGS techniques, such as high-throughput, cost-effectiveness, and improved accuracy, have made them the preferred choice for researchers and clinicians in various fields, including genomics, transcriptomics, and personalized medicine. However, the market faces challenges, primarily from the lack of clinical validation on direct-to-consumer genetic tests. As the use of NGS technology in consumer applications expands, ensuring the accuracy and reliability of results becomes crucial.
    The absence of standardized protocols and regulatory oversight in this area poses a significant challenge to market growth and trust. Companies seeking to capitalize on market opportunities must focus on addressing these challenges through collaborations, partnerships, and investments in research and development to ensure the clinical validity and reliability of their NGS-based RNA-Seq offerings.
    

    What will be the Size of the NGS-based RNA-Seq market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
    Request Free Sample

    The market continues to evolve, driven by advancements in NGS technology and its applications across various sectors. Spatial transcriptomics, a novel approach to studying gene expression in its spatial context, is gaining traction in disease research and precision medicine. Splice junction detection, a critical component of RNA-seq data analysis, enhances the accuracy of gene expression profiling and differential gene expression studies. Cloud computing plays a pivotal role in handling the massive amounts of data generated by NGS platforms, enabling real-time data analysis and storage. Enrichment analysis, gene ontology, and pathway analysis facilitate the interpretation of RNA-seq data, while data normalization and quality control ensure the reliability of results.

    Precision medicine and personalized therapy are key applications of RNA-seq, with single-cell RNA-seq offering unprecedented insights into the complexities of gene expression at the single-cell level. Read alignment and variant calling are essential steps in RNA-seq data analysis, while bioinformatics pipelines and RNA-seq software streamline the process. NGS technology is revolutionizing drug discovery by enabling the identification of biomarkers and gene fusion detection in various diseases, including cancer and neurological disorders. RNA-seq is also finding applications in infectious diseases, microbiome analysis, environmental monitoring, agricultural genomics, and forensic science. Sequencing costs are decreasing, making RNA-seq more accessible to researchers and clinicians.

    The ongoing development of sequencing platforms, library preparation, and sample preparation kits continues to drive innovation in the field. The dynamic nature of the market ensures that it remains a vibrant and evolving field, with ongoing research and development in areas such as data visualization, clinical trials, and sequencing depth.

    How is this NGS-based RNA-Seq industry segmented?

    The NGS-based RNA-seq industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    End-user
    
      Acamedic and research centers
      Clinical research
      Pharma companies
      Hospitals
    
    
    Technology
    
      Sequencing by synthesis
      Ion semiconductor sequencing
      Single-molecule real-time sequencing
      Others
    
    
    Geography
    
      North America
    
        US
    
    
      Europe
    
        Germany
        UK
    
    
      APAC
    
        China
        Singapore
    
    
      Rest of World (ROW)
    

    .

    By End-user Insights

    The acamedic and research centers segment is estimated to witness significant growth during the forecast period.

    The global next-generation sequencing (NGS) market for RNA sequencing (RNA-Seq) is primarily driven by academic and research institutions, including those from universities, research institutes, government entities, biotechnology organizations, and pharmaceutical companies. These institutions utilize NGS technology for various research applications, such as whole-genome sequencing, epigenetics, and emerging fields like agrigenomics and animal research, to enhance crop yield and nutritional composition. NGS-based RNA-Seq plays a pivotal role in translational research, with significant investments from both private and public organizations fueling its growth. The technology is instrumental in disease research, enabling the identification

  9. o

    Single cell RNA sequencing (scRNAseq) of transplanted mT3 tumors

    • explore.openaire.eu
    • search.dataone.org
    • +1more
    Updated Sep 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ivana Peran (2020). Single cell RNA sequencing (scRNAseq) of transplanted mT3 tumors [Dataset]. http://doi.org/10.6071/m3hm32
    Explore at:
    Dataset updated
    Sep 23, 2020
    Authors
    Ivana Peran
    Description

    Background & Aims: Pancreatic ductal adenocarcinomas (PDAC) are characterized by fibrosis and an abundance of cancer-associated fibroblasts (CAFs). We investigated strategies to disrupt interactions among CAFs, the immune system, and cancer cells, focusing on adhesion molecule cadherin 11 (CDH11), which has been associated with other fibrotic disorders and is expressed by activated fibroblasts. Methods: We compared levels of CDH11mRNA in human pancreatitis and pancreatic cancer tissues and cells, compared with normal pancreas, and measured levels of CDH11 protein in human and mouse pancreatic lesions and normal tissues. We crossed p48-Cre;LSL-KrasG12D/+;LSL-Trp53R172H/+(KPC) mice with CDH11-knockout mice and measured survival times of offspring. Pancreata were collected and analyzed by histology, immunohistochemistry, and (single-cell) RNA sequencing; RNA and proteins were identified by imaging mass cytometry. Some mice were given injections of PD1 antibody or gemcitabine and survival was monitored. Pancreatic cancer cells from KPC mice were subcutaneously injected into Cdh11+/+ and Cdh11–/– mice and tumor growth was monitored. Pancreatic cancer cells (mT3) from KPC mice (C57BL/6), were subcutaneously injected into Cdh11+/+ (C57BL/6J) mice and mice were given injections of antibody against CDH11, gemcitabine, or small molecule inhibitor of CDH11 (SD133) and tumor growth was monitored. Results: Levels of CDH11mRNA and protein were significantly higher in CAFs than in pancreatic cancer epithelial cells, human or mouse pancreatic cancer cell lines, or immune cells. KPC/Cdh11+/– and KPC/Cdh11–/– mice survived significantly longer than KPC/Cdh11+/+ mice. Markers of stromal activation entirely surrounded pancreatic intraepithelial neoplasias in KPC/Cdh11+/+ mice and incompletely in KPC/Cdh11+/– and KPC/Cdh11–/– mice, whose lesions also contained fewer FOXP3+cells in the tumor center. Compared with pancreatic tumors inKPC/Cdh11+/+ mice, tumors of KPC/Cdh11+/– mice had increased markers of antigen processing and presentation; more lymphocytes and associated cytokines; decreased extracellular matrix components; and reductions in markers and cytokines associated with immunosuppression. Administration of the PD1 antibody did not prolong survival of KPC mice with 0, 1, or 2 alleles of Cdh11. Gemcitabine extended survival only of KPC/Cdh11+/– and KPC/Cdh11–/– mice or reduced subcutaneous tumor growth in mT3 engrafted Cdh11+/+ mice given in combination with the CDH11 antibody. A small molecule inhibitor of CDH11 reduced growth of pre-established mT3 subcutaneous tumors only if T and B cells were present in mice. Conclusions: Knockout or inhibition of CDH11, which is expressed by CAFs in the pancreatic tumor stroma, reduces growth of pancreatic tumors, increases their response to gemcitabine, and significantly extends survival of mice. CDH11 promotes immunosuppression and extracellular matrix deposition, and might be developed as a therapeutic target for pancreatic cancer mT3 tumor was generated by injecting 25,000 mT3 cells (derived from a PDAC of a KPC C57BL/6 mouse) subcutaneously into the back flank of 10-week-old female C57BL/6 mice in a 1:1 suspension of Matrigel (Cat# 354234, Corning) and PBS. At 3 weeks post injection, the tumor was dissected and processed as described before to obtain single cell suspensions. Subsequently, immune cells and blood cells were removed by CD45+ magnetic bead-based depletion (Cat# 130-052-301, Miltenyi Biotech) and ACK lysis buffer (Cat# A1049201, Gibco), respectively, following manufacturer’s guidelines. Remaining cells were prepared for single cell sequencing using Chromium Single Cell 3ʹ GEM, Library & Gel Bead Kit v3 (Cat# 1000075, 10X Genomics) on a 10X Genomics Chromium Controller following manufacturers protocol and sequenced using Illumina NextSeq 500 sequencer. The Cell Ranger Single-Cell Software Suite (10X Genomics) was used to perform sample demultiplexing, barcode processing, and single-cell 3′ gene counting. Sequencing data was aligned to the mouse reference genome (mm10) using “cellranger mkfastq” with default parameters. Unique molecular identifier (UMI) counts were generated using “cellranger count”. Further analysis was performed in R using the Seurat package. Briefly, cells with fewer than 500 detected genes per cell and genes that were expressed by fewer than 5 cells were filtered out. Subsequently, cells with >7800 genes were filtered out to remove noise from droplets containing more than one cell. Dead cells were excluded by retaining cells with <5% mitochondrial reads. The data was subsequently normalized by employing a global-scaling normalization method “LogNormalize” followed by identification of 2,000 most variable genes in the dataset, data scaling and subsequently dimensionality reduction by principal component analysis (PCA) using the 2000 variable genes. Then, a gra...

  10. COVID-19 vaccination single cell datasets

    • zenodo.org
    application/gzip, bin
    Updated Sep 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bingjie Zhang; Bingjie Zhang; Rabi Upadhyay; Rabi Upadhyay; Yuhan Hao; Yuhan Hao; Marie I. Samanovic; Marie I. Samanovic; Ramin S. Herati; Ramin S. Herati; John Blair; John Blair; Jordan Axelrad; Jordan Axelrad; Mark J. Mulligan; Mark J. Mulligan; Dan R. Littman; Dan R. Littman; Rahul Satija; Rahul Satija (2023). COVID-19 vaccination single cell datasets [Dataset]. http://doi.org/10.5281/zenodo.7555405
    Explore at:
    application/gzip, binAvailable download formats
    Dataset updated
    Sep 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bingjie Zhang; Bingjie Zhang; Rabi Upadhyay; Rabi Upadhyay; Yuhan Hao; Yuhan Hao; Marie I. Samanovic; Marie I. Samanovic; Ramin S. Herati; Ramin S. Herati; John Blair; John Blair; Jordan Axelrad; Jordan Axelrad; Mark J. Mulligan; Mark J. Mulligan; Dan R. Littman; Dan R. Littman; Rahul Satija; Rahul Satija
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PBMC samples for CITE-seq and ASAP-seq were collected at four time points: immediately before (Day 0) vaccination, after primary vaccination (Day 2, Day 10), and seven days after boost vaccination (Day 28).

    The datasets uploaded here are three processed single-cell datasets:

    1. PBMC_vaccine_CITE.rds: 3' RNA and surface proteins (173 TotalSeq-A antibodies)

    2. PBMC_vaccine_ASAP.rds: Chromatin accessibility and surface proteins (173 TotalSeq-A antibodies)

    3. PBMC_vaccine_ECCITE_TCR.rds: 5' RNA, surface proteins (137 TotalSeq-C antibodies), TCR and dextramer loaded with peptides of SARS-CoV-2 spike protein.

    antigen_module_genes.rds: This file contains the vaccine-induced gene sets.

    antigen_module_peaks.rds: This file contains the DE peaks specific for vaccine-induced cells.

    To map the scRNA-seq query dataset onto our CITE-seq reference:

    library(Seurat)
    
    PBMC_CITE <- readRDS("/zenedo/PBMC_vaccine_CITE.rds")
    query_scRNA <- readRDS("/home/xx/your_own_data.rds")
    
    anchors <- FindTransferAnchors(
      reference = PBMC_CITE,
      query = query_scRNA,
      normalization.method = "SCT",
      k.anchor = 5,
      reference.reduction = "spca",
      dims = 1:50)
     
    query_scRNA <- MapQuery(
      anchorset = anchors,
      query = query_scRNA,
      reference = PBMC_CITE,
      refdata = list(
       l1 = "celltypel1",
       l2 = "celltypel2",
       l3 = "celltypel3"),
      reference.reduction = "spca",
      reduction.model = "wnn.umap") 
    
    
    

    To use the scATAC-seq data, please run the commands below to update the path of the fragment file for the object.

    Vaccine_ASAP <- readRDS("PBMC_vaccine_ASAP.rds")
    # remove fragment file information
    Fragments(Vaccine_ASAP) <- NULL
    # Update the path of the fragment file 
    Fragments(Vaccine_ASAP) <- CreateFragmentObject(path = "download/PBMC_vaccine_ASAP_fragments.tsv.gz", cells = Cells(Vaccine_ASAP))

  11. Data for the training and testing of ccAFv2

    • zenodo.org
    application/gzip
    Updated Sep 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christohper Plaisier; Christohper Plaisier; Samantha O'Connor; Samantha O'Connor (2024). Data for the training and testing of ccAFv2 [Dataset]. http://doi.org/10.5281/zenodo.13786547
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Sep 18, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christohper Plaisier; Christohper Plaisier; Samantha O'Connor; Samantha O'Connor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single-cell transcriptomics has unveiled a vast landscape of cellular heterogeneity in which the cell cycle is a significant component. We trained a high-resolution cell cycle classifier (ccAFv2) using single cell RNA-seq (scRNA-seq) characterized human neural stem cells. The features of this classifier are that it classifies six cell cycle states (G1, Late G1, S, S/G2, G2/M, and M/Early G1) and a quiescent-like G0 state, and it incorporates a tunable parameter to filter out less certain classifications. The ccAFv2 classifier performed better than or equivalent to other state-of-the-art methods even while classifying more cell cycle states, including G0. We showcased the versatility of ccAFv2 by successfully applying it to classify cells, nuclei, and spatial transcriptomics data in humans and mice, using various normalization methods and gene identifiers. We provide methods to regress the cell cycle expression patterns out of single cell or nuclei data to uncover underlying biological signals. The classifier can be used either as an R package integrated with Seurat (https://github.com/plaisier-lab/ccafv2_R) or a PyPI package integrated with scanpy (https://pypi.org/project/ccAF/). We proved that ccAFv2 has enhanced accuracy, flexibility, and adaptability across various experimental conditions, establishing ccAFv2 as a powerful tool for dissecting complex biological systems, unraveling cellular heterogeneity, and deciphering the molecular mechanisms by which proliferation and quiescence affect cellular processes.

  12. n

    Data from: ETV4 mediates dosage-dependent prostate tumor initiation and...

    • data.niaid.nih.gov
    • datacatalog.mskcc.org
    • +1more
    zip
    Updated Mar 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Li; Yu Chen; Ping Chi; Yu Zhan; Naitao Wang; Fanying Tang; Cindy Lee; Gabriella Bayshtok; Amanda Moore; Elissa Wong; Mohini Pachai; Yuanyuan Xie; Jessica Sher; Jimmy Zhao; Anuradha Gopalan; Joseph Chan; Ekta Khurana; Peter Shepherd; Nora Navone; Makhzuna Khudoynazarova (2023). ETV4 mediates dosage-dependent prostate tumor initiation and cooperates with p53 loss to generate prostate cancer [Dataset]. http://doi.org/10.5061/dryad.v41ns1s0s
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 24, 2023
    Dataset provided by
    Weill Cornell Medicine
    Memorial Sloan Kettering Cancer Center
    The University of Texas MD Anderson Cancer Center
    Authors
    Dan Li; Yu Chen; Ping Chi; Yu Zhan; Naitao Wang; Fanying Tang; Cindy Lee; Gabriella Bayshtok; Amanda Moore; Elissa Wong; Mohini Pachai; Yuanyuan Xie; Jessica Sher; Jimmy Zhao; Anuradha Gopalan; Joseph Chan; Ekta Khurana; Peter Shepherd; Nora Navone; Makhzuna Khudoynazarova
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The mechanisms underlying ETS-driven prostate cancer initiation and progression remain poorly understood due to a lack of model systems that recapitulate this phenotype. We generated a genetically engineered mouse with prostate-specific expression of the ETS factor, ETV4, at lower and higher protein dosages through mutation of its degron. Lower-level expression of ETV4 caused mild luminal cell expansion without histologic abnormalities and higher-level expression of stabilized ETV4 caused prostatic intraepithelial neoplasia (mPIN) with 100% penetrance within 1 week. Tumor progression was limited by p53-mediated senescence and Trp53 deletion cooperated with stabilized ETV4. The neoplastic cells expressed differentiation markers such as Nkx3.1 recapitulating luminal gene expression features of untreated human prostate cancer. Single-cell and bulk RNA-sequencing showed stabilized ETV4 induced a novel luminal-derived expression cluster with signatures of the cell cycle, senescence, and epithelial to mesenchymal transition. These data suggest that ETS overexpression alone, at sufficient dosage, can initiate prostate neoplasia. Methods Mouse prostate digestion: Intraperitoneal injection of tamoxifen was administered in 8-week-old mice. 2 weeks after tamoxifen treatment, the mouse prostate was digested 1 hour with Collagenase/Hyaluronidase (STEMCELL, #07912), and then 30 minutes with TrypLETM Express Enzyme (Thermo Fischer, # 12605028) at 37°C to isolate single prostate cells. The prostate cells were stained with PE/Cy7 conjugated anti-mouse CD326 (EpCAM) antibody (BioLegend, 118216) and then, CD326 and EYFP double positive cells were sorted out by flow cytometry, which are luminal cells mainly from the anterior prostate and dorsal prostate. The mRNA or genomic DNA were extracted from these double-positive cells and then were used for ATAC-sequencing and RNA-sequencing analysis. ATAC-seq and primary data processing: ATAC-seq was performed as previously described. Primary data processing and peak calling were performed using ENCODE ATAC-seq pipeline (https://github.com/kundajelab/atac_dnase_pipelines). Briefly, paired-end reads were trimmed, filtered, and aligned against mm9 using Bowtie2. PCR duplicates and reads mapped to mitochondrial chromosome or repeated regions were removed. Mapped reads were shifted +4/-5 to correct for the Tn5 transposase insertion. Peak calling was performed using MACS2, with p-value < 0.01 as the cutoff. Reproducible peaks from two biological replicates were defined as peaks that overlapped by more than 50%. On average 25 million uniquely mapped pairs of reads were remained after filtering. The distribution of inserted fragment length shows a typical nucleosome banding pattern, and the TSS enrichment score (reads that are enriched around TSS against background) ranges between 28 and 33, suggesting the libraries have high quality and were able to capture the majority of regions of interest. Differential peak accessibility: Reads aligned to peak regions were counted using R package GenomicAlignments_v1.12.2. Read count normalization and differential accessible peaks were called with DESeq2_v1.16.1 in R 3.4.1. Differential peaks were defined as peaks with adjusted p-value < 0.01 and |log2(FC)| > 2. For visualization, coverage bigwig files were generated using bamCoverage command from deepTools2, normalizing using the size factor generated by DESeq2. The differential ATAC-seq peak density plot was generated with deepTools2, using regions that were significantly more or less accessible in ETV4AAA samples relative to EYFP samples. Motif analysis: Enriched motif was performed using MEME-ChIP 5.0.0 with differentially accessible regions in ETV4AAA relative to EYFP. ATAC-seq footprinting was performed using TOBIAS. First, ACACCorrect was run to correct Tn5 bias, followed by ScoreBigwig to calculate footprint score, and finally BindDetect to generate differential footprint across regions. RNA-seq analysis: The extracted RNA was processed for RNA-sequencing by the Integrated Genomics Core Facility at MSKCC. The libraries were sequenced on an Illumina HiSeq-2500 platform with 51 bp paired-end reads to obtain a minimum yield of 40 million reads per sample. The sequenced data were aligned using STAR v2.3 with GRCm38.p6 as annotation. DESeq2_v1.16.1 was subsequently applied on read counts for normalization and the identification of differentially expressed genes between ETV4AAA and EYFP groups, with an adjusted p-value < 0.05 as the threshold. Genes were ranked by sign(log2(FC)) * (-log(p-value)) as input for GSEA analysis using ‘Run GSEA Pre-ranked’ with 1000 permutations (48). The custom gene sets used in GSEA analysis are shown in Table S2. Unsupervised hierarchical clustering: To get an overall sample clustering as part of QC, hierarchical clustering was performed using pheatmap_v1.0.10 package in R on normalized ATAC-seq or RNA-seq data. It was done using all peaks or all genes, with Spearman or Pearson correlation as the distance metric. To have an overview of the differential gene expression from the RNA-seq data, unsupervised clustering was also performed on a matrix with all samples as columns and scaled normalized read counts of differentially expressed genes between ETV4AAA and EYFP as rows. Integrative analysis of ATAC-seq, RNA-seq, and ChIP-seq data: ERG ChIP-seq peaks were called using MACS 2.1, with an FDR cutoff of q < 10-3 and the removal of peaks mapped to blacklist regions. Reproducible peaks between two biological replicates were identified as ETV4AAA ATAC-seq peaks. ERG ChIP-seq peaks and ETV4AAA ATAC-seq peaks were considered as overlap if peak summits were within 250bp. To determine whether the overlap was significant, enrichment analysis was done using regioneR_v1.8.1 in R, which counted the number of overlapped peaks between a set of randomly selected regions in the genome (excluding blacklist regions) and the ERG-ChIP seq peaks or ETV4AAA ATAC-seq peaks. A null distribution was formed using 1000 permutation tests to compute the p-value and z-score of the original evaluation. To assign ATAC-seq peaks to genes, ChIPseeker_v1.12.1 in R was used. Each peak was unambiguously assigned to one gene with a TSS or 3’ end closest to that peak. Differential gene expression between ETV4AAA and EYFP was evaluated using log2(FC) calculated by DESeq2. p-values were estimated with Wilcoxon rank t-test and Student t-test. scRNA-sequencing: Tmprss2-CreERT2, EYFP; Tmprss2-CreERT2, ETV4WT; Tmprss2-CreERT2, ETV4AAA; and Tmprss2-CreERT2, ETV4AAA; Trp53L/L mice were euthanized 2 weeks or 4 months after tamoxifen treatment (n=3 mice for each genotype and time point). After euthanasia, the prostates were dissected out and minced with scalpel, and then processed for 1h digestion with collagenase/hyaluronidase (#07912, STEMCELL Technologies) and 30min digestion with TrypLE (#12605010, Gibco). Live single prostate cells were sorted out by flow cytometry as DAPI-. For each mouse, 5,000 cells were directly processed with 10X genomics Chromium Single Cell 3’ GEM, Library & Gel Bead Kit v3 according to manufacturer’s specifications. For each sample, 200 million reads were acquired on NovaSeq platform S4 flow cell. Reads obtained from the 10x Genomics scRNAseq platform were mapped to mouse genome (mm9) using the Cell Ranger package (10X Genomics). True cells are distinguished from empty droplets using scCB2 package. The levels of mitochondrial reads and numbers of unique molecular identifiers (UMIs) were similar among the samples, which indicates that there were no systematic biases in the libraries from mice with different genotypes. Cells were removed if they expressed fewer than 600 unique genes, less than 1,500 total counts, more than 50,000 total counts, or greater than 20% mitochondrial reads. Genes detected in less than 10 cells and all mitochondrial genes were removed for subsequent analyses. Putative doublets were removed using the Doublet Detection package. The average gene detection in each cell type was similar among the samples. Combining samples in the entire cohort yielded a filtered count matrix of 48,926 cells by 19,854 genes, with a median of 6,944 counts and a median of 1,973 genes per cell, and a median of 2,039 cells per sample. The count matrix was then normalized to CPM (counts per million), and log2(X+1) transformed for analysis of the combined dataset. The top 1000 highly variable genes were found using SCANPY (version 1.6.1) (77). Principal Component Analysis (PCA) was performed on the 1,000 most variable genes with the top 50 principal components (PCs) retained with 29% variance explained. To visualize single cells of the global atlas, we used UMAP projections (https://arxiv.org/abs/1802.03426). We then performed Leiden clustering. Marker genes for each cluster were found with scanpy.tl.rank_genes_groups. Cell types were determined using the SCSA package, an automatic tool, based on a score annotation model combining differentially expressed genes (DEGs) and confidence levels of cell markers from both known and user-defined information. Heat-map were performed for single cells based on log-normalized and scaled expression values of marker genes curated from literature or identified as highly differentially expressed. Differentially expressed genes between different clusters were found using MAST package, which were shown in heat-map. The logFC of MAST output was used for the ranked gene list in GSEA analysis (48). The custom gene sets used in GSEA analysis are shown in Table S2. Gene imputation was performed using MAGIC (Markov affinity-based graph imputation of cells) package, and imputated gene expression were used in the heatmap. Analysis of public human gene expression datasets: To analyze TP53 RNA expression in human prostate cancer samples, we obtained normalized RNA-seq data from prostate cancer TCGA (www.firebrowse.org) (3). To assess the role of TP53 loss on

  13. n

    Data from: Extraocular muscle stem cells exhibit distinct cellular...

    • data.niaid.nih.gov
    zip
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniela Di Girolamo; Maria Benavente-Diaz; Melania Murolo; Alexandre Grimaldi; Priscilla Thomas Lopes; Brendan Evano; Mao Kuriki; Stamatia Gioftsidi; Vincent Laville; Jean-Yves Tinevez; Gaëlle Letort; Sebastien Mella; Shahragim Tajbakhsh; Glenda Comai (2024). Extraocular muscle stem cells exhibit distinct cellular properties associated with non-muscle molecular signatures [Dataset]. http://doi.org/10.5061/dryad.b8gtht7k0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 25, 2024
    Dataset provided by
    Délégation Ile-de-France Ouest et Nord
    Institut Pasteur
    Authors
    Daniela Di Girolamo; Maria Benavente-Diaz; Melania Murolo; Alexandre Grimaldi; Priscilla Thomas Lopes; Brendan Evano; Mao Kuriki; Stamatia Gioftsidi; Vincent Laville; Jean-Yves Tinevez; Gaëlle Letort; Sebastien Mella; Shahragim Tajbakhsh; Glenda Comai
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The muscle stem cell (MuSC) population is recognized as functionally heterogeneous. Cranial muscle stem cells, which originate from head mesoderm, can have greater proliferative capacity in culture and higher regenerative potential in transplantation assays when compared to those in the limb. The existence of such functional differences in phenotypic outputs remain unresolved as a comprehensive understanding of the underlying mechanisms is lacking. We addressed this issue using a combination of clonal analysis, live imaging, and scRNA-seq, identifying critical biological features that distinguish extraocular (EOM) and limb (Tibialis anterior, TA) MuSC populations. Time-lapse studies using a MyogenintdTomato reporter showed that the increased proliferation capacity of EOM MuSCs is accompanied by a differentiation delay in vitro. Unexpectedly, in vitro activated EOM MuSCs expressed a large array of distinct extracellular matrix (ECM) components, growth factors, and signaling molecules that are typically associated with mesenchymal non-muscle cells. These unique features are regulated by a specific set of transcription factors that constitute a coregulating module. This transcription factor network, which includes Foxc1 as one of the major players, appears to be hardwired to EOM identity as it is present in quiescent adult MuSCs, in the activated counterparts during growth and retained upon passages in vitro. These findings provide insights into how high-performing MuSCs regulate myogenic commitment by active remodeling of their local environment. Methods

    scRNAseq data generation MuSCs were isolated on BD FACSAriaTM III based on GFP fluorescence and cell viability from Tg:Pax7- nGFP mice (Sambasivan et al., 2009). Quiescent MuSCs were manually counted using a hemocytometer and immediately processed for scRNA-seq. For activated samples, MuSCs were cultured in vitro as described above for four days. Activated MuSCs were subsequently trypsinized and washed in DMEM/F12 2% FBS. Live cells were re-sorted, manually counted using a hemocytometer and processed for scRNA-seq. Prior to scRNAseq, RNA integrity was assessed using Agilent Bioanalyzer 2100 to validate the isolation protocol (RIN>8 was considered acceptable). 10X Genomics Chromium microfluidic chips were loaded with around 9000 cells and cDNA libraries were generated following manufacturer’s protocol. Concentrations and fragment sizes were determined using Agilent Bioanalyzer and Invitrogen Qubit. cDNA libraries were sequenced using NextSeq 500 and High Output v2.5 (75 cycles) kits. Count matrices were subsequently generated following 10X Genomics Cell Ranger pipeline. Following normalisation and quality control, we obtained an average of 5792 ± 1415 cells/condition. Seurat preprocessing scRNAseq datasets were processed using Seurat (https://satijalab.org/seurat/) (Butler et al., 2018). Cells with more than 10% of mitochondrial gene fraction were discarded. 4000-5000 genes were detected on average across all 4 datasets. Dimensionality reduction and UMAPs were generated following Seurat workflow. The top 100 DEGs were determined using Seurat "FindAllMarkers" function with default parameters. When processed independently (scvelo), the datasets were first regressed on cell cycle genes, mitochondrial fraction, number of genes, number of UMI following Seurat dedicated vignette, and doublets were removed using DoubletFinder v3 (McGinnis et al., 2019). A "StressIndex" score was generated for each cell based on the list of stress genes previously reported (Machado et al., 2021) using the “AddModule” Seurat function. 94 out of 98 genes were detected in the combined datasets. UMAPs were generated after 1. StressIndex regression, and 2. after complete removal of the detected stress genes from the gene expression matrix before normalization. In both cases, the overall aspect of the UMAP did not change significantly (Figure S5). Although immeasurable confounding effects of cell stress following isolation cannot be ruled out, we reasoned that our datasets did not show a significant effect of stress with respect to the conclusions of our study. Matrisome analysis After subsetting for the features of the Matrisome database (Naba et al., 2015) present in our single-cell dataset, the matrisome score was calculated by assessing the overall expression of its constituents using the "AddModuleScore" function from Seurat (Butler et al., 2018).

    RNA velocity and driver genes Scvelo was used to calculate RNA velocities (Bergen et al., 2020). Unspliced and spliced transcript matrices were generated using velocyto (Manno et al., 2018) command line function. Seurat-generated filtering, annotations and cell-embeddings (UMAP, tSNE, PCA) were then added to the outputted objects. These datasets were then processed following scvelo online guide and documentation. Velocity was calculated based on the dynamical model (using scv.tl.recover_dynamics(adata), and scv.tl.velocity(adata, mode=’dynamical’)) and differential kinetics calculations were added to the model (using scv.tl.velocity(adata, diff_kinetics=True)). Specific driver genes were identified by determining the top likelihood genes in the selected cluster. The lists of the top 100 drivers for EOM and TA progenitors are given in Suppl Tables 10 and 11. Gene regulatory network inference and transcription factor modules Gene regulatory networks were inferred using pySCENIC (Aibar et al., 2017; Sande et al., 2020). This algorithm regroups sets of correlated genes into regulons (i.e. a transcription factor and its targets) based on binding motifs and co-expression patterns. The top 35 regulons for each cluster were determined using scanpy "scanpy.tl.rank_genes_groups" function (method=t-test). Note that this function can yield less than 35 results depending on the cluster. UMAP and heatmap were generated using regulon AUC matrix (Area Under Curve) which refers to the activity level of each regulon in a given cell. Visualizations were performed using scanpy (Wolf et al., 2018). The outputted list of each regulon and their targets was subsequently used to create a transcription factor network. To do so, only genes that are regulons themselves were kept. This results in a visual representation where each node is an active transcription factor and each edge is an inferred regulation between 2 transcription factors. When placed in a force-directed environment, these nodes aggregate based on the number of shared edges. This operation greatly reduced the number of genes involved, while highlighting co-regulating transcriptional modules. Visualization of this network was performed in a force-directed graph using Gephi “Force-Atlas2” algorithm (https://gephi.org/).

  14. Raw and processed (filtered and annotated) scRNAseq data

    • figshare.com
    zip
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabrielle Leclercq-Cohen; Sabrina Danilin; Llucia Alberti-Servera; Stephan Schmeing; Hélène Haegel; Sina Nassiri; Marina Bacac (2023). Raw and processed (filtered and annotated) scRNAseq data [Dataset]. http://doi.org/10.6084/m9.figshare.23499192.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Gabrielle Leclercq-Cohen; Sabrina Danilin; Llucia Alberti-Servera; Stephan Schmeing; Hélène Haegel; Sina Nassiri; Marina Bacac
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single cell RNA-seq data generated and reported as part of the manuscript entitled "Dissecting the mechanisms underlying the Cytokine Release Syndrome (CRS) mediated by T Cell Bispecific Antibodies" by Leclercq-Cohen et al 2023. Raw and processed (filtered and annotated) data are provided as AnnData objects which can be directly ingested to reproduce the findings of the paper or for ab initio data reuse: 1- raw.zip provides concatenated raw/unfiltered counts for the 20 samples in the standard Market Exchange Format (MEX) format. 2- 230330_sw_besca2_LowFil_raw.h5ad contains filtered cells and raw counts in the HDF5 format. 3- 221124_sw_besca2_LowFil.annotated.h5ad contains filtered cells and log normalized counts, along with cell type annotation in the HDF5 format.

    scRNAseq data generation: Whole blood from 4 donors was treated with 0.2 μg/mL CD20-TCB, or incubated in the absence of CD20- TCB. At baseline (before addition of TCB) and assay endpoints (2, 4, 6, and 20 hrs), blood was collected for total leukocyte isolation using EasySepTM red blood cell depletion reagent (Stemcell). Briefly, cells were counted and processed for single cell RNA sequencing using the BD Rhapsody platform. To load several samples on a single BD Rhapsody cartridge, sample cells were labelled with sample tags (BD Human Single-Cell Multiplexing Kit) following the manufacturer’s protocol prior to pooling. Briefly, 1x106 cells from each sample were re-suspended in 180 μL FBS Stain Buffer (BD, PharMingen) and sample tags were added to the respective samples and incubated for 20 min at RT. After incubation, 2 successive washes were performed by addition of 2 mL stain buffer and centrifugation for 5 min at 300 g. Cells were then re- suspended in 620 μL cold BD Sample Buffer, stained with 3.1 μL of both 2 mM Calcein AM (Thermo Fisher Scientific) and 0.3 mM Draq7 (BD Biosciences) and finally counted on the BD Rhapsody scanner. Samples were then diluted and/or pooled equally in 650 μL cold BD Sample Buffer. The BD Rhapsody cartridges were then loaded with up to 40 000 – 50 000 cells. Single cells were isolated using Single-Cell Capture and cDNA Synthesis with the BD Rhapsody Express Single-Cell Analysis System according to the manufacturer’s recommendations (BD Biosciences). cDNA libraries were prepared using the Whole Transcriptome Analysis Amplification Kit following the BD Rhapsody System mRNA Whole Transcriptome Analysis (WTA) and Sample Tag Library Preparation Protocol (BD Biosciences). Indexed WTA and sample tags libraries were quantified and quality controlled on the Qubit Fluorometer using the Qubit dsDNA HS Assay, and on the Agilent 2100 Bioanalyzer system using the Agilent High Sensitivity DNA Kit. Sequencing was performed on a Novaseq 6000 (Illumina) in paired-end mode (64-8- 58) with Novaseq6000 S2 v1 or Novaseq6000 SP v1.5 reagents kits (100 cycles). scRNAseq data analysis: Sequencing data was processed using the BD Rhapsody Analysis pipeline (v 1.0 https://www.bd.com/documents/guides/user-guides/GMX_BD-Rhapsody-genomics- informatics_UG_EN.pdf) on the Seven Bridges Genomics platform. Briefly, read pairs with low sequencing quality were first removed and the cell label and UMI identified for further quality check and filtering. Valid reads were then mapped to the human reference genome (GRCh38-PhiX-gencodev29) using the aligner Bowtie2 v2.2.9, and reads with the same cell label, same UMI sequence and same gene were collapsed into a single raw molecule while undergoing further error correction and quality checks. Cell labels were filtered with a multi-step algorithm to distinguish those associated with putative cells from those associated with noise. After determining the putative cells, each cell was assigned to the sample of origin through the sample tag (only for cartridges with multiplex loading). Finally, the single-cell gene expression matrices were generated and a metrics summary was provided. After pre-processing with BD’s pipeline, the count matrices and metadata of each sample were aggregated into a single adata object and loaded into the besca v2.3 pipeline for the single cell RNA sequencing analysis (43). First, we filtered low quality cells with less than 200 genes, less than 500 counts or more than 30% of mitochondrial reads. This permissive filtering was used in order to preserve the neutrophils. We further excluded potential multiplets (cells with more than 5,000 genes or 20,000 counts), and genes expressed in less than 30 cells. Normalization, log-transformed UMI counts per 10,000 reads [log(CP10K+1)], was applied before downstream analysis. After normalization, technical variance was removed by regressing out the effects of total UMI counts and percentage of mitochondrial reads, and gene expression was scaled. The 2,507 most variable genes (having a minimum mean expression of 0.0125, a maximum mean expression of 3 and a minimum dispersion of 0.5) were used for principal component analysis. Finally, the first 50 PCs were used as input for calculating the 10 nearest neighbours and the neighbourhood graph was then embedded into the two-dimensional space using the UMAP algorithm at a resolution of 2. Cell type annotation was performed using the Sig-annot semi-automated besca module, which is a signature- based hierarchical cell annotation method. The used signatures, configuration and nomenclature files can be found at https://github.com/bedapub/besca/tree/master/besca/datasets. For more details, please refer to the publication.

  15. Z

    Oncogenic signalling is coupled to colorectal cancer cell differentiation...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Apr 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sell, Thomas (2023). Oncogenic signalling is coupled to colorectal cancer cell differentiation state [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6400082
    Explore at:
    Dataset updated
    Apr 11, 2023
    Dataset provided by
    Sell, Thomas
    Fischer, Matthias M.
    Astaburuaga-García, Rosario
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mass cytometry and single-cell RNA-sequencing data as well as R Markdown reports to reproduce the figures of our publication.

    Raw MC data were saved post de-convolution, spillover-compensation, and removal of calibration bead events. Gates for singlets and non-dead cells (low_Pt) are included as logical columns and should be applied prior to usage.

    As we performed random sampling to equalise cell numbers across conditions, batch normalisation, and used non-linear dimensionality reduction techniques (UMAP and Diffusion Maps), resulting plots may differ slightly from the published figures, yet still support the drawn conclusions. Already normalised and/or sampled data as well as pre-computed UMAP and Diffusion Map coordinates are included in this data set to reproduce the manuscript figures exactly, as shown in the included report “figures_only”. For all details on the batch normalisation and data analysis steps performed, please consult the report “data_analysis” instead.

  16. f

    DataSheet_1_A Tool for Visualization and Analysis of Single-Cell RNA-Seq...

    • frontiersin.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gennaro Gambardella; Diego di Bernardo (2023). DataSheet_1_A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining.pdf [Dataset]. http://doi.org/10.3389/fgene.2019.00734.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Gennaro Gambardella; Diego di Bernardo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gene expression in individual cells can now be measured for thousands of cells in a single experiment thanks to innovative sample-preparation and sequencing technologies. State-of-the-art computational pipelines for single-cell RNA-sequencing data, however, still employ computational methods that were developed for traditional bulk RNA-sequencing data, thus not accounting for the peculiarities of single-cell data, such as sparseness and zero-inflated counts. Here, we present a ready-to-use pipeline named gf-icf (gene frequency–inverse cell frequency) for normalization of raw counts, feature selection, and dimensionality reduction of scRNA-seq data for their visualization and subsequent analyses. Our work is based on a data transformation model named term frequency–inverse document frequency (TF-IDF), which has been extensively used in the field of text mining where extremely sparse and zero-inflated data are common. Using benchmark scRNA-seq datasets, we show that the gf-icf pipeline outperforms existing state-of-the-art methods in terms of improved visualization and ability to separate and distinguish different cell types.

  17. z

    Data from: A multidimensional analysis reveals distinct immune phenotypes...

    • zenodo.org
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joost Koedijk; Joost Koedijk; Olaf Heidenreich; Olaf Heidenreich (2025). A multidimensional analysis reveals distinct immune phenotypes and the composition of immune aggregates in pediatric acute myeloid leukemia [Dataset]. http://doi.org/10.5281/zenodo.15276882
    Explore at:
    Dataset updated
    Jun 1, 2025
    Dataset provided by
    Zenodo
    Authors
    Joost Koedijk; Joost Koedijk; Olaf Heidenreich; Olaf Heidenreich
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw GeoMx Data generated for
    https://www.nature.com/articles/s41375-024-02381-w

    Paper abstract:
    Because of the low mutational burden and consequently, fewer potential neoantigens, children with acute myeloid leukemia (AML) are thought to have a T cell-depleted or ‘cold’ tumor microenvironment and may have a low likelihood of response to T cell-directed immunotherapies. Understanding the composition, phenotype, and spatial organization of T cells and other microenvironmental populations in the pediatric AML bone marrow (BM) is essential for informing future immunotherapeutic trials about targetable immune-evasion mechanisms specific to pediatric AML. Here, we conducted a multidimensional analysis of the tumor immune microenvironment in pediatric AML and non-leukemic controls. We demonstrated that nearly one-third of pediatric AML cases has an immune-infiltrated BM, which is characterized by a decreased ratio of M2- to M1-like macrophages. Furthermore, we detected the presence of large T cell networks, both with and without colocalizing B cells, in the BM and dissected the cellular composition of T- and B cell-rich aggregates using spatial transcriptomics. These analyses revealed that these aggregates are hotspots of CD8+ T cells, memory B cells, plasma cells and/or plasmablasts, and M1-like macrophages. Collectively, our study provides a multidimensional characterization of the BM immune microenvironment in pediatric AML and indicates starting points for further investigations into immunomodulatory mechanisms in this devastating disease.

    GeoMx methods:
    5 μm thick FFPE BM biopsy sections from six pediatric AML cases with an immune-infiltrated BM and two non-leukemic controls were put on three different slides and prepared for GeoMx Digital Spatial Profiling (DSP; NanoString), as previously described. Slides were simultaneously incubated with immunofluorescent antibodies and GeoMx Whole Transcriptome Atlas profiling reagents. SYTO13 (S7575, Thermo Fisher,) was used for identification of nuclei, CD45 (NBP2-34528, Novus) for leukocytes, and CD3 (NBP2-54392AF647, Novus) for T cells. Stained slides were loaded onto the GeoMx instrument and scanned. ROIs were selected using the above-mentioned antibodies in combination with overlayed images of CD20, CD34, CD3-CD4 (duplex), and CD117 (IHC). Then, UV-photocleaved oligonucleotides were collected in separate wells and sequenced on the Nextseq2000 (Illumina, San Diego, CA, USA).

    Raw data were normalized using Quartile 3 count (Q3) normalization in R (V.4.2.1) as per NanoString’s recommendations (code is available in the vignette of the Geomxtools package:
    https://bioconductor.org/packages/release/workflows/vignettes/GeoMxWorkflows/inst/doc/GeomxTools_RNA-NGS_Analysis.html). Adjusted code is available with the download.

    Batch correction was performed using Combat-seq17. Spatial Deconvolution was performed using the safeTME reference (SpatialDecon package; cell reference profiles are available via https://github.com/Nanostring-Biostats/CellProfileLibrary/blob/archive/safeTME-for-tumor-immune.csv).

    Furthermore, we retrieved single-cell RNA-sequencing data from pediatric tonsillar B cells (sample BC005 was chosen since it had the highest number of cells, as done previously) and adult AML bone marrow CD8+ T cells (all patients), and converted these two additional reference profiles using R (V4.2.1).

    Deconvoluted abundance scores were normalized for ROI-size and, in case of immune aggregates, for the ROI-area covered by these aggregates. The ‘12chem’, ‘Tfh’, and ‘TLS imprint’, and M2-predominance signatures were applied to Q3-normalized data and further normalized as mentioned above.

    For questions please contact j.b.koedijk-2@umcutrecht.nl or joostbenjaminkoedijk@gmail.com.

  18. f

    Table6_Mouse Oocytes, A Complex Single Cell Transcriptome.XLSX

    • frontiersin.figshare.com
    xlsx
    Updated Jun 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Di Wu (2023). Table6_Mouse Oocytes, A Complex Single Cell Transcriptome.XLSX [Dataset]. http://doi.org/10.3389/fcell.2022.827937.s012
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    Frontiers
    Authors
    Di Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Germinal vesicle (GV) stage is a critical transition point from growth to maturation in mammalian oocyte development. During the following meiotic maturation, active RNA degradation and absence of transcription significantly reprofile the oocyte transcriptome to determine oocyte quality. Oocyte RNA-seq has revealed transcriptome differences between two defined phases of GV stage, namely non-surrounded nucleolus (NSN) and surrounded nucleolus (SN) phases. In addition, oocyte RNA-seq has identified a variety of dysregulated genes upon genetic mutation or environmental perturbation. Historically, due to the low amount of RNA per oocyte, a few (20–200) oocytes were needed for a regular library construction in bulk RNA-seq. In recent years, development of single cell sequencing allows detailing the transcriptome of individual oocytes. Here in this study, different RNA-seq datasets from single and bulk of mouse oocytes are compared, and single oocyte RNA-seq (soRNA-seq) shows higher reproducibility. In addition, soRNA-seq better illustrates developmental progression of GV oocytes, revealing more complex gene changes than traditional views. Specially, an elevated level of ribosomal RNA 5′-ETS (5′ external transcribed spacer) has been shown to highly correlate with SN property. This study further demonstrates that UMI (unique molecular identifiers) based and other deduplication methods are limited in their ability to improve the precision of the soRNA-seq datasets. Finally, this study proposes that external spike-in molecules are useful for normalizing samples of different transcriptome sizes. A list of stable genes has been identified during oocyte maturation that are comparable to external spike-in molecules. These findings highlight the advantage of soRNA-seq, and have established ways for better clustering and cross-stage normalization, which can provide more insight into the biological features of oocyte maturation.

  19. n

    Supporting data for: Three-dimensional genome re-wiring in loci with Human...

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Jan 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kathleen Keough (2023). Supporting data for: Three-dimensional genome re-wiring in loci with Human Accelerated Regions [Dataset]. http://doi.org/10.7272/Q6057D5N
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 31, 2023
    Dataset provided by
    Gladstone Institutes
    Authors
    Kathleen Keough
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Human Accelerated Regions (HARs) are conserved genomic loci that evolved at an accelerated rate in the human lineage and may underlie human-specific traits. We generated HARs and chimpanzee accelerated regions with an automated pipeline and an alignment of 241 mammalian genomes. Combining deep-learning with chromatin capture experiments in human and chimpanzee neural progenitor cells, we discovered a significant enrichment of HARs in topologically associating domains (TADs) containing human-specific genomic variants that change three-dimensional (3D) genome organization. Differential gene expression between humans and chimpanzees at these loci suggests rewiring of regulatory interactions between HARs and neurodevelopmental genes. Thus, comparative genomics together with models of 3D genome folding revealed enhancer hijacking as an explanation for the rapid evolution of HARs. Methods Lentivirus-based massively parallel reporter assay (lentiMPRA) library design and synthesis Tiles of 270bp in length were generated from all 312 zooHARs. Multiple tiles were generated with a sliding window of 20bp if the zooHAR was longer than 270bp. In total, 549 oligos were designed to cover all zooHARs. We also included 143 oligos centered on active chromatin marks as positive controls. This oligo pool was synthesized by Twist Bioscience. Primary cortical cell culture for lentiMPRA De-identified tissue samples were collected with consent in strict observance of legal and institutional ethical regulations. Protocols were approved by the Human Gamete, Embryo, and Stem Cell Research Committee (institutional review board) at the University of California, San Francisco. Gestational week 18 cortical tissue was dissociated into a single-cell suspension using papain (LK003150, Worthington Biochemical) and plated on 15cm dishes coated with poly-O-lysine, laminin, and fibronectin. DMEM culture medium (Gibco) with B27 (Gibco) and PennStrep (Gibco) was changed every 24 hours. Construction and sequencing of plasmid libraries for lentiMPRA LentiMPRA was performed as previously described with all modifications noted here. A 31bp minimal promoter and a 15bp random barcode were added to each lentiMPRA oligo through two rounds of PCR. The amplicon was then cloned into empty reporter backbone pLS-SceI (plasmid #137725, Addgene). Recombination products were amplified in electrocompetent cells (C3020, NEB) and grown in LB Agar plates (100217-214, VWR) at 37℃ overnight. We harvested ~3.5M colonies by Midiprep (12945, Qiagen), which yielded around 70 barcodes per oligo. A 477bp region in the plasmid containing the oligo and barcode was amplified and sequenced in one lane of Illumina Nextseq Mid-Output to identify the barcodes associated with each oligo. Lentivirus packaging and infection Lentivirus production was performed following the manufacturer’s protocol (LT002, Genecopoeia). To achieve high titer, the crude solution was concentrated using Lenti-X concentrator (631232, Takara Bio). The concentrated virus was immediately stored at -80 in single-use aliquots. For each replicate, a cell counter was used to estimate cell density, and then 20 million primary cortical cells were plated and cultured in a 10cm dish for 2 days before infection. Each dish was infected with 500 ul of lentivirus to achieve a multiplicity of infection (MOI) of 85. Each barcode was estimated to be integrated into random loci over 200 times. Medium was refreshed the next day and cells were incubated for 2 days before harvesting DNA and RNA. DNA/RNA harvest and sequencing DNA and RNA were simultaneously extracted from infected samples using the Allprep kit (80204, Qiagen). To prepare sequencing libraries, 8μg RNA was reverse transcribed to generate cDNA using Superscript IV RT (Invitrogen; 18090200). The integrated barcodes in cDNA and 15μg gDNA were PCR amplified to add a unique molecular identifier (UMI), an index and Illumina P5/P7 sequence. DNA and RNA barcode libraries were pooled with a 1:3 molar ratio and sequenced with NextSeq High-Output. LentiMPRA computational analyses Sequencing libraries were batch corrected to account for differences between samples from different donors. Oligos were required to have at least 10 unique barcodes and exact match to the designed sequence. UMI-normalized reads per oligo were summed over all barcodes, and oligos with less than 40 total DNA reads were discarded. Out of 312 zooHARs, 276 passed these quality control steps. For each of these zooHARs, depth normalization was performed using counts per million reads sequenced (CPM), and a RNA CPM / DNA CPM ratio was calculated for each oligo in each replicate. A zooHAR was determined to be active if its maximally active tile had an average (over replicates) normalized RNA CPM / DNA CPM value exceeding the median value of this statistic for a set of positive control sequences with enhancer-associated epigenetic marks in neurodevelopment (median = 1.06). To compare machine learning predictions to lentiMPRA measurements, the 276 zooHARs passing lentiMPRA quality control were evaluated for whether they had activity above 1.06 (139 active zooHARs) and/or a machine learning scores > 0.3 (175 predicted zooHARs), resulting in 88 high-confidence zooHAR neurodevelopmental enhancers validated by both approaches.

  20. f

    DataSheet2_Dimensionality Reduction and Louvain Agglomerative Hierarchical...

    • frontiersin.figshare.com
    txt
    Updated Jun 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soumita Seth; Saurav Mallik; Tapas Bhadra; Zhongming Zhao (2023). DataSheet2_Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data.CSV [Dataset]. http://doi.org/10.3389/fgene.2022.828479.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    Frontiers
    Authors
    Soumita Seth; Saurav Mallik; Tapas Bhadra; Zhongming Zhao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The major interest domains of single-cell RNA sequential analysis are identification of existing and novel types of cells, depiction of cells, cell fate prediction, classification of several types of tumor, and investigation of heterogeneity in different cells. Single-cell clustering plays an important role to solve the aforementioned questions of interest. Cluster identification in high dimensional single-cell sequencing data faces some challenges due to its nature. Dimensionality reduction models can solve the problem. Here, we introduce a potential cluster specified frequent biomarkers discovery framework using dimensionality reduction and hierarchical agglomerative clustering Louvain for single-cell RNA sequencing data analysis. First, we pre-filtered the features with fewer number of cells and the cells with fewer number of features. Then we created a Seurat object to store data and analysis together and used quality control metrics to discard low quality or dying cells. Afterwards we applied global-scaling normalization method “LogNormalize” for data normalization. Next, we computed cell-to-cell highly variable features from our dataset. Then, we applied a linear transformation and linear dimensionality reduction technique, Principal Component Analysis (PCA) to project high dimensional data to an optimal low-dimensional space. After identifying fifty “significant”principal components (PCs) based on strong enrichment of low p-value features, we implemented a graph-based clustering algorithm Louvain for the cell clustering of 10 top significant PCs. We applied our model to a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270, 23,630 features and 1872 samples (cells)). We obtained 10 cell clusters with a maximum modularity of 0.885 1. After detecting the cell clusters, we found 3871 cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST) with a   log 2FC threshold of 0.25 and a minimum feature detection of 25%. From these cluster-specific biomarkers, we found 1892 most frequent markers, i.e., overlapping biomarkers. We performed degree hub gene network analysis using Cytoscape and reported the five highest degree genes (Rps4x, Rps18, Rpl13a, Rps12 and Rpl18a). Subsequently, we performed KEGG pathway and Gene Ontology enrichment analysis of cluster markers using David 6.8 software tool. In summary, our proposed framework that integrated dimensionality reduction and agglomerative hierarchical clustering provides a robust approach to efficiently discover cluster-specific frequent biomarkers, i.e., overlapping biomarkers from single-cell RNA sequencing data.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nicholas Lytal; Di Ran; Lingling An (2023). Table_3_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s004

Table_3_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx

Related Article
Explore at:
docxAvailable download formats
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Nicholas Lytal; Di Ran; Lingling An
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.

Search
Clear search
Close search
Google apps
Main menu