Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Seurat object (.rds format) for a single-cell ATAC-seq dataset of hematopoietic stem and progenitor cells. It includes 4 samples:controlDKO (Reg1–/–, Reg3–/–)Nfkbiz–/–TKO DKO (Reg1–/–, Reg3–/– Nfkbiz–/–)Data was processed using Seurat and Signac. For more details we refer to the accompanying GitHub repository. In brief, we normalized the data, conducted linear and non-linear dimensionality reduction, clustered cells, calculated "gene activities", and added motif information to the Seurat object.A link to the accompanying paper will be added here after publication.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time course scATAC-seq for in vitro hematopoiesis along with single cell linege tracing using CellTag-multi barcode library.Metadata information:Cell types are stored in metadata column cell_type2. RNA UMAP is stored in the pal reduction and the clone-cell embedding is stored in the cce reduction.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the context of the Human Cell Atlas, we have created a single-cell-driven taxonomy of cell types and states in human tonsils. This repository contains the Seurat objects derived from this effort. In particular, we have datasets for each modality (scRNA-seq, scATAC-seq, CITE-seq, spatial transcriptomics), as well as cell type-specific datasets. Most importantly, this is the input that we used to create the HCATonsilData package, which allows programmatic access to all this datasets within R.
Version 2 of this repository includes cells from 7 additional donors, which we used as a validation cohort to validate the cell types and states defined in the atlas. In addition, in this version we also provide the Seurat object associated with the spatial transcriptomics data (10X Visium), as well as the fragments files for scATAC-seq and Multiome
CD8 T cell exhaustion is a major barrier limiting anti-tumor therapy. Though checkpoint blockade temporarily improves exhausted CD8 T cell (Tex) function, the underlying epigenetic landscape of Tex remains largely unchanged, preventing their durable “reinvigoration.†Whereas the transcription factor (TF) TOX has been identified as a critical initiator of Tex epigenetic programming, it remains unclear whether TOX plays an ongoing role in preserving Tex biology after cells commit to exhaustion. Here, we decoupled the role of TOX in the initiation versus maintenance of CD8 T cell exhaustion by temporally deleting TOX in established Tex. Induced TOX ablation in committed Tex resulted in apoptotic-driven loss of Tex, reduced expression of inhibitory receptors including PD-1, and a pronounced decrease in terminally differentiated subsets of Tex cells. Simultaneous gene expression and epigenetic profiling revealed a critical role for TOX in ensuring ongoing chromatin accessibility and transcri..., Cells from inducible-Cre (Rosa26CreERT2/+Toxfl/fl P14) mice where TOX was temporally deleted from mature populations of LCMV-specific T exhausted cells after establishment of chronic LCMV infection 5 days post infection were subjected to scRNA and scATACseq coassay,naive cells and WT cells were used as controls. Analysis pipeline developed by Josephine Giles and vignettes published by Satija and Stuart labs.Transcript count and peak accessibility matrices deposited in GSE255042,GSE255043. Seurat/Signac was used to process the scRNA and scATACseq coassay data The processed Seurat/Signac object above was subsequently used for downstream RNA and ATAC analyses as described below: DEGs between TOX WT and iKO cells within each subset were identified using FindMarkers (Seurat, Signac), with a log2-fold-change threshold of 0, using the SCT assay. DACRs were identified using FindMarkers using the "LR" test, with a log2-fold-change threshold of 0.1, a min.pct of 0.05, and included the number of c..., , # Continuous expression of TOX safeguards exhausted CD8 T cell epigenetic fate
https://doi.org/10.5061/dryad.8kprr4xx9
Seurat/Signac pipeline for multiomic scRNA-seq and scATAC-seq dataset, generated following inducible TOX deletion in LCMV-Cl13
Author
Yinghui Jane Huang
Purpose: Generate and process Seurat/Signac object for downstream analyses Written: Nov 2021 through Oct 2022 Adapted from: Analysis pipeline developed by Josephine Giles and vignettes published by Satija and Stuart labs Input dataset: Transcript count and peak accessibility matrices deposited in GSE255042,GSE255043
1) Create individual signac objects for each sample from the raw 10x cellranger output.
2) Merge individual objects to create one seurat object.
3) Add metadata to merged seurat object.
Following are the steps in the attached html file for analysis of the paired data (ATAC+RNA)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As single-cell chromatin accessibility profiling methods advance, scATAC-seq has become ever more important in the study of candidate regulatory genomic regions and their roles underlying developmental, evolutionary, and disease processes. At the same time, cell type annotation is critical in understanding the cellular composition of complex tissues and identifying potential novel cell types. However, most existing methods that can perform automated cell type annotation are designed to transfer labels from an annotated scRNA-seq data set to another scRNA-seq data set, and it is not clear whether these methods are adaptable to annotate scATAC-seq data. Several methods have been recently proposed for label transfer from scRNA-seq data to scATAC-seq data, but there is a lack of benchmarking study on the performance of these methods. Here, we evaluated the performance of five scATAC-seq annotation methods on both their classification accuracy and scalability using publicly available single-cell datasets from mouse and human tissues including brain, lung, kidney, PBMC, and BMMC. Using the BMMC data as basis, we further investigated the performance of these methods across different data sizes, mislabeling rates, sequencing depths and the number of cell types unique to scATAC-seq. Bridge integration, which is the only method that requires additional multimodal data and does not need gene activity calculation, was overall the best method and robust to changes in data size, mislabeling rate and sequencing depth. Conos was the most time and memory efficient method but performed the worst in terms of prediction accuracy. scJoint tended to assign cells to similar cell types and performed relatively poorly for complex datasets with deep annotations but performed better for datasets only with major label annotations. The performance of scGCN and Seurat v3 was moderate, but scGCN was the most time-consuming method and had the most similar performance to random classifiers for cell types unique to scATAC-seq.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This record contains analysis products for the paper "Transcription factor stoichiometry, motif affinity and syntax regulate single cell chromatin dynamics during fibroblast reprogramming to pluripotency" by Nair, Ameen et al. Please refer to the READMEs in the directories, which are summarized below.
The record contains the following files:
`clusters.tsv`: contains the cluster id, name and colour of clusters in the paper
scATAC.zip
Analysis products for the single-cell ATAC-seq data. Contains:
- `cells.tsv`: list of barcodes that pass QC. Columns include:
- `barcode`
- `sample`: (time point)
- `umap1`
- `umap2`
- `cluster`
- `dpt_pseudotime_fibr_root`: pseudotime values treating a fibroblast cell as root
- `dpt_pseudotime_xOSK_root`: pseudotime values treating xOSK cell as root
- `peaks.bed`: list of peaks of 500bp across all cell states. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA.
- `features.tsv`: 50 dimensional representation of each cell
- `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed`
scATAC_clusters.zip
Analysis products corresponding to cluster pseudo-bulks of the single-cell ATAC-seq data.
- `clusters.tsv`: contains the cluster id, name and colour used in the paper
- `peaks`: contains `overlap_reproducibilty/overlap.optimal_peak` peaks called using ENCODE bulk ATAC-seq pipeline in the narrowPeak format.
- `fragments`: contains per cluster fragment files
scATAC_scRNA_integration.zip
Analysis products from the integration of scATAC with scRNA. Contains:
- `peak_gene_links_fdr1e-4.tsv`: file with peak gene links passing FDR 1e-4. For analyses in the paper, we filter to peaks with absolute correlation >0.45.
- `harmony.cca.30.feat.tsv`: 30 dimensional co-embedding for scATAC and scRNA cells obtained by CCA followed by applying Harmony over assay type.
- `harmony.cca.metadata.tsv`: UMAP coordinates for scATAC and scRNA cells derived from the Harmony CCA embedding. First column contains barcode.
scRNA.zip
Analysis products for the single-cell RNA-seq data. Contains:
- `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca), knn graphs, all associated metadata. Note that barcode suffix (1-9 corresponds to samples D0, D2, ..., D14, iPSC)
- `genes.txt`: list of all genes
- `cells.tsv`: list of barcodes that pass QC across samples. Contains:
- `barcode_sample`: barcode with index of sample (1-9 corresponding to D0, D2, ..., D14, iPSC)
- `sample`: sample name (D0, D2, .., D14, iPSC)
- `umap1`
- `umap2`
- `nCount_RNA`
- `nFeature_RNA`
- `cluster`
- `percent.mt`: percent of mitochondrial transcripts in cell
- `percent.oskm`: percent of OSKM transcripts in cell
- `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt`
- `pca.tsv`: first 50 PC of each cell
- `oskm_endo_sendai.tsv`: estimated raw counts (cts, may not be integers) and log(1+ tp10k) normalized expression (norm) for endogenous and exogenous (Sendai derived) counts of POU5F1 (OCT4), SOX2, KLF4 and MYC genes. Rows are consistent with `seurat.rds` and `cells.tsv`
multiome.zip
multiome/snATAC:
These files are derived from the integration of nuclei from multiome (D1M and D2M), with cells from day 2 of scATAC-seq (labeled D2).
- `cells.tsv`: This is the list of nuclei barcodes that pass QC from multiome AND also cell barcodes from D2 of scATAC-seq. Includes:
- `barcode`
- `umap1`: These are the coordinates used for the figures involving multiome in the paper.
- `umap2`: ^^^
- `sample`: D1M and D2M correspond to multiome, D2 corresponds to day 2 of scATAC-seq
- `cluster`: For multiome barcodes, these are labels transfered from scATAC-seq. For D2 scATAC-seq, it is the original cluster labels.
- `peaks.bed`: This is the same file as scATAC/peaks.bed. List of peaks of 500bp. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA.
- `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed`.
- `features.no.harmony.50d.tsv`: 50 dimensional representation of each cell prior to running Harmony (to correct for batch effect between D2 scATAC and D1M,D2M snMultiome). Rows correspond to cells from `cells.tsv`.
- `features.harmony.10d.tsv`: 10 dimensional representation of each cell after running Harmony. Rows correspond to cells from `cells.tsv`.
multiome/snRNA:
- `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca),associated metadata. Note that barcode suffix (1,2 corresponds to samples D1M, D2M). Please use the UMAP/features from snATAC/ for consistency.
- `genes.txt`: list of all genes (this is different from the list in scRNA analysis)
- `cells.tsv`: list of barcodes that pass QC across samples. Contains:
- `barcode_sample`: barcode with index of sample (1,2 corresponding to D1M, D2M respectively)
- `sample`: sample name (D1M, D2M)
- `nCount_RNA`
- `nFeature_RNA`
- `percent.oskm`: percent of OSKM genes in cell
- `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt`
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data repository contains several datasets supplementing the paper “Dissecting the cellular architecture of neuroblastoma bone marrow metastasis using single-cell transcriptomics and epigenomics unravels the role of monocytes at the metastatic niche” by Fetahu, Esser-Skala, Dnyansagar et al. (2023).
HOMER_Results.zip: detailed results of the HOMER analysis
nblast_scopen_gene_activity_normalized_motifs_added.rds: Seurat object with scATAC-seq data
snp_array.tgz: SNP array data
R_data_generated.tgz: Files generated by the scRNA-seq analysis scripts in the GitHub repository associated with the publication.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets for ShinyCell2 Example Applications, which include:
spatial_brain.rds: Example spatial transcriptomics dataset of sagital mouse brain slices generated using the 10x Visium v1 chemistry, processed using the Seurat spatial pipeline (https://satijalab.org/seurat/articles/spatial_vignette)
multimodal_pbmc.rds: Example CITE-seq dataset of PBMC reference containing 162,000 PBMC cells measured with 228 antibodies (https://satijalab.org/seurat/articles/multimodal_reference_mapping.html)
ArchR-ProjHeme.tar.gz: Example scATAC-seq dataset of bone marrow and peripheral blood mononuclear cells, which is used as the tutorial dataset for the ArchR pipeline (https://www.archrproject.com/articles/Articles/tutorial.html). As ArchR objects are stored in a directory containing many files, the entire folder is tarred and compressed here.
signac_pbmc.rds: Example scATAC-seq dataset of PBMC provided by 10x Genomics, which is used as the tutorial dataset for the signac pipeline (https://stuartlab.org/signac/articles/pbmc_vignette.html). Signac objects store the full list of all unique fragments across all single cells in a separate fragment file, uploaded as signac_pbmc_fragments.tsv.gz here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset generated by our Microwell-seq 3.0 technique.
Files: In order to save space, we've packaged our data into tar.gz format. Please unzip the files once you've successfully downloaded. RNA_WT_RData.tar.gz: Seurat object along with a metadata including cell barcodes, tissue source & cell type annotation, could be loaded into R environment and used directly. RNA_Tumor_RData.tar.gz: Seurat object along with a metadata including cell barcodes, tissue source, cell type annotation & potential cell state prediction(neoplastic, intermediate & non-neoplastic), could be loaded into R environment and used directly. RNA_WT_Dge.tar.gz: Digital Expression data (in .csv format) generated by Drop-seq tools, with batch effect removed by customed scripts. RNA_Tumor_Dge.tar.gz : Digital Expression data(in .csv format) generated by Drop-seq tools, with batch effect removed by customed scripts. ATAC_WT_SparseMatrix.tar.gz: scATAC-seq data in 10X-like format(matrix.mtx, barcodes.csv, features.csv), along with a metadata including cell barcodes, tissue source & cell type annotation. ATAC_Tumor_SparseMatrix.tar.gz: scATAC-seq data in 10X-like format(matrix.mtx, barcodes.csv, features.csv), along with a metadata including cell barcodes, tissue source, cell type annotation & potential cell state prediction(neoplastic, intermediate & non-neoplastic).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PBMC samples for CITE-seq and ASAP-seq were collected at four time points: immediately before (Day 0) vaccination, after primary vaccination (Day 2, Day 10), and seven days after boost vaccination (Day 28).
The datasets uploaded here are three processed single-cell datasets:
1. PBMC_vaccine_CITE.rds: 3' RNA and surface proteins (173 TotalSeq-A antibodies)
2. PBMC_vaccine_ASAP.rds: Chromatin accessibility and surface proteins (173 TotalSeq-A antibodies)
3. PBMC_vaccine_ECCITE_TCR.rds: 5' RNA, surface proteins (137 TotalSeq-C antibodies), TCR and dextramer loaded with peptides of SARS-CoV-2 spike protein.
antigen_module_genes.rds: This file contains the vaccine-induced gene sets.
antigen_module_peaks.rds: This file contains the DE peaks specific for vaccine-induced cells.
To map the scRNA-seq query dataset onto our CITE-seq reference:
library(Seurat)
PBMC_CITE <- readRDS("/zenedo/PBMC_vaccine_CITE.rds")
query_scRNA <- readRDS("/home/xx/your_own_data.rds")
anchors <- FindTransferAnchors(
reference = PBMC_CITE,
query = query_scRNA,
normalization.method = "SCT",
k.anchor = 5,
reference.reduction = "spca",
dims = 1:50)
query_scRNA <- MapQuery(
anchorset = anchors,
query = query_scRNA,
reference = PBMC_CITE,
refdata = list(
l1 = "celltypel1",
l2 = "celltypel2",
l3 = "celltypel3"),
reference.reduction = "spca",
reduction.model = "wnn.umap")
To use the scATAC-seq data, please run the commands below to update the path of the fragment file for the object.
Vaccine_ASAP <- readRDS("PBMC_vaccine_ASAP.rds")
# remove fragment file information
Fragments(Vaccine_ASAP) <- NULL
# Update the path of the fragment file
Fragments(Vaccine_ASAP) <- CreateFragmentObject(path = "download/PBMC_vaccine_ASAP_fragments.tsv.gz", cells = Cells(Vaccine_ASAP))
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary data for the manuscript: Inferring and perturbing cell fate regulomes in human cerebral organoids.
Note: Due to the size limit of the repository, this update doesn't include all the relevant data. The following data are available via the same repository but different versions:
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Seurat object (.rds format) for a single-cell ATAC-seq dataset of hematopoietic stem and progenitor cells. It includes 4 samples:controlDKO (Reg1–/–, Reg3–/–)Nfkbiz–/–TKO DKO (Reg1–/–, Reg3–/– Nfkbiz–/–)Data was processed using Seurat and Signac. For more details we refer to the accompanying GitHub repository. In brief, we normalized the data, conducted linear and non-linear dimensionality reduction, clustered cells, calculated "gene activities", and added motif information to the Seurat object.A link to the accompanying paper will be added here after publication.