48 datasets found

f
Seurat object with cell type annotation and UMAP coordinates for zebrafish...
figshare.com
application/gzip
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gangcai Xie (2024). Seurat object with cell type annotation and UMAP coordinates for zebrafish testis single cell RNA sequencing datasets [Dataset]. http://doi.org/10.6084/m9.figshare.27922725.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27922725.v1
Dataset updated
Nov 28, 2024
Dataset provided by
figshare
Authors
Gangcai Xie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the Seurat object in .rds format with the raw matrix information (after filtering) , cell type annotation information and the UMAP coordinates. Users can use R readRDS function to load this .rds file. If you are using this dataset, please cite our paper: Qian, Peipei, Jiahui Kang, Dong Liu, and Gangcai Xie. "Single cell transcriptome sequencing of Zebrafish testis revealed novel spermatogenesis marker genes and stronger Leydig-germ cell paracrine interactions." Frontiers in genetics 13 (2022): 851719.
Data used in SeuratIntegrate paper
zenodo.org
application/gzip, bin +2
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Specque; Florian Specque; Macha Nikolski; Macha Nikolski; Domitille Chalopin; Domitille Chalopin (2025). Data used in SeuratIntegrate paper [Dataset]. http://doi.org/10.5281/zenodo.15496601
Explore at:
bin, pdf, txt, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15496601
Dataset updated
May 23, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Florian Specque; Florian Specque; Macha Nikolski; Macha Nikolski; Domitille Chalopin; Domitille Chalopin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository gathers the data and code used to generate hepatocellular carcinoma analyses in the paper presenting SeuratIntegrate. It contains the scripts to reproduce the figures presented in the article. Some figures are also available as pdf files.

To be able to fully reproduce the results from the paper, one shoud:

download all the files

install R 4.3.3, with correspondig base R packages (stats, graphics, grDevices, utils, datasets, methods and base)

install R packages listed in the file sessionInfo.txt

install the provided version of SeuratIntegrate. In an R session, run:

remotes::install_local("path/to/SeuratIntegrate_0.4.1.tar.gz")

install (mini)conda if necessary (we used miniconda version 23.11.0)

install the conda environments (if it fails with the *package-list.yml files, use the *package-list-from-history.yml files instead):

conda env create --file SeuratIntegrate_bbknn_package-list.yml conda env create --file SeuratIntegrate_scanorama_package-list.yml conda env create --file SeuratIntegrate_scvi-tools_package-list.yml conda env create --file SeuratIntegrate_trvae_package-list.yml

open an R session to make the conda environments usable by SeuratIntegrate:

library(SeuratIntegrate) UpdateEnvCache("bbknn", conda.env = "SeuratIntegrate_bbknn", conda.env.is.path = FALSE) UpdateEnvCache("scanorama", conda.env = "SeuratIntegrate_scanorama", conda.env.is.path = FALSE) UpdateEnvCache("scvi", conda.env = "SeuratIntegrate_scvi-tools", conda.env.is.path = FALSE) UpdateEnvCache("trvae", conda.env = "SeuratIntegrate_trvae", conda.env.is.path = FALSE)

Once done, running the code in integrate.R should produce reproducible results. Note that lines 3 to 6 from integrate.R should be adapted to the user's setup.
integrate.R is subdivided into six main parts:

Preparation: lines 1-56

Preprocessing: lines 58-74

Integration: lines 76-121

Processing of integration outputs: lines 126-267

Scoring of integration outputs: lines 269-353

Plotting: lines 380-507

Intermediate SeuratObjects have been saved between steps 3 and 4 and 5 and 6 (liver10k_integrated_object.RDS and liver10k_integrated_scored_object.RDS respectively). It is possible to start with these intermediate SeuratObjects to avoid the preceding steps, given that the Preparation step is always run before.
l
cellCounts
opal.latrobe.edu.au
researchdata.edu.au
bin
Updated Dec 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yang Liao; Dinesh Raghu; Bhupinder Pal; Lisa Mielke; Wei Shi (2022). cellCounts [Dataset]. http://doi.org/10.26181/21588276.v3
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.26181/21588276.v3
Dataset updated
Dec 19, 2022
Dataset provided by
La Trobe
Authors
Yang Liao; Dinesh Raghu; Bhupinder Pal; Lisa Mielke; Wei Shi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This page includes the data and code necessary to reproduce the results of the following paper: Yang Liao, Dinesh Raghu, Bhupinder Pal, Lisa Mielke and Wei Shi. cellCounts: fast and accurate quantification of 10x Chromium single-cell RNA sequencing data. Under review. A Linux computer running an operating system of CentOS 7 (or later) or Ubuntu 20.04 (or later) is recommended for running this analysis. The computer should have >2 TB of disk space and >64 GB of RAM. The following software packages need to be installed before running the analysis. Software executables generated after installation should be included in the $PATH environment variable.

R (v4.0.0 or newer) https://www.r-project.org/ Rsubread (v2.12.2 or newer) http://bioconductor.org/packages/3.16/bioc/html/Rsubread.html CellRanger (v6.0.1) https://support.10xgenomics.com/single-cell-gene-expression/software/overview/welcome STARsolo (v2.7.10a) https://github.com/alexdobin/STAR sra-tools (v2.10.0 or newer) https://github.com/ncbi/sra-tools Seurat (v3.0.0 or newer) https://satijalab.org/seurat/ edgeR (v3.30.0 or newer) https://bioconductor.org/packages/edgeR/ limma (v3.44.0 or newer) https://bioconductor.org/packages/limma/ mltools (v0.3.5 or newer) https://cran.r-project.org/web/packages/mltools/index.html

Reference packages generated by 10x Genomics are also required for this analysis and they can be downloaded from the following link (2020-A version for individual human and mouse reference packages should be selected): https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest After all these are done, you can simply run the shell script ‘test-all-new.bash’ to perform all the analyses carried out in the paper. This script will automatically download the mixture scRNA-seq data from the SRA database, and it will output a text file called ‘test-all.log’ that contains all the screen outputs and speed/accuracy results of CellRanger, STARsolo and cellCounts.
o
Data from: A Single-Cell Tumor Immune Atlas for Precision Oncology
explore.openaire.eu
Updated Sep 21, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paula Nieto (2020). A Single-Cell Tumor Immune Atlas for Precision Oncology [Dataset]. http://doi.org/10.5281/zenodo.4036019
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4036019
Dataset updated
Sep 21, 2020
Authors
Paula Nieto
Description
Publication version of the Single-Cell Tumor Immune Atlas This upload contains: TICAtlas.rds: an rds file containing a Seurat object with the whole Atlas TICAtlas.h5ad: an h5ad file with the whole Atlas TICAtlas_downsampled.rds: an rds file containing a downsampled version of the Seurat object of the whole Atlas TICAtlas_downsampled.h5ad: an rds file containing a downsampled version of the Seurat object of the whole Atlas TICAtlas_metadata.csv: a comma-separated text file with the metadata for each of the cells All the files contain the following patient/sample metadata variables: patient: assigned patient identifiers nCountRNA and nFeatureRNA: number of UMIs and genes per cell percent.mt: percentage of mitochondrial genes gender: the patient's gender (male/female/unknown) source: dataset of origin subtype: cancer type (abbreviations as indicated in the preprint) kmeans_cluster: patients clusters, NA if filtered out before clustering lv1 and lv2: annotated cell type for each of the cells, two level annotation (lv2 has more cell types) If you have any issues with the metadata (i.e. unexpected factors, NA values...) you can use the TICAtlas_metadata.csv file. For more information, read our paper, check our GitHub and our ShinyApp. h5ad files can be read with Python using Scanpy, rds files can be read in R using Seurat. For format conversion between AnnData and Seurat we recommend SeuratDisk. For other single-cell data formats you can use sceasy.
Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset
zenodo.org
data.niaid.nih.gov
bin, txt
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Hsu; Allart Stoop; Jonathan Hsu; Allart Stoop (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. http://doi.org/10.5281/zenodo.10011622
Explore at:
bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10011622
Dataset updated
Nov 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jonathan Hsu; Allart Stoop; Jonathan Hsu; Allart Stoop
Description
Table of Contents
Main Description
File Descriptions
Linked Files
Installation and Instructions

1. Main Description
---------------------------
This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled `marengo_code_for_paper_jan_2023.R` was used to generate the figures from the single-cell RNA sequencing data.
The following libraries are required for script execution:
Seurat
scReportoire
ggplot2
stringr
dplyr
ggridges
ggrepel
ComplexHeatmap

File Descriptions
---------------------------
The code can be downloaded and opened in RStudios.
The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper
The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113).
The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots.
The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

Linked Files
---------------------

This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data.
Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
Description: This submission contains the **raw sequencing** or `.fastq.gz` files, which are tab delimited text files.
Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)
Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.
Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code.
Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

Installation and Instructions
--------------------------------------
The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

> Ensure you have R version 4.1.2 or higher for compatibility.

> Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

1. Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).
2. Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.
3. Set your working directory to where the following files are located:
marengo_code_for_paper_jan_2023.R
Install_Packages.R
Marengo_newID_March242023.rds
genes_for_heatmap_fig5F.xlsx
all_res_deg_for_heat_updated_march2023.txt

You can use the following code to set the working directory in R:
> setwd(directory)

4. Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.
5. Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.
6. Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.
7. Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
f
Uehata et al. single-cell ATAC-seq dataset of hematopoietic stem and...
figshare.com
application/gzip
Updated Aug 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexis Vandenbon (2023). Uehata et al. single-cell ATAC-seq dataset of hematopoietic stem and progenitor cells [Dataset]. http://doi.org/10.6084/m9.figshare.24040575.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24040575.v1
Dataset updated
Aug 28, 2023
Dataset provided by
figshare
Authors
Alexis Vandenbon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A Seurat object (.rds format) for a single-cell ATAC-seq dataset of hematopoietic stem and progenitor cells. It includes 4 samples:controlDKO (Reg1–/–, Reg3–/–)Nfkbiz–/–TKO DKO (Reg1–/–, Reg3–/– Nfkbiz–/–)Data was processed using Seurat and Signac. For more details we refer to the accompanying GitHub repository. In brief, we normalized the data, conducted linear and non-linear dimensionality reduction, clustered cells, calculated "gene activities", and added motif information to the Seurat object.A link to the accompanying paper will be added here after publication.
Z
Single-cell Atlas Reveals Diagnostic Features Predicting Progressive Drug...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Sep 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meera Makheja (2023). Single-cell Atlas Reveals Diagnostic Features Predicting Progressive Drug Resistance in Chronic Myeloid Leukemia [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5118610
Explore at:
Dataset updated
Sep 7, 2023
Dataset provided by
Vaidehi Krishnan
Meera Makheja
Charles Chuah
Chan Zhu En
Pavanish Kumar
Lee Kian Leong
John Ouyang
Prasanna Nori Venkatesh
Alice Man Sze Cheung
Shyam Prabhakar
Zahid Nawaz
Owen Rackham
Sudipto Bari
Salvatore Albani
William Ying Khee Hwang
Sin Tiong Ong
Ahmad Lajam
Florian Schmidt
Description
This archive contains data of scRNAseq and CyTOF in form of Seurat objects, txt and csv files as well as R scripts for data analysis and Figure generation.

A summary of the content is provided in the following.

R scripts

Script to run Machine learning models predicting group specific marker genes: CML_Find_Markers_Zenodo.R Script to reproduce the majority of Main and Supplementary Figures shown in the manuscript: CML_Paper_Figures_Zenodo.R Script to run inferCNV analysis: inferCNV_Zenodo.R Script to plot NATMI analysis results:NATMI_CvsA_FC0.32_Updown_Column_plot_Zenodo.R Script to conduct sub-clustering and filtering of NK cells NK_Marker_Detection_Zenodo.R

Helper scripts for plotting and DEG calculation:ComputePairWiseDE_v2.R, Seurat_DE_Heatmap_RCA_Style.R

RDS files

General scRNA-seq Seurat objects:

scRNA-seq seurat object after QC, and cell type annotation used for most analysis in the manuscript: DUKE_DataSet_Doublets_Removed_Relabeled.RDS

scRNA-seq including findings e.g. from NK analysis used in the shiny app: DUKE_final_for_Shiny_App.rds

Neighborhood enrichment score computed for group A across all HSPCs: Enrichment_score_global_groupA.RDS

UMAP coordinates used in the article: Layout_2D_nNeighbours_25_Metric_cosine_TCU_removed.RDS

SCENIC files:

Regulon set used in SCENIC: 2.6_regulons_asGeneSet.Rds

AUC values computed for regulons: 3.4_regulonAUC.Rds

MetaData used in SCENIC cellInfo.Rds

Group specific regulons for LCS: groupSpecificRegulonsBCRAblP.RDS

Patient specific regulons for LSC: patientSpecificRegulonsBCRAblP.RDS

Patient specificity score for LSC: PatientSpecificRegulonSpecificityScoreBCRAblP.RDS

Regulon specificty score for LSC: RegulonSpecificityScoreBCRAblP.RDS

BCR-ABL1 inference:

HSC with inferred BCR-ABL1 label: HSCs_CML_with_BCR-Abl_label.RDS

UMAP for HSC with inferred BCR-ABL1 label: HSCs_CML_with_BCR-Abl_label_UMAP.RDS

HSPCs with BCR-ABL1 module scores: HSPC_metacluster_74K_with_modscore_27thmay.RDS

NK sub-clustering and filtering:

NK object with module scores: NK_8617cells_with_modscore_1stjune.RDS

Feature genes for NK cells computed with DubStepR: NK_Cells_DubStepR

NK cells Seurat object excluding contaminating T and B cells: NK_cells_T_B_17_removed.RDS

NK Seurat object including neighbourhood enrichment score calculations: NK_seurat_object_with_enrichment_labels_V2.RDS

txt and csv files:

Proportions per cluster calculated from CyTOF: CyTOF_Proportions.txt

Correlation between scRNAseq and CyTOF cell type abundance: scRNAseq_Cor_Cytof.txt

Correlation between manual gating and FlowSOM clustering: Manual_vs_FlowSOM.txt

GSEA results:

HSPC, HSC and LSC results: FINAL_GSEA_DATA_For_GGPLOT.txt

NK: NK_For_Plotting.txt

TFRC and HLA expression: TFRC_and_HLA_Values.txt

NATMI result files:

UP-regulated_mean.csv

DOWN-regulated_mean.csv

Gene position file used in inferCNV: inferCNV_gene_positions_hg38.txt

Module scores for NK subclusters per cell: NK_Supplementary_Module_Scores.csv

Compressed folders:

All CyTOF raw data files: CyTOF_Data_raw.zip

Results of the patient-based classifier: PatientwiseClassifier.zip

Results of the single-cell based classifier: SingleCellClassifierResults.zip

For general new data analysis approaches, we recommend the readers to use the Seruat object stored in DUKE_final_for_Shiny_App.rds or to use the shiny app(http://scdbm.ddnetbio.com/) and perform further analysis from there.

RAW data is available at EGA upon request using Study ID: EGAS00001005509

Revision

The for_CML_manuscript_revision.tar.gz folder contains scripts and data for the paper revision including 1) Detection of the BCR-ABL fusion with long read sequencing; 2) Identification of BCR-ABL junction reads with scRNAseq; 3) Detection of expressed mutations using scRNAseq.
Data from: A single-cell atlas characterizes dysregulation of the bone...
zenodo.org
Updated Jan 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William Pilcher; William Pilcher (2025). A single-cell atlas characterizes dysregulation of the bone marrow immune microenvironment associated with outcomes in multiple myeloma [Dataset]. http://doi.org/10.5281/zenodo.14624955
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.14624955
Dataset updated
Jan 14, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
William Pilcher; William Pilcher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 8, 2024
Description
This repository contains R Seurat objects associated with our study titled "A single-cell atlas characterizes dysregulation of the bone marrow immune microenvironment associated with outcomes in multiple myeloma".

Single cell data contained within this object comes from MMRF Immune Atlas Consortium work.

The .rds files contains a Seurat object saved with version 4.3. This can be loaded in R with the readRDS command.

Two .RDS files are included in this version of the release.

Discovery object: MMRF_ImmuneAtlas_Full_With_Corrected_Censored_Metadata.rds contains all aliquots belonging to the 'discovery' cohort as used in the initial paper. This represents the dataset used for initial clustering, cell annotation, and analysis.

Discovery + Validation object: COMBINED_VALIDATION_MMRF_ImmuneAtlas_Full_Censored_Metadata.rds contains both aliquots belonging to the initial 'discovery' cohort, and aliquots belonging to the 'validation' cohort. The group each cell is derived from is listed under the 'cohort' variable. Labels related to cell annotation, including doublet status, are derived from a label transfer process as described in the paper. Labels for the original 'discovery' cohort are unchanged. UMAPs have been reconstructed with both the discovery and validation cohorts integrated.

--

The discovery object contains two assays:

"RNA" - The raw count matrix

"RNA_Batch_Corrected" - Counts adjusted for the combination of 'Study_Site' and 'Batch'.

Analysis should prefer the original RNA assay, unless using pipelines which does not support adjusting for technical covariates.

Currently, the validation object only includes the uncorrected RNA assay.

--

The object contains two umaps in the reduction slot:

umap - will render the UMAP for the full object with all cells.

umap.sub -contains the UMAP embeddings for individual 'compartments', as indicated by 'subcluster_V03072023_compartment'

--

Each sample has three different identifiers:

public_id

Indicates a specific patient (n=263).

MMRF_####

This is a standard identifier which is used across all MMRF CoMMpass datasets

public_ids can map to multiple d_visit_specimen_ids and aliquot_ids

As of now, all public_ids have a single sample collected at Baseline.

This can be accessed by filtering for 'collection_event' %in% c("Baseline", "Screening") or VJ_INTERVAL == 'Baseline'

d_visit_specimen_id

Indicates a specific visit by a patient (n=358)

MMRF_####_Y

Y is a number indicate that this is the 'Y' sample obtained from said patient. This does not correspond to a specific timepoint.

This is a standard identifier, which is used across all MMRF CoMMpass datasets

The purpose of the visit is indicated in 'collection_event' (Baseline, Relapse, Remmission, etc.). The approximate interval the visit corresponds to is in "VJ_INTERVAL"

d_visit_specimen_id uniquely maps to one public_id

d_visit_specimen_id can map to multiple aliquot_ids

aliquot_id

Refers to the specific bone marrow aliquot sample processed (n=361)

MMRFA-######

This is a unique identifier for each processed scRNA-seq sample.

As of now, this uniquely maps to a combination of d_visit_specimen_id, Study_Site, and Batch

As of now, is an identifier specific to the MMRF ImmuneAtlas

Each cell has the following annotation information:

subcluster_V03072023

These refer to an individual cluster derived from 'Seurat'.

Format is 'Compartment'.'Compartment-cluster'.'Compartment-subcluster'

'NkT.2.2', indicates this cell is in the 'Natural Killer + T Cell compartment', was originally part of 'Cluster 2', and then was further separated into a refined subcluster 2.2'

If a parent cluster did not need to be further seprated, the 'Compartment-subcluster' part is omitted (e.g., 'NkT.6')

As of now, this uniquely maps to a specific cellID_short annotation.

Clustering was done on a per compartment basis

For most immune cell types, clustering was based on embeddings corrected for 'siteXbatch'. For Plasma, clustering was performed on embeddings corrected on a per-sample basis.

In the combined validation object, DISCOVERY.subcluster_V03072023 will contain values only for the discovery cohort, and have NA values for validation samples.

subcluster_V03072023_compartment

These refer to one of five major compartments as identified roughly on the original UMAP. Clustering was performed on a per-compartment basis following a first pass rough annotation.

The possible compartments are

NkT (T cell + Natural Killer Cells)

Myeloid (Monocytes, Macrophages, Dendritic cells, Neutrophil/Granulocyte populations)

BEry (B Cell, Erythroblasts, bone marrow progenitor populations, pDCs)

Ery (Erythrocyte population)

Plasma (Plasma cell populations)

Each compartment has it's own UMAP generated, which can be accessed in the 'umap.sub' reduction

One cluster was isolated from all other populations, and was not assigned to a compartment. This cluster is labeled as 'Full.23'.

In the combined validation object, DISCOVERY.subcluster_V03072023_compartment will contain values only for the discovery cohort, and have NA values for validation samples.

cellID_short

This is the individual annotation for each cluster.

Please see the 'Cell Population Annotation Dictionary' for further details.

If different seurat clusters were assigned similar annotations, the celltype annotation will be appended with a distinct cluster gene, or with '_b', '_c'

lineage_group

This is an annotation driven grouping of clusters into major immune populations, as shown in Figure 2.

This includes "CD8", "CD4", "M" (Myeloid), "B" (B cell), "E" (Erythroid), "P" (Plasma), "Other" (HSC, Fibro, pDC_a), "LQ" (Doublet)

isDoublet

This is a binary 'True' or 'False' derived from manual review of clusters following doublet analysis, as described in the paper.

True indicates the cluster was determined to be a doublet population.

This is derived from 'doublet_pred', in which 'dblet_cluster' and 'poss_dblet_cluster' were flagged as doublet populations for subsequent analysis.

In the validation object, the doublet status of new samples were inferred by if label transfer from the discovery cohort mapped the cell from the new sample as one of the previously identified doublet populations. The raw doublet scores from doublet finder, pegasus, or scrublet, are not included in this release.

--

Each sample has the following information indicating shipment batches, for batch correction

Study_Site

The center which processed a specific aliquot_id

EMORY, MSSM, WashU, MAYO

Batch

The shipment batch the sample was associated with

Valued 1 to 3 for EMORY, MSSM, MAYO, and 1 to 4 for WashU

siteXbatch

A combination of the above to variables, to be used for batch correction

(Combined Validation Object only): cohort

Indicates if the sample was involved in the 'discovery' cohort, or 'validation' cohort. Samples in the 'validation' cohort will have labels inferred from label mapping

--

Each public_id has limited demographic information based on publicly available information in the MMRF CoMMpass study.

d_pt_sex

Patient sex (not self-identified). Male or Female

d_pt_race_1

Patient self-identified race

d_pt_ethnicity

Patient self-identified ethnicity

d_dx_amm_age

Patient age at diagnosis.

Not reported for patients above 90 at diagnosis

d_dx_amm_bmi

Patient BMI at diagnosis

d_pt_height_cm

Patient height at diagnosis, in centimeters.

d_dx_amm_weight_kg

Patient weight at diagnosis, in kilograms

d_specimen_visit_id contains two data points providing limited information about the visit

collection_event

Description of why the sample was collected

e.g., 'Baseline' and 'Screening' indicates the sample was obtained prior to therapy

'Relapse/Progression' indicates the sample was collected due to disease progression based on clinical assessment

'Remission/Response' indicates the sample was collected due to patient entering remission based on clinical assessment

Samples may be collected for reasons independent of the above, such as 'Pre' or 'Post' ASCT, or for other unspecified reasons

VJ_INTERVAL

Indicates the rough interval following start of therapy the sample is assigned to

"Baseline", "Month 3", "Year 2", etc.

All the single-cell raw data, along with outcome and cytogenetic information, is available at MMRF’s VLAB shared resource. Requests to access these data will be reviewed by data access committee at MMRF and any data shared will be released under a data transfer agreement that will protect the identities of patients involved in the study. Other information from the CoMMpass trial can also generally be
f
Droplet-based, high-throughput single cell transcriptional analysis of adult...
figshare.com
Updated Mar 6, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarthak Sinha; Jo Anne Stratton (2019). Droplet-based, high-throughput single cell transcriptional analysis of adult mouse tissue using 10X Genomics' Chromium Single Cell 3' (v2) system: From tissue preparation to bioinformatic analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6626927.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.6626927.v1
Dataset updated
Mar 6, 2019
Dataset provided by
figshare
Authors
Sarthak Sinha; Jo Anne Stratton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The attached R Scripts supplement our protocol paper currently under editorial review at the Journal of Visualized Experiments.Scope of the article:This protocol describes the general processes and quality control checks necessary for preparing healthy adult single cells in preparation for droplet-based, high-throughput single cell RNA-Seq analysis using the 10X Genomics' Chromium System. We also describe sequencing parameters, alignment and downstream single-cell bioinformatic analysis.
n
scRNA data from: Organization of the human Intestine at single cell...
data.niaid.nih.gov
datadryad.org
zip
Updated Feb 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Winston Becker (2023). scRNA data from: Organization of the human Intestine at single cell resolution [Dataset]. http://doi.org/10.5061/dryad.8pk0p2ns8
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.8pk0p2ns8
Dataset updated
Feb 24, 2023
Dataset provided by
Stanford University
Authors
Winston Becker
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The human adult intestinal system is a complex organ that is approximately 9 meters long and performs a variety of complex functions including digestion, nutrient absorption, and immune surveillance. We performed snRNA-seq on 8 regions of of the human intestine (duodenum, proximal-jejunum, mid-jejunum, ileum, ascending colon, transverse colon, descending colon, and sigmoid colon) from 9 donors (B001, B004, B005, B006, B008, B009, B010, B011, and B012). In the corresponding paper, we find cell compositions differ dramatically across regions of the intestine and demonstrate the complexity of epithelial subtypes. We map gene regulatory differences in these cells suggestive of a regulatory differentiation cascade, and associate intestinal disease heritability with specific cell types. These results describe the complexity of the cell composition, regulation, and organization in the human intestine, and serve as an important reference map for understanding human biology and disease. Methods For a detailed description of each of the steps to obtain this data see the detailed materials and methods in the associated manuscript. Briefly, intestine pieces from 8 different sites across the small intestine and colon were flash frozen. Nuclei were isolated from each sample and the resulting nuclei were processed with either 10x scRNA-seq using Chromium Next GEM Single Cell 3’ Reagent Kits v3.1 (10x Genomics, 1000121) or Chromium Next GEM Chip G Single Cell Kits (10x Genomics, 1000120) or 10x multiome sequencing using Chromium Next GEM Single Cell Multiome ATAC + Gene Expression Kits (10x Genomics, 1000283). Initial processing of snRNA-seq data was done with the Cell Ranger Pipeline (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger) by first running cellranger mkfastq to demultiplex the bcl files and then running cellranger count. Since nuclear RNA was sequenced, data were aligned to a pre-mRNA reference. Initial processing of the mutiome data, including alignment and generation of fragments files and expression matrices, was performed with the Cell Ranger ARC Pipeline. The raw expression matrices from these pipelines are included here. Downstream processing was performed in R, using the Seurat package.
scRNA-seq Trajectory inference.
kaggle.com
Updated Aug 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Chervov (2022). scRNA-seq Trajectory inference. [Dataset]. https://www.kaggle.com/datasets/alexandervc/trajectory-inference-single-cell-rna-seq/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Alexander Chervov
Description
Remark: For trajectory inference discussion for that dataset, see paper: https://www.mdpi.com/1099-4300/22/11/1274 "Minimum Spanning vs. Principal Trees for Structured Approximations of Multi-Dimensional Datasets Alexander Chervov, Jonathan Bac and Andrei Zinovyev

For cell cycle analysis see: https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev

Data and Context

Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package

Particular data: Gene expressions count matrix. Single cell RNA sequencing data. 447 cells , 24748 genes Mouse Liver Hepatoblast in vivo.

Paper: Hepatology. 2017 Nov;66(5):1387-1401. doi: 10.1002/hep.29353. Epub 2017 Sep 29. A single-cell transcriptomic analysis reveals precise pathways and regulatory mechanisms underlying hepatoblast differentiation Li Yang 1 2 , Wei-Hua Wang 1 2 , Wei-Lin Qiu 1 3 , Zhen Guo 1 , Erfei Bi 4 , Cheng-Ran Xu 1

Data: GSE90047 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE90047 Downloaded from: https://cytotrace.stanford.edu/#shiny-tab-dataset_download

Related datasets:

Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"

Inspiration

Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles

(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833

Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)

Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)

Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell
Single cell T cell atlas
zenodo.org
bin, csv
Updated Jul 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kerry A Mullan; Kerry A Mullan (2024). Single cell T cell atlas [Dataset]. http://doi.org/10.5281/zenodo.12569981
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12569981
Dataset updated
Jul 27, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kerry A Mullan; Kerry A Mullan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The attached datasets comprised of the merging of 12 high quality single cell T cell based dataset that had both the TCR-seq and GEx. The object contains ~500K paired TCR-seq with GEx in the Seurat Object (supercluster_added_ID-240531.rds). We also included the original identifiers in the Sup_Update_labels.csv a. See our https://stegor.readthedocs.io/en/latest/ for how we processed the 12 datasets and decided on the current 47 T cell annotation models using scGate.

This is the accompanying data set for the paper entitled ‘T cell receptor-centric approach to streamline multimodal single-cell data analysis.’, which is currently available as a preprint (https://www.biorxiv.org/content/10.1101/2023.09.27.559702v2). Details on the origin of the datasets, and processing steps can be found there.

The purpose of this atlas both the full dataset and down sampling version is to aid in improving the interpretability of other T cell based datasets. This can be done by adding in the down sampled object that contains up to 500 cells per annotation model or all 12 dataset to your new sample. This dataset aims to improve the capacity to identify TCR-specific signature by ensuring a well covered background, which will improve the robustness of the FindMarker Function in Seurat package.
seurat.wnn.peak.rds
figshare.com
application/gzip
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liran Mao (2024). seurat.wnn.peak.rds [Dataset]. http://doi.org/10.6084/m9.figshare.27265410.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27265410.v1
Dataset updated
Oct 21, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Liran Mao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the data necessary to reproduce the results from the SpatialMuxSeq vignette (https://rpubs.com/LiranM/SpatialMuxSeq), featured in our paper "Multiplexed Spatial Mapping of Chromatin Features, Transcriptome, and Proteins in Tissues." To ensure full reproducibility of the results, we have provided a Seurat object that includes all omics layers. For further details and access to all relevant code, please visit our GitHub repository: https://github.com/liranmao/Spatial_multi_omics.
scRNA-seq MCF10-2A p53 on/off, CENP-A overexpress
kaggle.com
Updated Jul 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Chervov (2022). scRNA-seq MCF10-2A p53 on/off, CENP-A overexpress [Dataset]. https://www.kaggle.com/datasets/alexandervc/scrnaseq-mcf102a-p53-onoff-cenpa-overexpress/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 25, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Alexander Chervov
Description
Remark: See paper: https://arxiv.org/abs/2208.05229 results on cell cycle analysis discussed there. "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev

Data and Context

Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package

Particular data: Paper: CENP-A overexpression promotes distinct fates in human cells, depending on p53 status Daniel Jeffery, Alberto Gatto, Katrina Podsypanina, Charlène Renaud-Pageot, Rebeca Ponce Landete, Lorraine Bonneville, Marie Dumont, Daniele Fachinetti & Geneviève Almouzni https://www.nature.com/articles/s42003-021-01941-5

Data: https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-9861/

Related datasets:

Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"

Inspiration

Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles

(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833

Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)

Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)

Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell
Seurat object created using a scRNAseq dataset derived from malignant cells...
zenodo.org
bin
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adrián Salas-Bastos; Adrián Salas-Bastos (2025). Seurat object created using a scRNAseq dataset derived from malignant cells isolated from BRAF mutant patient-derived xenograft melanoma cohorts exposed to concurrent RAF/MEK-inhibition (Rambow et al., 2018. Cell) [Dataset]. http://doi.org/10.5281/zenodo.14581399
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14581399
Dataset updated
Jan 3, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Adrián Salas-Bastos; Adrián Salas-Bastos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The file is a Seurat object obtained by the re-analysis of the dataset published by Rambow et al., 2018, and is related to the paper "TGFβ signaling sensitizes MEKi-resistant human melanoma to targeted therapy-induced apoptosis" (Loos, Salas-Bastos, Nordin et al. Cell Death Dis 15, 925 (2024). https://doi.org/10.1038/s41419-024-07305-1)
Processed Seurat Objects for Localized Marker Detector (Cluster-Independent...
figshare.com
application/gzip
Updated Jun 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruiqi Li; Peggy Myung (2025). Processed Seurat Objects for Localized Marker Detector (Cluster-Independent Multiscale Marker Identification inSingle-cell RNA-seq Data using Localized Marker Detector) [Dataset]. http://doi.org/10.6084/m9.figshare.26507098.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26507098.v2
Dataset updated
Jun 10, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Ruiqi Li; Peggy Myung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are processed Seurat objects for the biological datasets in Localized Marker Detector (https://github.com/KlugerLab/LocalizedMarkerDetector):Tabular Muris bone marrow dataset (FACS-based and Droplet-based)We used publicly available scRNA-seq mouse bone marrow datasets (FACS and Droplet-based) from the Tabular Muris Consortium, which were already pre-processed and annotated according to their workflow. In addition, we applied ALRA imputation to generate a denoised assay alra and added several cell annotations: (1) Cell cycle annotation using CellCycleScoring with the updated 2019 cell cycle gene set; (2) Module Activity Scores for the gene modules listed in our paper.Mouse embryo skin datasetWe separated dermal cell populations from newly collected mouse embryo skin samples (aligned to the mouse genome mm10 using CellRanger (v.6.1.2)). Cells from the wildtype and SmoM2YFP mutant (SmoM2) for two consecutive days (embryonic day 13.5 and 14.5) were pooled for analysis. To avoid batch effects from pooling or integrating, we analyzed each condition separately: E13.5 SmoM2, E13.5 WT, E14.5 SmoM2, and E14.5 WT. For each condition, we performed standard normalization, selected the top 2,000 highly variable genes, and scaled the data using the Seurat v4 R package. We then applied PCA, retaining the number of PCs determined by the elbow plot: E13.5 SmoM2 (14 PCs), E13.5 WT (12 PCs), E14.5 SmoM2 (12 PCs), and E14.5 WT (11 PCs).
u
Dawnn benchmarking dataset: Simulated linear trajectories processing and...
rdr.ucl.ac.uk
application/gzip
Updated May 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
George Hall; Sergi Castellano Hereza (2023). Dawnn benchmarking dataset: Simulated linear trajectories processing and label simulation [Dataset]. http://doi.org/10.5522/04/22616611.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5522/04/22616611.v1
Dataset updated
May 4, 2023
Dataset provided by
University College London
Authors
George Hall; Sergi Castellano Hereza
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This project is a collection of files to allow users to reproduce the model development and benchmarking in "Dawnn: single-cell differential abundance with neural networks" (Hall and Castellano, under review). Dawnn is a tool for detecting differential abundance in single-cell RNAseq datasets. It is available as an R package here. Please contact us if you are unable to reproduce any of the analysis in our paper. The files in this collection correspond to the benchmarking dataset based on simulated linear trajectories.

FILES: Data processing code

adapted_traj_sim_milo_paper.R Lightly adapted code from Dann et al. to simulate single-cell RNAseq datasets that form linear trajectories . generate_test_data_linear_traj_sim_milo_paper.R R code to assign simulated labels to datatsets generated from adapted_traj_sim_milo_paper.R. Seurat objects saved as cells_sim_linear_traj_gex_seed_*.rds. Simulated labels saved as benchmark_dataset_sim_linear_traj.csv.

Resulting datasets

cells_sim_linear_traj_gex_seed_*.rds Seurat objects generated by generate_test_data_linear_traj_sim_milo_paper.R. benchmark_dataset_sim_linear_traj.csv Cell labels generated by generate_test_data_linear_traj_sim_milo_paper.R.
f
Processed data of single cell RNA-sequencing of 16 NPM1-mutated Acute...
figshare.com
bin
Updated Jun 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emin Onur Karakaslar (2025). Processed data of single cell RNA-sequencing of 16 NPM1-mutated Acute Myeloid Leukemia samples [Dataset]. http://doi.org/10.6084/m9.figshare.26189771.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26189771.v1
Dataset updated
Jun 16, 2025
Dataset provided by
figshare
Authors
Emin Onur Karakaslar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TLDRSeurat object of the 16 NPM1-mutated AML samples (n = 83,162 cells).AML samplesAll sixteen peripheral blood and bone marrow samples were obtained from patients with AML at diagnosis (n=15) or relapse after chemotherapy (n=1) with written informed consent according to the Declaration of Helsinki. Mononuclear cells were isolated by Ficoll-Isopaque density gradient centrifugation and cryopreserved in the Leiden University Medical Center (LUMC) Biobank for Hematological Diseases after approval by the LUMC Institutional Review Board (protocol no. B18.047).Upstream processing pipelineCellRanger v7.0.0 was run on all samples with the human reference genome hg38. For all QC Seurat v4 was used15. Our QC pipeline had three steps per sample: 1) soft filtering, 2) low quality cluster removal, and 3) doublet detection. In soft filtering, Seurat objects were created with cells expressing at least 200 genes and with the genes expressed at least in 3 cells. Then, standard Seurat command list with default parameters was run to detect low quality clusters. Clusters with >15% mitochondrial and 15% mitochondrial mRNA. We used standard Seurat commands to scale and normalize the data on integrated features. First 30 principal components were used to create UMAP plots. We used clustree to determine optimal cluster number, based on FindClusters with resolutions sweeping from 0 to 1.2. We chose res=0.5, as clusters became stable. Next, we merged two clusters (CC5 and CC12) into one GMP-like cluster as one of these clusters (CC12) had high expression of HSP-genes yet still retained its cell-type specific properties.Note: The file was processed with Seurat v4 but the object is updated for v5. Uploaded as .qs file format for faster reading. To read the file: qs:qread("path/to/data.qs")This data is available for research use only; and cannot be used for commercial purposes.For further queries please refer to our paper:
Analysis Products: Transcription factor stoichiometry, motif affinity and...
zenodo.org
tsv, zip
Updated Nov 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Surag Nair; Surag Nair; Mohamed Ameen; Kevin Wang; Kevin Wang; Anshul Kundaje; Anshul Kundaje; Mohamed Ameen (2023). Analysis Products: Transcription factor stoichiometry, motif affinity and syntax regulate single cell chromatin dynamics during fibroblast reprogramming to pluripotency [Dataset]. http://doi.org/10.5281/zenodo.8313962
Explore at:
zip, tsvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8313962
Dataset updated
Nov 11, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Surag Nair; Surag Nair; Mohamed Ameen; Kevin Wang; Kevin Wang; Anshul Kundaje; Anshul Kundaje; Mohamed Ameen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This record contains analysis products for the paper "Transcription factor stoichiometry, motif affinity and syntax regulate single cell chromatin dynamics during fibroblast reprogramming to pluripotency" by Nair, Ameen et al. Please refer to the READMEs in the directories, which are summarized below.

The record contains the following files:

`clusters.tsv`: contains the cluster id, name and colour of clusters in the paper

scATAC.zip

Analysis products for the single-cell ATAC-seq data. Contains:

- `cells.tsv`: list of barcodes that pass QC. Columns include:
- `barcode`
- `sample`: (time point)
- `umap1`
- `umap2`
- `cluster`
- `dpt_pseudotime_fibr_root`: pseudotime values treating a fibroblast cell as root
- `dpt_pseudotime_xOSK_root`: pseudotime values treating xOSK cell as root
- `peaks.bed`: list of peaks of 500bp across all cell states. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA.
- `features.tsv`: 50 dimensional representation of each cell
- `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed`

scATAC_clusters.zip

Analysis products corresponding to cluster pseudo-bulks of the single-cell ATAC-seq data.

- `clusters.tsv`: contains the cluster id, name and colour used in the paper
- `peaks`: contains `overlap_reproducibilty/overlap.optimal_peak` peaks called using ENCODE bulk ATAC-seq pipeline in the narrowPeak format.
- `fragments`: contains per cluster fragment files

scATAC_scRNA_integration.zip

Analysis products from the integration of scATAC with scRNA. Contains:

- `peak_gene_links_fdr1e-4.tsv`: file with peak gene links passing FDR 1e-4. For analyses in the paper, we filter to peaks with absolute correlation >0.45.
- `harmony.cca.30.feat.tsv`: 30 dimensional co-embedding for scATAC and scRNA cells obtained by CCA followed by applying Harmony over assay type.
- `harmony.cca.metadata.tsv`: UMAP coordinates for scATAC and scRNA cells derived from the Harmony CCA embedding. First column contains barcode.

scRNA.zip

Analysis products for the single-cell RNA-seq data. Contains:

- `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca), knn graphs, all associated metadata. Note that barcode suffix (1-9 corresponds to samples D0, D2, ..., D14, iPSC)
- `genes.txt`: list of all genes
- `cells.tsv`: list of barcodes that pass QC across samples. Contains:
- `barcode_sample`: barcode with index of sample (1-9 corresponding to D0, D2, ..., D14, iPSC)
- `sample`: sample name (D0, D2, .., D14, iPSC)
- `umap1`
- `umap2`
- `nCount_RNA`
- `nFeature_RNA`
- `cluster`
- `percent.mt`: percent of mitochondrial transcripts in cell
- `percent.oskm`: percent of OSKM transcripts in cell
- `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt`
- `pca.tsv`: first 50 PC of each cell
- `oskm_endo_sendai.tsv`: estimated raw counts (cts, may not be integers) and log(1+ tp10k) normalized expression (norm) for endogenous and exogenous (Sendai derived) counts of POU5F1 (OCT4), SOX2, KLF4 and MYC genes. Rows are consistent with `seurat.rds` and `cells.tsv`

multiome.zip

multiome/snATAC:

These files are derived from the integration of nuclei from multiome (D1M and D2M), with cells from day 2 of scATAC-seq (labeled D2).

- `cells.tsv`: This is the list of nuclei barcodes that pass QC from multiome AND also cell barcodes from D2 of scATAC-seq. Includes:
- `barcode`
- `umap1`: These are the coordinates used for the figures involving multiome in the paper.
- `umap2`: ^^^
- `sample`: D1M and D2M correspond to multiome, D2 corresponds to day 2 of scATAC-seq
- `cluster`: For multiome barcodes, these are labels transfered from scATAC-seq. For D2 scATAC-seq, it is the original cluster labels.
- `peaks.bed`: This is the same file as scATAC/peaks.bed. List of peaks of 500bp. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA.
- `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed`.
- `features.no.harmony.50d.tsv`: 50 dimensional representation of each cell prior to running Harmony (to correct for batch effect between D2 scATAC and D1M,D2M snMultiome). Rows correspond to cells from `cells.tsv`.
- `features.harmony.10d.tsv`: 10 dimensional representation of each cell after running Harmony. Rows correspond to cells from `cells.tsv`.

multiome/snRNA:

- `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca),associated metadata. Note that barcode suffix (1,2 corresponds to samples D1M, D2M). Please use the UMAP/features from snATAC/ for consistency.
- `genes.txt`: list of all genes (this is different from the list in scRNA analysis)
- `cells.tsv`: list of barcodes that pass QC across samples. Contains:
- `barcode_sample`: barcode with index of sample (1,2 corresponding to D1M, D2M respectively)
- `sample`: sample name (D1M, D2M)
- `nCount_RNA`
- `nFeature_RNA`
- `percent.oskm`: percent of OSKM genes in cell
- `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt`
Single cell sequencing data of PBMC and CSF from a cohort of Multiple...
zenodo.org
Updated Aug 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2024). Single cell sequencing data of PBMC and CSF from a cohort of Multiple Sclerosis patients and other neurological disease controls [Dataset]. http://doi.org/10.5281/zenodo.13253569
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.13253569
Dataset updated
Aug 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Neuroinflammation is often characterised by immune cell infiltrates in the cerebrospinal fluid (CSF). Here we apply single-cell RNA sequencing to explore the functional characteristics of these cells in patients with various inflammatory, infectious and non-inflammatory neurological disorders. We show that CSF is distinct from the peripheral blood in terms of both cellular composition and gene expression. We report that the cellular and transcriptional landscape of CSF is altered in neuroinflammation, but is strikingly similar across different neuroinflammatory disorders. We find clonal expansion of CSF B and T cells in all disorders but most pronounced in inflammatory diseases, and we functionally characterise the transcriptional features of these cells. Finally, we explore the genetic control of gene expression in CSF lymphocytes. Our results highlight the common features of immune cells in the CSF compartment across diverse neurological diseases and may help to identify new targets for drug development or repurposing in Multiple Sclerosis.

This dataset contains a tarball with six files:

A Seurat object with 5' single-cell gene expression data for all cells in the dataset

A Seurat object with B cells only, containing 5' single-cell gene expression data and VDJ data in the metadata

A Seurat object with T cells only, containing 5' single-cell gene expression data and VDJ data in the metadata

Separate .csv files with the metadata alone for each of the three datasets

These data have undergone very light quality control and contain only the raw, non-normalised RNA counts in the RNA assay (adjusted only for ambient RNA contamination). Details of QC steps used in the paper are given in the github. Please note that these data were generated across two sites and across multiple batches, and so any analysis should account for this potential source of technical variability. Metadata include the following key columns:

batch_id: the batch

source: whether the sample is from CSF or PBMC

processing_site: whether the sample was processed in Munich or Cambridge

Category: the diagnostic group (MS, Other Inflammatory Neurological Disease, Other Inflammatory Neurological Disease - Infection, and Non-inflammatory Neurological Disease)

Sex

OCB: whether the patient had CSF oligoclonal bands

fully_anonymous_pseudoid: donor ID

ann_celltypist_lowres: automated cell type assigment at low res

ann_celltypist_highres: automated cell type assigment at high res

VDJ datasets (B and T cells) contain many additional metadata columns with information on the VDJ and VJ transcripts expressed by each cell.

Facebook

Twitter

Click to copy link

Link copied

Cite

Gangcai Xie (2024). Seurat object with cell type annotation and UMAP coordinates for zebrafish testis single cell RNA sequencing datasets [Dataset]. http://doi.org/10.6084/m9.figshare.27922725.v1

Seurat object with cell type annotation and UMAP coordinates for zebrafish testis single cell RNA sequencing datasets

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.27922725.v1

Dataset updated

Nov 28, 2024

Dataset provided by

figshare

Authors

Gangcai Xie

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is the Seurat object in .rds format with the raw matrix information (after filtering) , cell type annotation information and the UMAP coordinates. Users can use R readRDS function to load this .rds file. If you are using this dataset, please cite our paper: Qian, Peipei, Jiahui Kang, Dong Liu, and Gangcai Xie. "Single cell transcriptome sequencing of Zebrafish testis revealed novel spermatogenesis marker genes and stronger Leydig-germ cell paracrine interactions." Frontiers in genetics 13 (2022): 851719.

Clear search

Close search

Google apps

Main menu

Seurat object with cell type annotation and UMAP coordinates for zebrafish...

Data used in SeuratIntegrate paper

cellCounts

Data from: A Single-Cell Tumor Immune Atlas for Precision Oncology

Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

Uehata et al. single-cell ATAC-seq dataset of hematopoietic stem and...

Single-cell Atlas Reveals Diagnostic Features Predicting Progressive Drug...

Data from: A single-cell atlas characterizes dysregulation of the bone...

Droplet-based, high-throughput single cell transcriptional analysis of adult...

scRNA data from: Organization of the human Intestine at single cell...

scRNA-seq Trajectory inference.

Data and Context

Related datasets:

Inspiration

Single cell T cell atlas

seurat.wnn.peak.rds

scRNA-seq MCF10-2A p53 on/off, CENP-A overexpress

Data and Context

Related datasets:

Inspiration

Seurat object created using a scRNAseq dataset derived from malignant cells...

Processed Seurat Objects for Localized Marker Detector (Cluster-Independent...

Dawnn benchmarking dataset: Simulated linear trajectories processing and...

Processed data of single cell RNA-sequencing of 16 NPM1-mutated Acute...

Analysis Products: Transcription factor stoichiometry, motif affinity and...

Single cell sequencing data of PBMC and CSF from a cohort of Multiple...

Seurat object with cell type annotation and UMAP coordinates for zebrafish testis single cell RNA sequencing datasets