48 datasets found
  1. f

    Seurat object with cell type annotation and UMAP coordinates for zebrafish...

    • figshare.com
    application/gzip
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gangcai Xie (2024). Seurat object with cell type annotation and UMAP coordinates for zebrafish testis single cell RNA sequencing datasets [Dataset]. http://doi.org/10.6084/m9.figshare.27922725.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Nov 28, 2024
    Dataset provided by
    figshare
    Authors
    Gangcai Xie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the Seurat object in .rds format with the raw matrix information (after filtering) , cell type annotation information and the UMAP coordinates. Users can use R readRDS function to load this .rds file. If you are using this dataset, please cite our paper: Qian, Peipei, Jiahui Kang, Dong Liu, and Gangcai Xie. "Single cell transcriptome sequencing of Zebrafish testis revealed novel spermatogenesis marker genes and stronger Leydig-germ cell paracrine interactions." Frontiers in genetics 13 (2022): 851719.

  2. Data used in SeuratIntegrate paper

    • zenodo.org
    application/gzip, bin +2
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Specque; Florian Specque; Macha Nikolski; Macha Nikolski; Domitille Chalopin; Domitille Chalopin (2025). Data used in SeuratIntegrate paper [Dataset]. http://doi.org/10.5281/zenodo.15496601
    Explore at:
    bin, pdf, txt, application/gzipAvailable download formats
    Dataset updated
    May 23, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Florian Specque; Florian Specque; Macha Nikolski; Macha Nikolski; Domitille Chalopin; Domitille Chalopin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository gathers the data and code used to generate hepatocellular carcinoma analyses in the paper presenting SeuratIntegrate. It contains the scripts to reproduce the figures presented in the article. Some figures are also available as pdf files.

    To be able to fully reproduce the results from the paper, one shoud:

    • download all the files
    • install R 4.3.3, with correspondig base R packages (stats, graphics, grDevices, utils, datasets, methods and base)
    • install R packages listed in the file sessionInfo.txt
    • install the provided version of SeuratIntegrate. In an R session, run:
    remotes::install_local("path/to/SeuratIntegrate_0.4.1.tar.gz")
    • install (mini)conda if necessary (we used miniconda version 23.11.0)
    • install the conda environments (if it fails with the *package-list.yml files, use the *package-list-from-history.yml files instead):
    conda env create --file SeuratIntegrate_bbknn_package-list.yml
    conda env create --file SeuratIntegrate_scanorama_package-list.yml
    conda env create --file SeuratIntegrate_scvi-tools_package-list.yml
    conda env create --file SeuratIntegrate_trvae_package-list.yml
    • open an R session to make the conda environments usable by SeuratIntegrate:
    library(SeuratIntegrate)
    
    UpdateEnvCache("bbknn", conda.env = "SeuratIntegrate_bbknn", conda.env.is.path = FALSE)
    UpdateEnvCache("scanorama", conda.env = "SeuratIntegrate_scanorama", conda.env.is.path = FALSE)
    UpdateEnvCache("scvi", conda.env = "SeuratIntegrate_scvi-tools", conda.env.is.path = FALSE)
    UpdateEnvCache("trvae", conda.env = "SeuratIntegrate_trvae", conda.env.is.path = FALSE)

    Once done, running the code in integrate.R should produce reproducible results. Note that lines 3 to 6 from integrate.R should be adapted to the user's setup.
    integrate.R is subdivided into six main parts:

    1. Preparation: lines 1-56
    2. Preprocessing: lines 58-74
    3. Integration: lines 76-121
    4. Processing of integration outputs: lines 126-267
    5. Scoring of integration outputs: lines 269-353
    6. Plotting: lines 380-507

    Intermediate SeuratObjects have been saved between steps 3 and 4 and 5 and 6 (liver10k_integrated_object.RDS and liver10k_integrated_scored_object.RDS respectively). It is possible to start with these intermediate SeuratObjects to avoid the preceding steps, given that the Preparation step is always run before.

  3. l

    cellCounts

    • opal.latrobe.edu.au
    • researchdata.edu.au
    bin
    Updated Dec 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yang Liao; Dinesh Raghu; Bhupinder Pal; Lisa Mielke; Wei Shi (2022). cellCounts [Dataset]. http://doi.org/10.26181/21588276.v3
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 19, 2022
    Dataset provided by
    La Trobe
    Authors
    Yang Liao; Dinesh Raghu; Bhupinder Pal; Lisa Mielke; Wei Shi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This page includes the data and code necessary to reproduce the results of the following paper: Yang Liao, Dinesh Raghu, Bhupinder Pal, Lisa Mielke and Wei Shi. cellCounts: fast and accurate quantification of 10x Chromium single-cell RNA sequencing data. Under review. A Linux computer running an operating system of CentOS 7 (or later) or Ubuntu 20.04 (or later) is recommended for running this analysis. The computer should have >2 TB of disk space and >64 GB of RAM. The following software packages need to be installed before running the analysis. Software executables generated after installation should be included in the $PATH environment variable.

    R (v4.0.0 or newer) https://www.r-project.org/ Rsubread (v2.12.2 or newer) http://bioconductor.org/packages/3.16/bioc/html/Rsubread.html CellRanger (v6.0.1) https://support.10xgenomics.com/single-cell-gene-expression/software/overview/welcome STARsolo (v2.7.10a) https://github.com/alexdobin/STAR sra-tools (v2.10.0 or newer) https://github.com/ncbi/sra-tools Seurat (v3.0.0 or newer) https://satijalab.org/seurat/ edgeR (v3.30.0 or newer) https://bioconductor.org/packages/edgeR/ limma (v3.44.0 or newer) https://bioconductor.org/packages/limma/ mltools (v0.3.5 or newer) https://cran.r-project.org/web/packages/mltools/index.html

    Reference packages generated by 10x Genomics are also required for this analysis and they can be downloaded from the following link (2020-A version for individual human and mouse reference packages should be selected): https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest After all these are done, you can simply run the shell script ‘test-all-new.bash’ to perform all the analyses carried out in the paper. This script will automatically download the mixture scRNA-seq data from the SRA database, and it will output a text file called ‘test-all.log’ that contains all the screen outputs and speed/accuracy results of CellRanger, STARsolo and cellCounts.

  4. o

    Data from: A Single-Cell Tumor Immune Atlas for Precision Oncology

    • explore.openaire.eu
    Updated Sep 21, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paula Nieto (2020). A Single-Cell Tumor Immune Atlas for Precision Oncology [Dataset]. http://doi.org/10.5281/zenodo.4036019
    Explore at:
    Dataset updated
    Sep 21, 2020
    Authors
    Paula Nieto
    Description

    Publication version of the Single-Cell Tumor Immune Atlas This upload contains: TICAtlas.rds: an rds file containing a Seurat object with the whole Atlas TICAtlas.h5ad: an h5ad file with the whole Atlas TICAtlas_downsampled.rds: an rds file containing a downsampled version of the Seurat object of the whole Atlas TICAtlas_downsampled.h5ad: an rds file containing a downsampled version of the Seurat object of the whole Atlas TICAtlas_metadata.csv: a comma-separated text file with the metadata for each of the cells All the files contain the following patient/sample metadata variables: patient: assigned patient identifiers nCountRNA and nFeatureRNA: number of UMIs and genes per cell percent.mt: percentage of mitochondrial genes gender: the patient's gender (male/female/unknown) source: dataset of origin subtype: cancer type (abbreviations as indicated in the preprint) kmeans_cluster: patients clusters, NA if filtered out before clustering lv1 and lv2: annotated cell type for each of the cells, two level annotation (lv2 has more cell types) If you have any issues with the metadata (i.e. unexpected factors, NA values...) you can use the TICAtlas_metadata.csv file. For more information, read our paper, check our GitHub and our ShinyApp. h5ad files can be read with Python using Scanpy, rds files can be read in R using Seurat. For format conversion between AnnData and Seurat we recommend SeuratDisk. For other single-cell data formats you can use sceasy.

  5. Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, txt
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Hsu; Allart Stoop; Jonathan Hsu; Allart Stoop (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. http://doi.org/10.5281/zenodo.10011622
    Explore at:
    bin, txtAvailable download formats
    Dataset updated
    Nov 20, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jonathan Hsu; Allart Stoop; Jonathan Hsu; Allart Stoop
    Description

    Table of Contents

    1. Main Description
    2. File Descriptions
    3. Linked Files
    4. Installation and Instructions

    1. Main Description

    ---------------------------

    This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled `marengo_code_for_paper_jan_2023.R` was used to generate the figures from the single-cell RNA sequencing data.

    The following libraries are required for script execution:

    • Seurat
    • scReportoire
    • ggplot2
    • stringr
    • dplyr
    • ggridges
    • ggrepel
    • ComplexHeatmap

    File Descriptions

    ---------------------------

    • The code can be downloaded and opened in RStudios.
    • The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper
    • The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113).
    • The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots.
    • The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

    Linked Files

    ---------------------

    This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

    Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

    • Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
    • Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data.
    • Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

    • Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
    • Description: This submission contains the **raw sequencing** or `.fastq.gz` files, which are tab delimited text files.
    • Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

    • Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.
    • Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code.
    • Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

    Installation and Instructions

    --------------------------------------

    The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

    > Ensure you have R version 4.1.2 or higher for compatibility.

    > Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

    1. Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).

    2. Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.

    3. Set your working directory to where the following files are located:

    • marengo_code_for_paper_jan_2023.R
    • Install_Packages.R
    • Marengo_newID_March242023.rds
    • genes_for_heatmap_fig5F.xlsx
    • all_res_deg_for_heat_updated_march2023.txt

    You can use the following code to set the working directory in R:

    > setwd(directory)

    4. Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.

    5. Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.

    6. Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.

    7. Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.

  6. f

    Uehata et al. single-cell ATAC-seq dataset of hematopoietic stem and...

    • figshare.com
    application/gzip
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexis Vandenbon (2023). Uehata et al. single-cell ATAC-seq dataset of hematopoietic stem and progenitor cells [Dataset]. http://doi.org/10.6084/m9.figshare.24040575.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Aug 28, 2023
    Dataset provided by
    figshare
    Authors
    Alexis Vandenbon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A Seurat object (.rds format) for a single-cell ATAC-seq dataset of hematopoietic stem and progenitor cells. It includes 4 samples:controlDKO (Reg1–/–, Reg3–/–)Nfkbiz–/–TKO DKO (Reg1–/–, Reg3–/– Nfkbiz–/–)Data was processed using Seurat and Signac. For more details we refer to the accompanying GitHub repository. In brief, we normalized the data, conducted linear and non-linear dimensionality reduction, clustered cells, calculated "gene activities", and added motif information to the Seurat object.A link to the accompanying paper will be added here after publication.

  7. Z

    Single-cell Atlas Reveals Diagnostic Features Predicting Progressive Drug...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Sep 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meera Makheja (2023). Single-cell Atlas Reveals Diagnostic Features Predicting Progressive Drug Resistance in Chronic Myeloid Leukemia [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5118610
    Explore at:
    Dataset updated
    Sep 7, 2023
    Dataset provided by
    Vaidehi Krishnan
    Meera Makheja
    Charles Chuah
    Chan Zhu En
    Pavanish Kumar
    Lee Kian Leong
    John Ouyang
    Prasanna Nori Venkatesh
    Alice Man Sze Cheung
    Shyam Prabhakar
    Zahid Nawaz
    Owen Rackham
    Sudipto Bari
    Salvatore Albani
    William Ying Khee Hwang
    Sin Tiong Ong
    Ahmad Lajam
    Florian Schmidt
    Description

    This archive contains data of scRNAseq and CyTOF in form of Seurat objects, txt and csv files as well as R scripts for data analysis and Figure generation.

    A summary of the content is provided in the following.

    R scripts

    Script to run Machine learning models predicting group specific marker genes: CML_Find_Markers_Zenodo.R Script to reproduce the majority of Main and Supplementary Figures shown in the manuscript: CML_Paper_Figures_Zenodo.R Script to run inferCNV analysis: inferCNV_Zenodo.R Script to plot NATMI analysis results:NATMI_CvsA_FC0.32_Updown_Column_plot_Zenodo.R Script to conduct sub-clustering and filtering of NK cells NK_Marker_Detection_Zenodo.R

    Helper scripts for plotting and DEG calculation:ComputePairWiseDE_v2.R, Seurat_DE_Heatmap_RCA_Style.R

    RDS files

    General scRNA-seq Seurat objects:

    scRNA-seq seurat object after QC, and cell type annotation used for most analysis in the manuscript: DUKE_DataSet_Doublets_Removed_Relabeled.RDS

    scRNA-seq including findings e.g. from NK analysis used in the shiny app: DUKE_final_for_Shiny_App.rds

    Neighborhood enrichment score computed for group A across all HSPCs: Enrichment_score_global_groupA.RDS

    UMAP coordinates used in the article: Layout_2D_nNeighbours_25_Metric_cosine_TCU_removed.RDS

    SCENIC files:

    Regulon set used in SCENIC: 2.6_regulons_asGeneSet.Rds

    AUC values computed for regulons: 3.4_regulonAUC.Rds

    MetaData used in SCENIC cellInfo.Rds

    Group specific regulons for LCS: groupSpecificRegulonsBCRAblP.RDS

    Patient specific regulons for LSC: patientSpecificRegulonsBCRAblP.RDS

    Patient specificity score for LSC: PatientSpecificRegulonSpecificityScoreBCRAblP.RDS

    Regulon specificty score for LSC: RegulonSpecificityScoreBCRAblP.RDS

    BCR-ABL1 inference:

    HSC with inferred BCR-ABL1 label: HSCs_CML_with_BCR-Abl_label.RDS

    UMAP for HSC with inferred BCR-ABL1 label: HSCs_CML_with_BCR-Abl_label_UMAP.RDS

    HSPCs with BCR-ABL1 module scores: HSPC_metacluster_74K_with_modscore_27thmay.RDS

    NK sub-clustering and filtering:

    NK object with module scores: NK_8617cells_with_modscore_1stjune.RDS

    Feature genes for NK cells computed with DubStepR: NK_Cells_DubStepR

    NK cells Seurat object excluding contaminating T and B cells: NK_cells_T_B_17_removed.RDS

    NK Seurat object including neighbourhood enrichment score calculations: NK_seurat_object_with_enrichment_labels_V2.RDS

    txt and csv files:

    Proportions per cluster calculated from CyTOF: CyTOF_Proportions.txt

    Correlation between scRNAseq and CyTOF cell type abundance: scRNAseq_Cor_Cytof.txt

    Correlation between manual gating and FlowSOM clustering: Manual_vs_FlowSOM.txt

    GSEA results:

    HSPC, HSC and LSC results: FINAL_GSEA_DATA_For_GGPLOT.txt

    NK: NK_For_Plotting.txt

    TFRC and HLA expression: TFRC_and_HLA_Values.txt

    NATMI result files:

    UP-regulated_mean.csv

    DOWN-regulated_mean.csv

    Gene position file used in inferCNV: inferCNV_gene_positions_hg38.txt

    Module scores for NK subclusters per cell: NK_Supplementary_Module_Scores.csv

    Compressed folders:

    All CyTOF raw data files: CyTOF_Data_raw.zip

    Results of the patient-based classifier: PatientwiseClassifier.zip

    Results of the single-cell based classifier: SingleCellClassifierResults.zip

    For general new data analysis approaches, we recommend the readers to use the Seruat object stored in DUKE_final_for_Shiny_App.rds or to use the shiny app(http://scdbm.ddnetbio.com/) and perform further analysis from there.

    RAW data is available at EGA upon request using Study ID: EGAS00001005509

    Revision

    The for_CML_manuscript_revision.tar.gz folder contains scripts and data for the paper revision including 1) Detection of the BCR-ABL fusion with long read sequencing; 2) Identification of BCR-ABL junction reads with scRNAseq; 3) Detection of expressed mutations using scRNAseq.

  8. Data from: A single-cell atlas characterizes dysregulation of the bone...

    • zenodo.org
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Pilcher; William Pilcher (2025). A single-cell atlas characterizes dysregulation of the bone marrow immune microenvironment associated with outcomes in multiple myeloma [Dataset]. http://doi.org/10.5281/zenodo.14624955
    Explore at:
    Dataset updated
    Jan 14, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    William Pilcher; William Pilcher
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 8, 2024
    Description

    This repository contains R Seurat objects associated with our study titled "A single-cell atlas characterizes dysregulation of the bone marrow immune microenvironment associated with outcomes in multiple myeloma".

    Single cell data contained within this object comes from MMRF Immune Atlas Consortium work.

    The .rds files contains a Seurat object saved with version 4.3. This can be loaded in R with the readRDS command.

    Two .RDS files are included in this version of the release.

    • Discovery object: MMRF_ImmuneAtlas_Full_With_Corrected_Censored_Metadata.rds contains all aliquots belonging to the 'discovery' cohort as used in the initial paper. This represents the dataset used for initial clustering, cell annotation, and analysis.

    • Discovery + Validation object: COMBINED_VALIDATION_MMRF_ImmuneAtlas_Full_Censored_Metadata.rds contains both aliquots belonging to the initial 'discovery' cohort, and aliquots belonging to the 'validation' cohort. The group each cell is derived from is listed under the 'cohort' variable. Labels related to cell annotation, including doublet status, are derived from a label transfer process as described in the paper. Labels for the original 'discovery' cohort are unchanged. UMAPs have been reconstructed with both the discovery and validation cohorts integrated.

    --

    The discovery object contains two assays:

    • "RNA" - The raw count matrix
    • "RNA_Batch_Corrected" - Counts adjusted for the combination of 'Study_Site' and 'Batch'.
      • Analysis should prefer the original RNA assay, unless using pipelines which does not support adjusting for technical covariates.

    Currently, the validation object only includes the uncorrected RNA assay.

    --

    The object contains two umaps in the reduction slot:

    • umap - will render the UMAP for the full object with all cells.
    • umap.sub -contains the UMAP embeddings for individual 'compartments', as indicated by 'subcluster_V03072023_compartment'

    --

    Each sample has three different identifiers:

    • public_id
      • Indicates a specific patient (n=263).
      • MMRF_####
      • This is a standard identifier which is used across all MMRF CoMMpass datasets
      • public_ids can map to multiple d_visit_specimen_ids and aliquot_ids
      • As of now, all public_ids have a single sample collected at Baseline.
        • This can be accessed by filtering for 'collection_event' %in% c("Baseline", "Screening") or VJ_INTERVAL == 'Baseline'
    • d_visit_specimen_id
      • Indicates a specific visit by a patient (n=358)
      • MMRF_####_Y
        • Y is a number indicate that this is the 'Y' sample obtained from said patient. This does not correspond to a specific timepoint.
      • This is a standard identifier, which is used across all MMRF CoMMpass datasets
      • The purpose of the visit is indicated in 'collection_event' (Baseline, Relapse, Remmission, etc.). The approximate interval the visit corresponds to is in "VJ_INTERVAL"
      • d_visit_specimen_id uniquely maps to one public_id
      • d_visit_specimen_id can map to multiple aliquot_ids
    • aliquot_id
      • Refers to the specific bone marrow aliquot sample processed (n=361)
      • MMRFA-######
      • This is a unique identifier for each processed scRNA-seq sample.
      • As of now, this uniquely maps to a combination of d_visit_specimen_id, Study_Site, and Batch
      • As of now, is an identifier specific to the MMRF ImmuneAtlas

    Each cell has the following annotation information:

    • subcluster_V03072023
      • These refer to an individual cluster derived from 'Seurat'.
      • Format is 'Compartment'.'Compartment-cluster'.'Compartment-subcluster'
        • 'NkT.2.2', indicates this cell is in the 'Natural Killer + T Cell compartment', was originally part of 'Cluster 2', and then was further separated into a refined subcluster 2.2'
        • If a parent cluster did not need to be further seprated, the 'Compartment-subcluster' part is omitted (e.g., 'NkT.6')
      • As of now, this uniquely maps to a specific cellID_short annotation.
      • Clustering was done on a per compartment basis
        • For most immune cell types, clustering was based on embeddings corrected for 'siteXbatch'. For Plasma, clustering was performed on embeddings corrected on a per-sample basis.
      • In the combined validation object, DISCOVERY.subcluster_V03072023 will contain values only for the discovery cohort, and have NA values for validation samples.
    • subcluster_V03072023_compartment
      • These refer to one of five major compartments as identified roughly on the original UMAP. Clustering was performed on a per-compartment basis following a first pass rough annotation.
      • The possible compartments are
        • NkT (T cell + Natural Killer Cells)
        • Myeloid (Monocytes, Macrophages, Dendritic cells, Neutrophil/Granulocyte populations)
        • BEry (B Cell, Erythroblasts, bone marrow progenitor populations, pDCs)
        • Ery (Erythrocyte population)
        • Plasma (Plasma cell populations)
      • Each compartment has it's own UMAP generated, which can be accessed in the 'umap.sub' reduction
      • One cluster was isolated from all other populations, and was not assigned to a compartment. This cluster is labeled as 'Full.23'.
      • In the combined validation object, DISCOVERY.subcluster_V03072023_compartment will contain values only for the discovery cohort, and have NA values for validation samples.
    • cellID_short
      • This is the individual annotation for each cluster.
      • Please see the 'Cell Population Annotation Dictionary' for further details.
      • If different seurat clusters were assigned similar annotations, the celltype annotation will be appended with a distinct cluster gene, or with '_b', '_c'
    • lineage_group
      • This is an annotation driven grouping of clusters into major immune populations, as shown in Figure 2.
      • This includes "CD8", "CD4", "M" (Myeloid), "B" (B cell), "E" (Erythroid), "P" (Plasma), "Other" (HSC, Fibro, pDC_a), "LQ" (Doublet)
    • isDoublet
      • This is a binary 'True' or 'False' derived from manual review of clusters following doublet analysis, as described in the paper.
      • True indicates the cluster was determined to be a doublet population.
      • This is derived from 'doublet_pred', in which 'dblet_cluster' and 'poss_dblet_cluster' were flagged as doublet populations for subsequent analysis.
      • In the validation object, the doublet status of new samples were inferred by if label transfer from the discovery cohort mapped the cell from the new sample as one of the previously identified doublet populations. The raw doublet scores from doublet finder, pegasus, or scrublet, are not included in this release.

    --

    Each sample has the following information indicating shipment batches, for batch correction

    • Study_Site
      • The center which processed a specific aliquot_id
      • EMORY, MSSM, WashU, MAYO
    • Batch
      • The shipment batch the sample was associated with
      • Valued 1 to 3 for EMORY, MSSM, MAYO, and 1 to 4 for WashU
    • siteXbatch
      • A combination of the above to variables, to be used for batch correction
    • (Combined Validation Object only): cohort
      • Indicates if the sample was involved in the 'discovery' cohort, or 'validation' cohort. Samples in the 'validation' cohort will have labels inferred from label mapping

    --

    Each public_id has limited demographic information based on publicly available information in the MMRF CoMMpass study.

    • d_pt_sex
      • Patient sex (not self-identified). Male or Female
    • d_pt_race_1
      • Patient self-identified race
    • d_pt_ethnicity
      • Patient self-identified ethnicity
    • d_dx_amm_age
      • Patient age at diagnosis.
      • Not reported for patients above 90 at diagnosis
    • d_dx_amm_bmi
      • Patient BMI at diagnosis
    • d_pt_height_cm
      • Patient height at diagnosis, in centimeters.
    • d_dx_amm_weight_kg
      • Patient weight at diagnosis, in kilograms

    d_specimen_visit_id contains two data points providing limited information about the visit

    • collection_event
      • Description of why the sample was collected
        • e.g., 'Baseline' and 'Screening' indicates the sample was obtained prior to therapy
        • 'Relapse/Progression' indicates the sample was collected due to disease progression based on clinical assessment
        • 'Remission/Response' indicates the sample was collected due to patient entering remission based on clinical assessment
        • Samples may be collected for reasons independent of the above, such as 'Pre' or 'Post' ASCT, or for other unspecified reasons
    • VJ_INTERVAL
      • Indicates the rough interval following start of therapy the sample is assigned to
        • "Baseline", "Month 3", "Year 2", etc.

    All the single-cell raw data, along with outcome and cytogenetic information, is available at MMRF’s VLAB shared resource. Requests to access these data will be reviewed by data access committee at MMRF and any data shared will be released under a data transfer agreement that will protect the identities of patients involved in the study. Other information from the CoMMpass trial can also generally be

  9. f

    Droplet-based, high-throughput single cell transcriptional analysis of adult...

    • figshare.com
    Updated Mar 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarthak Sinha; Jo Anne Stratton (2019). Droplet-based, high-throughput single cell transcriptional analysis of adult mouse tissue using 10X Genomics' Chromium Single Cell 3' (v2) system: From tissue preparation to bioinformatic analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6626927.v1
    Explore at:
    Dataset updated
    Mar 6, 2019
    Dataset provided by
    figshare
    Authors
    Sarthak Sinha; Jo Anne Stratton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The attached R Scripts supplement our protocol paper currently under editorial review at the Journal of Visualized Experiments.Scope of the article:This protocol describes the general processes and quality control checks necessary for preparing healthy adult single cells in preparation for droplet-based, high-throughput single cell RNA-Seq analysis using the 10X Genomics' Chromium System. We also describe sequencing parameters, alignment and downstream single-cell bioinformatic analysis.

  10. n

    scRNA data from: Organization of the human Intestine at single cell...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Feb 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Winston Becker (2023). scRNA data from: Organization of the human Intestine at single cell resolution [Dataset]. http://doi.org/10.5061/dryad.8pk0p2ns8
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 24, 2023
    Dataset provided by
    Stanford University
    Authors
    Winston Becker
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The human adult intestinal system is a complex organ that is approximately 9 meters long and performs a variety of complex functions including digestion, nutrient absorption, and immune surveillance. We performed snRNA-seq on 8 regions of of the human intestine (duodenum, proximal-jejunum, mid-jejunum, ileum, ascending colon, transverse colon, descending colon, and sigmoid colon) from 9 donors (B001, B004, B005, B006, B008, B009, B010, B011, and B012). In the corresponding paper, we find cell compositions differ dramatically across regions of the intestine and demonstrate the complexity of epithelial subtypes. We map gene regulatory differences in these cells suggestive of a regulatory differentiation cascade, and associate intestinal disease heritability with specific cell types. These results describe the complexity of the cell composition, regulation, and organization in the human intestine, and serve as an important reference map for understanding human biology and disease. Methods For a detailed description of each of the steps to obtain this data see the detailed materials and methods in the associated manuscript. Briefly, intestine pieces from 8 different sites across the small intestine and colon were flash frozen. Nuclei were isolated from each sample and the resulting nuclei were processed with either 10x scRNA-seq using Chromium Next GEM Single Cell 3’ Reagent Kits v3.1 (10x Genomics, 1000121) or Chromium Next GEM Chip G Single Cell Kits (10x Genomics, 1000120) or 10x multiome sequencing using Chromium Next GEM Single Cell Multiome ATAC + Gene Expression Kits (10x Genomics, 1000283). Initial processing of snRNA-seq data was done with the Cell Ranger Pipeline (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger) by first running cellranger mkfastq to demultiplex the bcl files and then running cellranger count. Since nuclear RNA was sequenced, data were aligned to a pre-mRNA reference. Initial processing of the mutiome data, including alignment and generation of fragments files and expression matrices, was performed with the Cell Ranger ARC Pipeline. The raw expression matrices from these pipelines are included here. Downstream processing was performed in R, using the Seurat package.

  11. scRNA-seq Trajectory inference.

    • kaggle.com
    Updated Aug 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Chervov (2022). scRNA-seq Trajectory inference. [Dataset]. https://www.kaggle.com/datasets/alexandervc/trajectory-inference-single-cell-rna-seq/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alexander Chervov
    Description

    Remark: For trajectory inference discussion for that dataset, see paper: https://www.mdpi.com/1099-4300/22/11/1274 "Minimum Spanning vs. Principal Trees for Structured Approximations of Multi-Dimensional Datasets Alexander Chervov, Jonathan Bac and Andrei Zinovyev

    For cell cycle analysis see: https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev

    Data and Context

    Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

    See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package

    Particular data: Gene expressions count matrix. Single cell RNA sequencing data. 447 cells , 24748 genes Mouse Liver Hepatoblast in vivo.

    Paper: Hepatology. 2017 Nov;66(5):1387-1401. doi: 10.1002/hep.29353. Epub 2017 Sep 29. A single-cell transcriptomic analysis reveals precise pathways and regulatory mechanisms underlying hepatoblast differentiation Li Yang 1 2 , Wei-Hua Wang 1 2 , Wei-Lin Qiu 1 3 , Zhen Guo 1 , Erfei Bi 4 , Cheng-Ran Xu 1

    Data: GSE90047 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE90047 Downloaded from: https://cytotrace.stanford.edu/#shiny-tab-dataset_download

    Related datasets:

    Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"

    Inspiration

    Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

    Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

    Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles

    (Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833

    Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)

    Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)

    Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell

  12. Single cell T cell atlas

    • zenodo.org
    bin, csv
    Updated Jul 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kerry A Mullan; Kerry A Mullan (2024). Single cell T cell atlas [Dataset]. http://doi.org/10.5281/zenodo.12569981
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Jul 27, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kerry A Mullan; Kerry A Mullan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    The attached datasets comprised of the merging of 12 high quality single cell T cell based dataset that had both the TCR-seq and GEx. The object contains ~500K paired TCR-seq with GEx in the Seurat Object (supercluster_added_ID-240531.rds). We also included the original identifiers in the Sup_Update_labels.csv a. See our https://stegor.readthedocs.io/en/latest/ for how we processed the 12 datasets and decided on the current 47 T cell annotation models using scGate.

    This is the accompanying data set for the paper entitled ‘T cell receptor-centric approach to streamline multimodal single-cell data analysis.’, which is currently available as a preprint (https://www.biorxiv.org/content/10.1101/2023.09.27.559702v2). Details on the origin of the datasets, and processing steps can be found there.

    The purpose of this atlas both the full dataset and down sampling version is to aid in improving the interpretability of other T cell based datasets. This can be done by adding in the down sampled object that contains up to 500 cells per annotation model or all 12 dataset to your new sample. This dataset aims to improve the capacity to identify TCR-specific signature by ensuring a well covered background, which will improve the robustness of the FindMarker Function in Seurat package.

  13. seurat.wnn.peak.rds

    • figshare.com
    application/gzip
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liran Mao (2024). seurat.wnn.peak.rds [Dataset]. http://doi.org/10.6084/m9.figshare.27265410.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 21, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Liran Mao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the data necessary to reproduce the results from the SpatialMuxSeq vignette (https://rpubs.com/LiranM/SpatialMuxSeq), featured in our paper "Multiplexed Spatial Mapping of Chromatin Features, Transcriptome, and Proteins in Tissues." To ensure full reproducibility of the results, we have provided a Seurat object that includes all omics layers. For further details and access to all relevant code, please visit our GitHub repository: https://github.com/liranmao/Spatial_multi_omics.

  14. scRNA-seq MCF10-2A p53 on/off, CENP-A overexpress

    • kaggle.com
    Updated Jul 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Chervov (2022). scRNA-seq MCF10-2A p53 on/off, CENP-A overexpress [Dataset]. https://www.kaggle.com/datasets/alexandervc/scrnaseq-mcf102a-p53-onoff-cenpa-overexpress/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 25, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alexander Chervov
    Description

    Remark: See paper: https://arxiv.org/abs/2208.05229 results on cell cycle analysis discussed there. "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev

    Data and Context

    Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

    See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package

    Particular data: Paper: CENP-A overexpression promotes distinct fates in human cells, depending on p53 status Daniel Jeffery, Alberto Gatto, Katrina Podsypanina, Charlène Renaud-Pageot, Rebeca Ponce Landete, Lorraine Bonneville, Marie Dumont, Daniele Fachinetti & Geneviève Almouzni https://www.nature.com/articles/s42003-021-01941-5

    Data: https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-9861/

    Related datasets:

    Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"

    Inspiration

    Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

    Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

    Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles

    (Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833

    Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)

    Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)

    Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell

  15. Seurat object created using a scRNAseq dataset derived from malignant cells...

    • zenodo.org
    bin
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adrián Salas-Bastos; Adrián Salas-Bastos (2025). Seurat object created using a scRNAseq dataset derived from malignant cells isolated from BRAF mutant patient-derived xenograft melanoma cohorts exposed to concurrent RAF/MEK-inhibition (Rambow et al., 2018. Cell) [Dataset]. http://doi.org/10.5281/zenodo.14581399
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 3, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Adrián Salas-Bastos; Adrián Salas-Bastos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The file is a Seurat object obtained by the re-analysis of the dataset published by Rambow et al., 2018, and is related to the paper "TGFβ signaling sensitizes MEKi-resistant human melanoma to targeted therapy-induced apoptosis" (Loos, Salas-Bastos, Nordin et al. Cell Death Dis 15, 925 (2024). https://doi.org/10.1038/s41419-024-07305-1)

  16. Processed Seurat Objects for Localized Marker Detector (Cluster-Independent...

    • figshare.com
    application/gzip
    Updated Jun 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruiqi Li; Peggy Myung (2025). Processed Seurat Objects for Localized Marker Detector (Cluster-Independent Multiscale Marker Identification inSingle-cell RNA-seq Data using Localized Marker Detector) [Dataset]. http://doi.org/10.6084/m9.figshare.26507098.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 10, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ruiqi Li; Peggy Myung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are processed Seurat objects for the biological datasets in Localized Marker Detector (https://github.com/KlugerLab/LocalizedMarkerDetector):Tabular Muris bone marrow dataset (FACS-based and Droplet-based)We used publicly available scRNA-seq mouse bone marrow datasets (FACS and Droplet-based) from the Tabular Muris Consortium, which were already pre-processed and annotated according to their workflow. In addition, we applied ALRA imputation to generate a denoised assay alra and added several cell annotations: (1) Cell cycle annotation using CellCycleScoring with the updated 2019 cell cycle gene set; (2) Module Activity Scores for the gene modules listed in our paper.Mouse embryo skin datasetWe separated dermal cell populations from newly collected mouse embryo skin samples (aligned to the mouse genome mm10 using CellRanger (v.6.1.2)). Cells from the wildtype and SmoM2YFP mutant (SmoM2) for two consecutive days (embryonic day 13.5 and 14.5) were pooled for analysis. To avoid batch effects from pooling or integrating, we analyzed each condition separately: E13.5 SmoM2, E13.5 WT, E14.5 SmoM2, and E14.5 WT. For each condition, we performed standard normalization, selected the top 2,000 highly variable genes, and scaled the data using the Seurat v4 R package. We then applied PCA, retaining the number of PCs determined by the elbow plot: E13.5 SmoM2 (14 PCs), E13.5 WT (12 PCs), E14.5 SmoM2 (12 PCs), and E14.5 WT (11 PCs).

  17. u

    Dawnn benchmarking dataset: Simulated linear trajectories processing and...

    • rdr.ucl.ac.uk
    application/gzip
    Updated May 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George Hall; Sergi Castellano Hereza (2023). Dawnn benchmarking dataset: Simulated linear trajectories processing and label simulation [Dataset]. http://doi.org/10.5522/04/22616611.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 4, 2023
    Dataset provided by
    University College London
    Authors
    George Hall; Sergi Castellano Hereza
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This project is a collection of files to allow users to reproduce the model development and benchmarking in "Dawnn: single-cell differential abundance with neural networks" (Hall and Castellano, under review). Dawnn is a tool for detecting differential abundance in single-cell RNAseq datasets. It is available as an R package here. Please contact us if you are unable to reproduce any of the analysis in our paper. The files in this collection correspond to the benchmarking dataset based on simulated linear trajectories.

    FILES: Data processing code

    adapted_traj_sim_milo_paper.R Lightly adapted code from Dann et al. to simulate single-cell RNAseq datasets that form linear trajectories . generate_test_data_linear_traj_sim_milo_paper.R R code to assign simulated labels to datatsets generated from adapted_traj_sim_milo_paper.R. Seurat objects saved as cells_sim_linear_traj_gex_seed_*.rds. Simulated labels saved as benchmark_dataset_sim_linear_traj.csv.

    Resulting datasets

    cells_sim_linear_traj_gex_seed_*.rds Seurat objects generated by generate_test_data_linear_traj_sim_milo_paper.R. benchmark_dataset_sim_linear_traj.csv Cell labels generated by generate_test_data_linear_traj_sim_milo_paper.R.

  18. f

    Processed data of single cell RNA-sequencing of 16 NPM1-mutated Acute...

    • figshare.com
    bin
    Updated Jun 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emin Onur Karakaslar (2025). Processed data of single cell RNA-sequencing of 16 NPM1-mutated Acute Myeloid Leukemia samples [Dataset]. http://doi.org/10.6084/m9.figshare.26189771.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 16, 2025
    Dataset provided by
    figshare
    Authors
    Emin Onur Karakaslar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TLDRSeurat object of the 16 NPM1-mutated AML samples (n = 83,162 cells).AML samplesAll sixteen peripheral blood and bone marrow samples were obtained from patients with AML at diagnosis (n=15) or relapse after chemotherapy (n=1) with written informed consent according to the Declaration of Helsinki. Mononuclear cells were isolated by Ficoll-Isopaque density gradient centrifugation and cryopreserved in the Leiden University Medical Center (LUMC) Biobank for Hematological Diseases after approval by the LUMC Institutional Review Board (protocol no. B18.047).Upstream processing pipelineCellRanger v7.0.0 was run on all samples with the human reference genome hg38. For all QC Seurat v4 was used15. Our QC pipeline had three steps per sample: 1) soft filtering, 2) low quality cluster removal, and 3) doublet detection. In soft filtering, Seurat objects were created with cells expressing at least 200 genes and with the genes expressed at least in 3 cells. Then, standard Seurat command list with default parameters was run to detect low quality clusters. Clusters with >15% mitochondrial and 15% mitochondrial mRNA. We used standard Seurat commands to scale and normalize the data on integrated features. First 30 principal components were used to create UMAP plots. We used clustree to determine optimal cluster number, based on FindClusters with resolutions sweeping from 0 to 1.2. We chose res=0.5, as clusters became stable. Next, we merged two clusters (CC5 and CC12) into one GMP-like cluster as one of these clusters (CC12) had high expression of HSP-genes yet still retained its cell-type specific properties.Note: The file was processed with Seurat v4 but the object is updated for v5. Uploaded as .qs file format for faster reading. To read the file: qs:qread("path/to/data.qs")This data is available for research use only; and cannot be used for commercial purposes.For further queries please refer to our paper:

  19. Analysis Products: Transcription factor stoichiometry, motif affinity and...

    • zenodo.org
    tsv, zip
    Updated Nov 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Surag Nair; Surag Nair; Mohamed Ameen; Kevin Wang; Kevin Wang; Anshul Kundaje; Anshul Kundaje; Mohamed Ameen (2023). Analysis Products: Transcription factor stoichiometry, motif affinity and syntax regulate single cell chromatin dynamics during fibroblast reprogramming to pluripotency [Dataset]. http://doi.org/10.5281/zenodo.8313962
    Explore at:
    zip, tsvAvailable download formats
    Dataset updated
    Nov 11, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Surag Nair; Surag Nair; Mohamed Ameen; Kevin Wang; Kevin Wang; Anshul Kundaje; Anshul Kundaje; Mohamed Ameen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This record contains analysis products for the paper "Transcription factor stoichiometry, motif affinity and syntax regulate single cell chromatin dynamics during fibroblast reprogramming to pluripotency" by Nair, Ameen et al. Please refer to the READMEs in the directories, which are summarized below.

    The record contains the following files:

    `clusters.tsv`: contains the cluster id, name and colour of clusters in the paper

    scATAC.zip

    Analysis products for the single-cell ATAC-seq data. Contains:

    - `cells.tsv`: list of barcodes that pass QC. Columns include:
    - `barcode`
    - `sample`: (time point)
    - `umap1`
    - `umap2`
    - `cluster`
    - `dpt_pseudotime_fibr_root`: pseudotime values treating a fibroblast cell as root
    - `dpt_pseudotime_xOSK_root`: pseudotime values treating xOSK cell as root
    - `peaks.bed`: list of peaks of 500bp across all cell states. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA.
    - `features.tsv`: 50 dimensional representation of each cell
    - `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed`

    scATAC_clusters.zip

    Analysis products corresponding to cluster pseudo-bulks of the single-cell ATAC-seq data.

    - `clusters.tsv`: contains the cluster id, name and colour used in the paper
    - `peaks`: contains `overlap_reproducibilty/overlap.optimal_peak` peaks called using ENCODE bulk ATAC-seq pipeline in the narrowPeak format.
    - `fragments`: contains per cluster fragment files

    scATAC_scRNA_integration.zip

    Analysis products from the integration of scATAC with scRNA. Contains:

    - `peak_gene_links_fdr1e-4.tsv`: file with peak gene links passing FDR 1e-4. For analyses in the paper, we filter to peaks with absolute correlation >0.45.
    - `harmony.cca.30.feat.tsv`: 30 dimensional co-embedding for scATAC and scRNA cells obtained by CCA followed by applying Harmony over assay type.
    - `harmony.cca.metadata.tsv`: UMAP coordinates for scATAC and scRNA cells derived from the Harmony CCA embedding. First column contains barcode.

    scRNA.zip

    Analysis products for the single-cell RNA-seq data. Contains:

    - `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca), knn graphs, all associated metadata. Note that barcode suffix (1-9 corresponds to samples D0, D2, ..., D14, iPSC)
    - `genes.txt`: list of all genes
    - `cells.tsv`: list of barcodes that pass QC across samples. Contains:
    - `barcode_sample`: barcode with index of sample (1-9 corresponding to D0, D2, ..., D14, iPSC)
    - `sample`: sample name (D0, D2, .., D14, iPSC)
    - `umap1`
    - `umap2`
    - `nCount_RNA`
    - `nFeature_RNA`
    - `cluster`
    - `percent.mt`: percent of mitochondrial transcripts in cell
    - `percent.oskm`: percent of OSKM transcripts in cell
    - `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt`
    - `pca.tsv`: first 50 PC of each cell
    - `oskm_endo_sendai.tsv`: estimated raw counts (cts, may not be integers) and log(1+ tp10k) normalized expression (norm) for endogenous and exogenous (Sendai derived) counts of POU5F1 (OCT4), SOX2, KLF4 and MYC genes. Rows are consistent with `seurat.rds` and `cells.tsv`

    multiome.zip

    multiome/snATAC:

    These files are derived from the integration of nuclei from multiome (D1M and D2M), with cells from day 2 of scATAC-seq (labeled D2).

    - `cells.tsv`: This is the list of nuclei barcodes that pass QC from multiome AND also cell barcodes from D2 of scATAC-seq. Includes:
    - `barcode`
    - `umap1`: These are the coordinates used for the figures involving multiome in the paper.
    - `umap2`: ^^^
    - `sample`: D1M and D2M correspond to multiome, D2 corresponds to day 2 of scATAC-seq
    - `cluster`: For multiome barcodes, these are labels transfered from scATAC-seq. For D2 scATAC-seq, it is the original cluster labels.
    - `peaks.bed`: This is the same file as scATAC/peaks.bed. List of peaks of 500bp. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA.
    - `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed`.
    - `features.no.harmony.50d.tsv`: 50 dimensional representation of each cell prior to running Harmony (to correct for batch effect between D2 scATAC and D1M,D2M snMultiome). Rows correspond to cells from `cells.tsv`.
    - `features.harmony.10d.tsv`: 10 dimensional representation of each cell after running Harmony. Rows correspond to cells from `cells.tsv`.

    multiome/snRNA:

    - `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca),associated metadata. Note that barcode suffix (1,2 corresponds to samples D1M, D2M). Please use the UMAP/features from snATAC/ for consistency.
    - `genes.txt`: list of all genes (this is different from the list in scRNA analysis)
    - `cells.tsv`: list of barcodes that pass QC across samples. Contains:
    - `barcode_sample`: barcode with index of sample (1,2 corresponding to D1M, D2M respectively)
    - `sample`: sample name (D1M, D2M)
    - `nCount_RNA`
    - `nFeature_RNA`
    - `percent.oskm`: percent of OSKM genes in cell
    - `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt`

  20. Single cell sequencing data of PBMC and CSF from a cohort of Multiple...

    • zenodo.org
    Updated Aug 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2024). Single cell sequencing data of PBMC and CSF from a cohort of Multiple Sclerosis patients and other neurological disease controls [Dataset]. http://doi.org/10.5281/zenodo.13253569
    Explore at:
    Dataset updated
    Aug 8, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Neuroinflammation is often characterised by immune cell infiltrates in the cerebrospinal fluid (CSF). Here we apply single-cell RNA sequencing to explore the functional characteristics of these cells in patients with various inflammatory, infectious and non-inflammatory neurological disorders. We show that CSF is distinct from the peripheral blood in terms of both cellular composition and gene expression. We report that the cellular and transcriptional landscape of CSF is altered in neuroinflammation, but is strikingly similar across different neuroinflammatory disorders. We find clonal expansion of CSF B and T cells in all disorders but most pronounced in inflammatory diseases, and we functionally characterise the transcriptional features of these cells. Finally, we explore the genetic control of gene expression in CSF lymphocytes. Our results highlight the common features of immune cells in the CSF compartment across diverse neurological diseases and may help to identify new targets for drug development or repurposing in Multiple Sclerosis.

    This dataset contains a tarball with six files:

    • A Seurat object with 5' single-cell gene expression data for all cells in the dataset
    • A Seurat object with B cells only, containing 5' single-cell gene expression data and VDJ data in the metadata
    • A Seurat object with T cells only, containing 5' single-cell gene expression data and VDJ data in the metadata
    • Separate .csv files with the metadata alone for each of the three datasets

    These data have undergone very light quality control and contain only the raw, non-normalised RNA counts in the RNA assay (adjusted only for ambient RNA contamination). Details of QC steps used in the paper are given in the github. Please note that these data were generated across two sites and across multiple batches, and so any analysis should account for this potential source of technical variability. Metadata include the following key columns:

    • batch_id: the batch
    • source: whether the sample is from CSF or PBMC
    • processing_site: whether the sample was processed in Munich or Cambridge
    • Category: the diagnostic group (MS, Other Inflammatory Neurological Disease, Other Inflammatory Neurological Disease - Infection, and Non-inflammatory Neurological Disease)
    • Sex
    • OCB: whether the patient had CSF oligoclonal bands
    • fully_anonymous_pseudoid: donor ID
    • ann_celltypist_lowres: automated cell type assigment at low res
    • ann_celltypist_highres: automated cell type assigment at high res

    VDJ datasets (B and T cells) contain many additional metadata columns with information on the VDJ and VJ transcripts expressed by each cell.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gangcai Xie (2024). Seurat object with cell type annotation and UMAP coordinates for zebrafish testis single cell RNA sequencing datasets [Dataset]. http://doi.org/10.6084/m9.figshare.27922725.v1

Seurat object with cell type annotation and UMAP coordinates for zebrafish testis single cell RNA sequencing datasets

Explore at:
application/gzipAvailable download formats
Dataset updated
Nov 28, 2024
Dataset provided by
figshare
Authors
Gangcai Xie
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is the Seurat object in .rds format with the raw matrix information (after filtering) , cell type annotation information and the UMAP coordinates. Users can use R readRDS function to load this .rds file. If you are using this dataset, please cite our paper: Qian, Peipei, Jiahui Kang, Dong Liu, and Gangcai Xie. "Single cell transcriptome sequencing of Zebrafish testis revealed novel spermatogenesis marker genes and stronger Leydig-germ cell paracrine interactions." Frontiers in genetics 13 (2022): 851719.

Search
Clear search
Close search
Google apps
Main menu