100+ datasets found
  1. f

    Scripts for Analysis

    • figshare.com
    txt
    Updated Jul 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sneddon Lab UCSF (2018). Scripts for Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6783569.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 18, 2018
    Dataset provided by
    figshare
    Authors
    Sneddon Lab UCSF
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.

  2. o

    Test Data for Galaxy Tutorial "Clustering 3k PBMCs with Seurat"

    • ordo.open.ac.uk
    bin
    Updated Nov 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marisa Loach (2024). Test Data for Galaxy Tutorial "Clustering 3k PBMCs with Seurat" [Dataset]. http://doi.org/10.5281/zenodo.14013475
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 14, 2024
    Dataset provided by
    The Open University
    Authors
    Marisa Loach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Test Data for Galaxy Tutorial "Clustering 3k PBMCs with Seurat"

  3. n

    Data from: Large-scale integration of single-cell transcriptomic data...

    • data.niaid.nih.gov
    • dataone.org
    • +1more
    zip
    Updated Dec 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 14, 2021
    Dataset provided by
    Cornell University
    Authors
    David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

    Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

    Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

    Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

    Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

    Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

    Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

    Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

    Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

    Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using

  4. f

    Data from: Figure2

    • figshare.com
    zip
    Updated Sep 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raffaele Calogero (2021). Figure2 [Dataset]. http://doi.org/10.6084/m9.figshare.16651780.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 27, 2021
    Dataset provided by
    figshare
    Authors
    Raffaele Calogero
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data used to build figure 2: Assignment of cell line type to clusters generated with Seurat, implemented in rCASC. A) RNA-5c clustering, five clusters generated with Seurat (resolution=0.1), using 2500 genes seected as the most variant within the 5000 most expressed (rCASC topx function). . B) RNA-3c clustering, four clusters generated with Seurat (resolution=0.1), using the 2500 genes selected for RNA-5c. C) RNA-5c hierarchical clustering (Euclidean distance, average linkage) of log2 CPM clusters’ pseudo-bulk expression (rCASC bulkClusters function), row-mean centered, and CCLE lung cell lines A449, NCIH838, NCIH2228, NCIH1975 and HCC827 log2 TPM row-mean centered. D) RNA-3c hierarchical clustering (Euclidean distance, average linkage) of log2 CPM clusters’ pseudo-bulk expression (rCASC bulkClusters function), row-mean centered, and CCLE lung cell lines A449, NCIH838, NCIH2228, NCIH1975 and HCC827 log2 TPM row-mean centered.Figure 2Asomewhere_in_your_computer/fig2/RNA2500-5c/VandE/Results_0.1/VandE/5/VandE_Stability_Plot.pdfFigure 2Bsomewhere_in_your_computer/fig2/RNA2500-3c/VandE/Results/VandE/4/VandE_Stability_Plot.pdf

  5. Data from: A single-cell atlas characterizes dysregulation of the bone...

    • zenodo.org
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Pilcher; William Pilcher (2025). A single-cell atlas characterizes dysregulation of the bone marrow immune microenvironment associated with outcomes in multiple myeloma [Dataset]. http://doi.org/10.5281/zenodo.14624955
    Explore at:
    Dataset updated
    Jan 14, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    William Pilcher; William Pilcher
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 8, 2024
    Description

    This repository contains R Seurat objects associated with our study titled "A single-cell atlas characterizes dysregulation of the bone marrow immune microenvironment associated with outcomes in multiple myeloma".

    Single cell data contained within this object comes from MMRF Immune Atlas Consortium work.

    The .rds files contains a Seurat object saved with version 4.3. This can be loaded in R with the readRDS command.

    Two .RDS files are included in this version of the release.

    • Discovery object: MMRF_ImmuneAtlas_Full_With_Corrected_Censored_Metadata.rds contains all aliquots belonging to the 'discovery' cohort as used in the initial paper. This represents the dataset used for initial clustering, cell annotation, and analysis.

    • Discovery + Validation object: COMBINED_VALIDATION_MMRF_ImmuneAtlas_Full_Censored_Metadata.rds contains both aliquots belonging to the initial 'discovery' cohort, and aliquots belonging to the 'validation' cohort. The group each cell is derived from is listed under the 'cohort' variable. Labels related to cell annotation, including doublet status, are derived from a label transfer process as described in the paper. Labels for the original 'discovery' cohort are unchanged. UMAPs have been reconstructed with both the discovery and validation cohorts integrated.

    --

    The discovery object contains two assays:

    • "RNA" - The raw count matrix
    • "RNA_Batch_Corrected" - Counts adjusted for the combination of 'Study_Site' and 'Batch'.
      • Analysis should prefer the original RNA assay, unless using pipelines which does not support adjusting for technical covariates.

    Currently, the validation object only includes the uncorrected RNA assay.

    --

    The object contains two umaps in the reduction slot:

    • umap - will render the UMAP for the full object with all cells.
    • umap.sub -contains the UMAP embeddings for individual 'compartments', as indicated by 'subcluster_V03072023_compartment'

    --

    Each sample has three different identifiers:

    • public_id
      • Indicates a specific patient (n=263).
      • MMRF_####
      • This is a standard identifier which is used across all MMRF CoMMpass datasets
      • public_ids can map to multiple d_visit_specimen_ids and aliquot_ids
      • As of now, all public_ids have a single sample collected at Baseline.
        • This can be accessed by filtering for 'collection_event' %in% c("Baseline", "Screening") or VJ_INTERVAL == 'Baseline'
    • d_visit_specimen_id
      • Indicates a specific visit by a patient (n=358)
      • MMRF_####_Y
        • Y is a number indicate that this is the 'Y' sample obtained from said patient. This does not correspond to a specific timepoint.
      • This is a standard identifier, which is used across all MMRF CoMMpass datasets
      • The purpose of the visit is indicated in 'collection_event' (Baseline, Relapse, Remmission, etc.). The approximate interval the visit corresponds to is in "VJ_INTERVAL"
      • d_visit_specimen_id uniquely maps to one public_id
      • d_visit_specimen_id can map to multiple aliquot_ids
    • aliquot_id
      • Refers to the specific bone marrow aliquot sample processed (n=361)
      • MMRFA-######
      • This is a unique identifier for each processed scRNA-seq sample.
      • As of now, this uniquely maps to a combination of d_visit_specimen_id, Study_Site, and Batch
      • As of now, is an identifier specific to the MMRF ImmuneAtlas

    Each cell has the following annotation information:

    • subcluster_V03072023
      • These refer to an individual cluster derived from 'Seurat'.
      • Format is 'Compartment'.'Compartment-cluster'.'Compartment-subcluster'
        • 'NkT.2.2', indicates this cell is in the 'Natural Killer + T Cell compartment', was originally part of 'Cluster 2', and then was further separated into a refined subcluster 2.2'
        • If a parent cluster did not need to be further seprated, the 'Compartment-subcluster' part is omitted (e.g., 'NkT.6')
      • As of now, this uniquely maps to a specific cellID_short annotation.
      • Clustering was done on a per compartment basis
        • For most immune cell types, clustering was based on embeddings corrected for 'siteXbatch'. For Plasma, clustering was performed on embeddings corrected on a per-sample basis.
      • In the combined validation object, DISCOVERY.subcluster_V03072023 will contain values only for the discovery cohort, and have NA values for validation samples.
    • subcluster_V03072023_compartment
      • These refer to one of five major compartments as identified roughly on the original UMAP. Clustering was performed on a per-compartment basis following a first pass rough annotation.
      • The possible compartments are
        • NkT (T cell + Natural Killer Cells)
        • Myeloid (Monocytes, Macrophages, Dendritic cells, Neutrophil/Granulocyte populations)
        • BEry (B Cell, Erythroblasts, bone marrow progenitor populations, pDCs)
        • Ery (Erythrocyte population)
        • Plasma (Plasma cell populations)
      • Each compartment has it's own UMAP generated, which can be accessed in the 'umap.sub' reduction
      • One cluster was isolated from all other populations, and was not assigned to a compartment. This cluster is labeled as 'Full.23'.
      • In the combined validation object, DISCOVERY.subcluster_V03072023_compartment will contain values only for the discovery cohort, and have NA values for validation samples.
    • cellID_short
      • This is the individual annotation for each cluster.
      • Please see the 'Cell Population Annotation Dictionary' for further details.
      • If different seurat clusters were assigned similar annotations, the celltype annotation will be appended with a distinct cluster gene, or with '_b', '_c'
    • lineage_group
      • This is an annotation driven grouping of clusters into major immune populations, as shown in Figure 2.
      • This includes "CD8", "CD4", "M" (Myeloid), "B" (B cell), "E" (Erythroid), "P" (Plasma), "Other" (HSC, Fibro, pDC_a), "LQ" (Doublet)
    • isDoublet
      • This is a binary 'True' or 'False' derived from manual review of clusters following doublet analysis, as described in the paper.
      • True indicates the cluster was determined to be a doublet population.
      • This is derived from 'doublet_pred', in which 'dblet_cluster' and 'poss_dblet_cluster' were flagged as doublet populations for subsequent analysis.
      • In the validation object, the doublet status of new samples were inferred by if label transfer from the discovery cohort mapped the cell from the new sample as one of the previously identified doublet populations. The raw doublet scores from doublet finder, pegasus, or scrublet, are not included in this release.

    --

    Each sample has the following information indicating shipment batches, for batch correction

    • Study_Site
      • The center which processed a specific aliquot_id
      • EMORY, MSSM, WashU, MAYO
    • Batch
      • The shipment batch the sample was associated with
      • Valued 1 to 3 for EMORY, MSSM, MAYO, and 1 to 4 for WashU
    • siteXbatch
      • A combination of the above to variables, to be used for batch correction
    • (Combined Validation Object only): cohort
      • Indicates if the sample was involved in the 'discovery' cohort, or 'validation' cohort. Samples in the 'validation' cohort will have labels inferred from label mapping

    --

    Each public_id has limited demographic information based on publicly available information in the MMRF CoMMpass study.

    • d_pt_sex
      • Patient sex (not self-identified). Male or Female
    • d_pt_race_1
      • Patient self-identified race
    • d_pt_ethnicity
      • Patient self-identified ethnicity
    • d_dx_amm_age
      • Patient age at diagnosis.
      • Not reported for patients above 90 at diagnosis
    • d_dx_amm_bmi
      • Patient BMI at diagnosis
    • d_pt_height_cm
      • Patient height at diagnosis, in centimeters.
    • d_dx_amm_weight_kg
      • Patient weight at diagnosis, in kilograms

    d_specimen_visit_id contains two data points providing limited information about the visit

    • collection_event
      • Description of why the sample was collected
        • e.g., 'Baseline' and 'Screening' indicates the sample was obtained prior to therapy
        • 'Relapse/Progression' indicates the sample was collected due to disease progression based on clinical assessment
        • 'Remission/Response' indicates the sample was collected due to patient entering remission based on clinical assessment
        • Samples may be collected for reasons independent of the above, such as 'Pre' or 'Post' ASCT, or for other unspecified reasons
    • VJ_INTERVAL
      • Indicates the rough interval following start of therapy the sample is assigned to
        • "Baseline", "Month 3", "Year 2", etc.

    All the single-cell raw data, along with outcome and cytogenetic information, is available at MMRF’s VLAB shared resource. Requests to access these data will be reviewed by data access committee at MMRF and any data shared will be released under a data transfer agreement that will protect the identities of patients involved in the study. Other information from the CoMMpass trial can also generally be

  6. f

    ProjecTILs murine reference atlas of tumor-infiltrating T cells, version 1

    • figshare.com
    application/gzip
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massimo Andreatta; Santiago Carmona (2023). ProjecTILs murine reference atlas of tumor-infiltrating T cells, version 1 [Dataset]. http://doi.org/10.6084/m9.figshare.12478571.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 29, 2023
    Dataset provided by
    figshare
    Authors
    Massimo Andreatta; Santiago Carmona
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We have developed ProjecTILs, a computational approach to project new data sets into a reference map of T cells, enabling their direct comparison in a stable, annotated system of coordinates. Because new cells are embedded in the same space of the reference, ProjecTILs enables the classification of query cells into annotated, discrete states, but also over a continuous space of intermediate states. By comparing multiple samples over the same map, and across alternative embeddings, the method allows exploring the effect of cellular perturbations (e.g. as the result of therapy or genetic engineering) and identifying genetic programs significantly altered in the query compared to a control set or to the reference map. We illustrate the projection of several data sets from recent publications over two cross-study murine T cell reference atlases: the first describing tumor-infiltrating T lymphocytes (TILs), the second characterizing acute and chronic viral infection.To construct the reference TIL atlas, we obtained single-cell gene expression matrices from the following GEO entries: GSE124691, GSE116390, GSE121478, GSE86028; and entry E-MTAB-7919 from Array-Express. Data from GSE124691 contained samples from tumor and from tumor-draining lymph nodes, and were therefore treated as two separate datasets. For the TIL projection examples (OVA Tet+, miR-155 KO and Regnase-KO), we obtained the gene expression counts from entries GSE122713, GSE121478 and GSE137015, respectively.Prior to dataset integration, single-cell data from individual studies were filtered using TILPRED-1.0 (https://github.com/carmonalab/TILPRED), which removes cells not enriched in T cell markers (e.g. Cd2, Cd3d, Cd3e, Cd3g, Cd4, Cd8a, Cd8b1) and cells enriched in non T cell genes (e.g. Spi1, Fcer1g, Csf1r, Cd19). Dataset integration was performed using STACAS (https://github.com/carmonalab/STACAS), a batch-correction algorithm based on Seurat 3. For the TIL reference map, we specified 600 variable genes per dataset, excluding cell cycling genes, mitochondrial, ribosomal and non-coding genes, as well as genes expressed in less than 0.1% or more than 90% of the cells of a given dataset. For integration, a total of 800 variable genes were derived as the intersection of the 600 variable genes of individual datasets, prioritizing genes found in multiple datasets and, in case of draws, those derived from the largest datasets. We determined pairwise dataset anchors using STACAS with default parameters, and filtered anchors using an anchor score threshold of 0.8. Integration was performed using the IntegrateData function in Seurat3, providing the anchor set determined by STACAS, and a custom integration tree to initiate alignment from the largest and most heterogeneous datasets.Next, we performed unsupervised clustering of the integrated cell embeddings using the Shared Nearest Neighbor (SNN) clustering method implemented in Seurat 3 with parameters {resolution=0.6, reduction=”umap”, k.param=20}. We then manually annotated individual clusters (merging clusters when necessary) based on several criteria: i) average expression of key marker genes in individual clusters; ii) gradients of gene expression over the UMAP representation of the reference map; iii) gene-set enrichment analysis to determine over- and under- expressed genes per cluster using MAST. In order to have access to predictive methods for UMAP, we recomputed PCA and UMAP embeddings independently of Seurat3 using respectively the prcomp function from basic R package “stats”, and the “umap” R package (https://github.com/tkonopka/umap).

  7. pbmc single cell RNA-seq matrix

    • zenodo.org
    csv
    Updated May 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Buchet; Samuel Buchet; Francesco Carbone; Morgan Magnin; Morgan Magnin; Mickaël Ménager; Olivier Roux; Olivier Roux; Francesco Carbone; Mickaël Ménager (2021). pbmc single cell RNA-seq matrix [Dataset]. http://doi.org/10.5281/zenodo.4730807
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 4, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Samuel Buchet; Samuel Buchet; Francesco Carbone; Morgan Magnin; Morgan Magnin; Mickaël Ménager; Olivier Roux; Olivier Roux; Francesco Carbone; Mickaël Ménager
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single cell RNA-sequencing dataset of peripheral blood mononuclear cells (pbmc: T, B, NK and monocytes) extracted from two healthy donors.

    Cells labeled as C26 come from a 30 years old female and cells labeled as C27 come from a 53 years old male. Cells have been isolated from blood using ficoll. Samples were sequenced using standard 3' v3 chemistry protocols by 10x genomics. Cellranger v4.0.0 was used for the processing, and reads were aligned to the ensembl GRCg38 human genome (GRCg38_r98-ensembl_Sept2019). QC metrics were calculated on the count matrix generated by cellranger (filtered_feature_bc_matrix). Cells with less than 3 genes per cells, less than 500 reads per cell and more than 20% of mithocondrial genes were discarded.

    The processing steps was performed with the R package Seurat (https://satijalab.org/seurat/), including sample integration, data normalisation and scaling, dimensional reduction, and clustering. SCTransform method was adopted for the normalisation and scaling steps. The clustered cells were manually annotated using known cell type markers.

    Files content:

    - raw_dataset.csv: raw gene counts

    - normalized_dataset.csv: normalized gene counts (single cell matrix)

    - cell_types.csv: cell types identified from annotated cell clusters

    - cell_types_macro.csv: cell macro types

    - UMAP_coordinates.csv: 2d cell coordinates computed with UMAP algorithm in Seurat

  8. A Spatial Transcriptomics Atlas of the Malaria-infected Liver Indicates a...

    • zenodo.org
    • data.niaid.nih.gov
    Updated Sep 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Franziska Hildebrandt; Franziska Hildebrandt; Miren Urrutia Iturritza; Miren Urrutia Iturritza; Christian Zwicker; Bavo Vanneste; Noémi Van Hul; Elisa Semle; Tales Pascini; Sami Saarenpää; Mengxiao He; Emma R. Andersson; Charlotte L. Scott; Joel Vega-Rodriguez; Joakim Lundeberg; Johan Ankarklev; Christian Zwicker; Bavo Vanneste; Noémi Van Hul; Elisa Semle; Tales Pascini; Sami Saarenpää; Mengxiao He; Emma R. Andersson; Charlotte L. Scott; Joel Vega-Rodriguez; Joakim Lundeberg; Johan Ankarklev (2023). A Spatial Transcriptomics Atlas of the Malaria-infected Liver Indicates a Crucial Role for Lipid Metabolism and Hotspots of Inflammatory Cell Infiltration [Dataset]. http://doi.org/10.5281/zenodo.8328679
    Explore at:
    Dataset updated
    Sep 20, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Franziska Hildebrandt; Franziska Hildebrandt; Miren Urrutia Iturritza; Miren Urrutia Iturritza; Christian Zwicker; Bavo Vanneste; Noémi Van Hul; Elisa Semle; Tales Pascini; Sami Saarenpää; Mengxiao He; Emma R. Andersson; Charlotte L. Scott; Joel Vega-Rodriguez; Joakim Lundeberg; Johan Ankarklev; Christian Zwicker; Bavo Vanneste; Noémi Van Hul; Elisa Semle; Tales Pascini; Sami Saarenpää; Mengxiao He; Emma R. Andersson; Charlotte L. Scott; Joel Vega-Rodriguez; Joakim Lundeberg; Johan Ankarklev
    Description

    Dataset created in the study "A Spatial Transcriptomics Atlas of the Malaria-infected Liver Indicates a Crucial Role for Lipid Metabolism and Hotspots of Inflammatory Cell Infiltration"

    Structure

    ST_berghei_liver

    contains data generated during stpipeline analysis and imaging on 2k arrays Spatial Transcriptomics platform as well as data necessary for and from hepaquery analysis. These samples include 38 sections in total of which 8 are from mice (n=4) infected with sporozoites for 12h, 5 sections from control mice (n=3) at 12h, 7 sections from mice (n=4) infected with sporozoites for 24h and 4 sections from control mice (n=3) for 24 as well as 8 samples of mice (n=2) infected with sporozoites for 38h and control mice (n =2) for 38h.

    • count contains gene expression matrix output from stpipeline in .tsv format
    • spotfiles contains coordinate files for count matrices
    • images contains scaled H&E, Fluorescence (FL) and annotated H&E images (from FL annotations) scaled to 10% of the original image size.
    • masks contains image masks for hepaquery analysis
    • distances contains distance measurements from original section sorted by timepoint as well as combined across timepoints
    • cluster contains clustering information across spatial positions used in spatial enrichment analysis

    STUtiility_mus_pb_ST.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in ST_berghei_liver

    visium_berghei_liver

    contains data generated with the spaceranger pipeline and imaging using the Visium spatial transcriptomics platform. These samples include 8 sections in total, of which 1 was infected with sporozoites for 12h, 1 control section at 12h, 1 section infected with sporozoites for 24h and 1 control section at 24 as well as 2 sporozoite infected sections, and 2 control sections at 38h.

    • V10S29-135_A1 contains spaceranger output for section 1 for infected and control sections at 38h post-infection
    • V10S29-135_B1 contains spaceranger output for section 1 for infected and control sections at 12h post-infection

    • V10S29-135_C1 contains spaceranger output for section 1 for infected and control sections at 24h post-infection

    • V10S29-135_D1 contains spaceranger output for section 2 for infected and control sections at 38h post-infection

    se_visium.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in visium_berghei_liver

    snSeq_berghei_liver

    contains data generated with the cellranger pipeline and imaging using the Visium spatial transcriptomics platform. These samples include single nuclei of 2 infected and control mice after 12h, 2 infected and control mice after 24h, 2 infected and control mice after 38h, and 2 uninfected mice prior to a challenge.

    • cellranger_cnt_out contains feature count matrix information from cell ranger output

    final_merged_curated_annotations_270623.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in snSeq_berghei_liver.tar.gz

    raw images.zip contains raw images for supplementary figures 20-22

    adjusted images.zip contains brightness and contrast adjusted images for supplementary figures 20-22

  9. Seurat cluster markers.xlsx

    • figshare.com
    xlsx
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meirigeng Qi (2023). Seurat cluster markers.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.21861684.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 12, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Meirigeng Qi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file is the scRNA-seq data seurat cluster markers

  10. Z

    Individual-donor scRNA-Seq datasets, as Seurat 4.0.5 objects

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandros Sountoulidis (2024). Individual-donor scRNA-Seq datasets, as Seurat 4.0.5 objects [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6386451
    Explore at:
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Alexandros Sountoulidis
    Christos Samakovlis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The provided datasets correspond to the analyses of individual donor single-cell RNA Sequencing (scRNA-Seq) datasets, before their integration. The datasets have been saved as Seurat v4.0.5 objects. For clustering, we used default settings in Seurat 4.0.5 (resolution 0.8) and increased resolution, if necessary, to separate epithelium in proximal and distal.

    The *_clusters.pdf files show the suggested clusters in the individual datasets and the *_indiv_anno1.pdf files show the cell annotations according to the 84 cell states, described in the study with title "Developmental origins of cell heterogeneity in the human lung" (1st preprint version doi: https://doi.org/10.1101/2022.01.11.475631).

    The "*_cluster_annotations.csv" files provide information about the suggested annotations of the clusters.

    The "*_object_raw_and_log_counts.RData" objects contain the metadata and the UMI-counts [raw and log2(counts+1)] for each donor scRNA-Seq dataset.

  11. f

    Single cell RNAseq clustering supporting data

    • figshare.com
    zip
    Updated Apr 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raffaele Calogero (2022). Single cell RNAseq clustering supporting data [Dataset]. http://doi.org/10.6084/m9.figshare.19390712.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 30, 2022
    Dataset provided by
    figshare
    Authors
    Raffaele Calogero
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The exemplary dataset 390c_wctype refers to the sample 390c and, in the counts table, cell names include the cell type assignment, i.e TGACTAGGTTCCACAA.Treg, (. This dataset is part of 41,650 cells isolated from the caeacum, transverse colon and sigmoid colon of 5 individuals, which is one of the datasets constituting the gut atlas transcriptome.

    @font-face {font-family:Helvetica; panose-1:0 0 0 0 0 0 0 0 0 0; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:-536870145 1342208091 0 0 415 0;}@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:roman; mso-font-pitch:variable; mso-font-signature:-536869121 1107305727 33554432 0 415 0;}p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; mso-pagination:widow-orphan; mso-hyphenate:none; font-size:12.0pt; font-family:"Times New Roman",serif; mso-fareast-font-family:"Times New Roman";}a:link, span.MsoHyperlink {mso-style-priority:99; color:#0563C1; mso-themecolor:hyperlink; text-decoration:underline; text-underline:single;}a:visited, span.MsoHyperlinkFollowed {mso-style-noshow:yes; mso-style-priority:99; color:#954F72; mso-themecolor:followedhyperlink; text-decoration:underline; text-underline:single;}.MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; font-family:"Calibri",sans-serif; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:Calibri; mso-fareast-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:Calibri; mso-bidi-theme-font:minor-bidi; mso-fareast-language:EN-US;}.MsoPapDefault {mso-style-type:export-only; mso-hyphenate:none;}div.WordSection1 {page:WordSection1;}

  12. TSd5.RDS - CD40 inhibiton in AMI on d5, seq on d7 and d14

    • zenodo.org
    Updated Oct 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Lang; Alexander Lang (2023). TSd5.RDS - CD40 inhibiton in AMI on d5, seq on d7 and d14 [Dataset]. http://doi.org/10.5281/zenodo.10015471
    Explore at:
    Dataset updated
    Oct 17, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexander Lang; Alexander Lang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ##### CD40 inhibiton in AMI on d5, seq on d7 and d14

    # Load necessary libraries for data manipulation, analysis, and visualization

    library(dplyr)

    library(Seurat)

    library(patchwork)

    library(plyr)

    # Set the working directory to the folder containing the data

    setwd("C:/Users/ALL/sciebo - Lang, Alexander (allan101@uni-duesseldorf.de)@uni-duesseldorf.sciebo.de/ALL_NGS/scRNAseq/scRNAseq/01_TS_d5_paper/03_CD40 inhibition on day 5, seq on day 7 and 14/938-2_cellranger_count/outs")

    # Read the M0 dataset from the 10X Genomics format

    pbmc.data <- Read10X(data.dir = "filtered_feature_bc_matrix/")

    RNA <- pbmc.data$`Gene Expression`

    ADT <- pbmc.data$`Antibody Capture`

    HST <- pbmc.data$`Multiplexing Capture`

    # Load the Matrix package

    library(Matrix)

    # Hashtag 1, 2 and 3 are marking the mouse replicates per condition

    # Subset the rows based on row names

    subsetted_rows <- c("TotalSeq-B0301", "TotalSeq-B0302", "TotalSeq-B0303")

    animals_data <- HST[subsetted_rows, , drop = FALSE]

    # Hashtag 4, 5, 6, 7 are representing DMSO d7, TS d7, DMSO d14 and TS d14

    subsetted_rows <- c(""TotalSeq-B0304", "TotalSeq-B0305", "TotalSeq-B0306", "TotalSeq-B0307")

    treatment_data <- HST[subsetted_rows, , drop = FALSE]

    #Create a Seurat obeject and more assays to combine later

    RNA <- CreateSeuratObject(counts = RNA)

    ADT <- CreateAssayObject(counts = ADT)

    Mice <- CreateAssayObject(counts = animals_data)

    Treatment <- CreateAssayObject(counts = treatment_data)

    seurat <- RNA

    #Add the Assays

    seurat[["ADT"]] <- ADT

    seurat[["HST_Mice"]] <- Mice

    seurat[["HST_Treatment"]] <- Treatment

    #Check for AK Names

    rownames(seurat[["ADT"]])

    #Cluster cells on the basis of their scRNA-seq profiles

    # perform visualization and clustering steps

    DefaultAssay(seurat) <- "RNA"

    seurat <- NormalizeData(seurat)

    seurat <- FindVariableFeatures(seurat)

    seurat <- ScaleData(seurat)

    seurat <- RunPCA(seurat, verbose = FALSE)

    seurat <- FindNeighbors(seurat, dims = 1:30)

    seurat <- FindClusters(seurat, resolution = 0.8, verbose = FALSE)

    seurat <- RunUMAP(seurat, dims = 1:30)

    DimPlot(seurat, label = TRUE)

    FeaturePlot(seurat, features = "Col1a1", order = T)

    # Normalize ADT data,

    DefaultAssay(seurat) <- "ADT"

    seurat <- NormalizeData(seurat, normalization.method = "CLR", margin = 2)

    #Demultiplex cells based on Mouse_Hashtag Enrichment

    seurat <- NormalizeData(seurat, assay = "HST_Mice", normalization.method = "CLR")

    seurat <- HTODemux(seurat, assay = "HST_Mice", positive.quantile = 0.60)

    #Visualize demultiplexing results

    # Global classification results

    table(seurat$HST_Mice_classification.global)

    DimPlot(seurat, group.by = "HST_Mice_classification")

    #Demultiplex cells based on Treatment_Hashtag Enrichment

    seurat <- NormalizeData(seurat, assay = "HST_Treatment", normalization.method = "CLR")

    seurat <- HTODemux(seurat, assay = "HST_Treatment", positive.quantile = 0.60)

    #Visualize demultiplexing results

    # Global classification results

    table(seurat$HST_Treatment_classification.global)

    DimPlot(seurat, group.by = "HST_Treatment_classification")

    Idents(seurat) <- seurat$HST_Treatment_classification

    pbmc.singlet <- subset(seurat, idents = "Negative", invert = T)

    Idents(pbmc.singlet) <- pbmc.singlet$HST_Mice_classification

    pbmc.singlet <- subset(pbmc.singlet, idents = "Negative", invert = T)

    DimPlot(pbmc.singlet, group.by = "HST_Treatment_maxID")

    #Redo the clssification to remove the doublettes

    pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Treatment", positive.quantile = 0.99)

    table(pbmc.singlet$HST_Treatment_classification.global)

    DimPlot(pbmc.singlet, group.by = "HST_Treatment_classification")

    pbmc.singlet <- subset(pbmc.singlet, idents = "Doublet", invert = T)

    pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Mice", positive.quantile = 0.99)

    table(pbmc.singlet$HST_Mice_classification.global)

    pbmc.singlet <- subset(pbmc.singlet, idents = "Doublet", invert = T)

    pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Mice", positive.quantile = 0.60)

    pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Treatment", positive.quantile = 0.60)

    DimPlot(pbmc.singlet, group.by = "HST_Treatment_maxID")

    DimPlot(pbmc.singlet, group.by = "HST_Mice_maxID")

    seurat <- pbmc.singlet

    seurat$mice <- seurat$HST_Mice_maxID

    seurat$treatment <- seurat$HST_Treatment_maxID

    library(plyr)

    seurat$treatment <- revalue(seurat$treatment, c(

    "TotalSeq-B0304" = "DMSO_d7",

    "TotalSeq-B0305" = "TS_d7",

    "TotalSeq-B0306" = "DMSO_d14",

    "TotalSeq-B0307" = "TS_d14"

    ))

    library(plyr)

    seurat$mice <- revalue(seurat$mice, c(

    "TotalSeq-B0301" = "1",

    "TotalSeq-B0302" = "2",

    "TotalSeq-B0303" = "3"

    ))

    #Cluster cells on the basis of their scRNA-seq profiles without doublettes

    # perform visualization and clustering steps

    DefaultAssay(seurat) <- "RNA"

    seurat <- NormalizeData(seurat)

    seurat <- FindVariableFeatures(seurat)

    seurat <- ScaleData(seurat)

    seurat <- RunPCA(seurat, verbose = FALSE)

    seurat <- FindNeighbors(seurat, dims = 1:30)

    seurat <- FindClusters(seurat, resolution = 0.8, verbose = FALSE)

    seurat <- RunUMAP(seurat, dims = 1:30)

    DimPlot(seurat, label = TRUE)

    DefaultAssay(seurat) <- "ADT"

    seurat <- NormalizeData(seurat, normalization.method = "CLR", margin = 2)

    setwd("C:/Users/ALL/sciebo - Lang, Alexander (allan101@uni-duesseldorf.de)@uni-duesseldorf.sciebo.de/ALL_NGS/scRNAseq/scRNAseq/01_TS_d5_paper/03_CD40 inhibition on day 5, seq on day 7 and 14/Analyse")

    saveRDS(seurat, file= "TSd5.v0.1.RDS")

  13. n

    Data from: Clustering Deviation Index (CDI): A robust and accurate internal...

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Oct 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiyuan Fang; Cliburn Chan; Kouros Owzar; Liuyang Wang; Diyuan Qin; Qi-Jing Li; Jichun Xie (2022). Clustering Deviation Index (CDI): A robust and accurate internal measure for evaluating scRNA-seq data clustering [Dataset]. http://doi.org/10.5061/dryad.08kprr55h
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 3, 2022
    Dataset provided by
    Duke Medical Center
    Authors
    Jiyuan Fang; Cliburn Chan; Kouros Owzar; Liuyang Wang; Diyuan Qin; Qi-Jing Li; Jichun Xie
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The clustering of cells has been widely used to explore the heterogeneity of cell populations in single-cell RNA-sequencing (scRNA-seq). We proposed a parametric model for monoclonal and polyclonal scRNA-seq data to evaluate clustering results. Based on the parametric model, we proposed a metric (CDI) to quantify the goodness-of-fit of cell clustering to the data. Here we presented CT26.WT and T-CELL as two datasets to examine the performance of our model and metric. CT26.WT contains wild-type CT26 cells from the murine colorectal carcinoma cell line, and cells in CT26.WT are highly homogeneous. T-CELL contains T-cells from tumor tissue of mice three weeks after 4T1 tumor injection. From these datasets and public datasets, we validated our model and benchmarked our metric.

    Methods This dataset contains six files. Four of them (matrix.mtx, features.tsv, barcodes.tsv, CT26_bulk_30k.txt) are for CT26.WT, and the other two are for T-CELL. CT26.WT sample preparation: Murine colorectal carcinoma cell line CT26.WT was obtained from the cell culture facility of Duke University and cultured in DMEM media (Sigma Aldrich). All cells were cultured at 37 degrees. Single-cell clones were chosen and cultured for over 220 days. Bulk RNA-seq and single-cell RNA-seq samples were prepared on the same day. CT26.WT bulk RNA-seq: Total RNA from ~ 1,000,000 cells from each group was extracted using the miniprep kit (Zymo Research) according to the manufacturer’s instructions. Then, the libraries were sequenced on the Illumina sequencing platform by the Novogene Corporation Inc. (CA, USA) (HiSeq × Ten) with paired-end 150 bp (PE 150) sequencing strategy. CT26.WT scRNA-seq: A total of ~ 10000 cells of each clone were selected for single-cell RNA-seq. Single-cell RNA sequence libraries using Chromium Single Cell 3’ Reagent kits v3 (10x genomics). The libraries were then sequenced on the Illumina sequencing platform by the Novogene Corporation Inc. (CA, USA) with PE 150 sequencing strategy in a single index mode. T-CELL scRNA-seq: In this study, tumors were firstly collected from the female mice after 3 weeks since the mice were injected by 4T1 tumors. Tissues were then disassociated into single cells and homogenized. T cells were separated out by flow sorting with a stringent gating threshold and sequenced on the 10X platform. T-CELL filtering: We filtered out genes with less than 2% non-zero cells and removed cells with less than 2% non-zero genes. Eventually, 2, 989 cells from five cell types with 7, 893 genes were retained. T-CELL annotation: The benchmark clustering labels of the T-CELL population were generated as a combination of protein-marker-based flow sorting labels and bioinformatics labels from Seurat v2. For evaluation purposes, we selected 5 distinct cell types: Regulatory Trm cells, Classical CD4 Tem cells, CD8 Trm cells, CD8 Tcm cells, and Active EM-like Treg cells.

  14. Z

    Data from: Robust clustering and interpretation of scRNA-seq data using...

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Schmidt (2021). Robust clustering and interpretation of scRNA-seq data using reference component analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4021966
    Explore at:
    Dataset updated
    May 30, 2021
    Dataset provided by
    Florian Schmidt
    Bobby Ranjan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets and Code accompanying the new release of RCA, RCA2. The R-package for RCA2 is available at GitHub: https://github.com/prabhakarlab/RCAv2/

    The datasets included here are:

    Datasets required for a characterization of batch effects:

    merged_rna_seurat.rds

    de_list.rds

    mergedRCAObj.rds

    merged_rna_integrated.rds

    10X_PBMCs.RDS: Processed 10X PBMC data RCA2 object (10X PBMC example data sets )

    NBM_RDS_Files.zip: Several RDS files containing RCA2 object of Normal Bone Marrow (NBM) data, umap coordinates, doublet finder results and metadata information (Normal Bone Marrow use case)

    Dataset used for the Covid19 example:

    blish_covid.seu.rds

    rownames_of_glocal_projection_immune_cells.txt

    Blish_RCA_no_QC_filtering_project_to_multiple_panels.rds

    Data sets used to outline the ability of supervised clustering to detect disease states:

    809653.seurat.rds

    blish_covid.seu.rds

    Performance benchmarking results:

    Memory_consumption.txt

    rca_time_list.rds

    ScanPY input files:

    input_data.zip

    The R script provides R code to regenerate the main paper Figures 2 to 7 modulo some visual modifications performed in Inkscape.

    Provided R scripts are:

    ComputePairWiseDE_v2.R (Required code for pairwise DE computation)

    RCA_Figure_Reproduction.R

    Provided python Code for Scanpy analysis:

    RA_Scanpy.ipynb

    CITESeq_Scanpy.ipynb

  15. CD40 activation and the effect on Neutrophils

    • zenodo.org
    Updated Oct 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Lang; Alexander Lang (2023). CD40 activation and the effect on Neutrophils [Dataset]. http://doi.org/10.5281/zenodo.10019624
    Explore at:
    Dataset updated
    Oct 18, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexander Lang; Alexander Lang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ##### CD40 activation and the effect on Neutrophils

    # Load necessary libraries for data manipulation, analysis, and visualization

    library(dplyr)

    library(Seurat)

    library(patchwork)

    library(plyr)

    # Set the working directory to the folder containing the data

    setwd("C:/Users/ALL/sciebo - Lang, Alexander (allan101@uni-duesseldorf.de)@uni-duesseldorf.sciebo.de/ALL_NGS/scRNAseq/scRNAseq/05_FGK45 Wirkung auf Neutros - scRNAseq/938-1_cellranger_count/outs")

    # Read the M0 dataset from the 10X Genomics format

    pbmc.data <- Read10X(data.dir = "filtered_feature_bc_matrix/")

    RNA <- pbmc.data$`Gene Expression`

    ADT <- pbmc.data$`Antibody Capture`

    HST <- pbmc.data$`Multiplexing Capture`

    # Load the Matrix package

    library(Matrix)

    # Hashtag 1, 2 and 3 are marking the organs (heart, blood, spleen)

    # Subset the rows based on row names

    subsetted_rows <- c("TotalSeq-B0301", "TotalSeq-B0302", "TotalSeq-B0303")

    animals_data <- HST[subsetted_rows, , drop = FALSE]

    # Hashtag 4, 5, 6, 7 are representing IgG_1, IgG_1, FGK45_1 and FGK45_1

    subsetted_rows <- c("TotalSeq-B0304", "TotalSeq-B0305", "TotalSeq-B0306", "TotalSeq-B0307")

    treatment_data <- HST[subsetted_rows, , drop = FALSE]

    #Create a Seurat obeject and more assays to combine later

    RNA <- CreateSeuratObject(counts = RNA)

    ADT <- CreateAssayObject(counts = ADT)

    Organ <- CreateAssayObject(counts = animals_data)

    Treatment <- CreateAssayObject(counts = treatment_data)

    seurat <- RNA

    #Add the Assays

    seurat[["ADT"]] <- ADT

    seurat[["HST_Mice"]] <- Organ

    seurat[["HST_Treatment"]] <- Treatment

    #Check for AK Names

    rownames(seurat[["ADT"]])

    #Cluster cells on the basis of their scRNA-seq profiles

    # perform visualization and clustering steps

    DefaultAssay(seurat) <- "RNA"

    seurat <- NormalizeData(seurat)

    seurat <- FindVariableFeatures(seurat)

    seurat <- ScaleData(seurat)

    seurat <- RunPCA(seurat, verbose = FALSE)

    seurat <- FindNeighbors(seurat, dims = 1:30)

    seurat <- FindClusters(seurat, resolution = 0.8, verbose = FALSE)

    seurat <- RunUMAP(seurat, dims = 1:30)

    DimPlot(seurat, label = TRUE)

    FeaturePlot(seurat, features = "S100a9", order = T)

    # Normalize ADT data,

    DefaultAssay(seurat) <- "ADT"

    seurat <- NormalizeData(seurat, normalization.method = "CLR", margin = 2)

    #Demultiplex cells based on Mouse_Hashtag Enrichment

    seurat <- NormalizeData(seurat, assay = "HST_Mice", normalization.method = "CLR")

    seurat <- HTODemux(seurat, assay = "HST_Mice", positive.quantile = 0.99)

    #Visualize demultiplexing results

    # Global classification results

    table(seurat$HST_Mice_classification.global)

    DimPlot(seurat, group.by = "HST_Mice_classification")

    #Demultiplex cells based on Treatment_Hashtag Enrichment

    seurat <- NormalizeData(seurat, assay = "HST_Treatment", normalization.method = "CLR")

    seurat <- HTODemux(seurat, assay = "HST_Treatment", positive.quantile = 0.99)

    #Visualize demultiplexing results

    # Global classification results

    table(seurat$HST_Treatment_classification.global)

    DimPlot(seurat, group.by = "HST_Treatment_classification")

    Idents(seurat) <- seurat$HST_Treatment_classification

    pbmc.singlet <- subset(seurat, idents = "Negative", invert = T)

    Idents(pbmc.singlet) <- pbmc.singlet$HST_Mice_classification

    pbmc.singlet <- subset(pbmc.singlet, idents = "Negative", invert = T)

    DimPlot(pbmc.singlet, group.by = "HST_Treatment_maxID")

    #Redo the clssification to remove the doublettes

    pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Treatment", positive.quantile = 0.99)

    table(pbmc.singlet$HST_Treatment_classification.global)

    DimPlot(pbmc.singlet, group.by = "HST_Treatment_classification")

    pbmc.singlet <- subset(pbmc.singlet, idents = "Doublet", invert = T)

    pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Mice", positive.quantile = 0.99)

    table(pbmc.singlet$HST_Mice_classification.global)

    pbmc.singlet <- subset(pbmc.singlet, idents = "Doublet", invert = T)

    pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Mice", positive.quantile = 0.60)

    pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Treatment", positive.quantile = 0.60)

    DimPlot(pbmc.singlet, group.by = "HST_Treatment_maxID")

    DimPlot(pbmc.singlet, group.by = "HST_Mice_maxID")

    seurat <- pbmc.singlet

    seurat$organ <- seurat$HST_Mice_maxID

    seurat$mouse <- seurat$HST_Treatment_maxID

    seurat$treatment <- seurat$HST_Treatment_maxID

    library(plyr)

    seurat$treatment <- revalue(seurat$treatment, c(

    "TotalSeq-B0304" = "IgG",

    "TotalSeq-B0305" = "IgG",

    "TotalSeq-B0306" = "FGK45",

    "TotalSeq-B0307" = "FGK45"

    ))

    library(plyr)

    seurat$organ <- revalue(seurat$organ, c(

    "TotalSeq-B0301" = "heart",

    "TotalSeq-B0302" = "blood",

    "TotalSeq-B0303" = "spleen"

    ))

    seurat$mouse <- revalue(seurat$mouse, c(

    "TotalSeq-B0304" = "1",

    "TotalSeq-B0305" = "2",

    "TotalSeq-B0306" = "3",

    "TotalSeq-B0307" = "4"

    ))

    #Cluster cells on the basis of their scRNA-seq profiles without doublettes

    # perform visualization and clustering steps

    DefaultAssay(seurat) <- "RNA"

    seurat <- NormalizeData(seurat)

    seurat <- FindVariableFeatures(seurat)

    seurat <- ScaleData(seurat)

    seurat <- RunPCA(seurat, verbose = FALSE)

    seurat <- FindNeighbors(seurat, dims = 1:30)

    seurat <- FindClusters(seurat, resolution = 0.8, verbose = FALSE)

    seurat <- RunUMAP(seurat, dims = 1:30)

    DimPlot(seurat, label = TRUE)

    DefaultAssay(seurat) <- "ADT"

    seurat <- NormalizeData(seurat, normalization.method = "CLR", margin = 2)

    setwd("C:/Users/ALL/sciebo - Lang, Alexander (allan101@uni-duesseldorf.de)@uni-duesseldorf.sciebo.de/ALL_NGS/scRNAseq/scRNAseq/05_FGK45 Wirkung auf Neutros - scRNAseq/Analyse")

    saveRDS(seurat, file = "FGK45_heart_blood_spleen.v0.1.RDS")

  16. d

    scRNA-seq_huang2019

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huang, Kee Wui (2023). scRNA-seq_huang2019 [Dataset]. http://doi.org/10.7910/DVN/QB5CC8
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Huang, Kee Wui
    Description

    Serialized R data files (.rds) associated with the inDrop single-cell RNA-seq analysis in Huang et al., 2019. Each file has a single Seurat object containing a subset of clusters from the full processed dataset, which were separated into different objects due to file size limitations. Raw data (UMIFM counts) are included in the corresponding slot in each Seurat object. Seurat objects can be re-merged into a single object containing the full dataset using the MergeSeurat function.

  17. Processed Seurat objects for GeneTrajectory inference (Gene Trajectory...

    • figshare.com
    application/gzip
    Updated Feb 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rihao Qu; Peggy Myung (2024). Processed Seurat objects for GeneTrajectory inference (Gene Trajectory Inference for Single-cell Data by Optimal Transport Metrics) [Dataset]. http://doi.org/10.6084/m9.figshare.25243225.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 19, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Rihao Qu; Peggy Myung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are processed Seurat objects for the two biological datasets in GeneTrajectory inference (https://github.com/KlugerLab/GeneTrajectory/):Human myeloid dataset analysisMyeloid cells were extracted from a publicly available 10x scRNA-seq dataset (https:// support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc 10k v3). QC was performed using the same workflow in (https://github.com/satijalab/ Integration2019/blob/master/preprocessing scripts/pbmc 10k v3.R). After standard normalization, highly-variable gene selection and scaling using the Seurat R package, we applied PCA and retained the top 30 principal components. Four sub-clusters of myeloid cells were identified based on Louvian clustering with a resolution of 0.3. Wilcoxon rank-sum test was employed to find cluster-specific gene markers for cell type annotation.For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel, each bandwidth is determined by the distance to its k-nearest neighbor, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 5 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 0.5% − 75% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (11,21,8) to extract three gene trajectories. Mouse embryo skin data analysisWe separated out dermal cell populations from the newly collected mouse embryo skin samples. Cells from the wildtype and the Wls mutant were pooled for analyses. After standard normalization, highly-variable gene selection and scaling using Seurat, we applied PCA and retained the top 30 principal components. Three dermal celltypes were stratified based on the expression of canonical dermal markers, including Sox2, Dkk1, and Dkk2. For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel bandwidth, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 10 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 1% − 50% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (9,16,5) to sequentially extract three gene trajectories. To compare the differences between the wiltype and the Wls mutant, we stratified Wnt-active UD cells into seven stages according to their expression profiles of the genes binned along the DC gene trajectory.

  18. m

    ipRGC_Integrated_Tran_scRNASeq

    • data.mendeley.com
    Updated Jun 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shane D'Souza (2024). ipRGC_Integrated_Tran_scRNASeq [Dataset]. http://doi.org/10.17632/36tpcd3ykb.1
    Explore at:
    Dataset updated
    Jun 11, 2024
    Authors
    Shane D'Souza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Reanalysis of Tran et al., 2019 ipRGC and ipRGC-proximal clusters (original annotations C33_M1, C40_M1dup, C31_M2, C43_M4, C22_M5, C7, and C8) used in Dyer et al., 2014. Dataset includes integrated Seurat object of aforementioned clusters and csv files with all differentially expressed genes / top 10 differentially expressed genes.

  19. Single-cell Atlas Reveals Diagnostic Features Predicting Progressive Drug...

    • zenodo.org
    • explore.openaire.eu
    • +1more
    Updated Sep 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vaidehi Krishnan; Florian Schmidt; Florian Schmidt; Zahid Nawaz; Prasanna Nori Venkatesh; Lee Kian Leong; Chan Zhu En; Alice Man Sze Cheung; Sudipto Bari; Meera Makheja; Ahmad Lajam; Pavanish Kumar; John Ouyang; Owen Rackham; William Ying Khee Hwang; Salvatore Albani; Charles Chuah; Shyam Prabhakar; Sin Tiong Ong; Vaidehi Krishnan; Zahid Nawaz; Prasanna Nori Venkatesh; Lee Kian Leong; Chan Zhu En; Alice Man Sze Cheung; Sudipto Bari; Meera Makheja; Ahmad Lajam; Pavanish Kumar; John Ouyang; Owen Rackham; William Ying Khee Hwang; Salvatore Albani; Charles Chuah; Shyam Prabhakar; Sin Tiong Ong (2023). Single-cell Atlas Reveals Diagnostic Features Predicting Progressive Drug Resistance in Chronic Myeloid Leukemia [Dataset]. http://doi.org/10.5281/zenodo.5118611
    Explore at:
    Dataset updated
    Sep 7, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Vaidehi Krishnan; Florian Schmidt; Florian Schmidt; Zahid Nawaz; Prasanna Nori Venkatesh; Lee Kian Leong; Chan Zhu En; Alice Man Sze Cheung; Sudipto Bari; Meera Makheja; Ahmad Lajam; Pavanish Kumar; John Ouyang; Owen Rackham; William Ying Khee Hwang; Salvatore Albani; Charles Chuah; Shyam Prabhakar; Sin Tiong Ong; Vaidehi Krishnan; Zahid Nawaz; Prasanna Nori Venkatesh; Lee Kian Leong; Chan Zhu En; Alice Man Sze Cheung; Sudipto Bari; Meera Makheja; Ahmad Lajam; Pavanish Kumar; John Ouyang; Owen Rackham; William Ying Khee Hwang; Salvatore Albani; Charles Chuah; Shyam Prabhakar; Sin Tiong Ong
    Description

    This archive contains data of scRNAseq and CyTOF in form of Seurat objects, txt and csv files as well as R scripts for data analysis and Figure generation.

    A summary of the content is provided in the following.

    R scripts

    Script to run Machine learning models predicting group specific marker genes: CML_Find_Markers_Zenodo.R
    Script to reproduce the majority of Main and Supplementary Figures shown in the manuscript: CML_Paper_Figures_Zenodo.R
    Script to run inferCNV analysis: inferCNV_Zenodo.R Script to plot NATMI analysis results:NATMI_CvsA_FC0.32_Updown_Column_plot_Zenodo.R Script to conduct sub-clustering and filtering of NK cells NK_Marker_Detection_Zenodo.R

    Helper scripts for plotting and DEG calculation:ComputePairWiseDE_v2.R, Seurat_DE_Heatmap_RCA_Style.R

    RDS files

    General scRNA-seq Seurat objects:

    • scRNA-seq seurat object after QC, and cell type annotation used for most analysis in the manuscript: DUKE_DataSet_Doublets_Removed_Relabeled.RDS
    • scRNA-seq including findings e.g. from NK analysis used in the shiny app: DUKE_final_for_Shiny_App.rds
    • Neighborhood enrichment score computed for group A across all HSPCs: Enrichment_score_global_groupA.RDS
    • UMAP coordinates used in the article: Layout_2D_nNeighbours_25_Metric_cosine_TCU_removed.RDS

    SCENIC files:

    • Regulon set used in SCENIC: 2.6_regulons_asGeneSet.Rds
    • AUC values computed for regulons: 3.4_regulonAUC.Rds
    • MetaData used in SCENIC cellInfo.Rds
    • Group specific regulons for LCS: groupSpecificRegulonsBCRAblP.RDS
    • Patient specific regulons for LSC: patientSpecificRegulonsBCRAblP.RDS
    • Patient specificity score for LSC: PatientSpecificRegulonSpecificityScoreBCRAblP.RDS
    • Regulon specificty score for LSC: RegulonSpecificityScoreBCRAblP.RDS

    BCR-ABL1 inference:

    • HSC with inferred BCR-ABL1 label: HSCs_CML_with_BCR-Abl_label.RDS
    • UMAP for HSC with inferred BCR-ABL1 label: HSCs_CML_with_BCR-Abl_label_UMAP.RDS
    • HSPCs with BCR-ABL1 module scores: HSPC_metacluster_74K_with_modscore_27thmay.RDS

    NK sub-clustering and filtering:

    • NK object with module scores: NK_8617cells_with_modscore_1stjune.RDS
    • Feature genes for NK cells computed with DubStepR: NK_Cells_DubStepR
    • NK cells Seurat object excluding contaminating T and B cells: NK_cells_T_B_17_removed.RDS
    • NK Seurat object including neighbourhood enrichment score calculations: NK_seurat_object_with_enrichment_labels_V2.RDS

    txt and csv files:

    • Proportions per cluster calculated from CyTOF: CyTOF_Proportions.txt
    • Correlation between scRNAseq and CyTOF cell type abundance: scRNAseq_Cor_Cytof.txt
    • Correlation between manual gating and FlowSOM clustering: Manual_vs_FlowSOM.txt
    • GSEA results:
      • HSPC, HSC and LSC results: FINAL_GSEA_DATA_For_GGPLOT.txt
      • NK: NK_For_Plotting.txt
    • TFRC and HLA expression: TFRC_and_HLA_Values.txt
    • NATMI result files:
      • UP-regulated_mean.csv
      • DOWN-regulated_mean.csv
    • Gene position file used in inferCNV: inferCNV_gene_positions_hg38.txt
    • Module scores for NK subclusters per cell: NK_Supplementary_Module_Scores.csv

    Compressed folders:

    • All CyTOF raw data files: CyTOF_Data_raw.zip
    • Results of the patient-based classifier: PatientwiseClassifier.zip
    • Results of the single-cell based classifier: SingleCellClassifierResults.zip

    For general new data analysis approaches, we recommend the readers to use the Seruat object stored in DUKE_final_for_Shiny_App.rds or to use the shiny app(http://scdbm.ddnetbio.com/) and perform further analysis from there.

    RAW data is available at EGA upon request using Study ID: EGAS00001005509

  20. u

    Dawnn benchmarking dataset: Simulated discrete clusters processing and label...

    • rdr.ucl.ac.uk
    application/gzip
    Updated May 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George Hall; Sergi Castellano Hereza (2023). Dawnn benchmarking dataset: Simulated discrete clusters processing and label simulation [Dataset]. http://doi.org/10.5522/04/22616590.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 4, 2023
    Dataset provided by
    University College London
    Authors
    George Hall; Sergi Castellano Hereza
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This project is a collection of files to allow users to reproduce the model development and benchmarking in "Dawnn: single-cell differential abundance with neural networks" (Hall and Castellano, under review). Dawnn is a tool for detecting differential abundance in single-cell RNAseq datasets. It is available as an R package here. Please contact us if you are unable to reproduce any of the analysis in our paper. The files in this collection correspond to the benchmarking dataset based on simulated discrete clusters.

    FILES: Data processing code

    adapted_discrete_clusters_sim_milo_paper.R Lightly adapted code from Dann et al. to simulate single-cell RNAseq datasets that form discrete clusters . generate_test_data_discrete_clusters_sim_milo_paper.R R code to assign simulated labels to datatsets generated from adapted_discrete_clusters_sim_milo_paper.R. Seurat objects saved as cells_sim_discerete_clusters_gex_seed_*.rds. Simulated labels saved as benchmark_dataset_sim_discrete_clusters.csv.

    Resulting datasets

    cells_sim_discerete_clusters_gex_seed_*.rds Seurat objects generated by generate_test_data_discrete_clusters_sim_milo_paper.R. benchmark_dataset_sim_discrete_clusters.csv Cell labels generated by generate_test_data_discrete_clusters_sim_milo_paper.R.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sneddon Lab UCSF (2018). Scripts for Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6783569.v2

Scripts for Analysis

Explore at:
txtAvailable download formats
Dataset updated
Jul 18, 2018
Dataset provided by
figshare
Authors
Sneddon Lab UCSF
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.

Search
Clear search
Close search
Google apps
Main menu