This archive contains data of scRNAseq and CyTOF in form of Seurat objects, txt and csv files as well as R scripts for data analysis and Figure generation.
A summary of the content is provided in the following.
R scripts
Script to run Machine learning models predicting group specific marker genes: CML_Find_Markers_Zenodo.R Script to reproduce the majority of Main and Supplementary Figures shown in the manuscript: CML_Paper_Figures_Zenodo.R Script to run inferCNV analysis: inferCNV_Zenodo.R Script to plot NATMI analysis results:NATMI_CvsA_FC0.32_Updown_Column_plot_Zenodo.R Script to conduct sub-clustering and filtering of NK cells NK_Marker_Detection_Zenodo.R
Helper scripts for plotting and DEG calculation:ComputePairWiseDE_v2.R, Seurat_DE_Heatmap_RCA_Style.R
RDS files
General scRNA-seq Seurat objects:
scRNA-seq seurat object after QC, and cell type annotation used for most analysis in the manuscript: DUKE_DataSet_Doublets_Removed_Relabeled.RDS
scRNA-seq including findings e.g. from NK analysis used in the shiny app: DUKE_final_for_Shiny_App.rds
Neighborhood enrichment score computed for group A across all HSPCs: Enrichment_score_global_groupA.RDS
UMAP coordinates used in the article: Layout_2D_nNeighbours_25_Metric_cosine_TCU_removed.RDS
SCENIC files:
Regulon set used in SCENIC: 2.6_regulons_asGeneSet.Rds
AUC values computed for regulons: 3.4_regulonAUC.Rds
MetaData used in SCENIC cellInfo.Rds
Group specific regulons for LCS: groupSpecificRegulonsBCRAblP.RDS
Patient specific regulons for LSC: patientSpecificRegulonsBCRAblP.RDS
Patient specificity score for LSC: PatientSpecificRegulonSpecificityScoreBCRAblP.RDS
Regulon specificty score for LSC: RegulonSpecificityScoreBCRAblP.RDS
BCR-ABL1 inference:
HSC with inferred BCR-ABL1 label: HSCs_CML_with_BCR-Abl_label.RDS
UMAP for HSC with inferred BCR-ABL1 label: HSCs_CML_with_BCR-Abl_label_UMAP.RDS
HSPCs with BCR-ABL1 module scores: HSPC_metacluster_74K_with_modscore_27thmay.RDS
NK sub-clustering and filtering:
NK object with module scores: NK_8617cells_with_modscore_1stjune.RDS
Feature genes for NK cells computed with DubStepR: NK_Cells_DubStepR
NK cells Seurat object excluding contaminating T and B cells: NK_cells_T_B_17_removed.RDS
NK Seurat object including neighbourhood enrichment score calculations: NK_seurat_object_with_enrichment_labels_V2.RDS
txt and csv files:
Proportions per cluster calculated from CyTOF: CyTOF_Proportions.txt
Correlation between scRNAseq and CyTOF cell type abundance: scRNAseq_Cor_Cytof.txt
Correlation between manual gating and FlowSOM clustering: Manual_vs_FlowSOM.txt
GSEA results:
HSPC, HSC and LSC results: FINAL_GSEA_DATA_For_GGPLOT.txt
NK: NK_For_Plotting.txt
TFRC and HLA expression: TFRC_and_HLA_Values.txt
NATMI result files:
UP-regulated_mean.csv
DOWN-regulated_mean.csv
Gene position file used in inferCNV: inferCNV_gene_positions_hg38.txt
Module scores for NK subclusters per cell: NK_Supplementary_Module_Scores.csv
Compressed folders:
All CyTOF raw data files: CyTOF_Data_raw.zip
Results of the patient-based classifier: PatientwiseClassifier.zip
Results of the single-cell based classifier: SingleCellClassifierResults.zip
For general new data analysis approaches, we recommend the readers to use the Seruat object stored in DUKE_final_for_Shiny_App.rds or to use the shiny app(http://scdbm.ddnetbio.com/) and perform further analysis from there.
RAW data is available at EGA upon request using Study ID: EGAS00001005509
Revision
The for_CML_manuscript_revision.tar.gz folder contains scripts and data for the paper revision including 1) Detection of the BCR-ABL fusion with long read sequencing; 2) Identification of BCR-ABL junction reads with scRNAseq; 3) Detection of expressed mutations using scRNAseq.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Purpose: We performed single-cell RNA sequencing (scRNA-seq), an unbiased and high-throughput single cell technology, to determine phenotype and function of peripheral immune cells in patients with diabetic macular edema (DME).Methods: Peripheral blood mononuclear cells (PBMCs) were isolated from DME patients and healthy controls (HC). The single-cell samples were loaded on the Chromium platform (10x Genomics) for sequencing. R package Seurat v3 was used for data normalizing, clustering, dimensionality reduction, differential expression analysis, and visualization.Results: We constructed a single-cell RNA atlas comprising 57,650 PBMCs (24,919 HC, 32,731 DME). We divided all immune cells into five major immune cell lineages, including monocytes (MC), T cells (TC), NK cells (NK), B cells (BC), and dendritic cells (DC). Our differential expression gene (DEG) analysis showed that MC was enriched of genes participating in the cytokine pathway and inflammation activation. We further subdivided MC into five subsets: resting CD14++ MC, proinflammatory CD14++ MC, intermediate MC, resting CD16++ MC and pro-inflammatory CD16++ MC. Remarkably, we revealed that the proinflammatory CD14++ monocytes predominated in promoting inflammation, mainly by increasingly production of inflammatory cytokines (TNF, IL1B, and NFKBIA) and chemokines (CCL3, CCL3L1, CCL4L2, CXCL2, and CXCL8). Gene Ontology (GO) and pathway analysis of the DEGs demonstrated that the proinflammatory CD14++ monocytes, especially in DME patients, upregulated inflammatory pathways including tumor necrosis factor-mediated signaling pathway, I-kappaB kinase/NF-kappaB signaling, and toll-like receptor signaling pathway.Conclusion: In this study, we construct the first immune landscape of DME patients with T2D and confirmed innate immune dysregulation in peripheral blood based on an unbiased scRNA-seq approach. And these results demonstrate potential target cell population for anti-inflammation treatments.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is part of the manuscript: "Dysregulation of mitochondrial and proteo-lysosomal genes in Parkinson's disease myeloid cells", by Navarro E, Udine E, et al.
Description of files:
MyND_monocyte.cis_eqtl_nominal.txt.gz - Full nominal eQTL summary statistics (gzip-compressed)
MyND_monocyte.cis_eqtl_permuted.txt.gz - Full permuted eQTL summary statistics (gzip-compressed)
MyND_monocyte.cis_sqtl_nominal.txt.gz - Full nominal sQTL summary statistics (gzip-compressed)
MyND_monocyte.cis_sqtl_permuted.txt.gz - Full permuted sQTL summary statistics (gzip-compressed)
gencode.v30.primary_assembly.annotation.txt.gz - Gencode (v30) gene annotations used in the analysis (gzip-compressed)
monocyte_counts_matrix.txt.gz - RSEM counts from monocytes samples (230 samples) (gzip-compressed)
monocyte_tpms_matrix.txt.gz -RSEM TPMs from monocytes samples (230 samples) (gzip-compressed)
microglia_counts_matrix.txt.gz - RSEM counts from microglia samples (128 samples - 55 donors) (gzip-compressed)
microglia_tpms_matrix.txt.gz - RSEM TPMs from microglia samples (128 samples - 55 donors) (gzip-compressed)
processed_seurat_obj.RDS - Seurat R data object file containing single-cell RNA-seq results (14,827 features, 19,144 cells, 10 donors)
Table columns are formatted as follows:
Nominal eQTL results include all SNP-gene pairs tested (using a 1Mb window from each side of the transcription start site (TSS) of a gene). Table columns are formatted as follows:
"pheno_id" - The phenotype ID
"pheno_chr" - The chromosome ID of the phenotype
"pheno_start" - The start position of the phenotype
"pheno_end" - The end position of the phenotype
"pheno_strand" - The strand orientation of the phenotype
"num_var" - The total number of variants tested in cis
"distance" - The distance between the phenotype and the tested variant (accounting for strand orientation)
"snp_id" - The ID of the tested variant
"snp_chr" - The chromosome ID of the variant
"snp_start" - The start position of the variant
"snp_end" - The end position of the variant
"nominal_pval" - The nominal P-value of association between the variant and the phenotype
"slope" - The corresponding regression slope
"lead_snp" - A binary flag equal to 1 is the variant is the top variant in cis
Permuted eQTL results include only the top SNP-gene association for each gene (1000 permutations). Table columns are formatted as follows:
"gene_id" - The phenotype ID
"gene_chr" - The chromosome ID of the phenotype
"gene_start" - The start position of the phenotype
"gene_end" - The end position of the phenotype
"gene_strand" - The strand orientation of the phenotype
"num_var" - The total number of variants tested in cis
"distance" - The distance between the phenotype and the tested variant (accounting for strand orientation)
"snp_id" - The ID of the top variant
"snp_chr" - The chromosome ID of the top variant
"snp_start" - The start position of the top variant
"snp_end" - The end position of the top variant
"degree_of_freedom" - The number of degrees of freedom used to compute the P-values
"dummy" - Dummy
"bval1" - The first parameter value of the fitted beta distribution
"bval2" - The second parameter value of the fitted beta distribution (it also gives the effective number of independent tests in the region)
"nominal_pval" - The nominal P-value of association between the phenotype and the top variant in cis
"slope" - The corresponding regression slope
"empirical_pval" - The P-value of association adjusted for the number of variants tested in cis given by the direct method (i.e. empirircal P-value)
"beta_dist_pval" - The P-value of association adjusted for the number of variants tested in cis given by the fitted beta distribution. We strongly recommend to use this adjusted P-value in any downstream analysis
Nominal sQTL results include all SNP-junction pairs tested (using a 100kb window from the center of each intron cluster). Table columns are formatted as follows:
"pheno_id" - The phenotype ID
"pheno_chr" - The chromosome ID of the phenotype
"pheno_start" - The start position of the phenotype
"pheno_end" - The end position of the phenotype
"pheno_strand" - The strand orientation of the phenotype
"num_var" - The total number of variants tested in cis
"distance" - The distance between the phenotype and the tested variant (accounting for strand orientation)
"snp_id" - The ID of the tested variant
"snp_chr" - The chromosome ID of the variant
"snp_start" - The start position of the variant
"snp_end" - The end position of the variant
"nominal_pval" - The nominal P-value of association between the variant and the phenotype
"slope" - The corresponding regression slope
"lead_snp" - A binary flag equal to 1 is the variant is the top variant in cis
Permuted sQTL results include only the top SNP-junction association by gene (1000 permutations). Table columns are formatted as follows:
"pheno_id" - The phenotype group ID (here a gene ID)
"pheno_chr" - The chromosome ID of the phenotype group
"pheno_start" - The start position of the phenotype group
"pheno_end" - The end position of the phenotype group
"pheno_strand" - The strand orientation of the phenotype group
"pheno_id" - The top phenotype in the group (here an exon ID)
"num_pheno" - The total number of phenotypes in the group (i.e. #exons)
"num_var" - The total number of variants tested in cis
"distance" - The distance between the phenotype group and the tested variant (accounting for strand orientation)
"snp_id" - The ID of the top variant
"snp_chr" - The chromosome ID of the top variant
"snp_start" - The start position of the top variant
"snp_end" - The end position of the top variant
"degree_of_freedom” - The number of degrees of freedom used to compute the P-valuesm"
"dummy" - Dummy
"bval1" - The first parameter value of the fitted beta distribution
"bval2" - The second parameter value of the fitted beta distribution (it also gives the effective number of independent tests in the region)
"nominal_pval" - The nominal P-value of association between the top phenotype and the top variant in cis
"slope" - The corresponding regression slope
"empirical_pval" - The P-value of association adjusted for the number of variants and phenotypes tested in cis given by the direct method (i.e. empirircal P-value)
"beta_dist_pval" - The P-value of association adjusted for the number of variants and phenotypes tested in cis given by the fitted beta distribution. We strongly recommend to use this adjusted P-value in any downstream analysis
NOTE: The effect sizes of eQTLs and sQTL are defined as the effect of the alternative allele (ALT) relative to the reference (REF) allele in the human genome reference (GRCh38).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This archive contains data of scRNAseq and CyTOF in form of Seurat objects, txt and csv files as well as R scripts for data analysis and Figure generation.
A summary of the content is provided in the following.
R scripts
Script to run Machine learning models predicting group specific marker genes: CML_Find_Markers_Zenodo.R Script to reproduce the majority of Main and Supplementary Figures shown in the manuscript: CML_Paper_Figures_Zenodo.R Script to run inferCNV analysis: inferCNV_Zenodo.R Script to plot NATMI analysis results:NATMI_CvsA_FC0.32_Updown_Column_plot_Zenodo.R Script to conduct sub-clustering and filtering of NK cells NK_Marker_Detection_Zenodo.R
Helper scripts for plotting and DEG calculation:ComputePairWiseDE_v2.R, Seurat_DE_Heatmap_RCA_Style.R
RDS files
General scRNA-seq Seurat objects:
scRNA-seq seurat object after QC, and cell type annotation used for most analysis in the manuscript: DUKE_DataSet_Doublets_Removed_Relabeled.RDS
scRNA-seq including findings e.g. from NK analysis used in the shiny app: DUKE_final_for_Shiny_App.rds
Neighborhood enrichment score computed for group A across all HSPCs: Enrichment_score_global_groupA.RDS
UMAP coordinates used in the article: Layout_2D_nNeighbours_25_Metric_cosine_TCU_removed.RDS
SCENIC files:
Regulon set used in SCENIC: 2.6_regulons_asGeneSet.Rds
AUC values computed for regulons: 3.4_regulonAUC.Rds
MetaData used in SCENIC cellInfo.Rds
Group specific regulons for LCS: groupSpecificRegulonsBCRAblP.RDS
Patient specific regulons for LSC: patientSpecificRegulonsBCRAblP.RDS
Patient specificity score for LSC: PatientSpecificRegulonSpecificityScoreBCRAblP.RDS
Regulon specificty score for LSC: RegulonSpecificityScoreBCRAblP.RDS
BCR-ABL1 inference:
HSC with inferred BCR-ABL1 label: HSCs_CML_with_BCR-Abl_label.RDS
UMAP for HSC with inferred BCR-ABL1 label: HSCs_CML_with_BCR-Abl_label_UMAP.RDS
HSPCs with BCR-ABL1 module scores: HSPC_metacluster_74K_with_modscore_27thmay.RDS
NK sub-clustering and filtering:
NK object with module scores: NK_8617cells_with_modscore_1stjune.RDS
Feature genes for NK cells computed with DubStepR: NK_Cells_DubStepR
NK cells Seurat object excluding contaminating T and B cells: NK_cells_T_B_17_removed.RDS
NK Seurat object including neighbourhood enrichment score calculations: NK_seurat_object_with_enrichment_labels_V2.RDS
txt and csv files:
Proportions per cluster calculated from CyTOF: CyTOF_Proportions.txt
Correlation between scRNAseq and CyTOF cell type abundance: scRNAseq_Cor_Cytof.txt
Correlation between manual gating and FlowSOM clustering: Manual_vs_FlowSOM.txt
GSEA results:
HSPC, HSC and LSC results: FINAL_GSEA_DATA_For_GGPLOT.txt
NK: NK_For_Plotting.txt
TFRC and HLA expression: TFRC_and_HLA_Values.txt
NATMI result files:
UP-regulated_mean.csv
DOWN-regulated_mean.csv
Gene position file used in inferCNV: inferCNV_gene_positions_hg38.txt
Module scores for NK subclusters per cell: NK_Supplementary_Module_Scores.csv
Compressed folders:
All CyTOF raw data files: CyTOF_Data_raw.zip
Results of the patient-based classifier: PatientwiseClassifier.zip
Results of the single-cell based classifier: SingleCellClassifierResults.zip
For general new data analysis approaches, we recommend the readers to use the Seruat object stored in DUKE_final_for_Shiny_App.rds or to use the shiny app(http://scdbm.ddnetbio.com/) and perform further analysis from there.
RAW data is available at EGA upon request using Study ID: EGAS00001005509
Revision
The for_CML_manuscript_revision.tar.gz folder contains scripts and data for the paper revision including 1) Detection of the BCR-ABL fusion with long read sequencing; 2) Identification of BCR-ABL junction reads with scRNAseq; 3) Detection of expressed mutations using scRNAseq.