Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data repository for the scMappR manuscript:
Abstract from biorXiv (https://www.biorxiv.org/content/10.1101/2020.08.24.265298v1.full).
RNA sequencing (RNA-seq) is widely used to identify differentially expressed genes (DEGs) and reveal biological mechanisms underlying complex biological processes. RNA-seq is often performed on heterogeneous samples and the resulting DEGs do not necessarily indicate the cell types where the differential expression occurred. While single-cell RNA-seq (scRNA-seq) methods solve this problem, technical and cost constraints currently limit its widespread use. Here we present single cell Mapper (scMappR), a method that assigns cell-type specificity scores to DEGs obtained from bulk RNA-seq by integrating cell-type expression data generated by scRNA-seq and existing deconvolution methods. After benchmarking scMappR using RNA-seq data obtained from sorted blood cells, we asked if scMappR could reveal known cell-type specific changes that occur during kidney regeneration. We found that scMappR appropriately assigned DEGs to cell-types involved in kidney regeneration, including a relatively small proportion of immune cells. While scMappR can work with any user supplied scRNA-seq data, we curated scRNA-seq expression matrices for ∼100 human and mouse tissues to facilitate its use with bulk RNA-seq data alone. Overall, scMappR is a user-friendly R package that complements traditional differential expression analysis available at CRAN.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.
Comprehensive introduction to the processing and analysis of bulk RNA-seq data including basic information about Illumina-based short read sequencing, common file formats (FASTQ, SAM/BAM, BED, ...) and quality controls. Contains ready-to-use UNIX and R code; covers the most common application of bulk RNA-seq to identify genes that are differentially expressed when comparing two conditions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains R and Python scripts used to analyze bulk RNA-Seq datasets from the Gene Expression Omnibus (GEO) related to breast, prostate, endometrial, lung, and colorectal cancer. The workflow includes differential expression analysis, survival analysis, functional enrichment, and visualization.Included workflows:R scripts:Data processing and normalization of GEO-derived bulk RNA-Seq datasets.Differential expression analysis (DEG) using limma.Survival analysis using Cox proportional hazards modelsFunctional enrichment analysis (GO, Reactome) of overlapping DEGs to identify significant pathwaysPython scripts:Forest plot generation using Matplotlib and Seaborn to visualize the survival analysis's hazard ratios and confidence intervals.Data Source:Publicly available bulk RNA-Seq datasets from the Gene Expression Omnibus (GEO) database.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Arsenic exposure via drinking water is a serious environmental health concern. Epidemiological studies suggest a strong association between prenatal arsenic exposure and subsequent childhood respiratory infections, as well as morbidity from respiratory diseases in adulthood, long after systemic clearance of arsenic. We investigated the impact of exclusive prenatal arsenic exposure on the inflammatory immune response and respiratory health after an adult influenza A (IAV) lung infection. C57BL/6J mice were exposed to 100 ppb sodium arsenite in utero, and subsequently infected with IAV (H1N1) after maturation to adulthood. Assessment of lung tissue and bronchoalveolar lavage fluid (BALF) at various time points post IAV infection reveals greater lung damage and inflammation in arsenic exposed mice versus control mice. Single-cell RNA sequencing analysis of immune cells harvested from IAV infected lungs suggests that the enhanced inflammatory response is mediated by dysregulation of innate immune function of monocyte derived macrophages, neutrophils, NK cells, and alveolar macrophages. Our results suggest that prenatal arsenic exposure results in lasting effects on the adult host innate immune response to IAV infection, long after exposure to arsenic, leading to greater immunopathology. This study provides the first direct evidence that exclusive prenatal exposure to arsenic in drinking water causes predisposition to a hyperinflammatory response to IAV infection in adult mice, which is associated with significant lung damage.
Methods Whole lung homogenate preparation for single cell RNA sequencing (scRNA-seq).
Lungs were perfused with PBS via the right ventricle, harvested, and mechanically disassociated prior to straining through 70- and 30-µm filters to obtain a single-cell suspension. Dead cells were removed (annexin V EasySep kit, StemCell Technologies, Vancouver, Canada), and samples were enriched for cells of hematopoetic origin by magnetic separation using anti-CD45-conjugated microbeads (Miltenyi, Auburn, CA). Single-cell suspensions of 6 samples were loaded on a Chromium Single Cell system (10X Genomics) to generate barcoded single-cell gel beads in emulsion, and scRNA-seq libraries were prepared using Single Cell 3’ Version 2 chemistry. Libraries were multiplexed and sequenced on 4 lanes of a Nextseq 500 sequencer (Illumina) with 3 sequencing runs. Demultiplexing and barcode processing of raw sequencing data was conducted using Cell Ranger v. 3.0.1 (10X Genomics; Dartmouth Genomics Shared Resource Core). Reads were aligned to mouse (GRCm38) and influenza A virus (A/PR8/34, genome build GCF_000865725.1) genomes to generate unique molecular index (UMI) count matrices. Gene expression data have been deposited in the NCBI GEO database and are available at accession # GSE142047.
Preprocessing of single cell RNA sequencing (scRNA-seq) data
Count matrices produced using Cell Ranger were analyzed in the R statistical working environment (version 3.6.1). Preliminary visualization and quality analysis were conducted using scran (v 1.14.3, Lun et al., 2016) and Scater (v. 1.14.1, McCarthy et al., 2017) to identify thresholds for cell quality and feature filtering. Sample matrices were imported into Seurat (v. 3.1.1, Stuart., et al., 2019) and the percentage of mitochondrial, hemoglobin, and influenza A viral transcripts calculated per cell. Cells with < 1000 or > 20,000 unique molecular identifiers (UMIs: low quality and doublets), fewer than 300 features (low quality), greater than 10% of reads mapped to mitochondrial genes (dying) or greater than 1% of reads mapped to hemoglobin genes (red blood cells) were filtered from further analysis. Total cells per sample after filtering ranged from 1895-2482, no significant difference in the number of cells was observed in arsenic vs. control. Data were then normalized using SCTransform (Hafemeister et al., 2019) and variable features identified for each sample. Integration anchors between samples were identified using canonical correlation analysis (CCA) and mutual nearest neighbors (MNNs), as implemented in Seurat V3 (Stuart., et al., 2019) and used to integrate samples into a shared space for further comparison. This process enables identification of shared populations of cells between samples, even in the presence of technical or biological differences, while also allowing for non-overlapping populations that are unique to individual samples.
Clustering and reference-based cell identity labeling of single immune cells from IAV-infected lung with scRNA-seq
Principal components were identified from the integrated dataset and were used for Uniform Manifold Approximation and Projection (UMAP) visualization of the data in two-dimensional space. A shared-nearest-neighbor (SNN) graph was constructed using default parameters, and clusters identified using the SLM algorithm in Seurat at a range of resolutions (0.2-2). The first 30 principal components were used to identify 22 cell clusters ranging in size from 25 to 2310 cells. Gene markers for clusters were identified with the findMarkers function in scran. To label individual cells with cell type identities, we used the singleR package (v. 3.1.1) to compare gene expression profiles of individual cells with expression data from curated, FACS-sorted leukocyte samples in the Immgen compendium (Aran D. et al., 2019; Heng et al., 2008). We manually updated the Immgen reference annotation with 263 sample group labels for fine-grain analysis and 25 CD45+ cell type identities based on markers used to sort Immgen samples (Guilliams et al., 2014). The reference annotation is provided in Table S2, cells that were not labeled confidently after label pruning were assigned “Unknown”.
Differential gene expression by immune cells
Differential gene expression within individual cell types was performed by pooling raw count data from cells of each cell type on a per-sample basis to create a pseudo-bulk count table for each cell type. Differential expression analysis was only performed on cell types that were sufficiently represented (>10 cells) in each sample. In droplet-based scRNA-seq, ambient RNA from lysed cells is incorporated into droplets, and can result in spurious identification of these genes in cell types where they aren’t actually expressed. We therefore used a method developed by Young and Behjati (Young et al., 2018) to estimate the contribution of ambient RNA for each gene, and identified genes in each cell type that were estimated to be > 25% ambient-derived. These genes were excluded from analysis in a cell-type specific manner. Genes expressed in less than 5 percent of cells were also excluded from analysis. Differential expression analysis was then performed in Limma (limma-voom with quality weights) following a standard protocol for bulk RNA-seq (Law et al., 2014). Significant genes were identified using MA/QC criteria of P < .05, log2FC >1.
Analysis of arsenic effect on immune cell gene expression by scRNA-seq.
Sample-wide effects of arsenic on gene expression were identified by pooling raw count data from all cells per sample to create a count table for pseudo-bulk gene expression analysis. Genes with less than 20 counts in any sample, or less than 60 total counts were excluded from analysis. Differential expression analysis was performed using limma-voom as described above.
This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows: matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz) *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include: nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().
On this Zenodo link, we share the data that is required to reproduce all the analyses from our publication "satuRn: Scalable Analysis of differential Transcript Usage for bulk and single-cell RNA-sequencing applications". This repository includes input transcript-level expression matrices and metadata for all datasets, as well as intermediate results and final outputs of the respective DTU analyses. For a more elaborate description of the data, we refer to the companion GitHub for our publications; https://github.com/statOmics/satuRnPaper. Note that this is version 1.0.3 of the data (uploaded on 2022-07-08). If any changes were to be made to the datasets in the future, this will also be communicated on our companion GitHub page.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale droplet-based dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at https://github.com/kuanglab/scVDMC.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes information relevant to the following manuscript from the labs of Prof. Carlos Caldas (University of Cambridge), and Dr. Long V. Nguyen (Princess Margaret Cancer Centre, University Health Network):
Nguyen LV et al. Dynamics and plasticity of human breast cancer single cell-derived clones. Under consideration for publication.
Bulk RNA sequencing raw count matrices are provided (RawCounts.csv) along with the normalized count matrices (LogCPMNormCounts.csv).
Single cell RNA sequencing count matrix processed from R package metacell is provided (mat.pdx_LN_v2_filt.Rda), along with the mc and mc2d files with information on metacell partitions (mc.pdx_LN_v2_filt.Rda and mc2d.pdx_LN_v2_filt.Rda).
Code and information on data analysis is provided for reviewers in our unpublished manuscript and on Github (https://github.com/cclab-brca/clone-dynamics).
Normalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data. Total 183 single cells (92 H1 cells, 91 H9 cells), sequenced twice, were used to evaluate SCnorm in normalizing single cell RNA-seq experiments. Total 48 bulk H1 samples were used to compare bulk and single cell properties. For single-cell RNA-seq, the identical single-cell indexed and fragmented cDNA were pooled at 96 cells per lane or at 24 cells per lane to test the effects of sequencing depth, resulting in approximately 1 million and 4 million mapped reads per cell in the two pooling groups, respectively.
The SQLite databases contain the outputs from the large scale analysis of pre-existing RNA-seq and microarray datasets performed in chapter 2. Both SQLite databases contain the outputs of limma- a package used to perform differential expressed gene analysis on the datasets from Gene Expression Omnibus (GEO)- https://www.ncbi.nlm.nih.gov/geo/. The Schema for both databases are as follows- the data table contains the outputs and statistics from limma. The meta table contains metadata about the number of treated and control samples, the type of experiment conducted and the tissue used. These datasets where used to derive the priors used in chapters 3 to 5 based on the proportion of datasets wherein a given gene is identified as differentially expressed- i.e. p-value below 0.05. Die to the size of the file, this is only available on request, please use https://library.soton.ac.uk/datarequest The machine_learning_input.csv file is a comma delaminated file containing the genomic and transcript based features used to predict a gene's prior in the machine learning models. For more information see the readme file. The RNK files are tab delimited files. The .RNK files' first column is the gene whils the second is the rank from 1 to 0. These files were used to assess the enrichment of desired DEGs across 22 perturbation studies in chapter 2 using GSEA- https://www.gsea-msigdb.org/gsea/index.jsp. 1 represents a gene with the lowest rank- highest priority. Whilst 0 represents the lowest priority for a given gene. The .RDS images are the R images used for the novel GEOreflect approach for ranking DEGs in bulk transcriptomic data developed in chapter 3. They are also needed to run the RShiny application used to showcase the method. The code for which can be found at GitHub (https://github.com/brandoncoke/GEOreflect) as well ain in the GEOreflect_bulk_DEG_analysis.tar. The .RDS files require R and the readRDS() function to load into the environment and contains the percentile matrices used to calculate a platform p-value rank. Within the GEOreflect_bulk_DEG_analysis.tar file is an R script GEOreflect_functions.R which when sourced after loading one of the .RDS images into the R environment enables the user to perform the GEOreflect method on bulk RNA-seq transcriptomic datasets by loading the percentile_matrix_p_value_RNAseq.RDS image. Alternatively when analysing GPL570 microarray datasets the percentile_matrix.RDS file needs to be loaded into the R environment and the appropiate R function then needs to be applied the DEG list. To run the RShiny application ensure both .RDS files are in the directory with the app.R file i.e. after using git clone https://github.com/brandoncoke/GEOreflect move both .RDS files into the GEOreflect directory with the cloned repository. The csv files with the scRNA-seq appended. These files contain the normalised mutual index, adjusted rand index and Silhouette coefficeint obtained when using 6 single cell RNA-sequencing techniques- GEOreflect, Seurat's vst method, CellBRF, genebasis and CellBRF with the 3 sigma rule imposed. This analysis was carried out in chapter 3. These .csvs use their GEO identifier in the file name or for Zheng et al's data from genomics 10X. The name assigned to it via the DuoClustering2018 R package. The machine_learning_input.csv file is a comma delaminated file containing the genomic and transcript based features used to predict a gene's prior in the machine learning models. The inputs from this file were used to develop the machine learning models used in chapter 5. First row- gene is the HNGC identifier for the genes whilst the min_to_be_sig column represents a gene's CDF value at 0.05 for their p-value distribution obtained from the RNA-seq datasets i.e. the target y for the regressor model. The sd column is unused- and was only relevant when calculating the priors using GPL570 microarray data were there can be redundant probes resulting in multiple priors for the same gene. This column would represent the standard deviation.
RNAseq FKPM values of mouse cell lines Ink4a.1, Met25, Met35, Met36, and Met38. Pipeline of data: Ink4a.1 and Met cell lines were profiled using standard RNA-seq completed by Azenta Life Sciences (formally known as Genewiz, South Plainfield, NJ). After quality check of the reads using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), we used Salmon to quantify transcript-level expression and EdgeR to identify genes with significantly differential expression between pairs of conditions based on replicated count data from bulk RNA-seq profiling. The normalized data were applied to R package GAGE for gene-set enrichment and pathway analysis. The p-values were corrected for multiple testing using FDR.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The mechanisms underlying ETS-driven prostate cancer initiation and progression remain poorly understood due to a lack of model systems that recapitulate this phenotype. We generated a genetically engineered mouse with prostate-specific expression of the ETS factor, ETV4, at lower and higher protein dosages through mutation of its degron. Lower-level expression of ETV4 caused mild luminal cell expansion without histologic abnormalities and higher-level expression of stabilized ETV4 caused prostatic intraepithelial neoplasia (mPIN) with 100% penetrance within 1 week. Tumor progression was limited by p53-mediated senescence and Trp53 deletion cooperated with stabilized ETV4. The neoplastic cells expressed differentiation markers such as Nkx3.1 recapitulating luminal gene expression features of untreated human prostate cancer. Single-cell and bulk RNA-sequencing showed stabilized ETV4 induced a novel luminal-derived expression cluster with signatures of the cell cycle, senescence, and epithelial to mesenchymal transition. These data suggest that ETS overexpression alone, at sufficient dosage, can initiate prostate neoplasia. Methods Mouse prostate digestion: Intraperitoneal injection of tamoxifen was administered in 8-week-old mice. 2 weeks after tamoxifen treatment, the mouse prostate was digested 1 hour with Collagenase/Hyaluronidase (STEMCELL, #07912), and then 30 minutes with TrypLETM Express Enzyme (Thermo Fischer, # 12605028) at 37°C to isolate single prostate cells. The prostate cells were stained with PE/Cy7 conjugated anti-mouse CD326 (EpCAM) antibody (BioLegend, 118216) and then, CD326 and EYFP double positive cells were sorted out by flow cytometry, which are luminal cells mainly from the anterior prostate and dorsal prostate. The mRNA or genomic DNA were extracted from these double-positive cells and then were used for ATAC-sequencing and RNA-sequencing analysis. ATAC-seq and primary data processing: ATAC-seq was performed as previously described. Primary data processing and peak calling were performed using ENCODE ATAC-seq pipeline (https://github.com/kundajelab/atac_dnase_pipelines). Briefly, paired-end reads were trimmed, filtered, and aligned against mm9 using Bowtie2. PCR duplicates and reads mapped to mitochondrial chromosome or repeated regions were removed. Mapped reads were shifted +4/-5 to correct for the Tn5 transposase insertion. Peak calling was performed using MACS2, with p-value < 0.01 as the cutoff. Reproducible peaks from two biological replicates were defined as peaks that overlapped by more than 50%. On average 25 million uniquely mapped pairs of reads were remained after filtering. The distribution of inserted fragment length shows a typical nucleosome banding pattern, and the TSS enrichment score (reads that are enriched around TSS against background) ranges between 28 and 33, suggesting the libraries have high quality and were able to capture the majority of regions of interest. Differential peak accessibility: Reads aligned to peak regions were counted using R package GenomicAlignments_v1.12.2. Read count normalization and differential accessible peaks were called with DESeq2_v1.16.1 in R 3.4.1. Differential peaks were defined as peaks with adjusted p-value < 0.01 and |log2(FC)| > 2. For visualization, coverage bigwig files were generated using bamCoverage command from deepTools2, normalizing using the size factor generated by DESeq2. The differential ATAC-seq peak density plot was generated with deepTools2, using regions that were significantly more or less accessible in ETV4AAA samples relative to EYFP samples. Motif analysis: Enriched motif was performed using MEME-ChIP 5.0.0 with differentially accessible regions in ETV4AAA relative to EYFP. ATAC-seq footprinting was performed using TOBIAS. First, ACACCorrect was run to correct Tn5 bias, followed by ScoreBigwig to calculate footprint score, and finally BindDetect to generate differential footprint across regions. RNA-seq analysis: The extracted RNA was processed for RNA-sequencing by the Integrated Genomics Core Facility at MSKCC. The libraries were sequenced on an Illumina HiSeq-2500 platform with 51 bp paired-end reads to obtain a minimum yield of 40 million reads per sample. The sequenced data were aligned using STAR v2.3 with GRCm38.p6 as annotation. DESeq2_v1.16.1 was subsequently applied on read counts for normalization and the identification of differentially expressed genes between ETV4AAA and EYFP groups, with an adjusted p-value < 0.05 as the threshold. Genes were ranked by sign(log2(FC)) * (-log(p-value)) as input for GSEA analysis using ‘Run GSEA Pre-ranked’ with 1000 permutations (48). The custom gene sets used in GSEA analysis are shown in Table S2. Unsupervised hierarchical clustering: To get an overall sample clustering as part of QC, hierarchical clustering was performed using pheatmap_v1.0.10 package in R on normalized ATAC-seq or RNA-seq data. It was done using all peaks or all genes, with Spearman or Pearson correlation as the distance metric. To have an overview of the differential gene expression from the RNA-seq data, unsupervised clustering was also performed on a matrix with all samples as columns and scaled normalized read counts of differentially expressed genes between ETV4AAA and EYFP as rows. Integrative analysis of ATAC-seq, RNA-seq, and ChIP-seq data: ERG ChIP-seq peaks were called using MACS 2.1, with an FDR cutoff of q < 10-3 and the removal of peaks mapped to blacklist regions. Reproducible peaks between two biological replicates were identified as ETV4AAA ATAC-seq peaks. ERG ChIP-seq peaks and ETV4AAA ATAC-seq peaks were considered as overlap if peak summits were within 250bp. To determine whether the overlap was significant, enrichment analysis was done using regioneR_v1.8.1 in R, which counted the number of overlapped peaks between a set of randomly selected regions in the genome (excluding blacklist regions) and the ERG-ChIP seq peaks or ETV4AAA ATAC-seq peaks. A null distribution was formed using 1000 permutation tests to compute the p-value and z-score of the original evaluation. To assign ATAC-seq peaks to genes, ChIPseeker_v1.12.1 in R was used. Each peak was unambiguously assigned to one gene with a TSS or 3’ end closest to that peak. Differential gene expression between ETV4AAA and EYFP was evaluated using log2(FC) calculated by DESeq2. p-values were estimated with Wilcoxon rank t-test and Student t-test. scRNA-sequencing: Tmprss2-CreERT2, EYFP; Tmprss2-CreERT2, ETV4WT; Tmprss2-CreERT2, ETV4AAA; and Tmprss2-CreERT2, ETV4AAA; Trp53L/L mice were euthanized 2 weeks or 4 months after tamoxifen treatment (n=3 mice for each genotype and time point). After euthanasia, the prostates were dissected out and minced with scalpel, and then processed for 1h digestion with collagenase/hyaluronidase (#07912, STEMCELL Technologies) and 30min digestion with TrypLE (#12605010, Gibco). Live single prostate cells were sorted out by flow cytometry as DAPI-. For each mouse, 5,000 cells were directly processed with 10X genomics Chromium Single Cell 3’ GEM, Library & Gel Bead Kit v3 according to manufacturer’s specifications. For each sample, 200 million reads were acquired on NovaSeq platform S4 flow cell. Reads obtained from the 10x Genomics scRNAseq platform were mapped to mouse genome (mm9) using the Cell Ranger package (10X Genomics). True cells are distinguished from empty droplets using scCB2 package. The levels of mitochondrial reads and numbers of unique molecular identifiers (UMIs) were similar among the samples, which indicates that there were no systematic biases in the libraries from mice with different genotypes. Cells were removed if they expressed fewer than 600 unique genes, less than 1,500 total counts, more than 50,000 total counts, or greater than 20% mitochondrial reads. Genes detected in less than 10 cells and all mitochondrial genes were removed for subsequent analyses. Putative doublets were removed using the Doublet Detection package. The average gene detection in each cell type was similar among the samples. Combining samples in the entire cohort yielded a filtered count matrix of 48,926 cells by 19,854 genes, with a median of 6,944 counts and a median of 1,973 genes per cell, and a median of 2,039 cells per sample. The count matrix was then normalized to CPM (counts per million), and log2(X+1) transformed for analysis of the combined dataset. The top 1000 highly variable genes were found using SCANPY (version 1.6.1) (77). Principal Component Analysis (PCA) was performed on the 1,000 most variable genes with the top 50 principal components (PCs) retained with 29% variance explained. To visualize single cells of the global atlas, we used UMAP projections (https://arxiv.org/abs/1802.03426). We then performed Leiden clustering. Marker genes for each cluster were found with scanpy.tl.rank_genes_groups. Cell types were determined using the SCSA package, an automatic tool, based on a score annotation model combining differentially expressed genes (DEGs) and confidence levels of cell markers from both known and user-defined information. Heat-map were performed for single cells based on log-normalized and scaled expression values of marker genes curated from literature or identified as highly differentially expressed. Differentially expressed genes between different clusters were found using MAST package, which were shown in heat-map. The logFC of MAST output was used for the ranked gene list in GSEA analysis (48). The custom gene sets used in GSEA analysis are shown in Table S2. Gene imputation was performed using MAGIC (Markov affinity-based graph imputation of cells) package, and imputated gene expression were used in the heatmap. Analysis of public human gene expression datasets: To analyze TP53 RNA expression in human prostate cancer samples, we obtained normalized RNA-seq data from prostate cancer TCGA (www.firebrowse.org) (3). To assess the role of TP53 loss on
https://ega-archive.org/dacs/EGAC00001002989https://ega-archive.org/dacs/EGAC00001002989
Manuscript Title: Co-targeting of BTK and MALT1 overcomes resistance to BTK inhibitors in mantle cell lymphoma
Journal: Journal of Clinical Investigation
Authors Vivian Changying Jiang1, Yang Liu1, Junwei Lian1, Shengjian Huang1, Alexa Jordan1, Qingsong Cai1, Fangfang Yan3, Joseph Mitchell McIntosh1, Yijing Li1, Yuxuan Che1, Zhihong Chen1, Jovanny Vargas1, Maria Badillo1, JohnNelson Bigcal1, Heng-Huan Lee1, Wei Wang1, Yixin Yao1, Lei Nie1, Christopher Flowers1, and Michael Wang1, 2*
Abstract Bruton’s tyrosine kinase (BTK) is a proven target in mantle cell lymphoma (MCL), an aggressive subtype of non-Hodgkin lymphoma. However, resistance to BTK inhibitors is a major clinical challenge. We here report that MALT1 is one of the top overexpressed genes in ibrutinib-resistant MCL cells, while expression of CARD11, which is upstream of MALT1, is decreased. MALT1 genetic knockout or inhibition produced dramatic defects in MCL cell growth regardless of ibrutinib sensitivity. Conversely, CARD11 knockout cells showed anti-tumor effects only in ibrutinib-sensitive cells, suggesting that MALT1 overexpression could drive ibrutinib resistance via bypassing BTK-CARD11 signaling. Additionally, BTK knockdown and MALT1 knockout markedly impaired MCL tumor migration and dissemination, and MALT1 pharmacological inhibition decreased MCL cell viability, adhesion, and migration by suppressing NF-κB, PI3K-ATK-mTOR, and integrin signaling. Importantly, co-targeting MALT1 with safimaltib and BTK with pirtobrutinib induced potent anti-MCL activity in ibrutinib-resistant MCL cell lines and patient-derived xenografts. Therefore, we conclude that MALT1 overexpression associates with resistance to BTK inhibitors in MCL, targeting abnormal MALT1 activity could be a promising therapeutic strategy to overcome BTK inhibitor resistance, and co-targeting of MALT1 and BTK should improve MCL treatment efficacy and durability as well as patient outcomes.
Dataset description: The bulk RNA-seq dataset was generated for the cell lines below and used for two major purposes: 1. DEG analysis and GSEA analysis comparing IBN-R and IBN-S cells 2. DEG analysis and GSEA analysis comparing MCL cells with/without MI-2 treatment.
sample Cell MI-2 Ibrutinib (IBN) Venetoclax (VEN) Used for IBN-R vs IBN-S comparison Used for MI-2 vs untreated (DMSO) H9 Granta519 - R S yes H21 Granta519 - R S yes H33 Granta519 - R S yes H10 Granta519-VEN-R - R R yes H22 Granta519-VEN-R - R R yes H34 Granta519-VEN-R - R R yes H3 JeKo BTK KD_1 - R R yes yes H15 JeKo BTK KD_1 - R R yes yes H27 JeKo BTK KD_1 - R R yes yes H5 JeKo BTK KD_2 - R R yes yes H17 JeKo BTK KD_2 - R R yes yes H29 JeKo BTK KD_2 - R R yes yes H1 JeKo-1 - S R yes yes H13 JeKo-1 - S R yes yes H25 JeKo-1 - S R yes yes H7 Mino - S S yes H19 Mino - S S yes H31 Mino - S S yes H8 Mino-VEN-R - S R yes H20 Mino-VEN-R - S R yes H32 Mino-VEN-R - S R yes H11 Rec-1 - S S yes H23 Rec-1 - S S yes H12 Rec-VEN-R - S S yes H24 Rec-VEN-R - S R yes H36 Rec-VEN-R - S R yes H35 Rec-1 -- S R yes H4 JeKo BTK KD_1 + MI-2 + yes H16 JeKo BTK KD_1 + MI-2 + yes H28 JeKo BTK KD_1 + MI-2 + yes H6 JeKo BTK KD_2 + MI-2 + yes H18 JeKo BTK KD_2 + MI-2 + yes H30 JeKo BTK KD_2 + MI-2 + yes H2 JeKo-1 + MI-2 + yes H14 JeKo-1 + MI-2 + yes H26 JeKo-1 + MI-2 + yes
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Polycystic ovary syndrome (PCOS) is a common endocrine disorder characterized by hyperandrogenemia of ovarian thecal cell origin, resulting in anovulation/oligo-ovulation and infertility. Our previous studies established that ovarian theca cells isolated and propagated from ovaries of normal ovulatory women and women with PCOS, have distinctive molecular and cellular signatures that underlie the increased androgen biosynthesis in PCOS. To evaluate differences between gene expression in single cells from passaged cultures of theca cells from ovaries of normal ovulatory women and women with PCOS, we performed single-cell RNA sequencing (scRNA-seq). Results from these studies revealed differentially expressed pathways and genes involved in the acquisition of cholesterol, the precursor of steroid hormones, and steroidogenesis. Bulk RNA-seq and microarray studies confirmed the theca cell differential gene expression profiles. The expression profiles appear to be directed largely by increased levels or activity of the transcription factors SREBF1, which regulates genes involved in cholesterol acquisition (LDLR, LIPA, NPC1, CYP11A1, FDX1, FDXR) and GATA6, which regulates expression of genes encoding steroidogenic enzymes (CYP17A1) in concert with other differentially expressed transcription factors (SP1, NR5A2). This study provides insights into the molecular mechanisms underlying the hyperandrogenemia associated with PCOS, and highlights potential targets for molecular diagnosis and therapeutic intervention.
scRNA-seq data in the form of 10x Cell Ranger files are available for the following samples:
PCOS affected - Mc03, Mc10, Mc16, Mc26, Mc27
Normal cycling women - Mc02, Mc06, Mc31, Mc40, Mc50
A F in the sample name indicates forskolin treatment and a C indicates untreated control samples.
Cell lines and compounds PCa cell lines (LNCaP, 22Rv1, VCaP, PC3, DU145, NCI-H660, C4-2), other cell lines (HEK293T, DLD1) and benign prostate line (RWPE-1) were purchased from ATCC and maintained according to ATCC protocols. Patient-derived CRPC organoids (WCM and MSK) were established and maintained as organoids in Matrigel drops according to the previously described protocol70. LNCaP-AR cells were a kind gift from Dr. Sawyers and Dr. Mu (Memorial Sloan Kettering Cancer Center) and were cultured as previously described5. All used cell lines and their phenotype are listed in Supplementary Table 1. Cell cultures were regularly tested for Mycoplasma contamination and confirmed to be negative. Genentech Inc. synthesized A947, its epimer (A858), FHD-286 and AU-15330. Cobimetinib, Trametinib, VL285 and CHIR99021 were purchased from SelleckChem. BRM014 was purchased from MedChemExpress. All drugs used in this study are listed in Supplementary Table 2. Single-cell RNA-sequencing by SORT-seq library generation and analysis SORT-seq was performed using Single Cell Discoveries (SCD) service. Organoids were treated for 72h with a control epimer (A858) or active compound (A947) at 1 µM, and 1x10e6 cells were harvested in PBS. Harvested cells were stained with 100ng/ml DAPI to stain dead cells. Using a cell sorter (conducted by Flow Cytometry Core, DBMR, Bern) and the recommended settings (Single Cell Discoveries B.V.), DAPI-negative cells were sorted as single cells in 376 wells of four 384-well plates containing immersion oil per condition. Resulting in a theoretical cell number of 1504 cells per condition. All post-harvesting steps were performed at 4°C. Plates were snap-frozen on dry ice for 15 minutes and sent out for sequencing at Single Cell Discoveries B.V. Data were analyzed using the Seurat package v.4.3.080. Cell QC filtering was done using the following thresholds: nCount > 4000, nFeature > 1000, percent.mito 0.85. Differential gene expression analysis between clusters was done with Seurat::FindAllMarkers. Module scores were generated with Seurat::AddModuleScore. Gene set enrichment analysis was done with the package fgsea v.1.24.081 and the human gene sets from the Molecular Signatures Database (https://www.gsea-msigdb.org). Gene regulatory networks analysis was done with pySCENIC v.0.12.182. Overall analysis was done in R v.4.2.2. RNA-seq library generation and processing For bulk RNA-seq, organoids were treated with A858 or A947 (1µM) for 24h and 48h (3 biological replicates per condition). RNA was extracted using the RNeasy Kit (Qiagen); library generation and subsequent sequencing was performed by the clinical genomics lab (CGL) at the University of Bern. Sequencing reads were aligned against the human genome hg38 with STAR v.2.7.3a83. Gene counts were generated with RSEM v.1.3.284, whose index was generated using the GENCODE v33 primary assembly annotation. Differential gene expression analysis was done with DESeq2 v.1.34.085. Gene set enrichment analysis was done with the package fgsea v.1.20.081 and the human gene sets from the Molecular Signatures Database (https://www.gsea-msigdb.org). Analysis was done in R v.4.1.2. TCF7L2 ChIP-seq library generation and processing For the ChIP-Seq assay, chromatin was prepared from 2 biological replicates of WCM1078 treated with A858 or A947 (1µM) for 4h, and ChIP-Seq assays were then performed by Active Motif Inc. using an antibody against TCF7L2 (Santa Cruz, cat# sc-8631, Lot# D0914). ChIP-seq sequence data was processed using an ENCODE-DC/chip-seq-pipeline2 -based workflow (https://github.com/ENCODE-DCC/chip-seq-pipeline2). Briefly, fastq files were aligned on the hg38 human genome reference using Bowtie2 (v2.2.6) followed by alignment sorting (samtools v1.7) of resulting bam files with filtering out of unmapped reads and keeping reads with mapping quality higher than 30. Duplicates were removed with Picard’s MarkDuplicates (v1.126) function, followed by indexation of resulting bam files with samtools. For each bam file, genome coverage was computed with bedtools (v2.26.0), followed by the generation of bigwig (wigToBigWig v377) files. Peaks were called with macs2 (v2.2.4) for each treatment sample using a pooled input alignment (.bam file) as control. Downstream analyses were performed with DiffBind v3.11.1 with default parameters, except for summits=250 in dba.count(). dba.contrast() and dba.analyzed() were used to compute significant differential peaks with DESeq2. ATAC-seq library generation and processing ATAC-seq was performed from 50’000 cryo-preserved cells per condition (1µM A858 and 1µM A947, n = 3 biological replicates) treated for 4h and analyzed as described in previous study86. Briefly, 50,000 cryo-preserved cells per condition were lysed for 5 minutes on ice and tagmented for 30 minutes at 37°C, followed by DNA isolation. DNA was barcoded and amplified before sequencing. PRO-cap library generation and processing For PRO-cap, app...
Molecular and behavioral responses to opioids are thought to be primarily mediated by neurons, although there is accumulating evidence that other cell types play a prominent role in drug addiction. To investigate cell-type-specific opioid responses, we performed single-cell RNA sequencing of the nucleus accumbens of mice following acute morphine treatment. Differential expression analysis uncovered unique morphine-dependent transcriptional responses by oligodendrocytes and astrocytes. Further analysis using RNAseq of FACS-purified oligodendrocytes revealed a large cohort of morphine-regulated genes. Importantly, the affected genes are enriched for roles in cellular pathways intimately linked to oligodendrocyte maturation and myelination, including the unfolded protein response. Altogether, our data illuminate the morphine-dependent transcriptional response by oligodendrocytes and offer mechanistic insights into myelination defects associated with opioid abuse. For single-cell experiments, we perfoed Drop-seq on 4 biological replicates each of the nucleus accumbens from saline- or morphine-treated mice. For bulk RNAseq experiments, we performed FACS to purify oligodendrocytes from the entire brains of 6 mice (3 mock, 3 morphine), then isolated RNA and subjected to RNA sequencing.
Wound healing is an intensely studied topic involved in many relevant pathophysiological processes, including fibrosis. Despite the large interest in fibrosis, the network that related to commensal microbiota and skin fibrosis remain mysterious. Here, we pay attention to keloid, a classical yet intractable skin fibrotic disease to establish the association between commensal microbiota to scaring tissue. Our histological data reveal the presence of microbiota in the keloids. 16S rRNA sequencing characterize microbial composition and divergence between the pathological and normal skin tissue. Our research provides insights into the pathology of human fibrotic diseases, advocating commensal bacteria and IL-8 signaling as useful targets in future interventions of recurrent keloid disease., 16S rDNA sequencing data The data files here are raw 16S rDNA sequencing accompanied by R scripts that were used to analyze these data. The source of the data was: (1) a swab test from the surface of normal and keloid skin; (2) the tissues of keloid patients from deeper parts of the skin. Surface microbiota samples were collected from the pathological location or the normal lateral location of patients using a swab (Catch-all Sample Collection Swab, Epicenter) moistened in Yeast Cell Lysis Buffer (from MasterPure Yeast DNA Purification Kit; Epicenter). Samples were snap-frozen on dry ice, and DNA was isolated from specimens using the PureLink Genomic DNA Mini Kit (Invitrogen).  Amplification of the 16S-V3+V4 region was performed according to the manufacturer’s specifications. Sequencing of 16S rRNA amplicons was conducted by Apexbio Co., Shanghai, China using the Illumina Novaseq platform. The data were analyzed with the attached R scripts. Bulk RNA-Seq For RNA sequencing, human dermal ..., , # Commensal microbiome dysbiosis in keloid disease
https://doi.org/10.5061/dryad.d51c5b0bt
These datasets contain original fastq.gz readouts from 16S rDNA sequencing experiments and scripts used to analyze these data. The experiments were performed on patients with keloids, a fibrotic skin disease. We did two similar experiments with the same way of analysis:
Additionally, we provide the files from bulk RNA sequencing of human dermal fibroblasts treated with IL-8 and TGF-beta here.
All the data, additional files, and R scripts ar...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the data and the code used in tanja hyvarinen's project (tanja.hyvarinen@tuni.fi)
"analysis" contains the processing and analysis of the bulk RNA sequencing data, contains the analysis of the integration between our RNA seq and external datasets.
"fastq" contains the raw fastq sequences of the bulk RNA sequencing
"counts" contains the results of processing the fasta sequences with nfcore rnaseq workflow
Analysis and integration folders can contain the starting raw data in "data", the R scripts in order of execution (op1, op2 ..) and the "output" folder that contains the final processed data of each operation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data repository for the scMappR manuscript:
Abstract from biorXiv (https://www.biorxiv.org/content/10.1101/2020.08.24.265298v1.full).
RNA sequencing (RNA-seq) is widely used to identify differentially expressed genes (DEGs) and reveal biological mechanisms underlying complex biological processes. RNA-seq is often performed on heterogeneous samples and the resulting DEGs do not necessarily indicate the cell types where the differential expression occurred. While single-cell RNA-seq (scRNA-seq) methods solve this problem, technical and cost constraints currently limit its widespread use. Here we present single cell Mapper (scMappR), a method that assigns cell-type specificity scores to DEGs obtained from bulk RNA-seq by integrating cell-type expression data generated by scRNA-seq and existing deconvolution methods. After benchmarking scMappR using RNA-seq data obtained from sorted blood cells, we asked if scMappR could reveal known cell-type specific changes that occur during kidney regeneration. We found that scMappR appropriately assigned DEGs to cell-types involved in kidney regeneration, including a relatively small proportion of immune cells. While scMappR can work with any user supplied scRNA-seq data, we curated scRNA-seq expression matrices for ∼100 human and mouse tissues to facilitate its use with bulk RNA-seq data alone. Overall, scMappR is a user-friendly R package that complements traditional differential expression analysis available at CRAN.