Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The description of bulk RNA-seq datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains R and Python scripts used to analyze bulk RNA-Seq datasets from the Gene Expression Omnibus (GEO) related to breast, prostate, endometrial, lung, and colorectal cancer. The workflow includes differential expression analysis, survival analysis, functional enrichment, and visualization.Included workflows:R scripts:Data processing and normalization of GEO-derived bulk RNA-Seq datasets.Differential expression analysis (DEG) using limma.Survival analysis using Cox proportional hazards modelsFunctional enrichment analysis (GO, Reactome) of overlapping DEGs to identify significant pathwaysPython scripts:Forest plot generation using Matplotlib and Seaborn to visualize the survival analysis's hazard ratios and confidence intervals.Data Source:Publicly available bulk RNA-Seq datasets from the Gene Expression Omnibus (GEO) database.
https://www.immport.org/agreementhttps://www.immport.org/agreement
Single-cell RNA sequencing (scRNA-Seq) studies have provided critical insight into the pathogenesis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of coronavirus disease 2019 (COVID-19). scRNA-Seq library preparation methods and data processing workflows are generally designed for the detection and quantification of eukaryotic host mRNAs and not viral RNAs. Here, we compare different scRNA-Seq library preparation methods for their ability to quantify and detect SARS-CoV-2 RNAs with a focus on subgenomic mRNAs (sgmRNAs). We show that compared to 10X Genomics Chromium Next GEM Single Cell 3' (10X 3') libraries or 10X Genomics Chromium Next GEM Single Cell V(D)J (10X 5') libraries sequenced with standard read configurations, 10X 5' libraries sequenced with an extended length read 1 (R1) that covers both cell barcode and transcript sequence (termed ""10X 5' with extended R1"") increase the number of unambiguous reads spanning leader-sgmRNA junction sites. We further present a data processing workflow, single-cell coronavirus sequencing (scCoVseq), which quantifies reads unambiguously assigned to viral sgmRNAs or viral genomic RNA (gRNA). We find that combining 10X 5' with extended R1 library preparation/sequencing and scCoVseq data processing maximizes the number of viral UMIs per cell quantified by scRNA-Seq. Corresponding sgmRNA expression levels are highly correlated with expression in matched bulk RNA-Seq data sets quantified with established tools for SARS-CoV-2 analysis. Using this scRNA-Seq approach, we find that SARS-CoV-2 gene expression is highly correlated across individual infected cells, which suggests that the proportion of viral sgmRNAs remains generally consistent throughout infection. Taken together, these results and corresponding data processing workflow enable robust quantification of coronavirus sgmRNA expression at single-cell resolution, thereby supporting high-resolution studies of viral RNA processes in individual cells. IMPORTANCE Single-cell RNA sequencing (scRNA-Seq) has emerged as a valuable tool to study host-virus interactions, especially for coronavirus disease 2019 (COVID-19). Here we compare the performance of different scRNA-Seq library preparation methods and sequencing strategies to detect SARS-CoV-2 RNAs and develop a data processing workflow to quantify unambiguous sequence reads derived from SARS-CoV-2 genomic RNA and subgenomic mRNAs. After establishing a workflow that maximizes the detection of SARS-CoV-2 subgenomic mRNAs, we explore patterns of SARS-CoV-2 gene expression across cells with variable levels of total viral RNA, assess host gene expression differences between infected and bystander cells, and identify non-canonical and lowly abundant SARS-CoV-2 RNAs. The sequencing and data processing strategies developed here can enhance studies of coronavirus RNA biology at single-cell resolution and thereby contribute to our understanding of viral pathogenesis.
Conventional (bulk) RNA-sequencing was performed on unfractionated cell suspension or snap frozen whole tissue material. Total RNA was isolated with TRIzol reagent followed by purification over PureLink RNA Mini Kit columns (Invitrogen). RNA-seq was performed using a polyA-enriched strand-specific library construction protocol (doi: 10.1016/j.ccell.2016.02.009) and paired-end 75bp sequencing on an Illumina HiSeq 2500 instrument. Raw reads were aligned to the reference human genome assembly GRCh37 (hg19) using STAR (v2.5.2.a). To improve spliced alignment, STAR was provided with exon junction coordinates from the reference annotations (Gencode v19). We applied a modified version of a bioinformatics workflow for normalization of raw read counts and differential gene expression analysis (doi: 10.12688/f1000research.9005.3). Gene-level read counts were quantified using HTSEQ-count (v0.11.0; intersection-strict, reverse mode) (doi: 10.1093/bioinformatics/btu638). Genes showing low read counts (i.e., genes not showing counts per million (cpm) > 1.0 in at least 10% of samples) were removed from further analysis. Raw counts from expressed genes were then TMM-normalized and scaled to counts per million (CPM) using the edgeR (v3.22.2) package (doi: 10.1093/bioinformatics/btp616). Sample IDs correspond to those referenced in Wang X et al, Nature Communications (2022).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Multipotent neural stem cells (NSCs) are found in several isolated niches of the adult mammalian brain where they have unique potential to assist in tissue repair. Modern transcriptomics offer high-throughput methods for identifying disease or injury associated gene expression signatures in endogenous adult NSCs, but they require adaptation to accommodate the rarity of NSCs. Bulk RNA sequencing (RNAseq) of NSCs requires pooling several mice, which impedes application to labor-intensive injury models. Alternatively, single cell RNAseq can profile hundreds to thousands of cells from a single mouse and is increasingly used to study NSCs. The consequences of the low RNA input from a single NSC on downstream identification of differentially expressed genes (DEGs) remains insufficiently explored. Here, to clarify the role that low RNA input plays in NSC DEG identification, we directly compared DEGs in an oxidative stress model of cultured NSCs by bulk and single cell sequencing. While both methods yielded DEGs that were replicable, single cell sequencing using the 10X Chromium platform yielded DEGs derived from genes with higher relative transcript counts compared to non-DEGs and exhibited smaller fold changes than DEGs identified by bulk RNAseq. The loss of high fold-change DEGs in the single cell platform presents an important limitation for identifying disease-relevant genes. To facilitate identification of such genes, we determined an RNA-input threshold that enables transcriptional profiling of NSCs comparable to standard bulk sequencing and used it to establish a workflow for in vivo profiling of endogenous NSCs. We then applied this workflow to identify DEGs after lateral fluid percussion injury, a labor-intensive animal model of traumatic brain injury. Our work joins an emerging body of evidence suggesting that single cell RNA sequencing may underestimate the diversity of pathologic DEGs. However, our data also suggest that population level transcriptomic analysis can be adapted to capture more of these DEGs with similar efficacy and diversity as standard bulk sequencing. Together, our data and workflow will be useful for investigators interested in understanding and manipulating adult hippocampal NSC responses to various stimuli.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of the reference data files, metadata and processed results files for the paper "Cardelino: Integrating whole exomes and single-cell transcriptomes to reveal phenotypic impact of somatic variants," which investigates clonality in normal human dermal fibroblast cell populations in 32 cell lines from distinct donors, using bulk whole-exome sequencing and single-cell RNA-sequencing data.
This dataset contains everything required to reproduce the results presented in the paper from processed data and results of our data processing workflows. Our analyses can be reproduced using the source code and instructions available at our project website.
The entire analysis workflow from raw data to final results is also reproducible but is substantially more complicated and computationally intensive. It also requires large datasets to be obtained from other repositories. Specifically, single-cell RNA-seq data have been deposited in the ArrayExpress database at EMBL-EBI under accession number E-MTAB-7167. Whole-exome sequencing data is available through the HipSci portal (www.hipsci.org). Combined with the dataset in this repository and following the instructions on the project website, it is possible to run our entire analysis pipeline.
Expression profiling by high throughput sequencing.
27 single-cell, 12 single-nuclei and 15 bulk RNA sequencing experiments using healthy adult mouse kidneys, three biological replicates are included per condition.
Single-cell expression profiling is a rich resource of cellular heterogeneity. While profiling every sample under study is advantageous, such workflow is time consuming and costly. We devised CPM - a deconvolution algorithm in which cellular heterogeneity is inferred from bulk expression data based on pre-existing collection of single-cell RNA-seq profiles. We applied CPM to investigate individual variation in heterogeneity of murine lung cells during in vivo influenza virus infection, revealing that the relations between cell quantities and clinical outcomes varies in a gradual manner along the cellular activation process. Validation experiments confirmed these gradual changes along the cellular activation trajectory. Additional analysis suggests that clinical outcomes relate to the rate of cell activation at the early stages of this process. These findings demonstrate the utility of CPM as a mapping deconvolution tool at single-cell resolution, and highlight the importance of such fine cell landscape for understanding diversity of clinical outcomes. Lungs gene expression of Collaborative Cross mice taken 48h after the infection with either the influenza virus or PBS.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Postmortem-derived Brain Sequencing Collection is a harmonized repository of scRNAseq sequencing data contributed by two ASAP CRN teams (Team Lee and Team Hafler). These samples were derived from the middle frontal gyrus, hippocampus, substantial nigra, and pre-frontal cortex regions of healthy controls, Parkinson’s Disease, and Alzheimer’s Disease brains. The samples have been harmonized across 10x sequenced data aligned into count tables with cell ranger.
The current collection represents the minimum viable product and will be expanded and improved as additional data is uploaded into the ASAP CRN Cloud. When complete, the collection will provide data generated from 1,800+ samples using proteomics, transcriptomics, and sequencing (single-nucleus RNAseq, single-cell RNAseq, bulk RNA-seq, ATAC-seq, long read WGS, and single-nucleus multiome sequencing (paired snRNAseq, snATACseq)) techniques.
The analysis workflow for this MVP dataset is available at https://github.com/ASAP-CRN/harmonized-wf-dev
This repository contains the data and the code used in tanja hyvarinen's project (tanja.hyvarinen@tuni.fi) "analysis" contains the processing and analysis of the bulk RNA sequencing data, contains the analysis of the integration between our RNA seq and external datasets."fastq" contains the raw fastq sequences of the bulk RNA sequencing"counts" contains the results of processing the fasta sequences with nfcore rnaseq workflow Analysis and integration folders can contain the starting raw data in "data", the R scripts in order of execution (op1, op2 ..) and the "output" folder that contains the final processed data of each operation.
According to our latest research, the global spatial transcriptomics slide kit market size reached USD 355 million in 2024. The market is expected to grow at a remarkable CAGR of 13.7% from 2025 to 2033, driven by rapid technological advancements and increasing adoption in biomedical research. By 2033, the market is forecasted to reach approximately USD 1,090 million. This robust growth is primarily attributed to the rising demand for high-resolution spatial gene expression analysis in fields such as oncology, neuroscience, and immunology, as well as expanding applications in drug discovery and precision medicine.
One of the primary growth factors fueling the spatial transcriptomics slide kit market is the escalating demand for spatially resolved transcriptomic data in cancer research. Traditional bulk RNA sequencing techniques lack the spatial context necessary for understanding tumor heterogeneity and the tumor microenvironment. Spatial transcriptomics technologies, by contrast, enable researchers to localize gene expression patterns within tissue sections, offering unprecedented insights into cellular interactions and disease mechanisms. This capability is increasingly critical in oncology, where spatial gene expression profiling is being leveraged to identify novel biomarkers, elucidate mechanisms of drug resistance, and inform the development of targeted therapies. As cancer research remains a central focus of global biomedical innovation, the adoption of advanced spatial transcriptomics slide kits is expected to accelerate further.
Another significant driver is the rapid evolution of sequencing and imaging technologies, which has dramatically improved the sensitivity, throughput, and cost-effectiveness of spatial transcriptomics workflows. Next-generation sequencing (NGS) platforms, combined with innovative imaging-based approaches, have enabled high-resolution mapping of transcriptomes at the single-cell and subcellular levels. These technological advancements have not only expanded the range of possible applications but also democratized access to spatial transcriptomics by reducing operational complexity and lowering costs. As a result, a broader spectrum of research institutions, pharmaceutical companies, and clinical laboratories are incorporating spatial transcriptomics slide kits into their workflows, further propelling market growth.
The increasing emphasis on precision medicine and personalized healthcare is also catalyzing the expansion of the spatial transcriptomics slide kit market. Spatial transcriptomics technologies offer critical insights into the cellular architecture of tissues and the molecular underpinnings of disease, enabling more precise patient stratification and therapy selection. This is particularly relevant in complex diseases such as neurodegenerative disorders and autoimmune conditions, where spatial heterogeneity plays a pivotal role in disease progression and treatment response. The integration of spatial transcriptomics data into clinical research pipelines is expected to enhance diagnostic accuracy, accelerate biomarker discovery, and facilitate the development of next-generation therapeutics, thereby driving sustained market growth.
From a regional perspective, North America currently dominates the spatial transcriptomics slide kit market, accounting for the largest revenue share in 2024. This leadership position is underpinned by the presence of leading biotechnology firms, robust research funding, and a well-established healthcare infrastructure. Europe follows closely, driven by strong academic research networks and increasing government investments in genomics. The Asia Pacific region, meanwhile, is emerging as a high-growth market, buoyed by expanding biomedical research activities, rising healthcare expenditure, and growing collaborations between academic and industry stakeholders. As these regions continue to invest in cutting-edge genomic technologies, the global spatial transcriptomics slide kit market is poised for sustained expansion.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Normalized counts from bulk RNA sequencing analysis of macrophages from old and young mice in co-culture with osteoblasts during mineralization
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance comparison of cancer prognosis using benchmark methods and the proposed methods (scP.V and scP.W).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
WikiPathways mapped pathways for the 10 breast cancer signatures.
Neutrophils have emerged as diverse regulators of tissue states, displaying functions in both the resolution and promotion of tissue inflammation. While neutrophils have widely been associated with tumor promotion, immune suppression, and poor patient outcome, we provide evidence to support direct tumor cytotoxic properties of neutrophils. Using various models of murine breast cancer; we establish that TLR-mediated engagement, combined with complex I inhibition, within the breast tumor microenvironment, acting either directly or indirectly on neutrophils, primes these innate immune cells to acquire direct tumor killing properties, both in vitro and in vivo, and independently of CD8+ T cell immunity. TLR engagement stimulates emergency granulopoiesis, increasing levels of neutrophils in the circulation and infiltrating into tumors without inducing the formation of a pro-metastatic niche. Mechanistically, we show that systemic administration of various TLR agonists, while increasing systemic inflammation, elevates NFB signalling in neutrophils, to contribute to their tumoricidal functions. Moreover, using bulk- and single-cell RNA sequencing, along with proteomics approaches, we show that neutrophils which are trained to acquire these anti-tumorigenic functions both enhance secretory granule production and increase expression NADPH-oxidase machinery. Concomitantly, these tumoricidal neutrophils increase production of toxic levels of reactive oxygen species, which can be overcome with myeloperoxidase inhibitors or overexpression of ROS scavengers. Taken together, we describe a new class of neutrophils that possess direct and intrinsic tumoricidal functions, which can be exploited to eradicate immune cold breast tumors which otherwise are refractory to standard immunotherapies, including immune checkpoint blockade.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single-cell RNA sequencing (scRNA-seq) is a high-throughput experimental technique for studying gene expression at the single-cell level. As a key component of single-cell data analysis, differential expression analysis (DEA) serves as the foundation for all subsequent secondary studies. Despite the fact that biological replicates are of vital importance in DEA process, small biological replication is still common in sequencing experiment now, which may impose problems to current DEA methods. Therefore, it is necessary to conduct a thorough comparison of various DEA approaches under small biological replications. Here, we compare 6 performance metrics on both simulated and real scRNA-seq datasets to assess the adaptability of 8 DEA approaches, with a particular emphasis on how well they function under small biological replications. Our findings suggest that DEA algorithms extended from bulk RNA-seq are still competitive under small biological replicate conditions, whereas the newly developed method DEF-scRNA-seq which is based on information entropy offers significant advantages. Our research not only provides appropriate suggestions for selecting DEA methods under different conditions, but also emphasizes the application value of machine learning algorithms in this field.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 15: Table S13. Genes differentially expressed between bulk RNA-seq profiles of intact kidneys and cold-dissociated single-cell suspensions; includes results of functional analysis with ToppGene [28] for genes with higher expression in intact kidneys.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a compressed dataset containing 4 files:
1) "raw_count_data_filtered.txt" (1.3GB): raw read counts of genes in 8,796 human bulk RNA-seq samples.
2) "logCombat_UQ_log10.txt" (5.1GB): the RNA-seq data after upper-quartile (UQ) normalization, and batch effect correction using ComBat. According to our study this is in general the best processing workflow for this data. Values are log10 values after addition of a small pseudo count.
3) "annotation_data.txt" (297KB): a table with the sample ID, study ID, and assigned cell type or tissue of each RNA-seq sample in this dataset.
4) "cell_types_vs_index.txt" (1.2KB): a list of each cell type and tissue in this dataset, along with their sample counts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Although an essential step, the functional annotation of cells often proves particularly challenging in the analysis of single-cell transcriptional data. Several methods have been developed to accomplish this task. However, in most cases, these rely on techniques initially developed for bulk RNA sequencing or simply make use of marker genes identified from cell clustering followed by supervised annotation. To overcome these limitations and automatise the process, we have developed two novel methods, the single-cell gene set enrichment analysis (scGSEA) and the single cell mapper (scMAP). scGSEA combines latent data representations and gene set enrichment scores to detect coordinated gene activity at single-cell resolution. scMAP uses transfer learning techniques to repurpose and contextualise new cells into a reference cell atlas. Using both simulated and real datasets, we show that scGSEA effectively recapitulates recurrent patterns of pathways’ activity shared by cells from different experimental conditions. At the same time, we show that scMAP can reliably map and contextualise new single cell profiles on a breast cancer atlas we recently released. Both tools are provided in an effective and straightforward workflow providing a framework to determine cell function and significantly improve annotation and interpretation of scRNA-seq data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The simulation data generation scheme for different genes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The description of bulk RNA-seq datasets.