Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains all the Seurat objects that were used for generating all the figures in Pal et al. 2021 (https://doi.org/10.15252/embj.2020107333). All the Seurat objects were created under R v3.6.1 using the Seurat package v3.1.1. The detailed information of each object is listed in a table in Chen et al. 2021.
https://ega-archive.org/dacs/EGAC00001001974https://ega-archive.org/dacs/EGAC00001001974
Single-cell RNA-Sequencing of 26 primary breast cancers from Wu et al. (2021) study. Data was generated using the Chromium controller (10X Genomics) and sequenced on the NextSeq 500 platform.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
"*.csv" files contain the single cell gene expression values (log2(tpm+1)) for all genes in each cell from melanoma and squamous cell carcinoma of head and neck (HNSCC) tumors. The cell type and origin of tumor for each cell is also included in "*.csv" files.The "MalignantCellSubtypes.xlsx" defines the tumor subtype."CCLE_RNAseq_rsem_genes_tpm_20180929.zip" is downloaded from CCLE database.
https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
Data Set DescriptionSingle cell RNA sequencing (Samrt-Seq3) and Whole exome sequencing from multiple regions of individual tumors from Breast Cancer patients and also single cell RNA seq for two ovarian cancer cell lines.The dataset contains raw sequencing data for various high-throughput molecular tests performed on two sample types: tumor samples from two breast cancer patients and cell lines derived from High-grade serous carcinoma Patients. The breast cancer data comes from two patients: patient 1 (BCSA1) has two tumor regions A-B and patient 2 (BCSA2) has five regions(A-E). For a normal sample and each region from each patient Whole Exome Sequencing was performed using Twist Biosciences Human Exome Kit by the SNP&SEQ Technology platform, SciLifeLab, National Genomics Infrastructure Uppsala, Sweden. Also for each patient, EPCAM+ CD45- sorted cells from all the regions where sorted to a 384 well plate, and Smart-Seq3 libraries were prepared at Karolinska Institutet and sequenced at National Genomics Infrastructure Uppsala, Sweden.The HGSOC cell-line data comes from OV2295R2 and TOV2295R cell lines described in Laks et al Cell 2019 Nov 14; 179(5): 1207–1221.e22 doi: 10.1016/j.cell.2019.10.026 . The cell line Smart-Seq3 libraries were prepared from two 384 well plates at Karolinska Institutet and sequenced at National Genomics Infrastructure Uppsala, Sweden.Terms for accessThis dataset is to be used for research on intratumor heterogeneity and subclonal evolution of tumors. To apply for conditional access to the dataset in this publication, please contact datacentre@scilifelab.se.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We developed a single-cell transcriptomics pipeline for high-throughput pharmacotranscriptomic screening. We explored the transcriptional landscape of three HGSOC models (JHOS2, a representative cell line; PDC2 and PDC3, two patient-derived samples) after treating their cells for 24 hours with 45 drugs representing 13 distinct classes of mechanism of action. Our work establishes a new precision oncology framework for the study of molecular mechanisms activated by a broad array of drug responses in cancer. . ├── 3D UMAPs/ → Interactive 3D UMAPs of cells treated with the 45 drugs used for multiplexed scRNA-seq. Related to Figure 4. Coordinates: x = UMAP 1; y = UMAP 2; z = UMAP 3. Legend: green = PDC1; blue = PDC2; red = JHOS2. │ ├── DMSO_3D_UMAP_Dini.et.al.html → 3D UMAP of untreated cells. │ └── drug_3D_UMAP_Dini.et.al.html → 3D UMAP of cells treated with (drug). ├── QC_plots/ → Diagnostic plots. Related to Figures 2–4. │ ├── model_QC_violin_plot_2023.pdf → Violin plots of the QC metrics used to filter the data. │ ├── model_col_HTO or model_row_HTO before and after filt → Heatmaps of the row or column HTO expression in each cell. │ └── model_counts_histogram_2023.pdf → Histogram of the distribution of the total counts per cell after filtering for high-quality cells. ├── scRNAseq/ → scRNA-seq data. Related to Figures 2–4. │ ├── AllData_subsampled_DGE_edgeR.csv.gz → Differential gene expression analyses results between treated and untreated cells via pseudobulk of aggregate subsamples, for each of the three models. Related to Figure 3. │ └── All_vs_all_RNAclusters_DEG_signif.txt → Differential gene expression analysis results (p.adj < 0.05) of FindAllMarkers for the Leiden/RNA clusters. ├── PDCs.transcript.counts.tsv → Bulk RNA-seq count data for PDCs 1–3 processed by Kallisto. Related to Figure S6. └── PDCs.transcript.TPM.tsv → Bulk RNA-seq TPM data for PDCs 1–3 processed by Kallisto. Related to Figure S6.
There is a growing need for integration of “Big Data” into undergraduate biology curricula. Transcriptomics is one venue to examine biology from an informatics perspective. RNA sequencing has largely replaced the use of microarrays for whole genome gene expression studies. Recently, single cell RNA sequencing (scRNAseq) has unmasked population heterogeneity, offering unprecedented views into the inner workings of individual cells. scRNAseq is transforming our understanding of development, cellular identity, cell function, and disease. As a ‘Big Data,’ scRNAseq can be intimidating for students to conceptualize and analyze, yet it plays an increasingly important role in modern biology. To address these challenges, we created an engaging case study that guides students through an exploration of scRNAseq technologies. Students work in groups to explore external resources, manipulate authentic data and experience how single cell RNA transcriptomics can be used for personalized cancer treatment. This five-part case study is intended for upper-level life science majors and graduate students in genetics, bioinformatics, molecular biology, cell biology, biochemistry, biology, and medical genomics courses. The case modules can be completed sequentially, or individual parts can be separately adapted. The first module can also be used as a stand-alone exercise in an introductory biology course. Students need an intermediate mastery of Microsoft Excel but do not need programming skills. Assessment includes both students’ self-assessment of their learning as answers to previous questions are used to progress through the case study and instructor assessment of final answers. This case provides a practical exercise in the use of high-throughput data analysis to explore the molecular basis of cancer at the level of single cells.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
we collected 40 tumor and adjacent normal tissue samples from 19 pathologically diagnosed NSCLC patients (10 LUAD and 9 LUSC) during surgical resections, and rapidly digested the tissues to obtain single-cell suspensions and constructed the cDNA libraries of these samples within 24 hours using the protocol of 10X gennomic. These libraries were sequenced on the Illumina NovaSeq 6000 platform. Finally we obtained the raw gene expression matrices were generated using CellRanger (version 3.0.1). Information was processed in R (version 3.6.0) using the Seurat R package (version 2.3.4).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of tumor microenvironment scRNA-seq datasets included in TMExplorer.
https://ega-archive.org/dacs/EGAC00001001380https://ega-archive.org/dacs/EGAC00001001380
This dataset contains single cell RNA sequencing data of PBMC samples from 10 bladder cancer patients. cDNAs and single cell RNA libraries were prepared following manufacturer’s user guide (10x Genomics). Each library was sequenced in HiSeq4000 (Illumina) to achieve ~300 million reads following manufacturer’s sequencing specification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Beyondcell is a methodology for the identification of drug vulnerabilities in single cell RNA-seq data. To this end, Beyondcell focuses on the analysis of drug-related commonalities between cells by classifying them into distinct therapeutic clusters. We have validated the tool in a population of MCF7-AA cells exposed to 500nM of bortezomib and collected at different time points: t0 (before treatment), t12, t48 and t96 (72h treatment followed by drug wash and 24h of recovery) obtained from Ben-David U, et al., Nature, 2018. Here, you can find the integrated Seurat object obtained from this analysis. This object is meant to help users follow Beyondcell's analysis workflow.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data accompanying the manuscript describing MIX-Seq, a method for transcriptional profiling of mixtures of cancer cell lines treated with small molecule and genetic perturbations (McFarland and Paolella et al., Nat Commun, 2020). Data consists of single-cell RNA-sequencing (UMI count matrices), and associated drug sensitivity and genomic features of the cancer cell lines.See README file for more information on dataset contents.
Table of Contents
1. Main Description
---------------------------
This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled `marengo_code_for_paper_jan_2023.R` was used to generate the figures from the single-cell RNA sequencing data.
The following libraries are required for script execution:
File Descriptions
---------------------------
Linked Files
---------------------
This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:
Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)
Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719
Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)
Installation and Instructions
--------------------------------------
The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:
> Ensure you have R version 4.1.2 or higher for compatibility.
> Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.
1. Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).
2. Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.
3. Set your working directory to where the following files are located:
You can use the following code to set the working directory in R:
> setwd(directory)
4. Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.
5. Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.
6. Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.
7. Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
https://ega-archive.org/dacs/EGAC00001001974https://ega-archive.org/dacs/EGAC00001001974
Single-cell RNA-Sequencing of five TNBC primary breast cancers from Wu et al. (2020) EMBO J study. Data was generated using the Chromium controller (10X Genomics) and sequenced on the NextSeq 500 platform.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Metadata and counts matrix (barcode and genes files also provided) for the colorectal cancer (CRC) spatial transcriptomics and scRNA-seq dataset utilized in the Crescrendo manuscript published by Millard et al. (2025). Batch column indicates whether cell is from scRNA-seq data or which spatial transcriptomics slice. The sample_id column indicates the sample the cell is from. The center_x and center_y columns indicate the center of the cell in space (scRNA-seq cells have 0 in these columns). The orig_publication_type indicates fine-grained cell type labels from the original publication of the CRC scRNA-seq dataset, while the cresc_publication_type column indicates the more coarse-grained cell type labels from the Crescendo publication.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Liquid biopsy is a promising non-invasive technology that is capable of diagnosing cancer. However, current ctDNA-based approaches detect only a minority of early-stage disease. We set out to improve the sensitivity of liquid biopsy by harnessing tumor recognition by T cells through the sequencing of the circulating T-cell receptor repertoire. We studied a cohort of 463 patients with lung cancer (86% stage I) and 587 subjects without cancer using gDNA extracted from blood buffy coats. We performed TCR β chain sequencing to yield a median of 113,571 TCR clonotypes per sample and built a TCR sequence similarity graph to cluster clonotypes into TCR repertoire functional units (RFUs). The TCR frequencies of RFUs were tested for association with cancer status and RFUs with a statistically significant association were combined into a cancer score using a support vector machine model. The model was evaluated by 10-fold cross-validation and compared with a ctDNA panel of 237 mutation hotspots in 154 lung cancer driver genes and 17 cancer related protein biomarkers in 85 subjects. We identified 327 cancer- associated TCR RFUs with a false discovery rate (FDR) ≤ 0.1, including 157 enriched in cancer samples and 170 enriched in controls. Levels of 247/327 (76%) RFUs were correlated with the presence of an HLA allele at FDR ≤ 0.1 and tumor-infiltrating lymphocyte TCRs from multiple RFUs bound HLA presented tumor antigen peptides, suggesting antigen recognition as a driver of the cancer-RFU associations found. The RFU cancer score detected nearly 50% of stage I lung cancers at a specificity of 80% and boosted the sensitivity by up to 20 percentage points when added to ctDNA and circulating proteins in a multi- analyte cancer screening test. Overall, we show that circulating TCR repertoire functional unit analysis can complement established analytes to improve liquid biopsy sensitivity for early-stage cancer.This dataset contains the CellRanger output for 20 cancer patients. Please refer to https://www.10xgenomics.com/support/software/cell-ranger/latest for documentation.For details on how the data was generated, please see Li Y. et al. 2025: Circulating T-cell Receptor Repertoire for Cancer Early Detection.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data was generated from a study that was conducted according to guidelines approved by the Review Board at the University of Texas Southwestern Medical Center. We procured patient biopsy samples from two distinct studies. The first is titled "Tissue Collection and Results Gathering for Radiotherapy Patients & Healthy Individuals" (STU 072010-098), and the second is a Phase I Clinical Study on Stereotactic Ablative Radiotherapy (SABR) for Pelvic and Prostate Areas in High-Risk Prostate Cancer Patients (STU062014-027). The single-cell RNA sequencing (scRNA-seq) took place in Dr. Douglas Strand's laboratory, adhering to the method outlined in Henry et al1. We used a 1-hour treatment with 5mg/ml of collagenase type I, 10mM of ROCK inhibitor, and 1mg of DNase. Barcode labeling for 3' GEX was done using a 10X machine, and the sequencing process utilized an Illumina NextSeq 500 device.
Purpose: Investigate cellular heterogeneity in a fresh human ovarian cancer tissue sample Methods: Enzymatic digestion of fresh tissue sample collected from the operating room to produce single cell suspension. Cells were labelled with fluorescent antibodies to CD3, CD14, CD19, CD20, CD56 and FACS sorted to remove immune cells. The negative population was used for sequencing. Single cells were processed using the Fluidigm C1 Chip to generate barcoded cDNA for each cell. Amplifed cDNA was sequenced using an Illumina HiSeq 2500 machine. Results: Single cell RNA sequence data was obtained for 92 cells and a "bulk" sample of 1000 cells. 26 cells were removed from analysis due to quality control standards. The remaining 66 cells and the bulk sample were analyzed. Conclusion: Single cell RNA sequence analysis reveals heterogeneity in gene expression in cells harvested from a high grade ovarian serous cancer Overall design: A single cell suspension generated from a fresh high grade serous ovarian cancer sample was run through two Fluidigm C1 chips to isolate single cells and produce barcoded cDNA. Sequencing was performed in a single lane of an Illumina HiSeq 2500 machine. 92 single cells were sequenced and 1 bulk sample was sequenced, for a total of 93 samples.
https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
10x genomics single cell RNA sequencing data (fastq-files) from three samples representing one AML and two xenografted samples.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes two parts:
(1) The spatial molecular imaging (SMI) data for 13 PDAC tumors that either received neoadjuvant chemo/radiation therapy or were treatment naive. Raw data files (images, cell segmentation, expression data, metadata) are included in the SMI_raw folder. The count matrix, metadata, and cell type annotations are included in the PDAC_raw_meta_data.h5ad file.
(2) The processed single nucleus RNA-seq data for CAF–malignant co-culture tumoroids, stored in a Seurat object file.
Lung cancer, the leading cause of cancer mortality, exhibits heterogeneity that enables adaptability, limits therapeutic success, and remains incompletely understood. Single-cell RNA sequencing (scRNA-seq) of metastatic lung cancer was performed using 49 clinical biopsies obtained from 30 patients before and during targeted therapy. Over 20,000 cancer and tumor microenvironment (TME) single-cell profiles exposed a rich and dynamic tumor ecosystem. scRNA-seq of cancer cells illuminated targetable oncogenes beyond those detected clinically. Cancer cells surviving therapy as residual disease (RD) expressed an alveolar-regenerative cell signature suggesting a therapy-induced primitive cell-state transition, whereas those present at on-therapy progressive disease (PD) upregulated kynurenine, plasminogen, and gap-junction pathways. Active T-lymphocytes and decreased macrophages were present at RD and immunosuppressive cell states characterized PD. Biological features revealed by scRNA-seq were biomarkers of clinical outcomes in independent cohorts. This study highlights how therapy-induced adaptation of the multi-cellular ecosystem of metastatic cancer shapes clinical outcomes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains all the Seurat objects that were used for generating all the figures in Pal et al. 2021 (https://doi.org/10.15252/embj.2020107333). All the Seurat objects were created under R v3.6.1 using the Seurat package v3.1.1. The detailed information of each object is listed in a table in Chen et al. 2021.