Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset folders from "TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses". If using the processed data or TISSUE algorithm, please cite: https://doi.org/10.1101/2023.04.25.538326.
The directory of datasets are compressed in tar gzip format. The top level contains folders with dataset names and within each of those folders, there are the relevant data files which include:
Spatial_count.txt --- a tab-delimited file containing spatial transcriptomics counts matrix
scRNA_count.txt --- a tab-delimited file containing RNAseq counts matrix
Locations.txt --- a tab-delimited file containing the (x,y) spatial coordinates of cells in the spatial transcriptomics data
Metadata.txt --- for some datasets, this is a comma-separated file containing the metadata table for the spatial transcriptomics data
These files are formatted and organized to be read into AnnData objects using the native loading functions in the TISSUE package (https://github.com/sunericd/TISSUE). Some folders will also have additional accessory files such as gene lists corresponding to some experiments present in our manuscript and/or adjacency matrix objects.
Also included are the two simulated spatial transcriptomics datasets that we generated using SRTsim.
The SVZ folders contain our processed MERFISH spatial transcriptomics dataset on the adult mouse subventricular zone. Refer to the SVZFullFinal folder for the full dataset with TISSUE-informed cell labels. All other folders are processed data accessed from publicly available sources. The identity of numbered folders can be found in the Data Availability statement of the benchmarking paper from which they were retrieved: https://doi.org/10.1038/s41592-022-01480-9
"svz_merfish_data.zip" includes the raw MERFISH dataset on the adult mouse subventricular zone.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A single-cell transcriptomic atlas characterizes ageing tissues in the mouse
https://www.nature.com/articles/s41586-020-2496-1#Sec2 Code to download and process this dataset is available in: https://github.com/seanome/2025-longevity-x-ai-hackathon Dataset structure is originally from AnnData. Descriptions of each data file is below.
Data Files
This dataset contains multiple parquet files, one for each sheet in the original Excel file:… See the full description on the dataset page: https://huggingface.co/datasets/longevity-db/aging-gene-expression-single-cell-mouse.
Remark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev
Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (csv file is vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
Particular data from: https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.2/5k_hgmm_v3_nextgem?
5k 1:1 mixture of fresh frozen human (HEK293T) and mouse (NIH3T3) cells (Next GEM) Single Cell Gene Expression Dataset by Cell Ranger 3.0.2 1:1 mixture of fresh frozen human (HEK293T) and mouse (NIH3T3) cells.
HEK293T: https://en.wikipedia.org/wiki/HEK_293_cells NIH3T3: https://en.wikipedia.org/wiki/3T3_cells
This is a classic human-mouse mixture experiment to demonstrate single cell behavior (the same cells were used to generate 1k_hgmm_v3, 1k_hgmm_v3_nextgem, 10k_hgmm_v3_nextgem)).
Libraries were prepared following the Chromium Next GEM Single Cell 3ʹ Reagent Kits v3.1 User Guide (CG000204 RevA).
6,163 cells detected Sequenced on Illumina NovaSeq with approximately 72,279 reads per cell 28bp read1 (16bp Chromium barcode and 12bp UMI), 91bp read2 (transcript), and 8bp I7 sample barcode run with --expect-cells=5000 Published on May 29, 2019
Inspiration Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6
Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplemental Data 1 is single-cell response to rapamycin count data first sequenced in this work and deposited in GEO with accession GSE242556. It is a 173348 rows × 5847 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 4 columns ('Gene', 'Replicate', 'Pool', and 'Experiment') are cell-specific metadata.
Supplemental Data 2 is bulk response to rapamycin count data first sequenced in this work. It is a 33 rows × 5847 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 4 columns ('Oligo', 'Time', 'Replicate', and 'Sample_barcode') are sample-specific metadata.
Supplemental Data 3 is single-cell count data published as GSE125162 and re-analyzed with the pipeline used for single-cell quantification in this work. It is a 65068 rows × 5850 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 7 columns ('Condition', 'Sample', 'Genotype_Group', 'Genotype_Individual', 'Genotype', 'Replicate', 'Cell_Barcode') are cell-specific metadata.
Supplemental Data 4 is the four deep learning models trained in this work. It is a TAR.GZ file containing the final biophysical transcription/decay model, the pre-trained decay model, the velocity prediction model, and the count prediction model. Each model file is an h5 file containing a pytorch model that can be loaded with supirfactor_dynamical.read().
Supplemental Data 5 is the prior knowledge network used to constrain the models for TF interpretability. It is a 1574 rows × 204 columns [Genes x TFs] TSV.GZ file where the first row is a header with TF names, the first column is an index of gene names, and TF-gene interactions are indicated by non-zero values in the matrix. There are 2799 TF-gene interactions.
Supplemental Table 6 is the oligonucleotide sequences used in this work. It is a TSV file with a header row.
Supplemental Table 7 is the yeast strains used in this work. It is a TSV file with a header row.
Supplemental Table 8 is gene metadata used in this work (e.g. Ribosomal Protein gene labels, etc). It is a TSV file with a header row.
Supplemental Table 9 is FY4/5 growth curve data generated in this work. It is a 20 rows × 7 columns TSV file where the first row is a header with replicate IDs, the first column is an index of times in minutes, and values are cell densities in YPD culture, in units of 10$^6$ cells / mL.
Supplemental Data 10 is a TAR.GZ file containing the yeast SacCer3 genome, modified to add UTR sequences, that was used to generate transcripts for kallisto pseudoalignment in this work.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genes with enriched expression per cell population in sample HH31. Genes enriched in the different cell clusters, calculated to be differentially expressed between each cell cluster and the rest of the cells in the sample. p_val: originally calculated p value; avg_logFC: average log fold-change relative to the rest of the cells; pct.x: percentage of cells in the focus cluster expressing the gene; pct.rest: percentage of cells in the rest of the clusters expressing the gene; p_val_adj: p value adjusted for multiple testing; cluster: cluster number in the main text and figures; gene: ENSEMBL gene identifier; name: gene symbol, or name when available; enrichment: ratio of pct.x: pct.rest. (XLSX 395 kb)
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
By leveraging single-cell transcriptome and T cell receptor (TCR) sequencing, we aimed to track the transcriptional signatures of CAR T cell clonotypes throughout the course of treatment and furthermore identify molecular patterns leading to potent CAR T cell cytotoxicity. The data presented in this study encompass blood and bone marrow samples from patients ≤ 21 years of age with relapsed or refractory B-cell acute lymphoblastic leukemia (B-ALL) participating in the SJCAR19 phase I/II clinical trial (NCT03573700). In brief, patients enrolled in the clinical trial received either 1 x 10^6 (dose level 1) or 3 x 10^6 (dose level 2) per kilogram of body weight following successful generation of autologous CAR T cell products and lymphodepleting chemotherapy. Peripheral blood was drawn from each participant every week until week 4 post-infusion, at week 6 or 8, and month 3 or 6 if feasible. At week 4 post-infusion, blood marrow was also collected from participants. Total T cells (CD3+) were sorted from each post-infusion sample, as well as the pre-infusion CAR T cell products, and processed through 10x Genomics’ single-cell gene expression and V(D)J sequencing platform using the standard protocol. We identified a unique and unexpected transcriptional signature in a subset of pre-infusion CAR T cells that shared TCRs with post-infusion cytotoxic effector CAR T cells. Functional validation of cells with even a subset of these pre-effector markers demonstrated their immediate cytotoxic potential and resistance to exhaustion. Methods Cells were processed using the Chromium Single Cell V(D)J 5' reagents (10X Genomics). T cell receptor V(D)J cDNA was enriched using the Chromium Single Cell V(D)J Enrichment kit for Human T cells. Corresponding libraries were sequenced on the Illumina NovaSeq platform. Sequencing data were processed using CelLRanger v3.1.0 (10X Genomics) with the GRCh38 reference (v3.0.0) modified to include the first 825 nucleotide bases of the CD19-CAR transcript. The resulting gene expression matrices were aggregated, with read depth normalization based on the number of mapped reads. TCR sequences were processed with version 3.1.0 of the GRCh38 V(D)J reference. Aggregated gene expression matrices were analyzed using Seurat (Hao et al, Cell 2021). Cells with fewer than 300 detected genes, more than 4,999 detected genes, with at least 10% of their expression owed to mitochondrial genes, or with no detected CD19-CAR UMIs (unique molecular identifiers) were excluded from downstream analyses. TCR lineages were integrated with gene expression data using shared cellular barcodes. Additional analyses are described in the corresponding manuscript.
ARC Institute Virtual Cell Challenge
Please check out the official website for the challenge rules and deadlines.
About
For this challenge, single-cell functional genomics was used to generate approximately 300,000 single-cell RNA-seq profiles by silencing 300 carefully selected genes using CRISPR interference (CRISPRi). 10x Genomics GEM-X Flex and Illumina sequencing were used to obtain single-cell gene expression profiles. The data are split into three groups for the… See the full description on the dataset page: https://huggingface.co/datasets/cyrilzakka/arc-institute-virtual-cell-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
+: cells found; ˗: No cells found; P: primary cell line; M: metastatic cell line; pRCC: papillary RCC; ccRCC: clear cell RCC.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The scoring method used for selection of canonical pathways was Fisher’s Exact Test. The ration (r) is calculated by the number of genes involved and diving by the total number of genes in that canonical pathway in IPA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Table S3. Gene-centric tagging determined by iSNPs and a read-based method for the Fibroblast, Lymphoblast and Pool100 datasets on ChrX, Chr17 and all autosomal chromosomes. (XLSX 2939 kb)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
+: positive for marker; −: negative for marker.
Remark 1: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev
Remark 2: See same data at: https://www.kaggle.com/datasets/alexandervc/scrnaseq-exposed-to-multiple-compounds extracted pieces from huge file here - more easy to load and work.
Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
Data - scRNA expressions for several cell lines affected by drugs with different doses/durations.
The data from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE139944 Status Public on Dec 05, 2019 Title Massively multiplex chemical transcriptomics at single cell resolution Organisms Homo sapiens; Mus musculus Experiment type Expression profiling by high throughput sequencing Summary Single-cell RNA-seq libraries were generated using two and three level single-cell combinatorial indexing RNA sequencing (sci-RNA-seq) of untreated or small molecule inhibitor exposed HEK293T, NIH3T3, A549, MCF7 and K562 cells. Different cells and different treatment were hashed and pooled prior to sci-RNA-seq using a nuclear barcoding strategy. This nuclear barcoding strategy relies on fixation of barcode containing well-specific oligos that are specific to a given cell type, replicate or treatment condition.
The corresponding paper is here: https://pubmed.ncbi.nlm.nih.gov/31806696/ Science. 2020 Jan 3;367(6473):45-51 "Massively multiplex chemical transcriptomics at single-cell resolution" Sanjay R Srivatsan, ... , Cole Trapnell
The authors splitted data into 4 subdatasets - see sciPlex1, sciPlex2, sciPlex3,sciPlex4 in filenames. The main dataset is the sciPlex3 which contains about 600K cells.
The data splitted into small parts - which one can be easily loaded into memory can be found in https://www.kaggle.com/alexandervc/scrnaseq-exposed-to-multiple-compounds
Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"
A collection of some bioinformatics related resources on kaggle: https://www.kaggle.com/general/203136
Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6
Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles
(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833
Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)
Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)
Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell
https://ega-archive.org/dacs/EGAC50000000619https://ega-archive.org/dacs/EGAC50000000619
This dataset contains fastq-files from single cell 5' RNA sequencing of the AML cell line HNT34 and normal T cells following co-culture with and without an antibody blocking SLAMF6 (TNC-1). The libraries were prepared using 10X GEM-X Universal 5' Gene Expression v3 Reagent Kit. In total, the dataset contains sequenced gene expression libraries from four samples (HNT34 co-cultured with T cells from two different donors; for both donors there is one sample with and one sample without the blocking antibody).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data is associated with the Inferelator package. It has an expression data set (103118_SS_Data.tsv.gz), which is a [Cells x Genes] TSV file which has 5 included metadata columns [Genotype, Genotype_Group, Replicate, Condition, tenXBarcode]. It also contains a prior data matrix generated from the YEASTRACT database (YEASTRACT_Both_20181118.tsv), a gold standard derived from the YEASTRACT database (gold_standard.tsv), a list of transcription factors (tf_names_restrict.tsv), and a list of protein-coding genes (orfs.tsv). It was initially used in Jackson, C.A., Castro, D.M., Saldi, G.-A., Bonneau, R., and Gresham, D. (2019). Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments. BioRxiv 581678.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The E-MTAB-3929 dataset is a human embryo dataset that includes 1,096 samples from three developmental stages: day 5 (E5), day 6 (E6), and day 7 (E7) of embryonic development. These samples belong to three cell lineages: PE, TE, and EPI, with 11,662 gene features.
The GSE36552 dataset is a human embryo dataset that includes 66 samples from the 8-cell stage, morula stage, and late blastocyst, with 15,143 gene features.
The GSE109071 dataset is a mouse embryo dataset that includes 1,724 samples from embryonic developmental stages E5.25, E5.5, E6.25, and E6.5, with 7,565 gene features.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Klinefelter syndrome (KS) is the most prevalent aneuploidy in males and is characterized by a 47,XXY karyotype. Less frequently, higher grade sex chromosome aneuploidies (HGAs) can also occur. Here, using a paradigmatic cohort of KS and HGA induced pluripotent stem cells (iPSCs) carrying 49,XXXXY, 48,XXXY, and 47,XXY karyotypes, we identified the genes within the pseudoautosomal region 1 (PAR1) as the most susceptible to dosage- dependent transcriptional dysregulation and therefore potentially responsible for the progressively worsening phenotype in higher grade X aneuploidies. By contrast, the biallelically expressed non-PAR escape genes displayed high interclonal and interpatient variability in iPSCs and differentiated derivatives, suggesting that these genes could be associated with variable KS traits. By interrogating KS and HGA iPSCs at the single-cell resolution we showed that PAR1 and non-PAR escape genes are not only resilient to the X-inactive specific transcript (XIST)-mediated inactivation but also that their transcriptional regulation is disjointed from the absolute XIST expression level. Finally, we explored the transcriptional effects of X chromosome overdosage on autosomes and identified the nuclear respiratory factor 1 (NRF1) as a key regulator of the zinc finger protein X-linked (ZFX). Our study provides the first evidence of an X-dosage-sensitive autosomal transcription factor regulating an X-linked gene in low- and high-grade X aneuploidies.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The Single-cell Analysis Market size was valued at USD 4.89 billion in 2023 and is projected to reach USD 16.24 billion by 2032, exhibiting a CAGR of 18.7 % during the forecasts period. Single-cell-Analysis is a cutting-edge technique in Biology, used to study individual cells instead of groups. It allows scientists to delve in the complexities of cell behavior, uncovering insights that might be missed when looking at cell populations. Diese Method is especially crucial in fields like genomics, immunology and cancer research. Features of single-cell analysis include the ability to identify rare cell types, detect subtle genetic variations between cells, and track cell development over time. By isolating and analyzing single cells, researchers can uncover crucial details about cellular diversity and function within tissues and organisms. Advantages of single-cell analysis include their ability to reveal heterogeneity within cell populations, providing a clearer understanding of complex biological processes. Recent developments include: In February 2024, 10X Genomics announced the launch of GEM-X, comprising two single-cell gene assays-Chromium Single Cell Gene Expression 3'v4 and Chromium Single Cell Immune Profiling 5'v3, helping 10X Genomics to expand its single-cell technology products portfolio. , In February 2024, Takara Bio USA, Inc announced the launch of two single-cell solutions, Shasta Total RNA-Seq Kit and Shasta Whole-Genome Amplification Kit. , In February 2024, Singleron Biotechnologies announced the opening of its labs in Ann Arbor, Michigan, U.S. The company planned to offer single-cell analysis service, comprehensive solutions from tissue dissociation, single-cell multi-omic analysis, single cell reagent kits, automation instruments, to bioinformatics support. , In January 2024, BD announced a collaboration with Hamilton Ink, a robotics developer organization, to support the creation of automated solutions for single-cell multiomics research purposes. , In January 2024, Singleron Biotechnologies launched AccuraSCOPE Single Cell Transcriptome and Genome Library Kit at the Festival of Genomics meeting in the UK. This innovative kit can simultaneously profile full genome and full-length transcriptome, offering researchers a valuable tool for their studies. , In September 2023, Illumina, Inc. collaborated with Singleron Biotechnologies for an optimized workflow that automatically initiates DRAGEN single cell RNA sequencing analysis following the sequencing of a Singleron GEXSCOPE single-cell library using an Illumina NextSeq 2000 system. , In April 2023, Fluidigm Corporation (Standard BioTools Inc.) launched the Hyperion XTi Imaging System. The XTi delivers high-precision imaging and quantification of complex biological information at the single-cell level. .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single-cell transcriptomic profiling of rheumatoid synovial fibroblasts (RASFs) cultures from LM vs FPI pathotypes in isolation and in co-culture with RA-B-cells.
Gene counts matrix = 36601 genes X 11140 cells Metadata of cells = Patient_ID, Sample_ID, Cell_ID, Cell_Type and Pathotype
Remark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev (Scanpy is not always reliable for cell cycle analysis ).
https://scanpy.readthedocs.io/en/stable/
Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
Single cell RNA sequencing data - count matrices: rows - correspond to cells, columns to genes, value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
SCANPY is a scalable toolkit for analyzing single-cell gene expression data. It includes methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks. Its Python-based implementation efficiently deals with data sets of more than one million cells (https://github.com/theislab/Scanpy). Along with SCANPY, we present ANNDATA, a generic class for handling annotated data matrices (https://github.com/theislab/anndata).
Paper:
Wolf, F., Angerer, P. & Theis, F. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19, 15 (2018). https://doi.org/10.1186/s13059-017-1382-0 https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1382-0
Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6 Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (csv file is vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
Paper: "Droplet barcoding for single cell transcriptomics applied to embryonic stem cells" Cell. 2015 May 21;161(5):1187-1201. doi: 10.1016/j.cell.2015.04.044. Allon M Klein 1, Linas Mazutis 2, Ilke Akartuna 3, Naren Tallapragada 1, Adrian Veres 4, Victor Li 1, Leonid Peshkin 1, David A Weitz 5, Marc W Kirschner https://pubmed.ncbi.nlm.nih.gov/26000487/ Data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE65525 Or: https://hemberg-lab.github.io/scRNA.seq.datasets/mouse/esc/
Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6
Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset folders from "TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses". If using the processed data or TISSUE algorithm, please cite: https://doi.org/10.1101/2023.04.25.538326.
The directory of datasets are compressed in tar gzip format. The top level contains folders with dataset names and within each of those folders, there are the relevant data files which include:
Spatial_count.txt --- a tab-delimited file containing spatial transcriptomics counts matrix
scRNA_count.txt --- a tab-delimited file containing RNAseq counts matrix
Locations.txt --- a tab-delimited file containing the (x,y) spatial coordinates of cells in the spatial transcriptomics data
Metadata.txt --- for some datasets, this is a comma-separated file containing the metadata table for the spatial transcriptomics data
These files are formatted and organized to be read into AnnData objects using the native loading functions in the TISSUE package (https://github.com/sunericd/TISSUE). Some folders will also have additional accessory files such as gene lists corresponding to some experiments present in our manuscript and/or adjacency matrix objects.
Also included are the two simulated spatial transcriptomics datasets that we generated using SRTsim.
The SVZ folders contain our processed MERFISH spatial transcriptomics dataset on the adult mouse subventricular zone. Refer to the SVZFullFinal folder for the full dataset with TISSUE-informed cell labels. All other folders are processed data accessed from publicly available sources. The identity of numbered folders can be found in the Data Availability statement of the benchmarking paper from which they were retrieved: https://doi.org/10.1038/s41592-022-01480-9
"svz_merfish_data.zip" includes the raw MERFISH dataset on the adult mouse subventricular zone.