82 datasets found

Additional file 3 of Pooling across cells to normalize single-cell RNA...
springernature.figshare.com
txt
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron L. Lun; Karsten Bach; John Marioni (2023). Additional file 3 of Pooling across cells to normalize single-cell RNA sequencing data with many zero counts [Dataset]. http://doi.org/10.6084/m9.figshare.c.3629252_D2.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3629252_D2.v1
Dataset updated
Jun 9, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Aaron L. Lun; Karsten Bach; John Marioni
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Enriched GO terms for library size normalization. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to library size normalization. The fields are the same as described for Additional file 2. (13 KB PDF)
Additional file 2 of Pooling across cells to normalize single-cell RNA...
springernature.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron L. Lun; Karsten Bach; John Marioni (2023). Additional file 2 of Pooling across cells to normalize single-cell RNA sequencing data with many zero counts [Dataset]. http://doi.org/10.6084/m9.figshare.c.3629252_D1.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3629252_D1.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Aaron L. Lun; Karsten Bach; John Marioni
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Enriched GO terms for deconvolution. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to deconvolution. The identifier and name of each term is shown along with the total number of genes associated with the term, the number of associated genes that are also DE, the expected number under the null hypothesis, and the Fisher p value. (13 KB PDF)
N
Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust...
data.niaid.nih.gov
Updated May 15, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bacher R; Chu L; Kendziorski C; Swanson S (2019). Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust normalization of single-cell rna-seq data [Dataset]. https://data.niaid.nih.gov/resources?id=gse85917
Explore at:
Dataset updated
May 15, 2019
Dataset provided by
University of Florida
Authors
Bacher R; Chu L; Kendziorski C; Swanson S
Description
Normalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data. Total 183 single cells (92 H1 cells, 91 H9 cells), sequenced twice, were used to evaluate SCnorm in normalizing single cell RNA-seq experiments. Total 48 bulk H1 samples were used to compare bulk and single cell properties. For single-cell RNA-seq, the identical single-cell indexed and fragmented cDNA were pooled at 96 cells per lane or at 24 cells per lane to test the effects of sequencing depth, resulting in approximately 1 million and 4 million mapped reads per cell in the two pooling groups, respectively.
f
Table_3_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...
frontiersin.figshare.com
docx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Lytal; Di Ran; Lingling An (2023). Table_3_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s004
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2020.00041.s004
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Nicholas Lytal; Di Ran; Lingling An
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.
scRNA-seq human embryonic stem H1, H9 cell lines
kaggle.com
zip
Updated Jul 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Chervov (2021). scRNA-seq human embryonic stem H1, H9 cell lines [Dataset]. https://www.kaggle.com/alexandervc/scrnaseq-human-embryonic-stem-h1-h9-cell-lines
Explore at:
zip(65064956 bytes)Available download formats
Dataset updated
Jul 28, 2021
Authors
Alexander Chervov
Description
Remark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev

Data and Context

Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (csv file is vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

Particular data from: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76381 There are original TXT files and reconversion to *.h5ad format which is more easy to work with. There are several subdatasets human/mouse/different cell types.

Paper: SCnorm: robust normalization of single-cell RNA-seq data https://pubmed.ncbi.nlm.nih.gov/28418000/ Bacher R, Chu LF, Leng N, Gasch AP et al. SCnorm: robust normalization of single-cell RNA-seq data. Nat Methods 2017 Jun;14(6):584-586

Abstract: The normalization of RNA-seq data is essential for accurate downstream inference, but the assumptions upon which most normalization methods are based are not applicable in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of single-cell RNA-seq data.

Total 183 single cells (92 H1 cells, 91 H9 cells), sequenced twice, were used to evaluate SCnorm in normalizing single cell RNA-seq experiments. Total 48 bulk H1 samples were used to compare bulk and single cell properties. For single-cell RNA-seq, the identical single-cell indexed and fragmented cDNA were pooled at 96 cells per lane or at 24 cells per lane to test the effects of sequencing depth, resulting in approximately 1 million and 4 million mapped reads per cell in the two pooling groups, respectively.

Inspiration

Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
f
DataSheet_1_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...
frontiersin.figshare.com
docx
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Lytal; Di Ran; Lingling An (2023). DataSheet_1_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2020.00041.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Nicholas Lytal; Di Ran; Lingling An
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.
f
Collection of single cell sequencing metadata normalized values.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Melchers, Fritz; Petkau, Georg; Guerra, Gabriela Maria; Klemm, Uwe; Mashreghi, Mir-Farzin; Heinz, Gitta Anne; Durek, Pawel; Heinrich, Frederik; Jani, Peter K.; Kawano, Yohei (2023). Collection of single cell sequencing metadata normalized values. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001067483
Explore at:
Dataset updated
Nov 20, 2023
Authors
Melchers, Fritz; Petkau, Georg; Guerra, Gabriela Maria; Klemm, Uwe; Mashreghi, Mir-Farzin; Heinz, Gitta Anne; Durek, Pawel; Heinrich, Frederik; Jani, Peter K.; Kawano, Yohei
Description
Collection of single cell sequencing metadata normalized values.
u
Data from: Reference transcriptomics of porcine peripheral immune cells...
agdatacommons.nal.usda.gov
datasets.ai
+2more
zip
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. http://doi.org/10.15482/USDA.ADC/1522411
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1522411
Dataset updated
Nov 21, 2025
Dataset provided by
Ag Data Commons
Authors
Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows:

matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz)

*The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include:

nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().
m
Investigating Highly Variable Genes in Single-cell RNA-seq Data across...
data.mendeley.com
Updated May 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jantarika Kumar Arora (2023). Investigating Highly Variable Genes in Single-cell RNA-seq Data across Multiple Cell Types and Conditions [Dataset]. http://doi.org/10.17632/6ry3x7r8hf.3
Explore at:
Unique identifier
https://doi.org/10.17632/6ry3x7r8hf.3
Dataset updated
May 16, 2023
Authors
Jantarika Kumar Arora
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The peripheral blood immune cell (PBMC) samples were collected from patients infected with dengue virus (DENV) at four time points: two and one day(s) before defervescence (febrile phase), at defervescence (critical phase), and two-week convalescence. The raw and filtered matrix files were generated using CellRanger version 3.0.2 (10x Genomics, USA) with the reference human genome GRCh38 1.2.0. Potential contamination of ambient RNAs was corrected using SoupX. Low quality cells, including cells expressing mitochondrial genes higher than 10% and doublets/multiplets, were excluded using Seurat and doubletFinder, respectively. The individual samples were then integrated using the SCTransform method with 3,000 gene features. Principal component analysis (PCA) and clustering were performed with the Louvain algorithm applying multi-level refinement algorithm. The gene expression level of each cell was normalized using the LogNormalize method in Seurat. Cell types were annotated using the canonical marker genes described in the original paper, see related link below.
f
Additional file 4 of Data normalization for addressing the challenges in the...
datasetcatalog.nlm.nih.gov
springernature.figshare.com
Updated Sep 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wu, Jiaqian; Duran, Raquel Cuevas-Diaz; Wei, Haichao (2024). Additional file 4 of Data normalization for addressing the challenges in the analysis of single-cell transcriptomic datasets [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001347686
Explore at:
Dataset updated
Sep 11, 2024
Authors
Wu, Jiaqian; Duran, Raquel Cuevas-Diaz; Wei, Haichao
Description
Supplementary Material 4
d
Data from: Subsets of tissue CD4 T cells display different susceptibilities...
datadryad.org
datasetcatalog.nlm.nih.gov
+3more
zip
Updated Feb 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaoyu Luo (2023). Subsets of tissue CD4 T cells display different susceptibilities to HIV infection and death: Analysis by CyTOF and single cell RNA-seq [Dataset]. http://doi.org/10.7272/Q6SX6BFR
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.7272/Q6SX6BFR
Dataset updated
Feb 17, 2023
Dataset provided by
Dryad
Authors
Xiaoyu Luo
Time period covered
Feb 15, 2023
Description
mass cytometry; single-cell RNA-seq mass cytometry data has been pre-gated on live singlets and normalized by CD8 cell number single-cell RNA-seq data are raw data
d
Raw differential gene expression data, data S1, from: Molecular cascades and...
datadryad.org
search.dataone.org
zip
Updated Jan 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brie Wamsley (2024). Raw differential gene expression data, data S1, from: Molecular cascades and cell type-specific signatures in ASD revealed by single cell genomics [Dataset]. http://doi.org/10.5061/dryad.4b8gthtkr
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.4b8gthtkr
Dataset updated
Jan 15, 2024
Dataset provided by
Dryad
Authors
Brie Wamsley
Time period covered
Dec 22, 2023
Description
Raw differential gene expression data, data S1, from: Molecular cascades and cell type-specific signatures in ASD revealed by single cell genomics

https://doi.org/10.5061/dryad.4b8gthtkr

Raw Differential Gene Expression data, Data S1, from "Molecular cascades and cell type-specific signatures in ASD revealed by single cell genomics"

Description of the data and file structure

Excell document with raw differential gene expression ASD vs. CTL per cell-type cluster. The first column is a drop-down selection to select which cell-type to view the differential gene expression results. The second (ASDvCTL) and third (CTLvASD) columns are the LOGFC value for each differential gene for a given cell type, the fourth (ASDvCTL) and fifth (CTLvASD) column are the p-values for each gene for a given cell type, the sixth (ASDvCTL) and seventh (CTLvASD) columns are the FDR-values for each gene for a given cell type, and the last column is the gene name.

Sharing/Access information

Links to...
n
Cell type labels for all clustering and normalization combinations compared...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Nov 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Hickey (2022). Cell type labels for all clustering and normalization combinations compared for CODEX multiplexed imaging [Dataset]. http://doi.org/10.5061/dryad.dfn2z352c
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.dfn2z352c
Dataset updated
Nov 17, 2022
Dataset provided by
Stanford University
Authors
John Hickey
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
We performed CODEX (co-detection by indexing) multiplexed imaging on four sections of the human colon (ascending, transverse, descending, and sigmoid) using a panel of 47 oligonucleotide-barcoded antibodies. Subsequently images underwent standard CODEX image processing (tile stitching, drift compensation, cycle concatenation, background subtraction, deconvolution, and determination of best focal plane), and single cell segmentation. Output of this process was a dataframe of nearly 130,000 cells with fluorescence values quantified from each marker. We used this dataframe as input to 1 of the 5 normalization techniques of which we compared z, double-log(z), min/max, and arcsinh normalizations to the original unmodified dataset. We used these normalized dataframes as inputs for 4 unsupervised clustering algorithms: k-means, leiden, X-shift euclidian, and X-shift angular.

From the clustering outputs, we then labeled the clusters that resulted for cells observed in the data producing 20 unique cell type labels. We also labeled cell types by hiearchical hand-gating data within cellengine (cellengine.com). We also created another gold standard for comparison by overclustering unormalized data with X-shift angular clustering. Finally, we created one last label as the major cell type call from each cell from all 21 cell type labels in the dataset.

Consequently the dataset has individual cells segmented out in each row. Then there are columns for the X, Y position in pixels in the overall montage image of the dataset. There are also columns to indicate which region the data came from (4 total). The rest are labels generated by all the clustering and normalization techniques used in the manuscript and what were compared to each other. These also were the data that were used for neighborhood analysis for the last figure of the manuscript. These are provided at all four levels of cell type level granularity (from 7 cell types to 35 cell types).
d
Processed single cell data from CODEX multiplexed imaging of the human...
datadryad.org
search.dataone.org
+1more
zip
Updated Nov 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Hickey (2022). Processed single cell data from CODEX multiplexed imaging of the human intestine [Dataset]. http://doi.org/10.5061/dryad.pk0p2ngrf
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.pk0p2ngrf
Dataset updated
Nov 10, 2022
Dataset provided by
Dryad
Authors
John Hickey
Time period covered
Oct 27, 2022
Description
For a detailed description of each of the steps of protocols and processes to obtain this data see the detailed materials and methods in the associated manuscript. Briefly, intestine pieces from 8 different sites across the small intestine and colon were taken and frozen in OCT. These were assembled into an array of 4 tissues, cut into 7 um slices, and stained with a panel of 54 CODEX DNA-oligonucleotide barcoded antibodies. Tissues were imaged with a Keyence microscope at 20x objective and then processed using image stitching, drift compensation, deconvolution, and cycle concatenation. Processed data were then segmented using CellVisionSegmenter, a neural network R-CNN-based single-cell segmentation algorithm. Cell type analysis was completed on B004, 5, and 6 by z normalization of protein markers used for clustering and then overclustered using leiden-based clustering. The cell type labels were verified by looking back at the original image. Cell type labels were transferred to other ...
Z
Data from: Discrete regulatory modules instruct hematopoietic lineage...
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Aug 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgolopoulos, Grigorios; Psatha, Nikoletta; Iwata, Mineo; Nishida, Andrew; Som, Tannishtha; Yiangou, Minas; Stamatoyannopoulos, John A; Vierstra, Jeff (2021). Discrete regulatory modules instruct hematopoietic lineage commitment and differentiation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5291736
Explore at:
Dataset updated
Aug 28, 2021
Dataset provided by
Department of Genetics, Development & Molecular Biology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
Altius Institute for Biomedical Sciences, Seattle, WA, USA
Authors
Georgolopoulos, Grigorios; Psatha, Nikoletta; Iwata, Mineo; Nishida, Andrew; Som, Tannishtha; Yiangou, Minas; Stamatoyannopoulos, John A; Vierstra, Jeff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
bioRxiv preprint: https://www.biorxiv.org/content/10.1101/2020.04.02.022566v4

Contact: Grigorios Georgolopoulos (ggeorgol@altius.org); Jeff Vierstra (jvierstra@altius.org)

Lineage commitment and differentiation is driven by the concerted action of master transcriptional regulators at their target chromatin sites. Multiple efforts have characterized the key transcription factors (TFs) that determine the various hematopoietic lineages. However, the temporal interactions between individual TFs and their chromatin targets during differentiation and how these interactions dictate lineage commitment remains poorly understood. Here we delineate the temporal interplay between the cis- and the trans-regulatory landscape in establishing lineage commitment and differentiation in human hematopoiesis by performing a dense timecourse of chromatin accessibility (DNase I-seq), and gene expression (total and single cell RNA-seq).

All data uploaded correspond to human genome build version GRCh38.

Contents

DNase I Hotspot (DHS) metadata: Supplementary_Data_1.txt

DNase I Hotspot quantile-normalized counts: A tab-separated matrix with quantile-normalized DNase I density counts from 79,085 FDR 5% hotspots, across 12 erythroid differentiation timepoints from 3 donors, present in at least n=2 samples. Rows correspond to DHS information in Supplementary_Data_1.txt (hotspots.fdr.0.05.qnorm.counts.tsv.gz)

Column information for DNase I Hotspot quantile-normalized counts: hotspots.fdr.0.05.qnorm.counts.info.tsv

Developmentally regulated gene metadata (erythroid): Supplementary_Data_2.csv

Gene matrix of quantile-normalized FPKM values (erythroid): A tab-separated matrix with the quantile-normalized FPKM values of all detected genes, across 13 erythroid differentiation timepoints from 3 donors. (fpkm_erythroid_qnorm.tsv.gz)

Column information for the quantile-normalized FPKM gene matrix (erythroid): A tab-separated table (fpkm_erythroid_qnorm.info.tsv)

CD34+ HSPC TADs at 10kb resolution: Supplementary_Data_3.bed

Day 11 ex vivo erythroid progenitor TADs at 10kb resolution: Supplementary_Data_4.bed

Transcription factor motif enrichment per DHS cluster: Supplementary_Data_5.csv

Correlation information (links) between developmentally regulated DHS and target genes: Supplementary_Data_6.csv

Chromatin anchor loops called from 10kb resolution Hi-C data: Supplementary_Data_7.bedgraph

Developmentally regulated gene metadata (megakaryocytic): Supplementary_Data_8.csv

Gene matrix of quantile-normalized FPKM values (megakaryocytic): A tab-separated matrix with the quantile-normalized FPKM values of all detected genes, across 13 megakaryocytic differentiation timepoints from 3 donors. (fpkm_megakaryocyte_qnorm.tsv.gz)

Column information for the quantile-normalized FPKM gene matrix (megakaryocytic): A tab-separated table (fpkm_megakaryocyte_qnorm.info.tsv)

Marker (differentially expressed) genes per single cell population: Supplementary_Data_9.csv

A SCANPY h5ad Annotated DataFrame object: Annotated Data frame anndata in h5ad format including the gene-by-cell count matrix, Velocyto splicing kinetics (RNA velocity) information layer, along with obs, obsm, var, varm, and uns layers. (SCANPY_anndata_object.h5ad)
Data for the training and testing of ccAFv2
zenodo.org
application/gzip
Updated Sep 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christohper Plaisier; Christohper Plaisier; Samantha O'Connor; Samantha O'Connor (2024). Data for the training and testing of ccAFv2 [Dataset]. http://doi.org/10.5281/zenodo.13786724
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13786724
Dataset updated
Sep 18, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christohper Plaisier; Christohper Plaisier; Samantha O'Connor; Samantha O'Connor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Single-cell transcriptomics has unveiled a vast landscape of cellular heterogeneity in which the cell cycle is a significant component. We trained a high-resolution cell cycle classifier (ccAFv2) using single cell RNA-seq (scRNA-seq) characterized human neural stem cells. The features of this classifier are that it classifies six cell cycle states (G1, Late G1, S, S/G2, G2/M, and M/Early G1) and a quiescent-like G0 state, and it incorporates a tunable parameter to filter out less certain classifications. The ccAFv2 classifier performed better than or equivalent to other state-of-the-art methods even while classifying more cell cycle states, including G0. We showcased the versatility of ccAFv2 by successfully applying it to classify cells, nuclei, and spatial transcriptomics data in humans and mice, using various normalization methods and gene identifiers. We provide methods to regress the cell cycle expression patterns out of single cell or nuclei data to uncover underlying biological signals. The classifier can be used either as an R package integrated with Seurat (https://github.com/plaisier-lab/ccafv2_R) or a PyPI package integrated with scanpy (https://pypi.org/project/ccAF/). We proved that ccAFv2 has enhanced accuracy, flexibility, and adaptability across various experimental conditions, establishing ccAFv2 as a powerful tool for dissecting complex biological systems, unraveling cellular heterogeneity, and deciphering the molecular mechanisms by which proliferation and quiescence affect cellular processes.
H
Single-Cell Level Estrogen Receptor Activity Data
dataverse.harvard.edu
datasetcatalog.nlm.nih.gov
Updated Oct 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zahir Aghayev; Adam T. Szafran; Anh Tran; Hari S. Ganesh; Fabio Stossi; Lan Zhou; Michael A. Mancini; Efstratios N. Pistikopoulos; Burcu Beykal (2023). Single-Cell Level Estrogen Receptor Activity Data [Dataset]. http://doi.org/10.7910/DVN/NIBGOQ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/NIBGOQ
Dataset updated
Oct 20, 2023
Dataset provided by
Harvard Dataverse
Authors
Zahir Aghayev; Adam T. Szafran; Anh Tran; Hari S. Ganesh; Fabio Stossi; Lan Zhou; Michael A. Mancini; Efstratios N. Pistikopoulos; Burcu Beykal
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains normalized single-cell level data for 60 reference compounds analyzed with high throughput microscopy and high content analysis-based experiments that were performed using the GFP-ER⍺:PRL-HeLa cell line. The multidimensional imaging data was used to train a classification model to ultimately predict the impact of unknown compounds on the estrogen receptor, either as agonists or antagonists as outlined in the RMarkdown file. All chemical screening data were normalized to E2-treated control samples using the robust z-score calculation based on single-cell data. In experiments that spanned multiple sample plates, the median and MAD values were calculated for each plate, the median MAD value across all plates was determined, and the robust z-score was calculated using the per-plate median value and the overall median MAD value. Datasets with names "LMH10" and "M50" are engineered datasets with number of features expanded using percentiles. Percentiles were determined by identifying the single-cell features falling into the 0-10th percentile range (L10), 45th–55th percentile range (M10), the 90th–100th percentile range (H10), and the 25th–75th percentile range (M50) from each sample. Experimental data sets containing all single-cell data and engineered data sets containing only the L10 + M10 + H10 or only the M50 single-cell features were also divided into 2 subsets depending on the detection of visible GFP-ER nuclear spots (all cells and array positive cells) used for subsequent model development.
Z
Transcriptional landscape of repetitive elements in single cells
data.niaid.nih.gov
zenodo.org
Updated Mar 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mallona, Izaskun (2021). Transcriptional landscape of repetitive elements in single cells [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4584955
Explore at:
Dataset updated
Mar 6, 2021
Dataset provided by
University of Zurich
Authors
Mallona, Izaskun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Purpose

We quantified conventional features (GENCODE) as well as repetitive elements (RepeatMasker) from multiple single-cell RNA-seq datasets using several analytical approaches, and starting from raw data, and looked for association with known cell metadata (e.g. cell types).

This repository includes the count tables and cell embeddings as generated by export_run_for_upload.R from repeats_sc (bitbucket).

Files

Please check the README.md file for a full description of all files. Each tarball contains a set of gz compressed files, including cell embeddings, raw and normalized count matrices, and metadata at the feature and cell levels. Data are stored as either comma-separated values (CSV) or MatrixMarket files, UTF8-encoded and with Unix EOLs.

Datasets

We reanalyzed sequencing (fastq) files generated by other researchers; this repository contains results from our reanalysis, and not the original raw data. We are very grateful to the original data producers and encourage checking their data release licenses and checking and citing their publication(s). The README.md has a full description of these datasets and their sources.

Source code

https://bitbucket.org/imallona/repeats_sc

Contact

izaskun dot mallona at gmail.com

Funding

Swiss National Science Foundation SNF Spark 190824 (CRSK-3_190824)
f
Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data...
frontiersin.figshare.com
application/cdfv2
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s001
Explore at:
application/cdfv2Available download formats
Unique identifier
https://doi.org/10.3389/fgene.2019.00400.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.
m
GERDA datasets including NGS and SGA data
data.mendeley.com
Updated Apr 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fabian Otte (2023). GERDA datasets including NGS and SGA data [Dataset]. http://doi.org/10.17632/8c4zbxfvwk.3
Explore at:
Unique identifier
https://doi.org/10.17632/8c4zbxfvwk.3
Dataset updated
Apr 26, 2023
Authors
Fabian Otte
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets linked to publication "Revealing viral and cellular dynamics of HIV-1 at the single-cell level during early treatment periods", Otte et al 2023 published in Cell Reports Methods pre-ART (antiretroviral therapy) cryo-conserved and and whole blood specimen were sampled for HIV-1 virus reservoir determination in HIV-1 positive individuals from the Swiss HIV Study Cohort. Patients were monitored for proviral (DNA), poly-A transcripts (RNA), late protein translation (Gag and Envelope reactivation co-detection assay, GERDA) and intact viruses (golden standard: viral outgrowth assay, VOA). In this dataset we deposited the pipeline for the multidimensional data analysis of our newly established GERDA method, using DBScan and tSNE. For further comprehension NGS and Sanger sequencing data were attached as processed and raw data (GenBank).

Resubmitted to Cell Reports Methods (Jan-2023), accepted in principal (Mar-2023)

GERDA is a new detection method to decipher the HIV-1 cellular reservoir in blood (tissue or any other specimen). It integrates HIV-1 Gag and Env co-detection along with cellular surface markers to reveal 1) what cells still contain HIV-1 translation competent virus and 2) which marker the respective infected cells express. The phenotypic marker repertoire of the cells allow to make predictions on potential homing and to assess the HIV-1 (tissue) reservoir. All FACS data were acquired on a LSRFortessa BD FACS machine (markers: CCR7, CD45RA, CD28, CD4, CD25, PD1, IntegrinB7, CLA, HIV-1 Env, HIV-1 Gag) Raw FACS data (pre-gated CD4CD3+ T-cells) were arcsin transformed and dimensionally reduced using optsne. Data was further clustered using DBSCAN and either individual clusters were further analyzed for individual marker expression or expression profiles of all relevant clusters were analyzed by heatmaps. Sequences before/after therapy initiation and during viral outgrowth cultures were monitored for individuals P01-46 and P04-56 by Next-generation sequencing (NGS of HIV-1 Envelope V3 loop only) and by Sanger (single genome amplification, SGA)

data normalization code (by Julian Spagnuolo) FACS normalized data as CSV (XXX_arcsin.csv) OMIQ conText file (_OMIQ-context_XXX) arcsin normalized FACS data after optsne dimension reduction with OMIQ.ai as CSV file (XXXarcsin.csv.csv) R pipeline with codes (XXX_commented.R) P01_46-NGS and Sanger sequences P04_56-NGS and Sanger sequences

Facebook

Twitter

Click to copy link

Link copied

Cite

Aaron L. Lun; Karsten Bach; John Marioni (2023). Additional file 3 of Pooling across cells to normalize single-cell RNA sequencing data with many zero counts [Dataset]. http://doi.org/10.6084/m9.figshare.c.3629252_D2.v1

Additional file 3 of Pooling across cells to normalize single-cell RNA sequencing data with many zero counts

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.c.3629252_D2.v1

Dataset updated

Jun 9, 2023

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

Aaron L. Lun; Karsten Bach; John Marioni

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Enriched GO terms for library size normalization. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to library size normalization. The fields are the same as described for Additional file 2. (13 KB PDF)

Clear search

Close search

Google apps

Main menu

Additional file 3 of Pooling across cells to normalize single-cell RNA...

Additional file 2 of Pooling across cells to normalize single-cell RNA...

Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust...

Table_3_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

scRNA-seq human embryonic stem H1, H9 cell lines

Data and Context

Inspiration

DataSheet_1_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

Collection of single cell sequencing metadata normalized values.

Data from: Reference transcriptomics of porcine peripheral immune cells...

Investigating Highly Variable Genes in Single-cell RNA-seq Data across...

Additional file 4 of Data normalization for addressing the challenges in the...

Data from: Subsets of tissue CD4 T cells display different susceptibilities...

Raw differential gene expression data, data S1, from: Molecular cascades and...

Raw differential gene expression data, data S1, from: Molecular cascades and cell type-specific signatures in ASD revealed by single cell genomics

Description of the data and file structure

Sharing/Access information

Cell type labels for all clustering and normalization combinations compared...

Processed single cell data from CODEX multiplexed imaging of the human...

Data from: Discrete regulatory modules instruct hematopoietic lineage...

Data for the training and testing of ccAFv2

Single-Cell Level Estrogen Receptor Activity Data

Transcriptional landscape of repetitive elements in single cells

Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data...

GERDA datasets including NGS and SGA data

Additional file 3 of Pooling across cells to normalize single-cell RNA sequencing data with many zero counts