82 datasets found
  1. Additional file 3 of Pooling across cells to normalize single-cell RNA...

    • springernature.figshare.com
    txt
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron L. Lun; Karsten Bach; John Marioni (2023). Additional file 3 of Pooling across cells to normalize single-cell RNA sequencing data with many zero counts [Dataset]. http://doi.org/10.6084/m9.figshare.c.3629252_D2.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Aaron L. Lun; Karsten Bach; John Marioni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Enriched GO terms for library size normalization. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to library size normalization. The fields are the same as described for Additional file 2. (13 KB PDF)

  2. Additional file 2 of Pooling across cells to normalize single-cell RNA...

    • springernature.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron L. Lun; Karsten Bach; John Marioni (2023). Additional file 2 of Pooling across cells to normalize single-cell RNA sequencing data with many zero counts [Dataset]. http://doi.org/10.6084/m9.figshare.c.3629252_D1.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Aaron L. Lun; Karsten Bach; John Marioni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Enriched GO terms for deconvolution. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to deconvolution. The identifier and name of each term is shown along with the total number of genes associated with the term, the number of associated genes that are also DE, the expected number under the null hypothesis, and the Fisher p value. (13 KB PDF)

  3. N

    Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust...

    • data.niaid.nih.gov
    Updated May 15, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bacher R; Chu L; Kendziorski C; Swanson S (2019). Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust normalization of single-cell rna-seq data [Dataset]. https://data.niaid.nih.gov/resources?id=gse85917
    Explore at:
    Dataset updated
    May 15, 2019
    Dataset provided by
    University of Florida
    Authors
    Bacher R; Chu L; Kendziorski C; Swanson S
    Description

    Normalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data. Total 183 single cells (92 H1 cells, 91 H9 cells), sequenced twice, were used to evaluate SCnorm in normalizing single cell RNA-seq experiments. Total 48 bulk H1 samples were used to compare bulk and single cell properties. For single-cell RNA-seq, the identical single-cell indexed and fragmented cDNA were pooled at 96 cells per lane or at 24 cells per lane to test the effects of sequencing depth, resulting in approximately 1 million and 4 million mapped reads per cell in the two pooling groups, respectively.

  4. f

    Table_3_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

    • frontiersin.figshare.com
    docx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Lytal; Di Ran; Lingling An (2023). Table_3_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s004
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicholas Lytal; Di Ran; Lingling An
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.

  5. scRNA-seq human embryonic stem H1, H9 cell lines

    • kaggle.com
    zip
    Updated Jul 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Chervov (2021). scRNA-seq human embryonic stem H1, H9 cell lines [Dataset]. https://www.kaggle.com/alexandervc/scrnaseq-human-embryonic-stem-h1-h9-cell-lines
    Explore at:
    zip(65064956 bytes)Available download formats
    Dataset updated
    Jul 28, 2021
    Authors
    Alexander Chervov
    Description

    Remark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev

    Data and Context

    Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (csv file is vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

    Particular data from: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76381 There are original TXT files and reconversion to *.h5ad format which is more easy to work with. There are several subdatasets human/mouse/different cell types.

    Paper: SCnorm: robust normalization of single-cell RNA-seq data https://pubmed.ncbi.nlm.nih.gov/28418000/ Bacher R, Chu LF, Leng N, Gasch AP et al. SCnorm: robust normalization of single-cell RNA-seq data. Nat Methods 2017 Jun;14(6):584-586

    Abstract: The normalization of RNA-seq data is essential for accurate downstream inference, but the assumptions upon which most normalization methods are based are not applicable in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of single-cell RNA-seq data.

    Total 183 single cells (92 H1 cells, 91 H9 cells), sequenced twice, were used to evaluate SCnorm in normalizing single cell RNA-seq experiments. Total 48 bulk H1 samples were used to compare bulk and single cell properties. For single-cell RNA-seq, the identical single-cell indexed and fragmented cDNA were pooled at 96 cells per lane or at 24 cells per lane to test the effects of sequencing depth, resulting in approximately 1 million and 4 million mapped reads per cell in the two pooling groups, respectively.

    Inspiration

    Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

    Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

  6. f

    DataSheet_1_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

    • frontiersin.figshare.com
    docx
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Lytal; Di Ran; Lingling An (2023). DataSheet_1_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicholas Lytal; Di Ran; Lingling An
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.

  7. f

    Collection of single cell sequencing metadata normalized values.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melchers, Fritz; Petkau, Georg; Guerra, Gabriela Maria; Klemm, Uwe; Mashreghi, Mir-Farzin; Heinz, Gitta Anne; Durek, Pawel; Heinrich, Frederik; Jani, Peter K.; Kawano, Yohei (2023). Collection of single cell sequencing metadata normalized values. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001067483
    Explore at:
    Dataset updated
    Nov 20, 2023
    Authors
    Melchers, Fritz; Petkau, Georg; Guerra, Gabriela Maria; Klemm, Uwe; Mashreghi, Mir-Farzin; Heinz, Gitta Anne; Durek, Pawel; Heinrich, Frederik; Jani, Peter K.; Kawano, Yohei
    Description

    Collection of single cell sequencing metadata normalized values.

  8. u

    Data from: Reference transcriptomics of porcine peripheral immune cells...

    • agdatacommons.nal.usda.gov
    • datasets.ai
    • +2more
    zip
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. http://doi.org/10.15482/USDA.ADC/1522411
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows:

    matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz)

    *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include:

    nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().

  9. m

    Investigating Highly Variable Genes in Single-cell RNA-seq Data across...

    • data.mendeley.com
    Updated May 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jantarika Kumar Arora (2023). Investigating Highly Variable Genes in Single-cell RNA-seq Data across Multiple Cell Types and Conditions [Dataset]. http://doi.org/10.17632/6ry3x7r8hf.3
    Explore at:
    Dataset updated
    May 16, 2023
    Authors
    Jantarika Kumar Arora
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The peripheral blood immune cell (PBMC) samples were collected from patients infected with dengue virus (DENV) at four time points: two and one day(s) before defervescence (febrile phase), at defervescence (critical phase), and two-week convalescence. The raw and filtered matrix files were generated using CellRanger version 3.0.2 (10x Genomics, USA) with the reference human genome GRCh38 1.2.0. Potential contamination of ambient RNAs was corrected using SoupX. Low quality cells, including cells expressing mitochondrial genes higher than 10% and doublets/multiplets, were excluded using Seurat and doubletFinder, respectively. The individual samples were then integrated using the SCTransform method with 3,000 gene features. Principal component analysis (PCA) and clustering were performed with the Louvain algorithm applying multi-level refinement algorithm. The gene expression level of each cell was normalized using the LogNormalize method in Seurat. Cell types were annotated using the canonical marker genes described in the original paper, see related link below.

  10. f

    Additional file 4 of Data normalization for addressing the challenges in the...

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Sep 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wu, Jiaqian; Duran, Raquel Cuevas-Diaz; Wei, Haichao (2024). Additional file 4 of Data normalization for addressing the challenges in the analysis of single-cell transcriptomic datasets [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001347686
    Explore at:
    Dataset updated
    Sep 11, 2024
    Authors
    Wu, Jiaqian; Duran, Raquel Cuevas-Diaz; Wei, Haichao
    Description

    Supplementary Material 4

  11. d

    Data from: Subsets of tissue CD4 T cells display different susceptibilities...

    • datadryad.org
    • datasetcatalog.nlm.nih.gov
    • +3more
    zip
    Updated Feb 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaoyu Luo (2023). Subsets of tissue CD4 T cells display different susceptibilities to HIV infection and death: Analysis by CyTOF and single cell RNA-seq [Dataset]. http://doi.org/10.7272/Q6SX6BFR
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 17, 2023
    Dataset provided by
    Dryad
    Authors
    Xiaoyu Luo
    Time period covered
    Feb 15, 2023
    Description

    mass cytometry; single-cell RNA-seq mass cytometry data has been pre-gated on live singlets and normalized by CD8 cell number single-cell RNA-seq data are raw data

  12. d

    Raw differential gene expression data, data S1, from: Molecular cascades and...

    • datadryad.org
    • search.dataone.org
    zip
    Updated Jan 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brie Wamsley (2024). Raw differential gene expression data, data S1, from: Molecular cascades and cell type-specific signatures in ASD revealed by single cell genomics [Dataset]. http://doi.org/10.5061/dryad.4b8gthtkr
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 15, 2024
    Dataset provided by
    Dryad
    Authors
    Brie Wamsley
    Time period covered
    Dec 22, 2023
    Description

    Raw differential gene expression data, data S1, from: Molecular cascades and cell type-specific signatures in ASD revealed by single cell genomics

    https://doi.org/10.5061/dryad.4b8gthtkr

    Raw Differential Gene Expression data, Data S1, from "Molecular cascades and cell type-specific signatures in ASD revealed by single cell genomics"

    Description of the data and file structure

    Excell document with raw differential gene expression ASD vs. CTL per cell-type cluster. The first column is a drop-down selection to select which cell-type to view the differential gene expression results. The second (ASDvCTL) and third (CTLvASD) columns are the LOGFC value for each differential gene for a given cell type, the fourth (ASDvCTL) and fifth (CTLvASD) column are the p-values for each gene for a given cell type, the sixth (ASDvCTL) and seventh (CTLvASD) columns are the FDR-values for each gene for a given cell type, and the last column is the gene name.

    Sharing/Access information

    Links to...

  13. n

    Cell type labels for all clustering and normalization combinations compared...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Nov 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Hickey (2022). Cell type labels for all clustering and normalization combinations compared for CODEX multiplexed imaging [Dataset]. http://doi.org/10.5061/dryad.dfn2z352c
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 17, 2022
    Dataset provided by
    Stanford University
    Authors
    John Hickey
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    We performed CODEX (co-detection by indexing) multiplexed imaging on four sections of the human colon (ascending, transverse, descending, and sigmoid) using a panel of 47 oligonucleotide-barcoded antibodies. Subsequently images underwent standard CODEX image processing (tile stitching, drift compensation, cycle concatenation, background subtraction, deconvolution, and determination of best focal plane), and single cell segmentation. Output of this process was a dataframe of nearly 130,000 cells with fluorescence values quantified from each marker. We used this dataframe as input to 1 of the 5 normalization techniques of which we compared z, double-log(z), min/max, and arcsinh normalizations to the original unmodified dataset. We used these normalized dataframes as inputs for 4 unsupervised clustering algorithms: k-means, leiden, X-shift euclidian, and X-shift angular.

    From the clustering outputs, we then labeled the clusters that resulted for cells observed in the data producing 20 unique cell type labels. We also labeled cell types by hiearchical hand-gating data within cellengine (cellengine.com). We also created another gold standard for comparison by overclustering unormalized data with X-shift angular clustering. Finally, we created one last label as the major cell type call from each cell from all 21 cell type labels in the dataset.

    Consequently the dataset has individual cells segmented out in each row. Then there are columns for the X, Y position in pixels in the overall montage image of the dataset. There are also columns to indicate which region the data came from (4 total). The rest are labels generated by all the clustering and normalization techniques used in the manuscript and what were compared to each other. These also were the data that were used for neighborhood analysis for the last figure of the manuscript. These are provided at all four levels of cell type level granularity (from 7 cell types to 35 cell types).

  14. d

    Processed single cell data from CODEX multiplexed imaging of the human...

    • datadryad.org
    • search.dataone.org
    • +1more
    zip
    Updated Nov 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Hickey (2022). Processed single cell data from CODEX multiplexed imaging of the human intestine [Dataset]. http://doi.org/10.5061/dryad.pk0p2ngrf
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 10, 2022
    Dataset provided by
    Dryad
    Authors
    John Hickey
    Time period covered
    Oct 27, 2022
    Description

    For a detailed description of each of the steps of protocols and processes to obtain this data see the detailed materials and methods in the associated manuscript. Briefly, intestine pieces from 8 different sites across the small intestine and colon were taken and frozen in OCT. These were assembled into an array of 4 tissues, cut into 7 um slices, and stained with a panel of 54 CODEX DNA-oligonucleotide barcoded antibodies. Tissues were imaged with a Keyence microscope at 20x objective and then processed using image stitching, drift compensation, deconvolution, and cycle concatenation. Processed data were then segmented using CellVisionSegmenter, a neural network R-CNN-based single-cell segmentation algorithm. Cell type analysis was completed on B004, 5, and 6 by z normalization of protein markers used for clustering and then overclustered using leiden-based clustering. The cell type labels were verified by looking back at the original image. Cell type labels were transferred to other ...

  15. Z

    Data from: Discrete regulatory modules instruct hematopoietic lineage...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Aug 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgolopoulos, Grigorios; Psatha, Nikoletta; Iwata, Mineo; Nishida, Andrew; Som, Tannishtha; Yiangou, Minas; Stamatoyannopoulos, John A; Vierstra, Jeff (2021). Discrete regulatory modules instruct hematopoietic lineage commitment and differentiation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5291736
    Explore at:
    Dataset updated
    Aug 28, 2021
    Dataset provided by
    Department of Genetics, Development & Molecular Biology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
    Altius Institute for Biomedical Sciences, Seattle, WA, USA
    Authors
    Georgolopoulos, Grigorios; Psatha, Nikoletta; Iwata, Mineo; Nishida, Andrew; Som, Tannishtha; Yiangou, Minas; Stamatoyannopoulos, John A; Vierstra, Jeff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    bioRxiv preprint: https://www.biorxiv.org/content/10.1101/2020.04.02.022566v4

    Contact: Grigorios Georgolopoulos (ggeorgol@altius.org); Jeff Vierstra (jvierstra@altius.org)

    Lineage commitment and differentiation is driven by the concerted action of master transcriptional regulators at their target chromatin sites. Multiple efforts have characterized the key transcription factors (TFs) that determine the various hematopoietic lineages. However, the temporal interactions between individual TFs and their chromatin targets during differentiation and how these interactions dictate lineage commitment remains poorly understood. Here we delineate the temporal interplay between the cis- and the trans-regulatory landscape in establishing lineage commitment and differentiation in human hematopoiesis by performing a dense timecourse of chromatin accessibility (DNase I-seq), and gene expression (total and single cell RNA-seq).

    All data uploaded correspond to human genome build version GRCh38.

    Contents

    DNase I Hotspot (DHS) metadata: Supplementary_Data_1.txt

    DNase I Hotspot quantile-normalized counts: A tab-separated matrix with quantile-normalized DNase I density counts from 79,085 FDR 5% hotspots, across 12 erythroid differentiation timepoints from 3 donors, present in at least n=2 samples. Rows correspond to DHS information in Supplementary_Data_1.txt (hotspots.fdr.0.05.qnorm.counts.tsv.gz)

    Column information for DNase I Hotspot quantile-normalized counts: hotspots.fdr.0.05.qnorm.counts.info.tsv

    Developmentally regulated gene metadata (erythroid): Supplementary_Data_2.csv

    Gene matrix of quantile-normalized FPKM values (erythroid): A tab-separated matrix with the quantile-normalized FPKM values of all detected genes, across 13 erythroid differentiation timepoints from 3 donors. (fpkm_erythroid_qnorm.tsv.gz)

    Column information for the quantile-normalized FPKM gene matrix (erythroid): A tab-separated table (fpkm_erythroid_qnorm.info.tsv)

    CD34+ HSPC TADs at 10kb resolution: Supplementary_Data_3.bed

    Day 11 ex vivo erythroid progenitor TADs at 10kb resolution: Supplementary_Data_4.bed

    Transcription factor motif enrichment per DHS cluster: Supplementary_Data_5.csv

    Correlation information (links) between developmentally regulated DHS and target genes: Supplementary_Data_6.csv

    Chromatin anchor loops called from 10kb resolution Hi-C data: Supplementary_Data_7.bedgraph

    Developmentally regulated gene metadata (megakaryocytic): Supplementary_Data_8.csv

    Gene matrix of quantile-normalized FPKM values (megakaryocytic): A tab-separated matrix with the quantile-normalized FPKM values of all detected genes, across 13 megakaryocytic differentiation timepoints from 3 donors. (fpkm_megakaryocyte_qnorm.tsv.gz)

    Column information for the quantile-normalized FPKM gene matrix (megakaryocytic): A tab-separated table (fpkm_megakaryocyte_qnorm.info.tsv)

    Marker (differentially expressed) genes per single cell population: Supplementary_Data_9.csv

    A SCANPY h5ad Annotated DataFrame object: Annotated Data frame anndata in h5ad format including the gene-by-cell count matrix, Velocyto splicing kinetics (RNA velocity) information layer, along with obs, obsm, var, varm, and uns layers. (SCANPY_anndata_object.h5ad)

  16. Data for the training and testing of ccAFv2

    • zenodo.org
    application/gzip
    Updated Sep 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christohper Plaisier; Christohper Plaisier; Samantha O'Connor; Samantha O'Connor (2024). Data for the training and testing of ccAFv2 [Dataset]. http://doi.org/10.5281/zenodo.13786724
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Sep 18, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christohper Plaisier; Christohper Plaisier; Samantha O'Connor; Samantha O'Connor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single-cell transcriptomics has unveiled a vast landscape of cellular heterogeneity in which the cell cycle is a significant component. We trained a high-resolution cell cycle classifier (ccAFv2) using single cell RNA-seq (scRNA-seq) characterized human neural stem cells. The features of this classifier are that it classifies six cell cycle states (G1, Late G1, S, S/G2, G2/M, and M/Early G1) and a quiescent-like G0 state, and it incorporates a tunable parameter to filter out less certain classifications. The ccAFv2 classifier performed better than or equivalent to other state-of-the-art methods even while classifying more cell cycle states, including G0. We showcased the versatility of ccAFv2 by successfully applying it to classify cells, nuclei, and spatial transcriptomics data in humans and mice, using various normalization methods and gene identifiers. We provide methods to regress the cell cycle expression patterns out of single cell or nuclei data to uncover underlying biological signals. The classifier can be used either as an R package integrated with Seurat (https://github.com/plaisier-lab/ccafv2_R) or a PyPI package integrated with scanpy (https://pypi.org/project/ccAF/). We proved that ccAFv2 has enhanced accuracy, flexibility, and adaptability across various experimental conditions, establishing ccAFv2 as a powerful tool for dissecting complex biological systems, unraveling cellular heterogeneity, and deciphering the molecular mechanisms by which proliferation and quiescence affect cellular processes.

  17. H

    Single-Cell Level Estrogen Receptor Activity Data

    • dataverse.harvard.edu
    • datasetcatalog.nlm.nih.gov
    Updated Oct 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zahir Aghayev; Adam T. Szafran; Anh Tran; Hari S. Ganesh; Fabio Stossi; Lan Zhou; Michael A. Mancini; Efstratios N. Pistikopoulos; Burcu Beykal (2023). Single-Cell Level Estrogen Receptor Activity Data [Dataset]. http://doi.org/10.7910/DVN/NIBGOQ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 20, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Zahir Aghayev; Adam T. Szafran; Anh Tran; Hari S. Ganesh; Fabio Stossi; Lan Zhou; Michael A. Mancini; Efstratios N. Pistikopoulos; Burcu Beykal
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains normalized single-cell level data for 60 reference compounds analyzed with high throughput microscopy and high content analysis-based experiments that were performed using the GFP-ER⍺:PRL-HeLa cell line. The multidimensional imaging data was used to train a classification model to ultimately predict the impact of unknown compounds on the estrogen receptor, either as agonists or antagonists as outlined in the RMarkdown file. All chemical screening data were normalized to E2-treated control samples using the robust z-score calculation based on single-cell data. In experiments that spanned multiple sample plates, the median and MAD values were calculated for each plate, the median MAD value across all plates was determined, and the robust z-score was calculated using the per-plate median value and the overall median MAD value. Datasets with names "LMH10" and "M50" are engineered datasets with number of features expanded using percentiles. Percentiles were determined by identifying the single-cell features falling into the 0-10th percentile range (L10), 45th–55th percentile range (M10), the 90th–100th percentile range (H10), and the 25th–75th percentile range (M50) from each sample. Experimental data sets containing all single-cell data and engineered data sets containing only the L10 + M10 + H10 or only the M50 single-cell features were also divided into 2 subsets depending on the detection of visible GFP-ER nuclear spots (all cells and array positive cells) used for subsequent model development.

  18. Z

    Transcriptional landscape of repetitive elements in single cells

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mallona, Izaskun (2021). Transcriptional landscape of repetitive elements in single cells [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4584955
    Explore at:
    Dataset updated
    Mar 6, 2021
    Dataset provided by
    University of Zurich
    Authors
    Mallona, Izaskun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Purpose

    We quantified conventional features (GENCODE) as well as repetitive elements (RepeatMasker) from multiple single-cell RNA-seq datasets using several analytical approaches, and starting from raw data, and looked for association with known cell metadata (e.g. cell types).

    This repository includes the count tables and cell embeddings as generated by export_run_for_upload.R from repeats_sc (bitbucket).

    Files

    Please check the README.md file for a full description of all files. Each tarball contains a set of gz compressed files, including cell embeddings, raw and normalized count matrices, and metadata at the feature and cell levels. Data are stored as either comma-separated values (CSV) or MatrixMarket files, UTF8-encoded and with Unix EOLs.

    Datasets

    We reanalyzed sequencing (fastq) files generated by other researchers; this repository contains results from our reanalysis, and not the original raw data. We are very grateful to the original data producers and encourage checking their data release licenses and checking and citing their publication(s). The README.md has a full description of these datasets and their sources.

    Source code

    https://bitbucket.org/imallona/repeats_sc

    Contact

    izaskun dot mallona at gmail.com

    Funding

    Swiss National Science Foundation SNF Spark 190824 (CRSK-3_190824)

  19. f

    Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data...

    • frontiersin.figshare.com
    application/cdfv2
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s001
    Explore at:
    application/cdfv2Available download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

  20. m

    GERDA datasets including NGS and SGA data

    • data.mendeley.com
    Updated Apr 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabian Otte (2023). GERDA datasets including NGS and SGA data [Dataset]. http://doi.org/10.17632/8c4zbxfvwk.3
    Explore at:
    Dataset updated
    Apr 26, 2023
    Authors
    Fabian Otte
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets linked to publication "Revealing viral and cellular dynamics of HIV-1 at the single-cell level during early treatment periods", Otte et al 2023 published in Cell Reports Methods pre-ART (antiretroviral therapy) cryo-conserved and and whole blood specimen were sampled for HIV-1 virus reservoir determination in HIV-1 positive individuals from the Swiss HIV Study Cohort. Patients were monitored for proviral (DNA), poly-A transcripts (RNA), late protein translation (Gag and Envelope reactivation co-detection assay, GERDA) and intact viruses (golden standard: viral outgrowth assay, VOA). In this dataset we deposited the pipeline for the multidimensional data analysis of our newly established GERDA method, using DBScan and tSNE. For further comprehension NGS and Sanger sequencing data were attached as processed and raw data (GenBank).

    Resubmitted to Cell Reports Methods (Jan-2023), accepted in principal (Mar-2023)

    GERDA is a new detection method to decipher the HIV-1 cellular reservoir in blood (tissue or any other specimen). It integrates HIV-1 Gag and Env co-detection along with cellular surface markers to reveal 1) what cells still contain HIV-1 translation competent virus and 2) which marker the respective infected cells express. The phenotypic marker repertoire of the cells allow to make predictions on potential homing and to assess the HIV-1 (tissue) reservoir. All FACS data were acquired on a LSRFortessa BD FACS machine (markers: CCR7, CD45RA, CD28, CD4, CD25, PD1, IntegrinB7, CLA, HIV-1 Env, HIV-1 Gag) Raw FACS data (pre-gated CD4CD3+ T-cells) were arcsin transformed and dimensionally reduced using optsne. Data was further clustered using DBSCAN and either individual clusters were further analyzed for individual marker expression or expression profiles of all relevant clusters were analyzed by heatmaps. Sequences before/after therapy initiation and during viral outgrowth cultures were monitored for individuals P01-46 and P04-56 by Next-generation sequencing (NGS of HIV-1 Envelope V3 loop only) and by Sanger (single genome amplification, SGA)

    data normalization code (by Julian Spagnuolo) FACS normalized data as CSV (XXX_arcsin.csv) OMIQ conText file (_OMIQ-context_XXX) arcsin normalized FACS data after optsne dimension reduction with OMIQ.ai as CSV file (XXXarcsin.csv.csv) R pipeline with codes (XXX_commented.R) P01_46-NGS and Sanger sequences P04_56-NGS and Sanger sequences

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aaron L. Lun; Karsten Bach; John Marioni (2023). Additional file 3 of Pooling across cells to normalize single-cell RNA sequencing data with many zero counts [Dataset]. http://doi.org/10.6084/m9.figshare.c.3629252_D2.v1
Organization logoOrganization logo

Additional file 3 of Pooling across cells to normalize single-cell RNA sequencing data with many zero counts

Related Article
Explore at:
txtAvailable download formats
Dataset updated
Jun 9, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Aaron L. Lun; Karsten Bach; John Marioni
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Enriched GO terms for library size normalization. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to library size normalization. The fields are the same as described for Additional file 2. (13 KB PDF)

Search
Clear search
Close search
Google apps
Main menu