63 datasets found
  1. Additional file 3 of Pooling across cells to normalize single-cell RNA...

    • springernature.figshare.com
    txt
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron L. Lun; Karsten Bach; John Marioni (2023). Additional file 3 of Pooling across cells to normalize single-cell RNA sequencing data with many zero counts [Dataset]. http://doi.org/10.6084/m9.figshare.c.3629252_D2.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Aaron L. Lun; Karsten Bach; John Marioni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Enriched GO terms for library size normalization. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to library size normalization. The fields are the same as described for Additional file 2. (13 KB PDF)

  2. Additional file 2 of Pooling across cells to normalize single-cell RNA...

    • springernature.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron L. Lun; Karsten Bach; John Marioni (2023). Additional file 2 of Pooling across cells to normalize single-cell RNA sequencing data with many zero counts [Dataset]. http://doi.org/10.6084/m9.figshare.c.3629252_D1.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Aaron L. Lun; Karsten Bach; John Marioni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Enriched GO terms for deconvolution. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to deconvolution. The identifier and name of each term is shown along with the total number of genes associated with the term, the number of associated genes that are also DE, the expected number under the null hypothesis, and the Fisher p value. (13 KB PDF)

  3. N

    Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust...

    • data.niaid.nih.gov
    Updated May 15, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bacher R; Chu L; Kendziorski C; Swanson S (2019). Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust normalization of single-cell rna-seq data [Dataset]. https://data.niaid.nih.gov/resources?id=gse85917
    Explore at:
    Dataset updated
    May 15, 2019
    Dataset provided by
    University of Florida
    Authors
    Bacher R; Chu L; Kendziorski C; Swanson S
    Description

    Normalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data. Total 183 single cells (92 H1 cells, 91 H9 cells), sequenced twice, were used to evaluate SCnorm in normalizing single cell RNA-seq experiments. Total 48 bulk H1 samples were used to compare bulk and single cell properties. For single-cell RNA-seq, the identical single-cell indexed and fragmented cDNA were pooled at 96 cells per lane or at 24 cells per lane to test the effects of sequencing depth, resulting in approximately 1 million and 4 million mapped reads per cell in the two pooling groups, respectively.

  4. f

    Table_3_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

    • frontiersin.figshare.com
    docx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Lytal; Di Ran; Lingling An (2023). Table_3_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s004
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicholas Lytal; Di Ran; Lingling An
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.

  5. f

    Table_1_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

    • frontiersin.figshare.com
    docx
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Lytal; Di Ran; Lingling An (2023). Table_1_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s002
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicholas Lytal; Di Ran; Lingling An
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.

  6. pbmc single cell RNA-seq matrix

    • zenodo.org
    csv
    Updated May 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Buchet; Samuel Buchet; Francesco Carbone; Morgan Magnin; Morgan Magnin; Mickaël Ménager; Olivier Roux; Olivier Roux; Francesco Carbone; Mickaël Ménager (2021). pbmc single cell RNA-seq matrix [Dataset]. http://doi.org/10.5281/zenodo.4730807
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 4, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Samuel Buchet; Samuel Buchet; Francesco Carbone; Morgan Magnin; Morgan Magnin; Mickaël Ménager; Olivier Roux; Olivier Roux; Francesco Carbone; Mickaël Ménager
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single cell RNA-sequencing dataset of peripheral blood mononuclear cells (pbmc: T, B, NK and monocytes) extracted from two healthy donors.

    Cells labeled as C26 come from a 30 years old female and cells labeled as C27 come from a 53 years old male. Cells have been isolated from blood using ficoll. Samples were sequenced using standard 3' v3 chemistry protocols by 10x genomics. Cellranger v4.0.0 was used for the processing, and reads were aligned to the ensembl GRCg38 human genome (GRCg38_r98-ensembl_Sept2019). QC metrics were calculated on the count matrix generated by cellranger (filtered_feature_bc_matrix). Cells with less than 3 genes per cells, less than 500 reads per cell and more than 20% of mithocondrial genes were discarded.

    The processing steps was performed with the R package Seurat (https://satijalab.org/seurat/), including sample integration, data normalisation and scaling, dimensional reduction, and clustering. SCTransform method was adopted for the normalisation and scaling steps. The clustered cells were manually annotated using known cell type markers.

    Files content:

    - raw_dataset.csv: raw gene counts

    - normalized_dataset.csv: normalized gene counts (single cell matrix)

    - cell_types.csv: cell types identified from annotated cell clusters

    - cell_types_macro.csv: cell macro types

    - UMAP_coordinates.csv: 2d cell coordinates computed with UMAP algorithm in Seurat

  7. c

    Data from: Reference transcriptomics of porcine peripheral immune cells...

    • s.cnmilf.com
    • agdatacommons.nal.usda.gov
    • +2more
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/data-from-reference-transcriptomics-of-porcine-peripheral-immune-cells-created-through-bul-e667c
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows: matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz) *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include: nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().

  8. Raw and normalized count data for "Probabilistic cell-type assignment of...

    • zenodo.org
    bin
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kieran Campbell; Kieran Campbell (2020). Raw and normalized count data for "Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling" [Dataset]. http://doi.org/10.5281/zenodo.3372746
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kieran Campbell; Kieran Campbell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SingleCellExperiment objects containing raw and normalized counts, as well as reduced dimension representations and cell type annotations for both the follicular lymphoma samples (sce_follicular_annotated_final.rds) and high grade serous ovarian cancer samples (sce_hgsc_annotated_final.rds) as detailed in the paper.

  9. f

    Table_5_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

    • frontiersin.figshare.com
    docx
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Lytal; Di Ran; Lingling An (2023). Table_5_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s006
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicholas Lytal; Di Ran; Lingling An
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.

  10. Data from: Subsets of tissue CD4 T cells display different susceptibilities...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Feb 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaoyu Luo (2023). Subsets of tissue CD4 T cells display different susceptibilities to HIV infection and death: Analysis by CyTOF and single cell RNA-seq [Dataset]. http://doi.org/10.7272/Q6SX6BFR
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 17, 2023
    Dataset provided by
    HIV Vaccine Trials Networkhttp://www.hvtn.org/
    National Institute of Allergy and Infectious Diseaseshttp://www.niaid.nih.gov/
    Division of Acquired Immunodeficiency Syndrome
    Authors
    Xiaoyu Luo
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    CD4 T lymphocytes belong to diverse cellular subsets whose sensitivity or resistance to HIV-associated killing remains to be defined. Working with lymphoid cells from human tonsils, we characterized the HIV-associated depletion of various CD4 T cell subsets using mass cytometry and single-cell RNA-seq. CD4 T cell subsets preferentially killed by HIV are phenotypically distinct from those resistant to HIV-associated cell death, in a manner not fully accounted for by their susceptibility to productive infection. Preferentially-killed subsets express CXCR5 and CXCR4 while preferentially-infected subsets exhibit an activated and exhausted effector memory cell phenotype. Single-cell RNA-seq analysis reveals that the subsets of preferentially-killed cells express genes favoring abortive infection and pyroptosis. These studies emphasize a complex interplay between HIV and distinct tissue-based CD4 T cell subsets, and the important contribution of abortive infection and inflammatory programmed cell death to the overall depletion of CD4 T cells that accompanies untreated HIV infection. Methods mass cytometry; single-cell RNA-seq mass cytometry data has been pre-gated on live singlets and normalized by CD8 cell number single-cell RNA-seq data are raw data

  11. Human breast cancer PDX models bulk and single cell RNA sequencing

    • zenodo.org
    • explore.openaire.eu
    bin, csv
    Updated Aug 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Long V. Nguyen; Long V. Nguyen; Yaniv Eyal-Lubling; Daniel Guerrero-Romero; Raquel Manzano Garcia; Oscar M. Rueda; Oscar M. Rueda; Carlos Caldas; Carlos Caldas; Yaniv Eyal-Lubling; Daniel Guerrero-Romero; Raquel Manzano Garcia (2024). Human breast cancer PDX models bulk and single cell RNA sequencing [Dataset]. http://doi.org/10.5281/zenodo.10978990
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Aug 7, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Long V. Nguyen; Long V. Nguyen; Yaniv Eyal-Lubling; Daniel Guerrero-Romero; Raquel Manzano Garcia; Oscar M. Rueda; Oscar M. Rueda; Carlos Caldas; Carlos Caldas; Yaniv Eyal-Lubling; Daniel Guerrero-Romero; Raquel Manzano Garcia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset includes information relevant to the following manuscript from the labs of Prof. Carlos Caldas (University of Cambridge), and Dr. Long V. Nguyen (Princess Margaret Cancer Centre, University Health Network):

    Nguyen LV et al. Dynamics and plasticity of human breast cancer single cell-derived clones. Under consideration for publication.

    Bulk RNA sequencing raw count matrices are provided (RawCounts.csv) along with the normalized count matrices (LogCPMNormCounts.csv).

    Single cell RNA sequencing count matrix processed from R package metacell is provided (mat.pdx_LN_v2_filt.Rda), along with the mc and mc2d files with information on metacell partitions (mc.pdx_LN_v2_filt.Rda and mc2d.pdx_LN_v2_filt.Rda).

    Code and information on data analysis is provided for reviewers in our unpublished manuscript and on Github (https://github.com/cclab-brca/clone-dynamics).

  12. Z

    Data from: Discrete regulatory modules instruct hematopoietic lineage...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iwata, Mineo (2021). Discrete regulatory modules instruct hematopoietic lineage commitment and differentiation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5291736
    Explore at:
    Dataset updated
    Aug 28, 2021
    Dataset provided by
    Georgolopoulos, Grigorios
    Stamatoyannopoulos, John A
    Som, Tannishtha
    Yiangou, Minas
    Psatha, Nikoletta
    Nishida, Andrew
    Iwata, Mineo
    Vierstra, Jeff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    bioRxiv preprint: https://www.biorxiv.org/content/10.1101/2020.04.02.022566v4

    Contact: Grigorios Georgolopoulos (ggeorgol@altius.org); Jeff Vierstra (jvierstra@altius.org)

    Lineage commitment and differentiation is driven by the concerted action of master transcriptional regulators at their target chromatin sites. Multiple efforts have characterized the key transcription factors (TFs) that determine the various hematopoietic lineages. However, the temporal interactions between individual TFs and their chromatin targets during differentiation and how these interactions dictate lineage commitment remains poorly understood. Here we delineate the temporal interplay between the cis- and the trans-regulatory landscape in establishing lineage commitment and differentiation in human hematopoiesis by performing a dense timecourse of chromatin accessibility (DNase I-seq), and gene expression (total and single cell RNA-seq).

    All data uploaded correspond to human genome build version GRCh38.

    Contents

    DNase I Hotspot (DHS) metadata: Supplementary_Data_1.txt

    DNase I Hotspot quantile-normalized counts: A tab-separated matrix with quantile-normalized DNase I density counts from 79,085 FDR 5% hotspots, across 12 erythroid differentiation timepoints from 3 donors, present in at least n=2 samples. Rows correspond to DHS information in Supplementary_Data_1.txt (hotspots.fdr.0.05.qnorm.counts.tsv.gz)

    Column information for DNase I Hotspot quantile-normalized counts: hotspots.fdr.0.05.qnorm.counts.info.tsv

    Developmentally regulated gene metadata (erythroid): Supplementary_Data_2.csv

    Gene matrix of quantile-normalized FPKM values (erythroid): A tab-separated matrix with the quantile-normalized FPKM values of all detected genes, across 13 erythroid differentiation timepoints from 3 donors. (fpkm_erythroid_qnorm.tsv.gz)

    Column information for the quantile-normalized FPKM gene matrix (erythroid): A tab-separated table (fpkm_erythroid_qnorm.info.tsv)

    CD34+ HSPC TADs at 10kb resolution: Supplementary_Data_3.bed

    Day 11 ex vivo erythroid progenitor TADs at 10kb resolution: Supplementary_Data_4.bed

    Transcription factor motif enrichment per DHS cluster: Supplementary_Data_5.csv

    Correlation information (links) between developmentally regulated DHS and target genes: Supplementary_Data_6.csv

    Chromatin anchor loops called from 10kb resolution Hi-C data: Supplementary_Data_7.bedgraph

    Developmentally regulated gene metadata (megakaryocytic): Supplementary_Data_8.csv

    Gene matrix of quantile-normalized FPKM values (megakaryocytic): A tab-separated matrix with the quantile-normalized FPKM values of all detected genes, across 13 megakaryocytic differentiation timepoints from 3 donors. (fpkm_megakaryocyte_qnorm.tsv.gz)

    Column information for the quantile-normalized FPKM gene matrix (megakaryocytic): A tab-separated table (fpkm_megakaryocyte_qnorm.info.tsv)

    Marker (differentially expressed) genes per single cell population: Supplementary_Data_9.csv

    A SCANPY h5ad Annotated DataFrame object: Annotated Data frame anndata in h5ad format including the gene-by-cell count matrix, Velocyto splicing kinetics (RNA velocity) information layer, along with obs, obsm, var, varm, and uns layers. (SCANPY_anndata_object.h5ad)

  13. m

    Single Cell RNAseq BAM files and metadata "Tailoring vascular phenotype...

    • data.mendeley.com
    Updated Apr 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohanraj Ramachandran (2023). Single Cell RNAseq BAM files and metadata "Tailoring vascular phenotype through AAV T therapy promotes anti-tumor immunity in glioma, Ramachandran et al, 2023" [Dataset]. http://doi.org/10.17632/fwczkb6xw3.2
    Explore at:
    Dataset updated
    Apr 17, 2023
    Authors
    Mohanraj Ramachandran
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study was designed to capture the changes in CD8 T cell phenotypes in murine glioma (CT-2A) model post immunotherapy with AAV-LIGHT (TNFSF14). Tumour infiltrating CD45 cells were isolated by flow sorting and subject to targeted single-cell transcriptome sequencing and downstream analysis. 1. The 3 .bam files contain aligned sequence data for sc-RNAseq of all the CD8+T cells in the 3 libraries that were analysed in the mentioned study. 2. md5 checksum file to verify integrity of the data 3. Metadata including cell id to annotate the cells to treatment and outcome 4. Normalized counts 4. Normalized gene expression data of CD8 T cells.

  14. f

    Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data...

    • frontiersin.figshare.com
    zip
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

  15. s

    Targeted scRNA-seq and AbSeq of human CAR-T cell infusion product from 24...

    • figshare.scilifelab.se
    • researchdata.se
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Claudio Mirabello; Magnus Essand; Mohanraj Ramachandran; Tina Sarén (2025). Targeted scRNA-seq and AbSeq of human CAR-T cell infusion product from 24 cancer patients [Dataset]. http://doi.org/10.17044/scilifelab.20208764.v1
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Uppsala University
    Authors
    Claudio Mirabello; Magnus Essand; Mohanraj Ramachandran; Tina Sarén
    License

    https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/

    Description

    Backgroud informationThe dataset contains single cell targeted RNA sequencing (RNAseq) and targeted antibody-oligonucleotide conjugates sequencing (Abseq) data from chimeric antigen receptor (CAR)-engineered T cells used to treat each individual cancer patients in a clinical study. The starting material was in all cases autologous T cells harvested from peripheral blood of patients. The data is collected from 24 participants of which 23 were adult patients with relapsed or refractory B cell lymphoma and one was a pediatric patient with relapsed B cell acute lymphoblastic leukemia. The data were generated as part of a study by Sarén et. al, Clinical Cancer Research (2023).Targeted RNA and protein single-cell libraries were generated using the BD Rhapsody™ platform (BD Biosciences). Cells were labeled with sample tags from the BD Human Immune Single-Cell Multiplexing Kit and BD Ab-seq Ab-Oligos and live cells were collected by flow cytometry. CAR-T cells were loaded on BD Rhapsody cartridge and mRNA captured with cell capture beads and used as template for cDNA synthesis. Four separate targeted libraries were produced and pooled for paired-end sequencing on NovaSeq 6000 S1 sequencer (Illumina) at the SNP&SEQ Technology Platform (Uppsala, Sweden).Terms of accessSequencing data generated during the current study are not publicly available due to the European General Data Protection Regulation (GDPR) to protect patients’ privacy but are available from the corresponding author on reasonable request (see contact info). The dataset is only to be used for research that is seeking to advance the understanding of CAR-T cell treatment of cancer.Ancillary datasets and codeProcessed RNAseq and AbSeq data, in the form of raw and normalized count matrices, are available on BioStudies (Accession: E-MTAB-12407).R code used to process the data is available on the study GitHub repository:https://github.com/magnessa/EudraCT_2016-004043-36

  16. d

    Raw differential gene expression data, data S1, from: Molecular cascades and...

    • search.dataone.org
    • zenodo.org
    • +1more
    Updated Jan 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brie Wamsley (2024). Raw differential gene expression data, data S1, from: Molecular cascades and cell type-specific signatures in ASD revealed by single cell genomics [Dataset]. http://doi.org/10.5061/dryad.4b8gthtkr
    Explore at:
    Dataset updated
    Jan 17, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Brie Wamsley
    Time period covered
    Jan 1, 2023
    Description

    Genomic profiling in post-mortem brain from autistic individuals has consistently revealed convergent molecular changes. What drives these changes and how they relate to genetic susceptibility in this complex condition is not understood. We performed deep single nuclear RNA sequencing (snRNAseq) to examine cell composition and transcriptomics, identifying dysregulation of cell type-specific gene regulatory networks (GRNs) in autism, which we corroborated using snATAC-seq and spatial transcriptomics. Transcriptomic changes were primarily cell type-specific, involving multiple cell types, most prominently interhemispheric and callosal-projecting neurons, interneurons within superficial laminae, and distinct glial reactive states involving oligodendrocytes, microglia, and astrocytes. Autism-associated GRN drivers and their targets were enriched in rare and common genetic risk variants, connecting autism genetic susceptibility and cellular and circuit alterations in the human brain. This da..., Please see Manuscript for detailed information. In Brief: we generated Pseudobulk expression ASD vs CTL analysis by cell type. We generated pseudobulk counts for each sample by adding counts from the same cell type. Then pseudobulk counts are normalized by variance stabilizing transformation method. To identify genes differentially expressed in ASD compared to control in each cell type, we examined covariates with top 5 PCs from normalized pseudo-bulk expression matrix. We identified the following covariates consistently correlated with top 5PCs for each cell type: age, PMI, BrainRegion, SeqBatch, Mito_perc, and ngenes. We then randomly selected subjects 500 times and calculated average beta to regress out effects of these covariates. Then we used limma-voom to identify differentially expressed genes for each cluster., , # Raw differential gene expression data, data S1, from: Molecular cascades and cell type-specific signatures in ASD revealed by single cell genomics

    Raw Differential Gene Expression data, Data S1, from "Molecular cascades and cell type-specific signatures in ASD revealed by single cell genomics"

    Description of the data and file structure

    Excell document with raw differential gene expression ASD vs. CTL per cell-type cluster. The first column is a drop-down selection to select which cell-type to view the differential gene expression results. The second (ASDvCTL) and third (CTLvASD) columns are the LOGFC value for each differential gene for a given cell type, the fourth (ASDvCTL) and fifth (CTLvASD) column are the p-values for each gene for a given cell type, the sixth (ASDvCTL) and seventh (CTLvASD) columns are the FDR-values for each gene for a given cell type, and the last column is the gene name.

    Sharing/Access information

    Links to other publicly accessible locations of t...

  17. Ngs-Based Rna-Seq Market Analysis North America, Europe, Asia, Rest of World...

    • technavio.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio, Ngs-Based Rna-Seq Market Analysis North America, Europe, Asia, Rest of World (ROW) - US, UK, Germany, Singapore, China - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/ngs-based-rna-seq-market-analysis
    Explore at:
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Global
    Description

    Snapshot img

    NGS-Based Rna-Seq Market Size 2024-2028

    The NGS-based RNA-seq market size is forecast to increase by USD 6.66 billion, at a CAGR of 20.52% between 2023 and 2028.

    The market is witnessing significant growth, driven by the increased adoption of next-generation sequencing (NGS) methods for RNA-Seq analysis. The advanced capabilities of NGS techniques, such as high-throughput, cost-effectiveness, and improved accuracy, have made them the preferred choice for researchers and clinicians in various fields, including genomics, transcriptomics, and personalized medicine. However, the market faces challenges, primarily from the lack of clinical validation on direct-to-consumer genetic tests. As the use of NGS technology in consumer applications expands, ensuring the accuracy and reliability of results becomes crucial.
    The absence of standardized protocols and regulatory oversight in this area poses a significant challenge to market growth and trust. Companies seeking to capitalize on market opportunities must focus on addressing these challenges through collaborations, partnerships, and investments in research and development to ensure the clinical validity and reliability of their NGS-based RNA-Seq offerings.
    

    What will be the Size of the NGS-based RNA-Seq market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
    Request Free Sample

    The market continues to evolve, driven by advancements in NGS technology and its applications across various sectors. Spatial transcriptomics, a novel approach to studying gene expression in its spatial context, is gaining traction in disease research and precision medicine. Splice junction detection, a critical component of RNA-seq data analysis, enhances the accuracy of gene expression profiling and differential gene expression studies. Cloud computing plays a pivotal role in handling the massive amounts of data generated by NGS platforms, enabling real-time data analysis and storage. Enrichment analysis, gene ontology, and pathway analysis facilitate the interpretation of RNA-seq data, while data normalization and quality control ensure the reliability of results.

    Precision medicine and personalized therapy are key applications of RNA-seq, with single-cell RNA-seq offering unprecedented insights into the complexities of gene expression at the single-cell level. Read alignment and variant calling are essential steps in RNA-seq data analysis, while bioinformatics pipelines and RNA-seq software streamline the process. NGS technology is revolutionizing drug discovery by enabling the identification of biomarkers and gene fusion detection in various diseases, including cancer and neurological disorders. RNA-seq is also finding applications in infectious diseases, microbiome analysis, environmental monitoring, agricultural genomics, and forensic science. Sequencing costs are decreasing, making RNA-seq more accessible to researchers and clinicians.

    The ongoing development of sequencing platforms, library preparation, and sample preparation kits continues to drive innovation in the field. The dynamic nature of the market ensures that it remains a vibrant and evolving field, with ongoing research and development in areas such as data visualization, clinical trials, and sequencing depth.

    How is this NGS-based RNA-Seq industry segmented?

    The NGS-based RNA-seq industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    End-user
    
      Acamedic and research centers
      Clinical research
      Pharma companies
      Hospitals
    
    
    Technology
    
      Sequencing by synthesis
      Ion semiconductor sequencing
      Single-molecule real-time sequencing
      Others
    
    
    Geography
    
      North America
    
        US
    
    
      Europe
    
        Germany
        UK
    
    
      APAC
    
        China
        Singapore
    
    
      Rest of World (ROW)
    

    .

    By End-user Insights

    The acamedic and research centers segment is estimated to witness significant growth during the forecast period.

    The global next-generation sequencing (NGS) market for RNA sequencing (RNA-Seq) is primarily driven by academic and research institutions, including those from universities, research institutes, government entities, biotechnology organizations, and pharmaceutical companies. These institutions utilize NGS technology for various research applications, such as whole-genome sequencing, epigenetics, and emerging fields like agrigenomics and animal research, to enhance crop yield and nutritional composition. NGS-based RNA-Seq plays a pivotal role in translational research, with significant investments from both private and public organizations fueling its growth. The technology is instrumental in disease research, enabling the identification

  18. f

    Data_Sheet_1_Non-linear Normalization for Non-UMI Single Cell RNA-Seq.PDF

    • frontiersin.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhijin Wu; Kenong Su; Hao Wu (2023). Data_Sheet_1_Non-linear Normalization for Non-UMI Single Cell RNA-Seq.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.612670.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhijin Wu; Kenong Su; Hao Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single cell RNA-seq data, like data from other sequencing technology, contain systematic technical noise. Such noise results from a combined effect of unequal efficiencies in the capturing and counting of mRNA molecules, such as extraction/amplification efficiency and sequencing depth. We show that such technical effects are not only cell-specific, but also affect genes differently, thus a simple cell-wise size factor adjustment may not be sufficient. We present a non-linear normalization approach that provides a cell- and gene-specific normalization factor for each gene in each cell. We show that the proposed normalization method (implemented in “SC2P" package) reduces more technical variation than competing methods, without reducing biological variation. When technical effects such as sequencing depths are not balanced between cell populations, SC2P normalization also removes the bias due to uneven technical noise. This method is applicable to scRNA-seq experiments that do not use unique molecular identifier (UMI) thus retain amplification biases.

  19. Bulk RNA-seq Dataset for "Molecular and developmental deficits in...

    • zenodo.org
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YuJu Lee; Yoobin Cho; Qihuang Zhang; Qihuang Zhang; Wei-Hsiang Huang; Wei-Hsiang Huang; YuJu Lee; Yoobin Cho (2025). Bulk RNA-seq Dataset for "Molecular and developmental deficits in Smith-Magenis syndrome human stem cell-derived cortical neural models" [Dataset]. http://doi.org/10.5281/zenodo.15391244
    Explore at:
    Dataset updated
    Jun 2, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    YuJu Lee; Yoobin Cho; Qihuang Zhang; Qihuang Zhang; Wei-Hsiang Huang; Wei-Hsiang Huang; YuJu Lee; Yoobin Cho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset supports the manuscript titled "Molecular and developmental deficits in Smith-Magenis syndrome patient hiPSC-derived cortical neural models". It includes processed single-cell RNA sequencing (scRNA-seq) and bulk RNA sequencing (bulk RNA-seq) data derived from human induced pluripotent stem cell (hiPSC)-derived cortical neural progenitor cells and neurons obtained from Smith-Magenis syndrome (SMS) patients and matched healthy controls.

    The dataset comprises:

    • bulkRNA-seq.zip: Normalized gene expression count tables and differential gene expression results from bulk RNA-seq analysis, along with patient-level metadata including demographic information (e.g., age, sex, diagnosis group).

    The data capture transcriptomic changes across developmental stages and enable the study of disease-associated molecular and cellular alterations in SMS. These files are intended for secondary analysis and reproducibility; raw FASTQ files are not included in this deposit. A companion GitHub repository with code for data preprocessing and analysis will be provided.

  20. Analysis Products: Transcription factor stoichiometry, motif affinity and...

    • zenodo.org
    tsv, zip
    Updated Nov 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Surag Nair; Surag Nair; Mohamed Ameen; Kevin Wang; Kevin Wang; Anshul Kundaje; Anshul Kundaje; Mohamed Ameen (2023). Analysis Products: Transcription factor stoichiometry, motif affinity and syntax regulate single cell chromatin dynamics during fibroblast reprogramming to pluripotency [Dataset]. http://doi.org/10.5281/zenodo.8313962
    Explore at:
    zip, tsvAvailable download formats
    Dataset updated
    Nov 11, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Surag Nair; Surag Nair; Mohamed Ameen; Kevin Wang; Kevin Wang; Anshul Kundaje; Anshul Kundaje; Mohamed Ameen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This record contains analysis products for the paper "Transcription factor stoichiometry, motif affinity and syntax regulate single cell chromatin dynamics during fibroblast reprogramming to pluripotency" by Nair, Ameen et al. Please refer to the READMEs in the directories, which are summarized below.

    The record contains the following files:

    `clusters.tsv`: contains the cluster id, name and colour of clusters in the paper

    scATAC.zip

    Analysis products for the single-cell ATAC-seq data. Contains:

    - `cells.tsv`: list of barcodes that pass QC. Columns include:
    - `barcode`
    - `sample`: (time point)
    - `umap1`
    - `umap2`
    - `cluster`
    - `dpt_pseudotime_fibr_root`: pseudotime values treating a fibroblast cell as root
    - `dpt_pseudotime_xOSK_root`: pseudotime values treating xOSK cell as root
    - `peaks.bed`: list of peaks of 500bp across all cell states. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA.
    - `features.tsv`: 50 dimensional representation of each cell
    - `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed`

    scATAC_clusters.zip

    Analysis products corresponding to cluster pseudo-bulks of the single-cell ATAC-seq data.

    - `clusters.tsv`: contains the cluster id, name and colour used in the paper
    - `peaks`: contains `overlap_reproducibilty/overlap.optimal_peak` peaks called using ENCODE bulk ATAC-seq pipeline in the narrowPeak format.
    - `fragments`: contains per cluster fragment files

    scATAC_scRNA_integration.zip

    Analysis products from the integration of scATAC with scRNA. Contains:

    - `peak_gene_links_fdr1e-4.tsv`: file with peak gene links passing FDR 1e-4. For analyses in the paper, we filter to peaks with absolute correlation >0.45.
    - `harmony.cca.30.feat.tsv`: 30 dimensional co-embedding for scATAC and scRNA cells obtained by CCA followed by applying Harmony over assay type.
    - `harmony.cca.metadata.tsv`: UMAP coordinates for scATAC and scRNA cells derived from the Harmony CCA embedding. First column contains barcode.

    scRNA.zip

    Analysis products for the single-cell RNA-seq data. Contains:

    - `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca), knn graphs, all associated metadata. Note that barcode suffix (1-9 corresponds to samples D0, D2, ..., D14, iPSC)
    - `genes.txt`: list of all genes
    - `cells.tsv`: list of barcodes that pass QC across samples. Contains:
    - `barcode_sample`: barcode with index of sample (1-9 corresponding to D0, D2, ..., D14, iPSC)
    - `sample`: sample name (D0, D2, .., D14, iPSC)
    - `umap1`
    - `umap2`
    - `nCount_RNA`
    - `nFeature_RNA`
    - `cluster`
    - `percent.mt`: percent of mitochondrial transcripts in cell
    - `percent.oskm`: percent of OSKM transcripts in cell
    - `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt`
    - `pca.tsv`: first 50 PC of each cell
    - `oskm_endo_sendai.tsv`: estimated raw counts (cts, may not be integers) and log(1+ tp10k) normalized expression (norm) for endogenous and exogenous (Sendai derived) counts of POU5F1 (OCT4), SOX2, KLF4 and MYC genes. Rows are consistent with `seurat.rds` and `cells.tsv`

    multiome.zip

    multiome/snATAC:

    These files are derived from the integration of nuclei from multiome (D1M and D2M), with cells from day 2 of scATAC-seq (labeled D2).

    - `cells.tsv`: This is the list of nuclei barcodes that pass QC from multiome AND also cell barcodes from D2 of scATAC-seq. Includes:
    - `barcode`
    - `umap1`: These are the coordinates used for the figures involving multiome in the paper.
    - `umap2`: ^^^
    - `sample`: D1M and D2M correspond to multiome, D2 corresponds to day 2 of scATAC-seq
    - `cluster`: For multiome barcodes, these are labels transfered from scATAC-seq. For D2 scATAC-seq, it is the original cluster labels.
    - `peaks.bed`: This is the same file as scATAC/peaks.bed. List of peaks of 500bp. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA.
    - `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed`.
    - `features.no.harmony.50d.tsv`: 50 dimensional representation of each cell prior to running Harmony (to correct for batch effect between D2 scATAC and D1M,D2M snMultiome). Rows correspond to cells from `cells.tsv`.
    - `features.harmony.10d.tsv`: 10 dimensional representation of each cell after running Harmony. Rows correspond to cells from `cells.tsv`.

    multiome/snRNA:

    - `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca),associated metadata. Note that barcode suffix (1,2 corresponds to samples D1M, D2M). Please use the UMAP/features from snATAC/ for consistency.
    - `genes.txt`: list of all genes (this is different from the list in scRNA analysis)
    - `cells.tsv`: list of barcodes that pass QC across samples. Contains:
    - `barcode_sample`: barcode with index of sample (1,2 corresponding to D1M, D2M respectively)
    - `sample`: sample name (D1M, D2M)
    - `nCount_RNA`
    - `nFeature_RNA`
    - `percent.oskm`: percent of OSKM genes in cell
    - `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt`

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aaron L. Lun; Karsten Bach; John Marioni (2023). Additional file 3 of Pooling across cells to normalize single-cell RNA sequencing data with many zero counts [Dataset]. http://doi.org/10.6084/m9.figshare.c.3629252_D2.v1
Organization logo

Additional file 3 of Pooling across cells to normalize single-cell RNA sequencing data with many zero counts

Related Article
Explore at:
txtAvailable download formats
Dataset updated
Jun 9, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Aaron L. Lun; Karsten Bach; John Marioni
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Enriched GO terms for library size normalization. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to library size normalization. The fields are the same as described for Additional file 2. (13 KB PDF)

Search
Clear search
Close search
Google apps
Main menu