5 datasets found
  1. scverse tutorial data: Getting started with AnnData

    • figshare.com
    hdf
    Updated Apr 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Lause (2023). scverse tutorial data: Getting started with AnnData [Dataset]. http://doi.org/10.6084/m9.figshare.22577536.v2
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Apr 7, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Jan Lause
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data is derived from the 3k PBMC data used in scanpy & Seurat tutorials. In comes in the AnnData h5ad format.

    Processed 3k PBMCs from a Healthy Donor from 10x Genomics, available at https://scanpy.readthedocs.io/en/stable/generated/scanpy.datasets.pbmc3k_processed.html Original 10X data available at http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz from this website: https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/pbmc3k

    The changes made to the original scanpy.datasets.pbmc3k_processed() data are described in this github issue: https://github.com/scverse/scverse-tutorials/issues/51

    See jupyter notebook for details.

  2. Processed Seurat objects for GeneTrajectory inference (Gene Trajectory...

    • figshare.com
    application/gzip
    Updated Feb 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rihao Qu; Peggy Myung (2024). Processed Seurat objects for GeneTrajectory inference (Gene Trajectory Inference for Single-cell Data by Optimal Transport Metrics) [Dataset]. http://doi.org/10.6084/m9.figshare.25243225.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 19, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Rihao Qu; Peggy Myung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are processed Seurat objects for the two biological datasets in GeneTrajectory inference (https://github.com/KlugerLab/GeneTrajectory/):Human myeloid dataset analysisMyeloid cells were extracted from a publicly available 10x scRNA-seq dataset (https:// support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc 10k v3). QC was performed using the same workflow in (https://github.com/satijalab/ Integration2019/blob/master/preprocessing scripts/pbmc 10k v3.R). After standard normalization, highly-variable gene selection and scaling using the Seurat R package, we applied PCA and retained the top 30 principal components. Four sub-clusters of myeloid cells were identified based on Louvian clustering with a resolution of 0.3. Wilcoxon rank-sum test was employed to find cluster-specific gene markers for cell type annotation.For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel, each bandwidth is determined by the distance to its k-nearest neighbor, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 5 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 0.5% − 75% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (11,21,8) to extract three gene trajectories. Mouse embryo skin data analysisWe separated out dermal cell populations from the newly collected mouse embryo skin samples. Cells from the wildtype and the Wls mutant were pooled for analyses. After standard normalization, highly-variable gene selection and scaling using Seurat, we applied PCA and retained the top 30 principal components. Three dermal celltypes were stratified based on the expression of canonical dermal markers, including Sox2, Dkk1, and Dkk2. For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel bandwidth, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 10 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 1% − 50% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (9,16,5) to sequentially extract three gene trajectories. To compare the differences between the wiltype and the Wls mutant, we stratified Wnt-active UD cells into seven stages according to their expression profiles of the genes binned along the DC gene trajectory.

  3. f

    Pre-processed CD4 T cell (sc) train and test sets (5-fold CV)

    • figshare.com
    zip
    Updated Aug 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Orsolya Lapohos (2024). Pre-processed CD4 T cell (sc) train and test sets (5-fold CV) [Dataset]. http://doi.org/10.6084/m9.figshare.26426404.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 7, 2024
    Dataset provided by
    figshare
    Authors
    Orsolya Lapohos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pooled gene expression probabilities and ATAC-seq tracks for human CD4 T cells, for each autosomal protein-coding gene. These can be used directly to train the accessibility-augmented sequence to expression model described at https://github.com/lapohosorsolya/accessible_seq2exp. In this dataset, ATAC-seq tracks were obtained from a human multiome PBMC dataset by 10x Genomics, and gene expression data were obtained from a human PBMC dataset with antibody-derived tags by 10x Genomics. For a detailed description of data processing, please refer to the corresponding manuscript.

  4. o

    Symphony pre-built single-cell reference atlases

    • explore.openaire.eu
    • zenodo.org
    Updated Jul 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joyce Kang (2021). Symphony pre-built single-cell reference atlases [Dataset]. http://doi.org/10.5281/zenodo.4602301
    Explore at:
    Dataset updated
    Jul 9, 2021
    Authors
    Joyce Kang
    Description

    Pre-built Symphony reference objects that can be downloaded and used to map new query datasets. The Symphony algorithm is used to perform reference mapping to these atlases. Preprint: https://www.biorxiv.org/content/10.1101/2020.11.18.389189v2 Usage: https://github.com/immunogenomics/symphony References available for download: 10x PBMCs Atlas (pbmcs_10x_reference.rds) Pancreatic Islet Cells Atlas (pancreas_plate-based_reference.rds) Fetal Liver Hematopoiesis Atlas (fetal_liver_reference_3p.rds) Healthy Fetal Kidney Atlas (kidney_healthy_fetal_reference.rds) T cell CITE-seq atlas (tbru_ref.rds) Cross-tissue Fibroblast Atlas (see here) Cross-tissue Inflammatory Immune Atlas (here) Tabula Muris Senis (FACS) Atlas (TMS_facs_reference.rds) To read in a reference into R, one may simply execute: reference = readRDS('path/to/reference_name.rds') Note: To be able to map query datasets into the reference UMAP coordinates, you must also download the corresponding 'uwot_model' file and set the reference$save_uwot_path. {"references": ["https://www.biorxiv.org/content/10.1101/2020.11.18.389189v2"]}

  5. Processed AnnData objects for GeneTrajectory inference (Gene Trajectory...

    • figshare.com
    hdf
    Updated Apr 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rihao Qu; Francesco Strino (2024). Processed AnnData objects for GeneTrajectory inference (Gene Trajectory Inference for Single-cell Data by Optimal Transport Metrics) [Dataset]. http://doi.org/10.6084/m9.figshare.25539547.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Apr 4, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Rihao Qu; Francesco Strino
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are processed AnnData objects (converted from Seurat objects) for GeneTrajectory tutorials (https://github.com/KlugerLab/GeneTrajectory-python/):Human myeloid dataset analysisMyeloid cells were extracted from a publicly available 10x scRNA-seq dataset (https:// support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc 10k v3). QC was performed using the same workflow in (https://github.com/satijalab/ Integration2019/blob/master/preprocessing scripts/pbmc 10k v3.R). After standard normalization, highly-variable gene selection and scaling using the Seurat R package, we applied PCA and retained the top 30 principal components. Four sub-clusters of myeloid cells were identified based on Louvian clustering with a resolution of 0.3. Wilcoxon rank-sum test was employed to find cluster-specific gene markers for cell type annotation.For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel, each bandwidth is determined by the distance to its k-nearest neighbor, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 5 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 0.5% − 75% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (11,21,8) to extract three gene trajectories.Mouse embryo skin data analysisWe separated out dermal cell populations from the newly collected mouse embryo skin samples. Cells from the wildtype and the Wls mutant were pooled for analyses. After standard normalization, highly-variable gene selection and scaling using Seurat, we applied PCA and retained the top 30 principal components. Three dermal celltypes were stratified based on the expression of canonical dermal markers, including Sox2, Dkk1, and Dkk2. For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel bandwidth, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 10 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 1% − 50% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (9,16,5) to sequentially extract three gene trajectories. To compare the differences between the wiltype and the Wls mutant, we stratified Wnt-active UD cells into seven stages according to their expression profiles of the genes binned along the DC gene trajectory.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jan Lause (2023). scverse tutorial data: Getting started with AnnData [Dataset]. http://doi.org/10.6084/m9.figshare.22577536.v2
Organization logo

scverse tutorial data: Getting started with AnnData

Explore at:
hdfAvailable download formats
Dataset updated
Apr 7, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jan Lause
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The data is derived from the 3k PBMC data used in scanpy & Seurat tutorials. In comes in the AnnData h5ad format.

Processed 3k PBMCs from a Healthy Donor from 10x Genomics, available at https://scanpy.readthedocs.io/en/stable/generated/scanpy.datasets.pbmc3k_processed.html Original 10X data available at http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz from this website: https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/pbmc3k

The changes made to the original scanpy.datasets.pbmc3k_processed() data are described in this github issue: https://github.com/scverse/scverse-tutorials/issues/51

See jupyter notebook for details.

Search
Clear search
Close search
Google apps
Main menu