5 datasets found

scverse tutorial data: Getting started with AnnData
figshare.com
hdf
Updated Apr 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Lause (2023). scverse tutorial data: Getting started with AnnData [Dataset]. http://doi.org/10.6084/m9.figshare.22577536.v2
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22577536.v2
Dataset updated
Apr 7, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jan Lause
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data is derived from the 3k PBMC data used in scanpy & Seurat tutorials. In comes in the AnnData h5ad format.

Processed 3k PBMCs from a Healthy Donor from 10x Genomics, available at https://scanpy.readthedocs.io/en/stable/generated/scanpy.datasets.pbmc3k_processed.html Original 10X data available at http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz from this website: https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/pbmc3k

The changes made to the original scanpy.datasets.pbmc3k_processed() data are described in this github issue: https://github.com/scverse/scverse-tutorials/issues/51

See jupyter notebook for details.
Processed Seurat objects for GeneTrajectory inference (Gene Trajectory...
figshare.com
application/gzip
Updated Feb 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rihao Qu; Peggy Myung (2024). Processed Seurat objects for GeneTrajectory inference (Gene Trajectory Inference for Single-cell Data by Optimal Transport Metrics) [Dataset]. http://doi.org/10.6084/m9.figshare.25243225.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25243225.v1
Dataset updated
Feb 19, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Rihao Qu; Peggy Myung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are processed Seurat objects for the two biological datasets in GeneTrajectory inference (https://github.com/KlugerLab/GeneTrajectory/):Human myeloid dataset analysisMyeloid cells were extracted from a publicly available 10x scRNA-seq dataset (https:// support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc 10k v3). QC was performed using the same workflow in (https://github.com/satijalab/ Integration2019/blob/master/preprocessing scripts/pbmc 10k v3.R). After standard normalization, highly-variable gene selection and scaling using the Seurat R package, we applied PCA and retained the top 30 principal components. Four sub-clusters of myeloid cells were identified based on Louvian clustering with a resolution of 0.3. Wilcoxon rank-sum test was employed to find cluster-specific gene markers for cell type annotation.For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel, each bandwidth is determined by the distance to its k-nearest neighbor, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 5 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 0.5% − 75% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (11,21,8) to extract three gene trajectories. Mouse embryo skin data analysisWe separated out dermal cell populations from the newly collected mouse embryo skin samples. Cells from the wildtype and the Wls mutant were pooled for analyses. After standard normalization, highly-variable gene selection and scaling using Seurat, we applied PCA and retained the top 30 principal components. Three dermal celltypes were stratified based on the expression of canonical dermal markers, including Sox2, Dkk1, and Dkk2. For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel bandwidth, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 10 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 1% − 50% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (9,16,5) to sequentially extract three gene trajectories. To compare the differences between the wiltype and the Wls mutant, we stratified Wnt-active UD cells into seven stages according to their expression profiles of the genes binned along the DC gene trajectory.
f
Pre-processed CD4 T cell (sc) train and test sets (5-fold CV)
figshare.com
zip
Updated Aug 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Orsolya Lapohos (2024). Pre-processed CD4 T cell (sc) train and test sets (5-fold CV) [Dataset]. http://doi.org/10.6084/m9.figshare.26426404.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26426404.v1
Dataset updated
Aug 7, 2024
Dataset provided by
figshare
Authors
Orsolya Lapohos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pooled gene expression probabilities and ATAC-seq tracks for human CD4 T cells, for each autosomal protein-coding gene. These can be used directly to train the accessibility-augmented sequence to expression model described at https://github.com/lapohosorsolya/accessible_seq2exp. In this dataset, ATAC-seq tracks were obtained from a human multiome PBMC dataset by 10x Genomics, and gene expression data were obtained from a human PBMC dataset with antibody-derived tags by 10x Genomics. For a detailed description of data processing, please refer to the corresponding manuscript.
o
Symphony pre-built single-cell reference atlases
explore.openaire.eu
zenodo.org
Updated Jul 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joyce Kang (2021). Symphony pre-built single-cell reference atlases [Dataset]. http://doi.org/10.5281/zenodo.4602301
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4602301
Dataset updated
Jul 9, 2021
Authors
Joyce Kang
Description
Pre-built Symphony reference objects that can be downloaded and used to map new query datasets. The Symphony algorithm is used to perform reference mapping to these atlases. Preprint: https://www.biorxiv.org/content/10.1101/2020.11.18.389189v2 Usage: https://github.com/immunogenomics/symphony References available for download: 10x PBMCs Atlas (pbmcs_10x_reference.rds) Pancreatic Islet Cells Atlas (pancreas_plate-based_reference.rds) Fetal Liver Hematopoiesis Atlas (fetal_liver_reference_3p.rds) Healthy Fetal Kidney Atlas (kidney_healthy_fetal_reference.rds) T cell CITE-seq atlas (tbru_ref.rds) Cross-tissue Fibroblast Atlas (see here) Cross-tissue Inflammatory Immune Atlas (here) Tabula Muris Senis (FACS) Atlas (TMS_facs_reference.rds) To read in a reference into R, one may simply execute: reference = readRDS('path/to/reference_name.rds') Note: To be able to map query datasets into the reference UMAP coordinates, you must also download the corresponding 'uwot_model' file and set the reference$save_uwot_path. {"references": ["https://www.biorxiv.org/content/10.1101/2020.11.18.389189v2"]}
Processed AnnData objects for GeneTrajectory inference (Gene Trajectory...
figshare.com
hdf
Updated Apr 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rihao Qu; Francesco Strino (2024). Processed AnnData objects for GeneTrajectory inference (Gene Trajectory Inference for Single-cell Data by Optimal Transport Metrics) [Dataset]. http://doi.org/10.6084/m9.figshare.25539547.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25539547.v1
Dataset updated
Apr 4, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Rihao Qu; Francesco Strino
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are processed AnnData objects (converted from Seurat objects) for GeneTrajectory tutorials (https://github.com/KlugerLab/GeneTrajectory-python/):Human myeloid dataset analysisMyeloid cells were extracted from a publicly available 10x scRNA-seq dataset (https:// support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc 10k v3). QC was performed using the same workflow in (https://github.com/satijalab/ Integration2019/blob/master/preprocessing scripts/pbmc 10k v3.R). After standard normalization, highly-variable gene selection and scaling using the Seurat R package, we applied PCA and retained the top 30 principal components. Four sub-clusters of myeloid cells were identified based on Louvian clustering with a resolution of 0.3. Wilcoxon rank-sum test was employed to find cluster-specific gene markers for cell type annotation.For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel, each bandwidth is determined by the distance to its k-nearest neighbor, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 5 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 0.5% − 75% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (11,21,8) to extract three gene trajectories.Mouse embryo skin data analysisWe separated out dermal cell populations from the newly collected mouse embryo skin samples. Cells from the wildtype and the Wls mutant were pooled for analyses. After standard normalization, highly-variable gene selection and scaling using Seurat, we applied PCA and retained the top 30 principal components. Three dermal celltypes were stratified based on the expression of canonical dermal markers, including Sox2, Dkk1, and Dkk2. For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel bandwidth, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 10 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 1% − 50% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (9,16,5) to sequentially extract three gene trajectories. To compare the differences between the wiltype and the Wls mutant, we stratified Wnt-active UD cells into seven stages according to their expression profiles of the genes binned along the DC gene trajectory.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jan Lause (2023). scverse tutorial data: Getting started with AnnData [Dataset]. http://doi.org/10.6084/m9.figshare.22577536.v2

scverse tutorial data: Getting started with AnnData

Explore at:

hdfAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.22577536.v2

Dataset updated

Apr 7, 2023

Dataset provided by

Figsharehttp://figshare.com/

Authors

Jan Lause

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The data is derived from the 3k PBMC data used in scanpy & Seurat tutorials. In comes in the AnnData h5ad format.

Processed 3k PBMCs from a Healthy Donor from 10x Genomics, available at https://scanpy.readthedocs.io/en/stable/generated/scanpy.datasets.pbmc3k_processed.html Original 10X data available at http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz from this website: https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/pbmc3k

The changes made to the original scanpy.datasets.pbmc3k_processed() data are described in this github issue: https://github.com/scverse/scverse-tutorials/issues/51

See jupyter notebook for details.

Clear search

Close search

Google apps

Main menu

scverse tutorial data: Getting started with AnnData

Processed Seurat objects for GeneTrajectory inference (Gene Trajectory...

Pre-processed CD4 T cell (sc) train and test sets (5-fold CV)

Symphony pre-built single-cell reference atlases

Processed AnnData objects for GeneTrajectory inference (Gene Trajectory...

scverse tutorial data: Getting started with AnnData