Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
• This dataset provides a comprehensive single-cell RNA sequencing (scRNA-seq) analysis of 3,000 Peripheral Blood Mononuclear Cells (PBMC 3k) using the Scanpy framework. • It includes a fully processed and annotated Jupyter Notebook workflow designed for beginners, intermediate users, and advanced researchers working in single-cell bioinformatics. • The dataset demonstrates key preprocessing steps including quality control, filtering, normalization, log transformation, and detection of highly variable genes. • It covers dimensionality reduction techniques such as PCA, neighborhood graph construction, UMAP, and t-SNE embeddings for intuitive visualization of cell populations. • The workflow includes clustering analysis using Leiden algorithms to identify distinct immune cell types present in PBMC samples. • Detailed marker-gene identification and differential gene expression analysis are performed to classify major immune cell subsets. • The notebook integrates multiple visualization tools including Scanpy plots, violin plots, dot plots, rank-gene visualizations, and interactive embeddings. • It provides step-by-step code explanations to help users understand each stage of scRNA-seq data processing using Scanpy. • The dataset is suitable for researchers studying immunology, transcriptomics, and single-cell data exploration. • This dataset enables reproducible analysis and serves as a reference template for future single-cell workflows using Scanpy. • It is ideal for teaching, training, and hands-on learning in scRNA-seq analysis. • The included notebook demonstrates best practices for analyzing publicly available PBMC 3k data from the 10x Genomics platform. • Users can explore interactive visualizations to better interpret cellular heterogeneity and lineage relationships within PBMCs. • This resource aims to simplify single-cell analysis and make Scanpy workflows more accessible to the bioinformatics community.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains code and processed scRNA-seq data of lamprey, which constructed a comprehensive cell atlas comprising 604,460 cells/nuclei and 70 cell types from 14 tissues.
lamprey_atlas.raw.h5ad:
Python data set (.h5ad) containing raw counts matrix from all tissues and libraries.
lamprey_atlas.scanpy_merge.h5ad:
Python data set (.h5ad) containing scanpy processed matrix, used in projection of cells from all tissues into shared UMAP space. Only highly-variable genes calculated by scanpy are included.
immune.h5ad:
Python data set (.h5ad) containing scanpy processed matrix, used in re-clustering of immune cells. Only highly-variable genes calculated by scanpy are included.
pancreas.evo.rds:
R data set (.rds) containing integrated data of intestine, liver, pancreas from human and mouse, as well as intestine and liver from lamprey.
lamprey-single-cell-atlas-1.0.0.zip:
Code used in processing of scRNA-seq data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Embeddings of single-cell RNA-Seq data from three adult vertebrate brain datasets into Orthogroup feature space or Structural cluster feature space. Orthogroups were generated using OrthoFinder v5.5.0; Structural clusters were assigned by using FoldSeek to cluster AlphaFold-v4 structural predictions.
The three datasets used as the basis for these embeddings were:
sample "Brain8" from the Jiang et al. 2021 zebrafish cell atlas (files beginning with GSM3768152)
sample "Brain1" from the Han et al. 2018 mouse cell atlas (files beginning with GSM2906405)
sample "Xenopus_brain_COL65" from the Liao et al. 2022 Xenopus laevis adult cell atlas (files beginning with GSM6214268)
For each dataset, we also generated a standardized cell type annotation file based on the author's originally provided cell type annotation data. The first column is the cell barcode for that species and the second column is the original study's cell type annotation for that cell.
For the Xenopus brain data, we removed around ~18k cells that were not annotated in the original data to simplify data analyses - these are reflected in the files with the "subsampled" suffix. Subsampled versions of the data are also available for the joint embedding space (prefixed with "DrerMmusXlae").
For the final datasets used in our analyses, we also provide features x cell matrices as .h5ad files for smaller file sizes and faster loading using Scanpy.
For visualizing our UMAP plots of our top200 embedding space, we provide ".tsv" files with a variety of metrics and the x and y positions of each cell in the UMAP. See "DrerMmusXlae_adultbrain_FoldSeek_plotlydata.tsv" and "DrerMmusXlae_adultbrain_OrthoFinder_plotlydata.tsv"
These data are part of the Arcadia Science Pub titled "Comparing gene expression across species based on protein structure instead of sequence".
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Preprint version of the Single-Cell Tumor Immune Atlas
This upload contains:
For the h5ad files, the .X slot contains the normalized data, while the .X.raw slot contains the raw counts as they were in the original datasets.
All the files contain the following patient/sample metadata variables:
If you have any issues with the metadata you can use the TICAtlas_metadata.csv file.
For more information, read our preprint and check our GitHub.
h5ad files can be read with Python using Scanpy, rds files can be read in R using Seurat. For format conversion between AnnData and Seurat we recommend SeuratDisk. For other single-cell data formats you can use sceasy.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets and Code accompanying the new release of RCA, RCA2. The R-package for RCA2 is available at GitHub: https://github.com/prabhakarlab/RCAv2/
The datasets included here are:
Datasets required for a characterization of batch effects:
merged_rna_seurat.rds
de_list.rds
mergedRCAObj.rds
merged_rna_integrated.rds
10X_PBMCs.RDS: Processed 10X PBMC data RCA2 object (10X PBMC example data sets )
NBM_RDS_Files.zip: Several RDS files containing RCA2 object of Normal Bone Marrow (NBM) data, umap coordinates, doublet finder results and metadata information (Normal Bone Marrow use case)
Dataset used for the Covid19 example:
blish_covid.seu.rds
rownames_of_glocal_projection_immune_cells.txt
Blish_RCA_no_QC_filtering_project_to_multiple_panels.rds
Data sets used to outline the ability of supervised clustering to detect disease states:
809653.seurat.rds
blish_covid.seu.rds
Performance benchmarking results:
Memory_consumption.txt
rca_time_list.rds
ScanPY input files:
input_data.zip
The R script provides R code to regenerate the main paper Figures 2 to 7 modulo some visual modifications performed in Inkscape.
Provided R scripts are:
ComputePairWiseDE_v2.R (Required code for pairwise DE computation)
RCA_Figure_Reproduction.R
Provided python Code for Scanpy analysis:
RA_Scanpy.ipynb
CITESeq_Scanpy.ipynb
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The helper-like ILC contains various functional subsets, such as ILC1, ILC2, ILC3 and LTi cells, mediating the immune responses against viruses, parasites, and extracellular bacteria, respectively. Among them, LTi cells are also crucial for the formation of peripheral lymphoid tissues, such as lymph nodes. Our research, along with others’, indicates a high proportion of LTi cells in the fetal ILC pool, which significantly decreases after birth. Conversely, the proportion of non-LTi ILCs increases postnatally, corresponding to the need for LTi cells to mediate lymphoid tissue formation during fetal stages and other ILC subsets to combat diverse pathogen infections postnatally. However, the regulatory mechanism for this transition remains unclear. In this study, we observed a preference for fetal ILC progenitors to differentiate into LTi cells, while postnatal bone marrow ILC progenitors preferentially differentiate into non-LTi ILCs. Particularly, this differentiation shift occurs within the first week after birth in mice. Further analysis revealed that adult ILC progenitors exhibit stronger activation of the Notch signaling pathway compared to fetal counterparts, accompanied by elevated Gata3 expression and decreased Rorc expression, leading to a transition from fetal LTi cell-dominant states to adult non-LTi ILC-dominant states. This study suggests that the body can regulate ILC development by modulating the activation level of the Notch signaling pathway, thereby acquiring different ILC subsets to accommodate the varying demands within the body at different developmental stages.Data usageimport scanpy as sc# read the data using scanpyadata= sc.read_h5ad('./220516-ABM.velo.h5ad')# draw umap for visualization. `ann0608` is the cell type label.sc.pl.umap(adata,color='ann0608')# get gene expression matrixadata.X
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
• This dataset provides a comprehensive single-cell RNA sequencing (scRNA-seq) analysis of 3,000 Peripheral Blood Mononuclear Cells (PBMC 3k) using the Scanpy framework. • It includes a fully processed and annotated Jupyter Notebook workflow designed for beginners, intermediate users, and advanced researchers working in single-cell bioinformatics. • The dataset demonstrates key preprocessing steps including quality control, filtering, normalization, log transformation, and detection of highly variable genes. • It covers dimensionality reduction techniques such as PCA, neighborhood graph construction, UMAP, and t-SNE embeddings for intuitive visualization of cell populations. • The workflow includes clustering analysis using Leiden algorithms to identify distinct immune cell types present in PBMC samples. • Detailed marker-gene identification and differential gene expression analysis are performed to classify major immune cell subsets. • The notebook integrates multiple visualization tools including Scanpy plots, violin plots, dot plots, rank-gene visualizations, and interactive embeddings. • It provides step-by-step code explanations to help users understand each stage of scRNA-seq data processing using Scanpy. • The dataset is suitable for researchers studying immunology, transcriptomics, and single-cell data exploration. • This dataset enables reproducible analysis and serves as a reference template for future single-cell workflows using Scanpy. • It is ideal for teaching, training, and hands-on learning in scRNA-seq analysis. • The included notebook demonstrates best practices for analyzing publicly available PBMC 3k data from the 10x Genomics platform. • Users can explore interactive visualizations to better interpret cellular heterogeneity and lineage relationships within PBMCs. • This resource aims to simplify single-cell analysis and make Scanpy workflows more accessible to the bioinformatics community.