Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The below data are associated with our paper entitled "Linking regulatory variants to target genes by integrating single-cell multiome methods and genomic distance."
1) SNP-gene link predictions generated by pgBoost and existing methods SCENT (Sakaue et al. 2024 Nat Genet), Signac (Stuart et al. 2021 Nat Methods), ArchR (Granja et al. 2021 Nat Genet), and Cicero (Pliner et al. 2018 Mol Cell).
pgBoost_scores.tsv.gz contains linking predictions made by pgBoost.
constituent_method_scores.tsv.gz contains linking predictions made by constituent methods.
**NOTE: promoters (+/- 1kb from TSS) and candidate links >500kb are excluded from linking predictions (see manuscript)**
Linking scores and percentiles are reported for each method (pgBoost score, SCENT FDR, Signac correlation, ArchR correlation, Cicero co-accessibility). Rank percentiles are computed as: 1 - (rank / n). When multiple links receive the same score, they are assigned the percentile of the top rank. Links unscored by each method (denoted by zeros* in the linking score column) are assigned a percentile equivalent to the percent of links unscored by the focal method. See the Methods section of the paper for further details on computing linking scores and summarizing scores across cell types and data sets.
*Candidate links tested and assigned a co-accessibility of zero by the Cicero method are given a score of 1e-100 in the "Cicero" column to distinguish between unscored candidate links and candidate links assigned a partial correlation of zero (see Pliner et al. 2018 Mol Cell).
NOTE: The predictions associated with this release (version 2) were generated using an expanded set of data sets, an expanded training set, and corrected TSS coordinates.
2) GWAS-derived evaluation SNP-gene link evaluation set.
gwas_evaluation.tsv: GWAS-derived evaluation SNP-gene link evaluation set. Column 1 provides SNP coordinates in the format
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Transcribed enhancer maps can reveal nuclear interactions underpinning each cell type and connect specific cell types to diseases. Using a 5′ single-cell RNA sequencing approach, we defined transcription start sites of enhancer RNAs and other classes of coding and non-coding RNAs in human CD4+ T cells, revealing cellular heterogeneity and differentiation trajectories. Integration of these datasets with single-cell chromatin profiles showed that active enhancers with bidirectional RNA transcription are highly cell type–specific, and disease heritability is strongly enriched in these enhancers. The resulting cell type–resolved multimodal atlas of bidirectionally transcribed enhancers, which we linked with promoters using fine-scale chromatin contact maps, enabled us to systematically interpret genetic variants associated with a range of immune-mediated diseases. Methods All experiments using human samples were approved by the ethical review committee of RIKEN [approval no. H30-9(13)]. Written informed consent was obtained from all donors. CD4+ T cells were isolated by the immunomagnetic negative selection method. Stained CD4+ T cells were sorted using a FACSAria IIu Cell Sorter (BD Biosciences). Human CD4+ T cells and FACS-sorted heterogenous populations were processed with a Chromium Next GEM Single Cell 5′ kit (10x Genomics). Libraries were sequenced on an Illumina NovaSeq 6000 sequencing platform using 2 × 150 bp paired-end sequencing. Multiome assay (10x Genomics) was performed according to the manufacturer’s instructions. Multiome libraries were pooled and sequenced as above with 10 cycles for i7 index and 24 cycles for i5 index. Micro-C libraries were generated using a Dovetail Micro-C Kit (Cantata Bio, Cat#21006) and were sequenced on an Illumina NovaSeq 6000 platform using 2 × 150 bp paired-end sequencing. Chromium scRNA-seq, snRNA-seq, and CITE-seq data were processed using Cell Ranger Software version 5.0.1 (10x Genomics) and R package Seurat version 5 (4.9.9.9067). Multiome data were processed by Cell Ranger ARC version 2.0.0 (10x Genomics), Seurat version 5 (4.9.9.9067), and Signac version 1.10.0. scRNA-seq, snRNA-seq, and Multiome 3′ snRNA-seq data were integrated using canonical correlation analysis. snATAC peaks were identified from fragment files of each cluster using MACS2 version 2.2.6 with default settings as implemented in Signac version 1.10.0. For ReapTEC, paired-end reads were mapped again using STAR (STARsolo) to obtain reads with unencoded G, which was tagged as a soft-clipped G by STARsolo. Reads were deduplicated, and those with the barcodes of each cell type were extracted. A count file was generated for each transcription start site (TSS) using the “bamToBed” function in BEDTools version 2.30.0. TSS peaks were generated by merging TSSs located within 10 bp of each other. To identify btcEnhs, TSS peak pairs were detected using scripts provided at https://github.com/anderssonrobin/enhancers/blob/master/scripts/bidir_enhancers with minor modifications. Micro-C data were processed with the dovetail_tools pipeline (Cantata Bio). Chromatin loop contacts were identified by the HiCCUPS algorithm using the Juicer Tools package version 2.20.0 and the scale-space representation algorithm using the Mustache package. Loops were called at a 1-kb resolution with SCALE-normalized contact matrices for HiCCUPS and with ICE-normalized contact matrices for Mustache, and were filtered for an FDR < 0.05.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Preprint: https://doi.org/10.1101/2022.03.21.485045
Abstract:
Salamanders are important tetrapod models to study brain organization and regeneration, however the identity and evolutionary conservation of brain cell types is largely unknown. Here, we delineate cell populations in the axolotl telencephalon during homeostasis and regeneration, representing the first single-cell genomic and spatial profiling of an anamniote tetrapod brain. We identify glutamatergic neurons with similarities to amniote neurons of hippocampus, dorsal and lateral cortex, and conserved GABAergic neuron classes. We infer transcriptional dynamics and gene regulatory relationships of postembryonic, region-specific direct and indirect neurogenesis, and unravel conserved signatures. Following brain injury, ependymoglia activate an injury-specific state before reestablishing lost neuron populations and axonal connections. Together, our analyses yield key insights into the organization, evolution, and regeneration of a tetrapod nervous system.
File description:
all_nuclei_clustered_highlevel_anno.rds - Seurat object including all snRNA-seq data from uninjured pallium, both from microdissections and whole pallium multiome.
pallium_metadata_simp.csv - csv file containing a simplified version of the metadata for the uninjured pallium
Edu_1_2_4_6_8_12_fil_highvarfeat.rds - Seurat object containing all Div-seq data for the pallium injury time course
divseq_predicted_metadata.csv - csv file containing a simplified version of the metadata for the pallium injury time course
ep_wpi_srat.rds - Seurat object containing an integrated version of ependymoglia cells from uninjured and injured pallium (see Fig 6 in the preprint).
D1_113_sub_b.rds - Seurat object containing a Visium data for the axolotl pallium
multiome_integATAC_SCT.rds - Signac object containing the data used for multiome analysis of the uninjured whole pallium
predictions_cell2loc.csv - csv file containing cell2location scores for the uninjured pallium cell types in the Visium dataset
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets to go along with the publication listed:
full_object.rds: Brain Organoid Phospho-Seq dataset with ATAC, Protein and imputed RNA data
rna_object.rds: Reference whole cell scRNA-Seq object on Brain organoids
multiome_object.rds: Bridge dataset containing RNA and ATAC modalities for Brain organoids
metacell_allnorm.rds: Metacell object for finding gene-peak-protein linkages in Brain organoid dataset
fullobject_fragments.tsv.gz: fragment file to go with the full object
fullobject_fragments.tsv.gz.tbi:index file for the full object fragment file
multiome_fragments.tsv.gz: fragment file to go with the multiome object
multiome_fragments.tsv.gz.tbi:index file for the multiome object fragment file
K562_Stem.rds : object corresponding to the pilot experiment including K562 cells and iPS cells
K562_stem_fragments.tsv.gz: fragment file to go with the K562_stem object
K562_stem_fragments.tsv.gz.tbi: index file for the K562_stem object fragment file
retina.rds : object corresponding to the retinal organoid phospho-seq experiment
retina_fragments.tsv.gz: fragment file to go with the retina object
retina_fragments.tsv.gz.tbi: index file for the retina object fragment file
retina_multi.rds : object corresponding to the retinal organoid phospho-seq-multiome experiment
retina_multi_fragments.tsv.gz: fragment file to go with the retina_multi object
retina_multi_fragments.tsv.gz.tbi: index file for the retina_multi object fragment file
To use the K562, multiome, retina and retina_multiome datasets provided, please use these lines of code to import the object into Signac/Seurat and change the fragment file path to the corresponding downloaded fragment file:
obj <- readRDS("obj.rds") # remove fragment file information Fragments(obj) <- NULL # Update the path of the fragment file Fragments(obj) <- CreateFragmentObject(path = "download/obj_fragments.tsv.gz", cells = Cells(obj))
To use the K562 and multiome datasets provided, please use these lines of code to import the object into Signac/Seurat and change the fragment file path to the corresponding downloaded fragment file:
obj <- readRDS("obj.rds") # remove fragment file information Fragments(obj) <- NULL # Update the path of the fragment file Fragments(obj) <- CreateFragmentObject(path = "download/obj_fragments.tsv.gz", cells = Cells(obj))
To use the "fullobject" dataset provided, please use these lines of code to import the object into Signac/Seurat and change the fragment file path to the corresponding downloaded fragment file:
#load the stringr package library(stringr) #load the object obj <- readRDS("obj.rds") # remove fragment file information Fragments(obj) <- NULL #Remove unwanted residual information and rename cells obj@reductions$norm.adt.pca <- NULL obj@reductions$norm.pca <- NULL obj <- RenameCells(obj, new.names = str_remove(Cells(obj), "atac_")) # Update the path of the fragment file Fragments(obj) <- CreateFragmentObject(path = "download/obj_fragments.tsv.gz", cells = Cells(obj))
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The below data are associated with our paper entitled "Linking regulatory variants to target genes by integrating single-cell multiome methods and genomic distance."
1) SNP-gene link predictions generated by pgBoost and existing methods SCENT (Sakaue et al. 2024 Nat Genet), Signac (Stuart et al. 2021 Nat Methods), ArchR (Granja et al. 2021 Nat Genet), and Cicero (Pliner et al. 2018 Mol Cell).
pgBoost_scores.tsv.gz contains linking predictions made by pgBoost.
constituent_method_scores.tsv.gz contains linking predictions made by constituent methods.
**NOTE: promoters (+/- 1kb from TSS) and candidate links >500kb are excluded from linking predictions (see manuscript)**
Linking scores and percentiles are reported for each method (pgBoost score, SCENT FDR, Signac correlation, ArchR correlation, Cicero co-accessibility). Rank percentiles are computed as: 1 - (rank / n). When multiple links receive the same score, they are assigned the percentile of the top rank. Links unscored by each method (denoted by zeros* in the linking score column) are assigned a percentile equivalent to the percent of links unscored by the focal method. See the Methods section of the paper for further details on computing linking scores and summarizing scores across cell types and data sets.
*Candidate links tested and assigned a co-accessibility of zero by the Cicero method are given a score of 1e-100 in the "Cicero" column to distinguish between unscored candidate links and candidate links assigned a partial correlation of zero (see Pliner et al. 2018 Mol Cell).
NOTE: The predictions associated with this release (version 2) were generated using an expanded set of data sets, an expanded training set, and corrected TSS coordinates.
2) GWAS-derived evaluation SNP-gene link evaluation set.
gwas_evaluation.tsv: GWAS-derived evaluation SNP-gene link evaluation set. Column 1 provides SNP coordinates in the format