11 datasets found

Privacy levels and information exchanged by Federated Harmony.
plos.figshare.com
xls
Updated Oct 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruizhi Yuan; Ziqi Rong; Haoran Hu; Tianhao Liu; Shiyue Tao; Wei Chen; Lu Tang (2025). Privacy levels and information exchanged by Federated Harmony. [Dataset]. http://doi.org/10.1371/journal.pcbi.1013526.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1013526.t001
Dataset updated
Oct 10, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Ruizhi Yuan; Ziqi Rong; Haoran Hu; Tianhao Liu; Shiyue Tao; Wei Chen; Lu Tang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Privacy levels and information exchanged by Federated Harmony.
n
Data from: Large-scale integration of single-cell transcriptomic data...
data-staging.niaid.nih.gov
data.niaid.nih.gov
+2more
zip
Updated Dec 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.t4b8gtj34
Dataset updated
Dec 14, 2021
Dataset provided by
Cornell University
Authors
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using
n
Data from: From Chaos to Harmony: Addressing Data De-Noising, Complexity and...
curate.nd.edu
pdf
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qianlong Wen (2025). From Chaos to Harmony: Addressing Data De-Noising, Complexity and Adaptability in Graph Machine Learning [Dataset]. http://doi.org/10.7274/28786127.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.7274/28786127.v1
Dataset updated
Apr 28, 2025
Dataset provided by
University of Notre Dame
Authors
Qianlong Wen
License
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Description
Graph representation learning—especially via graph neural networks (GNNs)—has demonstrated considerable promise in modeling intricate interaction systems, such as social networks and molecular structures. However, the deployment of GNN-based frameworks in industrial settings remains challenging due to the inherent complexity and noise in real-world graph data. This dissertation systematically addresses these challenges by advancing novel methodologies to improve the comprehensiveness and robustness of graph representation learning, with a dual focus on resolving data complexity and denoising across diverse graph-learning scenarios. In addressing graph data denoising, we design auxiliary self-supervised optimization objectives that disentangle noisy topological structures and misinformation while preserving the representational sufficiency of critical graph features. These tasks operate synergistically with primary learning objectives to enhance robustness against data corruption. The efficacy of these techniques is demonstrated through their application to real-world opioid prescription time series data for predicting potential opioid over-prescription. To mitigate data complexity, the study investigates two complementary approaches: (1) multimodal fusion, which employs attentive integration of graph data with features from other modalities, and (2) hierarchical substructure mining, which extracts semantic patterns at multiple granularities to enhance model generalization in demanding contexts. Finally, the dissertation explores the adaptability of graph data in a range of practical applications, including E-commerce demand forecasting and recommendations, to further enhance prediction and reasoning capabilities.
EPI-Clone supplementary dataset: Single cell RNA-seq of clonally barcoded...
figshare.com
application/gzip
Updated Nov 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lars Velten; Michael Scherer; Alejo Rodriguez-Fraticelli; Indranil Singh (2024). EPI-Clone supplementary dataset: Single cell RNA-seq of clonally barcoded hematopoietic progenitors [Dataset]. http://doi.org/10.6084/m9.figshare.24260743.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24260743.v1
Dataset updated
Nov 26, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Lars Velten; Michael Scherer; Alejo Rodriguez-Fraticelli; Indranil Singh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset supporting the EPI-Clone manuscript: scRNA-seq profiling of hematopoietic stem and progenitor cells (HSPCs) was performed with the 3' 10x Genomics profiling. Three experiments are included: Two where HSCs were clonally labeled with the LARRY system, transplanted to recipient mouse and profiled 4-5 months later (post-transplant hematopoiesis), and one where HSPCs were profiled straight from an unperturbed mouse.Dataset is a seurat (v4) object with the following assays, reductions and metadata:ASSAYS:AB: Antibody expression dataRNA: RNA expression profilesintegrated: Integration of DNA methylation data performed across experimental batches with two batch correction methods: CCA (https://satijalab.org/seurat/reference/runcca) and harmony (https://portals.broadinstitute.org/harmony/articles/quickstart.html).DIMENSIONALITY REDUCTIONpca_cca: PCA performed on the integrated data (CCA integration)umap_cca: UMAP computed on the integrated data (CCA integration)umap_harmony: UMAP computed on the integrated data (Harmony integration)METADATAExperiment: The experiment that the cell is from, values are "LARRY main experiment", "LARRY replicate" and "Native hematopoiesis"ProcessingBatch: Experiments were processed in several batches.CellType: Cell type annotationLARRY: Error corrected LARRY barcodepercent.mt: percentage of mitochondrial DNAnCount_RNA: Read count for the RNA modalitynFeature_RNA: Number of RNAs with at least one readnCount_AB: Read count for the surface protein modalitynFeature_AB: Number of ABs with at least one read
R
Mozart Piano Sonatas with Form, Harmony, and Texture Annotations
entrepot.recherche.data.gouv.fr
zip
Updated Dec 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louis Couturier; Louis Couturier; Louis Bigo; Louis Bigo; Johannes Hentschel; Johannes Hentschel; Florence Levé; Florence Levé; Markus Neuwirth; Markus Neuwirth; Martin Rohrmeier; Martin Rohrmeier (2024). Mozart Piano Sonatas with Form, Harmony, and Texture Annotations [Dataset]. http://doi.org/10.57745/OHRWPC
Explore at:
zip(1630948)Available download formats
Unique identifier
https://doi.org/10.57745/OHRWPC
Dataset updated
Dec 18, 2024
Dataset provided by
Recherche Data Gouv
Authors
Louis Couturier; Louis Couturier; Louis Bigo; Louis Bigo; Johannes Hentschel; Johannes Hentschel; Florence Levé; Florence Levé; Markus Neuwirth; Markus Neuwirth; Martin Rohrmeier; Martin Rohrmeier
License
https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.57745/OHRWPChttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.57745/OHRWPC
Description
The 18 Mozart piano sonatas with some form, harmony, and texture annotations This dataset is an archive of the “Mozart Piano Sonatas” corpus (scores, measure maps, analyses, recordings, synchronizations, metadata). It provides both raw data and data for integration with the Dezrann music web platform: https://www.dezrann.net/explore/mozart-piano-sonatas. Wolfgang Amadeus Mozart (1756–1791) was a composer of 18th century Classical style period, recognized as one of the three principal figures of the First Viennese School, alongside Joseph Haydn and Ludwig van Beethoven. He expressed his versatile music ideas in a large palette of genres. His piano sonatas, published over a 15-year period, were composed for various purposes, including educational material and private commissions from aristocrats. The classical sonata (typically for a solo keyboard instrument) is composed of usually three movements, of which the first generally follows sonata form. Mozart’s sonatas are well known to have a remarkable structural and textural composition. The corpus consists of complete scores of all 18 sonatas with form, harmony, and cadence annotations (Hentschel et al., 2021). Sonatas 1 (K279), 2 (K280) and 5 (K283) also have texture annotations (Couturier et al., 2022). Some movements also have synchronized audio. The corpus uses measure maps (Gotham et al., 2023) to improve annotation interoperability. License: CC-BY-NC-SA-4.0 (scores), ODbL (annotations), CC0-1.0, CC-BY-NC-SA-3.0 (specific recordings) Maintainers: Louis Couturier louis.couturier@algomus.fr, Mathieu Giraud mathieu@algomus.fr References (Hentschel et al., 2021), (Couturier et al., 2022) https://dx.doi.org/10.57745/OHRWPC https://www.algomus.fr/data
I
Integration Platform as a Service Market Report
promarketreports.com
doc, pdf, ppt
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pro Market Reports (2025). Integration Platform as a Service Market Report [Dataset]. https://www.promarketreports.com/reports/integration-platform-as-a-service-market-8805
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Jun 21, 2025
Dataset authored and provided by
Pro Market Reports
License
https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The size of the Integration Platform as a Service Market was valued at USD 12.98 billion in 2023 and is projected to reach USD 76.62 billion by 2032, with an expected CAGR of 28.87% during the forecast period. Recent developments include: May 2021: Jitterbit acquired eBridge Connections, a IPaaS providet that offers data to seamlessly flow between on-premises or cloud e-commerce, EDI, ERP, and CRM systems. A strong complement to Jitterbit’s Harmony API integration platform, the combined offerings will provide one of the most comprehensive sets of integration solution around e-commerce integration and EDI integration which helps in customers increase their digital capabilities and helps in massive time efficiencies., August 2021: SnapLogic and Schneider Electric have introduced a new citizen developer approach to application and data integration. Using SnapLogic's self-service, low-code platform as the foundation for Schneider Electric's new operating model, the multinational utility will enable nearly 150 citizen developers to integrate over 100 cloud and on-premises systems across the enterprise, Increased employee productivity faster Innovation and larger business impact..
DataSheet_1_Molecular mechanisms regulating natural menopause in the female...
frontiersin.figshare.com
xlsx
Updated Jul 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quan Liu; Fangqin Wei; Jiannan Wang; Haiyan Liu; Hua Zhang; Min Liu; Kaili Liu; Zheng Ye (2023). DataSheet_1_Molecular mechanisms regulating natural menopause in the female ovary: a study based on transcriptomic data.xlsx [Dataset]. http://doi.org/10.3389/fendo.2023.1004245.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fendo.2023.1004245.s001
Dataset updated
Jul 24, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Quan Liu; Fangqin Wei; Jiannan Wang; Haiyan Liu; Hua Zhang; Min Liu; Kaili Liu; Zheng Ye
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionNatural menopause is an inevitable biological process with significant implications for women's health. However, the molecular mechanisms underlying menopause are not well understood. This study aimed to investigate the molecular and cellular changes occurring in the ovary before and after perimenopause.MethodsSingle-cell sequencing data from the GTEx V8 cohort (30-39: 14 individuals; 40-49: 37 individuals; 50-59: 61 individuals) and transcriptome sequencing data from ovarian tissue were analyzed. Seurat was used for single-cell sequencing data analysis, while harmony was employed for data integration. Cell differentiation trajectories were inferred using CytoTrace. CIBERSORTX assessed cell infiltration scores in ovarian tissue. WGCNA evaluated co-expression network characteristics in pre- and post-perimenopausal ovarian tissue. Functional enrichment analysis of co-expression modules was conducted using ClusterprofileR and Metascape. DESeq2 performed differential expression analysis. Master regulator analysis and signaling pathway activity analysis were carried out using MsViper and Progeny, respectively. Machine learning models were constructed using Orange3.ResultsWe identified the differentiation trajectory of follicular cells in the ovary as ARID5B+ Granulosa -> JUN+ Granulosa -> KRT18+ Granulosa -> MT-CO2+ Granulosa -> GSTA1+ Granulosa -> HMGB1+ Granulosa. Genes driving Granulosa differentiation, including RBP1, TMSB10, SERPINE2, and TMSB4X, were enriched in ATP-dependent activity regulation pathways. Genes involved in maintaining the Granulosa state, such as DCN, ARID5B, EIF1, and HSP90AB1, were enriched in the response to unfolded protein and chaperone-mediated protein complex assembly pathways. Increased contents of terminally differentiated HMGB1+ Granulosa and GSTA1+ Granulosa were observed in the ovaries of individuals aged 50-69. Signaling pathway activity analysis indicated a gradual decrease in TGFb and MAPK pathway activity with menopause progression, while p53 pathway activity increased. Master regulator analysis revealed significant activation of transcription factors FOXR1, OTX2, MYBL2, HNF1A, and FOXN4 in the 30-39 age group, and GLI1, SMAD1, SMAD7, APP, and EGR1 in the 40-49 age group. Additionally, a diagnostic model based on 16 transcription factors (Logistic Regression L2) achieved reliable performance in determining ovarian status before and after perimenopause.ConclusionThis study provides insights into the molecular and cellular mechanisms underlying natural menopause in the ovary. The findings contribute to our understanding of perimenopausal changes and offer a foundation for health management strategies for women during this transition.
Visium Spatial and snRNA data of Brain section from Parkinson Mouse Model...
zenodo.org
bin, csv, zip
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jaehyun Lee; Jaehyun Lee (2025). Visium Spatial and snRNA data of Brain section from Parkinson Mouse Model based on inducible expression of human a-syn constructs: 20-months + snRNA 23 months dataset [Dataset]. http://doi.org/10.5281/zenodo.14988055
Explore at:
csv, bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14988055
Dataset updated
Jun 5, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jaehyun Lee; Jaehyun Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Using 23-months old mice of a inducible expression of human a-syn constructs based Parkinson mouse model, we produced a single nucleus RNA dataset by cutting 0mm Bregma to -5mm Bregma. The Chromium 3’ Single Cell Library Kit (10x Genomics) was used and Sequencing was performed on a NovaSeq 6000. From the same model we also used 20-months old mice with the Visium Spatial V1 platform (10x Genomics). Sequencing was performed on a NovaSeq 6000. Both were PE150.

snRNA pipeline: For the alignment of reads, a custom reference was created by adding the sequences of the V1S/SV2 transgene and the Camk2a promoter to the mm10 mouse reference genome. Count matrices generated by cellranger count 7.1 were loaded into an AnnData object and processed using the Python-based framework Scanpy 1.10.2. Integration with R, where needed, was facilitated through the rpy2 package. Raw count matrices were corrected for ambient RNA contamination using the SoupX 1.6.2. To remove potential doublets, scDblFinder 1.18.0 was employed with a fixed seed (123). Nuclei with nUMI and nGenes values exceeding three median absolute deviations (MADs) from the median were excluded. Genes detected in fewer than five nuclei across the dataset were excluded. The resulting dataset was normalized via scanpy.pp.normalize_total and scanpy.pp.log1p. Highly variable genes were identified using the function scanpy.pp.highly_variable_genes with the Seurat v3 flavor, selecting the top 4,000 genes. Dimensionality reduction was performed using principal component analysis (PCA) and batch effects were corrected using the python-implemented version of Harmony via the function scanpy.external.pp.harmony_integrate. Harmony embeddings were then used to construct a k-nearest neighbor (kNN) graph with scanpy.pp.neighbors. Clustering was performed using Leiden clustering with standard parameters via the function scanpy.tl.leiden. Clusters were annotated using literature, the mousebrain.org, and markers identified via the FindConservedMarkers function in Seurat. First, neurons and non-neuronal cells were distinguished using mainly canonical markers, such as but not limited to Rbfox3 (neurons), Mbp (oligodendrocytes), Acsbg1 (astrocytes), Pdgfra (oligodendrocyte precursor cells), Inpp5d (microglia), Colec12 (vascular cells), and Ttr (choroid plexus cells). Neurons were further classified into Vglut1 (Slc17a7), Vglut2 (Slc17a6), GABA (Gad2), cholinergic (Scube1), and dopaminergic (Th) neurons. Vglut1 and GABA neurons were further annotated into subtypes based on subclustering and FindConservedMarkers markers.

visium spatial pipeline: Sequences were fiducially aligned to spots using Loupe Browser ver. 8. All aligned sequences were mapped using spaceranger count 3.0.1 with a custom refence, which included sequences for the promotor and transgene (Camk2aTTA, V1S/SV2) to the mouse genome mm39. We filtered each sample of the Visium Spatial dataset based on the MAD filtering of number of reads (nUMI), number of genes (nGene), and percentage of mitochondrial genes (percent.mt). A spot was filtered out if it was outside of 3x MAD value in at least two metrics. Filtered samples were merged into one Seurat 5.1.0 object and we obtained normalized counts by the SCTransform function of Seurat. Integration was performed using Harmony 1.2.0 on 50 PCA embeddings and clustering was done using Leiden clustering based on 30 harmony embeddings. Integrated clusters were visualized using the UMAP method. Samples that were not successfully integrated (based on similarity measures of the harmony embeddings) and showed high percentage.mt or low nUMI levels compared to other samples, were removed from subsequent analysis. A final integration and clustering were performed after filtering. Regions were first annotated based on a 0.1 resolution clustering to get high level region annotation (Cortex, Hippocampus, Subcortex). Each high-level region was further annotated based on either more granular resolutions or subclustering. Marker genes from mousebrain.org and literature were used in combination with the Allen mouse brain atlas to obtain anatomically relevant annotations.
Pan-Cancer T cell atlas from "The combined use of scRNA-seq and network...
zenodo.org
bin
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adèle Mangelinck; Adèle Mangelinck (2025). Pan-Cancer T cell atlas from "The combined use of scRNA-seq and network propagation highlights key features of pan-cancer Tumor-Infiltrating T cells" (https://doi.org/10.1371/journal.pone.0315980) [Dataset]. http://doi.org/10.5281/zenodo.13879752
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13879752
Dataset updated
Jan 2, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Adèle Mangelinck; Adèle Mangelinck
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The scRNA-seq data were collected from previously published datasets (GSE140228, GSE139555, GSE155698, GSE121636, and GSE139324), adhering to the following selection criteria: 1) presence of T cells, 2) treatment-naïve patients, 3) solid tumors, and 4) inclusion of at least tumor and blood samples.
Each scRNA-seq dataset underwent separate preprocessing in R (v4.0.2). We filtered out cells from the original count matrices that had fewer than 200 genes detected or more than 10% mitochondrial UMI counts and we only kept genes detected in at least 3 cells. Then, we applied Seurat (v4.0.5) with default parameters for count data normalization and scaling. Each cell was assigned a cell cycle score using the CellCycleScoring function and we computed the difference between the G2M and S phase scores. This approach allows for the separation of non-cycling from cycling cells while minimizing the differences in cell cycle phase among proliferating cells. The SelectIntegrationFeatures function was ran with the nfeatures parameter set to 3,000 before merging all samples from each dataset. These integration features were then used for Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). Clustering was performed using the Louvain algorithm with the resolution parameter set to 2.0 for all datasets. Finally, T cells were isolated based on CD3D and CD3G genes expression (CD3D or CD3G expression level > 0).

To integrate heterogeneous data from different sources, a two-step procedure was applied. We first concatenated all datasets together and ran the scaling and PCA steps based on the top 3,000 highly variable genes identified by the FindVariableFeatures function with the “vst” method. Harmony was applied for batch effect correction then UMAP and clustering using the Louvain algorithm with the resolution parameter set to 2.0 were performed on the harmony reduction. Examining the result from the first clustering run, we identified contamination clusters and clusters that arose from unwanted factors: we removed the contamination clusters including low quality cells highly expressing marker genes associated with apoptosis and tissue dissociation operation, pancreatic acinar cells (expressing PRSS1, CLPS, PNLIP and CTRB1 among others), myeloid cells (expressing CD68) and B cells (expressing CD79A). Then, we performed the second run of integration and clustering excluding immunoglobulin, ribosome-protein-coding, and T cell receptor (TCR) genes (gene symbol with string pattern "^IGK|^IGH|^IGL|^IGJ|^IGS|^IGD|IGFN1", "^RP([0–9]+-|[LS])", and "^TRA|^TRB|^TRG" respectively) from the top 3,000 highly variable genes and regressing out the cell cycle difference effect as well as the percentage of mitochondrial UMI counts. Harmony (v0.1.0) was applied again for batch effect correction and UMAP was performed on the harmony reduction.
T cell subtypes identification and annotation was performed by clustering cells using the Louvain algorithm with the resolution parameter set to 4.1 after iterative testing from 3.5 to 5.0 by 0.1 (more granular than default), computing clusters signatures based on differential gene expression using the FindAllMarkers function with the “MAST” method and interrogating known gene markers expression. A resolution value of 4.1 was notably found to be the lowest resolution value enabling the correct separation of proliferating CD4+ T cells from proliferating CD8+ T cells.
Tubuloid kidney organoid - single cell RNA-seq
figshare.com
tar
Updated May 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Javier Perales Patón; Rafael Kramann (2022). Tubuloid kidney organoid - single cell RNA-seq [Dataset]. http://doi.org/10.6084/m9.figshare.11786238.v1
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11786238.v1
Dataset updated
May 16, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Javier Perales Patón; Rafael Kramann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
It is included data derived from the processing of single-cell and single-nuclei RNA-seq from several samples (see below). This data corresponds to the input and intermediate output files from https://github.com/saezlab/Xu_tubuloid . Data The data include:

Binary sparse matrices for the UMI gene expression quantification from cellranger (filtered feature-barcode matrices). These are TAR archive files named with the name of the sample. Seurat Objects with normalized data, embeddings of dimensionality reduction, clustering and cell cluster annotation. These are TAR archive files including final objects, grouped by sample type: SeuratObjects_[SortedCells | Organoids | Human Kidney Tissue]. The HumanKidneyTissue also includes the SeuratObject after Harmony integration. Exported barcode idents from unsupervised clustering and manual annotation ("barcodeIdents*.csv" files). Label transfer via Symphony mapping to tubuloid cells from each organoid to a integrated reference atlas of human kidney tissue (SymphonyMapped*.csv).

Samples The data corresponds to the following samples, which were profiled at the single-cell resolution:

CK5 early organoid (Healthy). Organoid generated from CD24+ sorted cells from human adult kidney tissue at an early stage. CK119 late organoid (Healthy). Organoid generated from CD24+ sorted cells from human adult kidney tissue at a late stage.

JX1 late organoid (Healthy). Organoid generated following Hans Clever's protocol for kidney organoids. JX2 PKD1-KO organoid (PKD). Organoid generated from CD24+ sorted cells from human adult kidney tissue, for which PKD1 was gene-edited to reproduce PKD phenotype, developed at a late stage. JX3 PKD2-KO organoid (PKD). Organoid generated from CD24+ sorted cells from human adult kidney tissue, for which PKD2 was gene-edited to reproduce PKD phenotype, developed at a late stage. CK120 CD13. CD13+ sorted cells from human adult kidney tissue. CK121 CD24. CD24+ sorted cells from human adult kidney tissue.

In addition, human adult kidney tissue were profiled in the context of ADPKD:

CK224 : human specimen with ADPKD (PKD2- genotype).

CK225 : human specimen with ADPKD (PKD1- genotype). ADPKD3: human specimen with ADPKD (ND genotype).

Control1 : human specimen with healthy tissue. Control2 : human specimen with healthy tissue.
Table 2_Single-cell/spatial integration reveals an MES2-like glioblastoma...
frontiersin.figshare.com
xlsx
Updated Oct 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chonghui Zhang; Lu Tan; Kaijian Zheng; Yifan Xu; Junshan Wan; Jinpeng Wu; Chao Wang; Pin Guo; Yugong Feng (2025). Table 2_Single-cell/spatial integration reveals an MES2-like glioblastoma program orchestrated by immune communication and regulatory networks.xlsx [Dataset]. http://doi.org/10.3389/fimmu.2025.1699134.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fimmu.2025.1699134.s002
Dataset updated
Oct 29, 2025
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Chonghui Zhang; Lu Tan; Kaijian Zheng; Yifan Xu; Junshan Wan; Jinpeng Wu; Chao Wang; Pin Guo; Yugong Feng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundGlioblastoma (GBM) exhibits marked plasticity and intense microenvironmental crosstalk. We aimed to delineate mesenchymal programs with spatial resolution, clinical relevance, and mechanistic anchors.MethodsWe integrated single-cell RNA-seq, bulk transcriptomes, and Visium spatial data. After rigorous QC and Harmony integration, we annotated 12 cell states using canonical markers, decoupler-based ORA, and AUCell. Tumor boundaries were defined by inferCNV/CopyKAT; developmental potential by CytoTRACE2 and PHATE. Post-translational modification (PTM) axes were scored from curated gene sets. A cell type-aware GNN linked bulk expression to a patient-similarity graph for survival modeling and gene-level hazard attribution. Network convergence combined bulk WGCNA (TCGA/CGGA), single-cell hdWGCNA, BayesPrism deconvolution, and external GEO validation. Ligand–receptor (LR) signaling was inferred with LIANA+, embedded in a signed causal network, and mapped spatially. ARRDC3 expression was assessed in GBM tissues; U251 gain- and loss-of-function assays evaluated proliferation and migration.ResultsWe resolved major GBM states, including two mesenchymal programs (MES1-like, MES2-like). CNV-high regions marked malignant cores, and CytoTRACE2 identified high-potency niches within MES2-like and Proliferation states along non-linear trajectories. PTM landscapes segregated by state; S-nitrosylation, glycosylation, and lactylation were enriched in mesenchymal programs. A GNN risk score stratified overall survival in TCGA (n=157) and generalized to CGGA-325 (n=85) and CGGA-693 (n=140). MES2-like abundance remained an independent adverse predictor (HR = 2.31; 95% CI, 1.04–5.10). MES2-high tumors upregulated EMT, TNFα/NF-κB, JAK/STAT, hypoxia, angiogenesis, and glycolysis; S-nitrosylation associated with increased hazard. Cross-modal convergence defined a conservative MES2 core enriched for ECM remodeling, collagen modification, focal adhesion, and TGF-β regulation. LR analysis prioritized a TAM-to-MES2 axis (e.g., GRN–TNFRSF1A, ADAM9/10/17–ITGB1, TGFB1–ITGB1/EGFR) converging on a CEBPD-centered module. Spatial mapping localized MES2 hotspots within CNV-defined territories and revealed a TNFRSF1A–CEBPD–ARRDC3 focus at an infiltrative rim. ARRDC3 was upregulated in GBM tissues; in U251 cells, knockdown promoted and overexpression suppressed proliferation and migration, indicating context-dependent roles.ConclusionsMES2-like GBM is an ECM-driven, stress-adapted state with strong prognostic impact. We nominate CEBPD and TNFRSF1A/ITGB1 as actionable nodes and identify ARRDC3 as a spatially restricted effector with context-dependent tumor-modulatory functions warranting therapeutic exploration.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ruizhi Yuan; Ziqi Rong; Haoran Hu; Tianhao Liu; Shiyue Tao; Wei Chen; Lu Tang (2025). Privacy levels and information exchanged by Federated Harmony. [Dataset]. http://doi.org/10.1371/journal.pcbi.1013526.t001

Privacy levels and information exchanged by Federated Harmony.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pcbi.1013526.t001

Dataset updated

Oct 10, 2025

Dataset provided by

PLOShttp://plos.org/

Authors

Ruizhi Yuan; Ziqi Rong; Haoran Hu; Tianhao Liu; Shiyue Tao; Wei Chen; Lu Tang

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Privacy levels and information exchanged by Federated Harmony.

Clear search

Close search

Google apps

Main menu

Privacy levels and information exchanged by Federated Harmony.

Data from: Large-scale integration of single-cell transcriptomic data...

Data from: From Chaos to Harmony: Addressing Data De-Noising, Complexity and...

EPI-Clone supplementary dataset: Single cell RNA-seq of clonally barcoded...

Mozart Piano Sonatas with Form, Harmony, and Texture Annotations

Integration Platform as a Service Market Report

DataSheet_1_Molecular mechanisms regulating natural menopause in the female...

Visium Spatial and snRNA data of Brain section from Parkinson Mouse Model...

Pan-Cancer T cell atlas from "The combined use of scRNA-seq and network...

Tubuloid kidney organoid - single cell RNA-seq

Table 2_Single-cell/spatial integration reveals an MES2-like glioblastoma...

Privacy levels and information exchanged by Federated Harmony.