15 datasets found

Fast model-free integration and transfer learning via MASI for single-cell...
figshare.com
hdf
Updated Jan 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yang Xu (2022). Fast model-free integration and transfer learning via MASI for single-cell expression data [Dataset]. http://doi.org/10.6084/m9.figshare.18866264.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.18866264.v1
Dataset updated
Jan 23, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Yang Xu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Processsed data for MASI manuscript
f
Table 1_Integrative single-cell and cell-free plasma RNA transcriptomics...
frontiersin.figshare.com
xlsx
Updated May 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li Wu; Renxin Zhang; Yichao Wang; Shaoxing Dai; Naixue Yang (2025). Table 1_Integrative single-cell and cell-free plasma RNA transcriptomics identifies biomarkers for early non-invasive AD screening.xlsx [Dataset]. http://doi.org/10.3389/fnagi.2025.1571783.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fnagi.2025.1571783.s001
Dataset updated
May 30, 2025
Dataset provided by
Frontiers
Authors
Li Wu; Renxin Zhang; Yichao Wang; Shaoxing Dai; Naixue Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionData-driven omics approaches have rapidly advanced our understanding of the molecular heterogeneity of Alzheimer’s disease (AD). However, limited by the unavailability of brain tissue, there is an urgent need for a non-invasive tool to detect alterations in the AD brain. Cell-free RNA (cfRNA), which crosses the blood-brain barrier, could reflect AD brain pathology and serve as a diagnostic biomarker.MethodsHere, we integrated plasma-derived cfRNA-seq data from 337 samples (172 AD patients and 165 age-matched controls) with brain-derived single cell RNA-seq (scRNA-seq) data from 88 samples (46 AD patients and 42 controls) to explore the potential of cfRNA profiling for AD diagnosis. A systematic comparative analysis of cfRNA and brain scRNA-seq datasets was conducted to identify dysregulated genes linked to AD pathology. Machine learning models—including support vector machine, random forest, and logistic regression—were trained using cfRNA expression patterns of the identified gene set to predict AD diagnosis and classify disease progression stages. Model performance was rigorously evaluated using area under the receiver operating characteristic curve (AUC), with robustness assessed through cross-validation and independent validation cohorts.ResultsNotably, we identified 34 dysregulated genes with consistent expression changes in both cfRNA and scRNA-seq. Machine learning models based on the cfRNA expression patterns of these 34 genes can accurately predict AD patients (the highest AUC = 89%) and effectively distinguish patients at early stage of AD. Furthermore, classifiers developed based on the expression of 34 genes in brain transcriptome data demonstrated robust predictive performance for assessing the risk of AD in the population (the highest AUC = 94%).DiscussionThis multi-omics approach overcomes limitations of invasive brain biomarkers and noisy blood-based signatures. The 34-gene panel provides non-invasive molecular insights into AD pathogenesis and early screening. While cfRNA stability challenges clinical translation, our framework highlights the potential for precision diagnostics and personalized therapeutic monitoring in AD.
S
scRNA-seq data of lung cancer
scidb.cn
Updated Jul 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weimin Li (2022). scRNA-seq data of lung cancer [Dataset]. http://doi.org/10.57760/sciencedb.02028
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.02028
Dataset updated
Jul 21, 2022
Dataset provided by
Science Data Bank
Authors
Weimin Li
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
we collected 40 tumor and adjacent normal tissue samples from 19 pathologically diagnosed NSCLC patients (10 LUAD and 9 LUSC) during surgical resections, and rapidly digested the tissues to obtain single-cell suspensions and constructed the cDNA libraries of these samples within 24 hours using the protocol of 10X gennomic. These libraries were sequenced on the Illumina NovaSeq 6000 platform. Finally we obtained the raw gene expression matrices were generated using CellRanger (version 3.0.1). Information was processed in R (version 3.6.0) using the Seurat R package (version 2.3.4).
E
scRNAseq of patients with chronic graft-versus-host-disease
ega-archive.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
scRNAseq of patients with chronic graft-versus-host-disease [Dataset]. https://ega-archive.org/datasets/EGAD00001012121
Explore at:
License
https://ega-archive.org/dacs/EGAC00001003458https://ega-archive.org/dacs/EGAC00001003458
Description
This dataset contains 10 samples from 9 patients with chronic graft-versus-host disease (GVHD). Each sample is analysed with Chromium V(D)J and 5' Gene Expression Platform v1.1 (10X Genomics). The raw data includes fastq files for Gene expression and fastq files for V(D)J Expression. The processed data have been deposited in the ArrayExpress database at EMBL-EBI (www.ebi.ac.uk/arrayexpress) under accession number E-MTAB-13419.
S
Data from: Single-cell RNA-seq provides insight into the underdeveloped...
scidb.cn
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yifei SHENG; Xiaodong FANG (2025). Single-cell RNA-seq provides insight into the underdeveloped immune system of germ-free mice [Dataset]. http://doi.org/10.57760/sciencedb.j00139.00203
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.j00139.00203
Dataset updated
Apr 21, 2025
Dataset provided by
Science Data Bank
Authors
Yifei SHENG; Xiaodong FANG
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This RDS file contains processed single-cell RNA sequencing (scRNA-seq) data comparing immune cell populations from germ-free (GF) and specific-pathogen-free (SPF) mice. The dataset includes:Samples: Peripheral blood (PB) and bone marrow (BM) from GF and SPF miceCell Counts:Raw: 21,827 cells (PB) and 19,940 cells (BM)Quality-filtered: 18,344 high-quality cells (PB) and 16,537 high-quality cells (BM)Gene Coverage: Median 1,426 genes per cell (PB) and 1,391 genes per cell (BM)Cell Classifications: 18 major cell identities further divided into 25 subpopulationsAnnotation: Cells identified using established marker genes for blood cells
t
OCID – Object Clutter Indoor Dataset
researchdata.tuwien.at
application/gzip
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jean-Baptiste Nicolas Weibel; Markus Suchi; Jean-Baptiste Nicolas Weibel; Markus Suchi (2025). OCID – Object Clutter Indoor Dataset [Dataset]. http://doi.org/10.48436/pcbjd-4wa12
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.48436/pcbjd-4wa12
Dataset updated
Jul 3, 2025
Dataset provided by
TU Wien
Authors
Jean-Baptiste Nicolas Weibel; Markus Suchi; Jean-Baptiste Nicolas Weibel; Markus Suchi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 20, 2019
Description
OCID – Object Clutter Indoor Dataset
Developing robot perception systems for handling objects in the real-world requires computer vision algorithms to be carefully scrutinized with respect to the expected operating domain. This demands large quantities of ground truth data to rigorously evaluate the performance of algorithms.
The Object Cluttered Indoor Dataset is an RGBD-dataset containing point-wise labeled point-clouds for each object. The data was captured using two ASUS-PRO Xtion cameras that are positioned at different heights. It captures diverse settings of objects, background, context, sensor to scene distance, viewpoint angle and lighting conditions. The main purpose of OCID is to allow systematic comparison of existing object segmentation methods in scenes with increasing amount of clutter. In addition OCID does also provide ground-truth data for other vision tasks like object-classification and recognition.
OCID comprises 96 fully built up cluttered scenes. Each scene is a sequence of labeled pointclouds which are created by building a increasing cluttered scene incrementally and adding one object after the other. The first item in a sequence contains no objects, the second one object, up to the final count of added objects.
Dataset
The dataset uses 89 different objects that are chosen representatives from the Autonomous Robot Indoor Dataset(ARID)[1] classes and YCB Object and Model Set (YCB)[2] dataset objects.
The ARID20 subset contains scenes including up to 20 objects from ARID. The ARID10 and YCB10 subsets include cluttered scenes with up to 10 objects from ARID and the YCB objects respectively. The scenes in each subset are composed of objects from only one set at a time to maintain separation between datasets. Scene variation includes different floor (plastic, wood, carpet) and table textures (wood, orange striped sheet, green patterned sheet). The complete set of data provides 2346 labeled point-clouds.
OCID subsets are structured so that specific real-world factors can be individually assessed.
ARID20-structure
location: floor, table
view: bottom, top
scene: sequence-id
free: clearly separated (objects 1-9 in corresponding sequence)
touching: physically touching (objects 10-16 in corresponding sequence)
stacked: on top of each other (objects 17-20 in corresponding sequence)
ARID10-structure
location: floor, table
view: bottom, top
box: objects with sharp edges (e.g. cereal-boxes)
curved: objects with smooth curved surfaces (e.g. ball)
mixed: objects from both the box and curved
fruits: fruit and vegetables
non-fruits: mixed objects without fruits
scene: sequence-id
YCB10-structure
location: floor, table
view: bottom, top
box: objects with sharp edges (e.g. cereal-boxes)
curved: objects with smooth curved surfaces (e.g. ball)
mixed: objects from both the box and curved
scene: sequence-id
Structure:
You can find all labeled pointclouds of the ARID20 dataset for the first sequence on a table recorded with the lower mounted camera in this directory:
./ARID20/table/bottom/seq01/pcd/
In addition to labeled organized point-cloud files, corresponding depth, RGB and 2d-label-masks are available:
pcd: 640×480 organized XYZRGBL-pointcloud file with ground truth
rgb: 640×480 RGB png-image
depth: 640×480 16-bit png-image with depth in mm
label: 640×480 16-bit png-image with unique integer-label for each object at each pixel
Dataset creation using EasyLabel:
OCID was created using EasyLabel – a semi-automatic annotation tool for RGBD-data. EasyLabel processes recorded sequences of organized point-cloud files and exploits incrementally built up scenes, where in each take one additional object is placed. The recorded point-cloud data is then accumulated and the depth difference between two consecutive recordings are used to label new objects. The code is available here.
OCID data for instance recognition/classification
For ARID10 and ARID20 there is additional data available usable for object recognition and classification tasks. It contains semantically annotated RGB and depth image crops extracted from the OCID dataset.
The structure is as follows:
type: depth, RGB
class name: eg. banana, kleenex, …
class instance: eg. banana_1, banana_2, kleenex_1, kleenex_2,…
The data is provided by Mohammad Reza Loghmani.

Research paper
If you found our dataset useful, please cite the following paper:
@inproceedings{DBLP:conf/icra/SuchiPFV19,
author = {Markus Suchi and
Timothy Patten and
David Fischinger and
Markus Vincze},
title = {EasyLabel: {A} Semi-Automatic Pixel-wise Object Annotation Tool for
Creating Robotic {RGB-D} Datasets},
booktitle = {International Conference on Robotics and Automation, {ICRA} 2019,
Montreal, QC, Canada, May 20-24, 2019},
pages = {6678--6684},
year = {2019},
crossref = {DBLP:conf/icra/2019},
url = {https://doi.org/10.1109/ICRA.2019.8793917},
doi = {10.1109/ICRA.2019.8793917},
timestamp = {Tue, 13 Aug 2019 20:25:20 +0200},
biburl = {https://dblp.org/rec/bib/conf/icra/SuchiPFV19},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

@proceedings{DBLP:conf/icra/2019,
title = {International Conference on Robotics and Automation, {ICRA} 2019,
Montreal, QC, Canada, May 20-24, 2019},
publisher = {{IEEE}},
year = {2019},
url = {http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=8780387},
isbn = {978-1-5386-6027-0},
timestamp = {Tue, 13 Aug 2019 20:23:21 +0200},
biburl = {https://dblp.org/rec/bib/conf/icra/2019},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

Contact & credits
For any questions or issues with the OCID-dataset, feel free to contact the author:
Markus Suchi – email: suchi@acin.tuwien.ac.at
Tim Patten – email: patten@acin.tuwien.ac.at
For specific questions about the OCID-semantic crops data please contact:
Mohammad Reza Loghmani – email: loghmani@acin.tuwien.ac.at
References
[1] Loghmani, Mohammad Reza et al. "Recognizing Objects in-the-Wild: Where do we Stand?" 2018 IEEE International Conference on Robotics and Automation (ICRA) (2018): 2170-2177.
[2] Berk Calli, Arjun Singh, James Bruce, Aaron Walsman, Kurt Konolige, Siddhartha Srinivasa, Pieter Abbeel, Aaron M Dollar, Yale-CMU-Berkeley dataset for robotic manipulation research, The International Journal of Robotics Research, vol. 36, Issue 3, pp. 261 – 268, April 2017.
e
Distinct mechanisms of germ cell factor regulation for an inductive germ...
b2find.eudat.eu
Updated Feb 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Distinct mechanisms of germ cell factor regulation for an inductive germ cell fate - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/1c33ed55-a9ab-5712-a582-512d2d081388
Explore at:
Dataset updated
Feb 8, 2025
Description
Here we employed single cell RNA sequencing to identify the transcriptional program of Nanos and Vasa positive cells and their changes during development. Our single cell sequencing analysis of six developmental stages in P. miniata revealed cell types derived from the three germ layers and expression of the germ cell genes Nanos and Vasa. We used these datasets to parse out 20 cell lineages of the embryo identified by this approach and to focus on the key transitions of germ cell gene expression and test their coexpression with key signaling components. Overall design: Adult Patiria miniata animals were collected by either Peter Halmay (PeterHalmay@gmail.com) or Josh Ross (info@scbiomarine.com) off the Californian coast. Embryos were cultured essentially as described previously (Fresques et al., 2016). Embryos were cultured in filtered (0.2micron) sea water collected at the Marine Biological laboratories in Woods Hole MA, until the appropriate stage for dissociation. All embryos used in the study resulted from mating of one male and one female. Multiple fertilizations were initiated in this study and timed such that the appropriate stages of embryonic development were reached at a common endpoint. The embryos were then collected and washed twice with calcium-free sea water, and then suspended hyalin-extraction media (HEM) for 10-15 minutes, depending on the stage of dissociation. When cells were beginning to dissociate, the embryos were collected and washed in 0.5M NaCl, gently sheared with a pipette, run through a 40micron Nitex mesh, counted on a hemocytometer, and diluted to reach the appropriate concentration for the scRNA-seq protocol. Equal numbers of embryos were used in each time point and at no time were cells or embryos pelleted in a centrifuge (Oulhen et al., 2019).
f
DataSheet_2_Lineage tracing of T cell differentiation from T-iPSC by 2D...
frontiersin.figshare.com
pdf
Updated Dec 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoshitaka Ishiguro; Shoichi Iriguchi; Shinya Asano; Tokuyuki Shinohara; Sara Shiina; Suguru Arima; Yoshiaki Kassai; Yoshiharu Sakai; Kazutaka Obama; Shin Kaneko (2023). DataSheet_2_Lineage tracing of T cell differentiation from T-iPSC by 2D feeder-free culture and 3D organoid culture.pdf [Dataset]. http://doi.org/10.3389/fimmu.2023.1303713.s002
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fimmu.2023.1303713.s002
Dataset updated
Dec 15, 2023
Dataset provided by
Frontiers
Authors
Yoshitaka Ishiguro; Shoichi Iriguchi; Shinya Asano; Tokuyuki Shinohara; Sara Shiina; Suguru Arima; Yoshiaki Kassai; Yoshiharu Sakai; Kazutaka Obama; Shin Kaneko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionT cells induced from induced pluripotent stem cells(iPSCs) derived from antigen-specific T cells (T-iPS-T cells) are an attractive tool for T cell immunotherapy. The induction of cytotoxic T-iPS-T cells is well established in feeder-free condition for the aim of off-the-shelf production, however, the induction of helper T-iPS-T cells remains challenging.MethodsWe analyzed T-iPS-T cells matured in 3D organoid culture at different steps in the culture process at the single-cell level. T-iPS-T cell datasets were merged with an available human thymocyte dataset based in single-cell RNA sequencing (scRNA-seq). Particularly, we searched for genes crucial for generation CD4+ T-iPS-T cells by comparing T-iPS-T cells established in 2D feeder-free or 3D organoid culture.ResultsThe scRNA-seq data indicated that T-iPS-T cells are similar to T cells transitioning to human thymocytes, with SELENOW, GIMAP4, 7, SATB1, SALMF1, IL7R, SYTL2, S100A11, STAT1, IFITM1, LZTFL1 and SOX4 identified as candidate genes for the 2D feeder-free induction of CD4+ T-iPS-T cells.DiscussionThis study provides single cell transcriptome datasets of iPS-T cells and leads to further analysis for CD4+ T cell generation from T-iPSCs.
Raw gene counts
figshare.com
txt
Updated Oct 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geoff Stanley (2020). Raw gene counts [Dataset]. http://doi.org/10.6084/m9.figshare.12089430.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12089430.v1
Dataset updated
Oct 6, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Geoff Stanley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Non-small cell lung cancer (NSCLC) metastatic to the brain leptomeninges (LMD) is rapidly fatal, cannot be biopsied, and the number of cancer cells in the cerebral spinal fluid (CSF) are few; therefore, the tissue samples available for research and the development of effective treatments are severely limited. We overcame these obstacles using LMD patient CSF to perform massive parallel qPCR to analyze the cell-free RNA signatures (n=14), and performed single cell RNA sequencing (scRNAseq; n=197 cells from 4 patients).
f
3k PBMCs from a healthy donor
figshare.com
hdf
Updated Feb 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Songqi Duan (2025). 3k PBMCs from a healthy donor [Dataset]. http://doi.org/10.6084/m9.figshare.28414916.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28414916.v1
Dataset updated
Feb 14, 2025
Dataset provided by
figshare
Authors
Songqi Duan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains single-cell RNA sequencing (scRNA-seq) data of 3,000 peripheral blood mononuclear cells (PBMCs) from a healthy donor, processed using the 10x Genomics Chromium platform. The raw data was obtained from 10x Genomics and subsequently aligned using Cell Ranger 8.0.1 with the GENCODE Release 47 (GRCh38.p14) reference genome.The dataset includes the following output files from the Cell Ranger pipeline:filtered_feature_bc_matrix.h5 – Filtered count matrix in HDF5 formatfiltered_feature_bc_matrix – Filtered gene-barcode matrix in directory formatraw_feature_bc_matrix – Raw gene-barcode matrix in directory formatraw_feature_bc_matrix.h5 – Raw count matrix in HDF5 formatThis dataset is valuable for researchers studying single-cell transcriptomics, immune cell profiling, and bioinformatics pipeline benchmarking.File format: HDF5 and Matrix Market (MTX)Reference Genome: GENCODE Release 47 (GRCh38.p14)Processing Pipeline: Cell Ranger 8.0.1For any questions or collaborations, please feel free to contact the uploader.
f
Data_Sheet_1_Identification and Validation of a Malignant Cell Subset...
frontiersin.figshare.com
xls
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qiyuan Zou; Yufeng Lv; Zuhuan Gan; Shulan Liao; Zhonghui Liang (2023). Data_Sheet_1_Identification and Validation of a Malignant Cell Subset Marker-Based Polygenic Risk Score in Stomach Adenocarcinoma Through Integrated Analysis of Bulk and Single-Cell RNA Sequencing Data.xls [Dataset]. http://doi.org/10.3389/fcell.2021.720649.s001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.3389/fcell.2021.720649.s001
Dataset updated
Jun 6, 2023
Dataset provided by
Frontiers
Authors
Qiyuan Zou; Yufeng Lv; Zuhuan Gan; Shulan Liao; Zhonghui Liang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Objectives: The aim of the present study was to construct a polygenic risk score (PRS) for poor survival among patients with stomach adenocarcinoma (STAD) based on expression of malignant cell markers.Methods: Integrated analyses of bulk and single-cell RNA sequencing (scRNA-seq) of STAD and normal stomach tissues were conducted to identify malignant and non-malignant markers. Analyses of the scRNA-seq profile from early STAD were used to explore intratumoral heterogeneity (ITH) of the malignant cell subpopulations. Dimension reduction, cell clustering, pseudotime, and gene set enrichment analyses were performed. The marker genes of each malignant tissue and cell clusters were screened to create a PRS using Cox regression analyses. Combined with the PRS and routine clinicopathological characteristics, a nomogram tool was generated to predict prognosis of patients with STAD. The prognostic power of the PRS was validated in two independent external datasets.Results: The malignant and non-malignant cells were identified according to 50 malignant and non-malignant cell markers. The malignant cells were divided into nine clusters with different marker genes and biological characteristics. Pseudotime analysis showed the potential differentiation trajectory of these nine malignant cell clusters and identified genes that affect cell differentiation. Ten malignant cell markers were selected to generate a PRS: RGS1, AADAC, NPC2, COL10A1, PRKCSH, RAMP1, PRR15L, TUBA1A, CXCR6, and UPP1. The PRS was associated with both overall and progression-free survival (PFS) and proved to be a prognostic factor independent of routine clinicopathological characteristics. PRS could successfully divide patients with STAD in three datasets into high- or low-risk groups. In addition, we combined PRS and the tumor clinicopathological characteristics into a nomogram tool to help predict the survival of patients with STAD.Conclusion: We revealed limited but significant intratumoral heterogeneity in STAD and proposed a malignant cell subset marker-based PRS through integrated analysis of bulk sequencing and scRNA-seq data.
f
DataSheet_1_Bulk and single-cell RNA-sequencing analyses along with abundant...
frontiersin.figshare.com
pdf
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuyao Liu; Haoxue Zhang; Yan Mao; Yangyang Shi; Xu Wang; Shaomin Shi; Delin Hu; Shengxiu Liu (2023). DataSheet_1_Bulk and single-cell RNA-sequencing analyses along with abundant machine learning methods identify a novel monocyte signature in SKCM.pdf [Dataset]. http://doi.org/10.3389/fimmu.2023.1094042.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fimmu.2023.1094042.s001
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Yuyao Liu; Haoxue Zhang; Yan Mao; Yangyang Shi; Xu Wang; Shaomin Shi; Delin Hu; Shengxiu Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundGlobal patterns of immune cell communications in the immune microenvironment of skin cutaneous melanoma (SKCM) haven’t been well understood. Here we recognized signaling roles of immune cell populations and main contributive signals. We explored how multiple immune cells and signal paths coordinate with each other and established a prognosis signature based on the key specific biomarkers with cellular communication.MethodsThe single-cell RNA sequencing (scRNA-seq) dataset was downloaded from the Gene Expression Omnibus (GEO) database, in which various immune cells were extracted and re-annotated according to cell markers defined in the original study to identify their specific signs. We computed immune-cell communication networks by calculating the linking number or summarizing the communication probability to visualize the cross-talk tendency in different immune cells. Combining abundant analyses of communication networks and identifications of communication modes, all networks were quantitatively characterized and compared. Based on the bulk RNA sequencing data, we trained specific markers of hub communication cells through integration programs of machine learning to develop new immune-related prognostic combinations.ResultsAn eight-gene monocyte-related signature (MRS) has been built, confirmed as an independent risk factor for disease-specific survival (DSS). MRS has great predictive values in progression free survival (PFS) and possesses better accuracy than traditional clinical variables and molecular features. The low-risk group has better immune functions, infiltrated with more lymphocytes and M1 macrophages, with higher expressions of HLA, immune checkpoints, chemokines and costimulatory molecules. The pathway analysis based on seven databases confirms the biological uniqueness of the two risk groups. Additionally, the regulon activity profiles of 18 transcription factors highlight possible differential regulatory patterns between the two risk groups, suggesting epigenetic event-driven transcriptional networks may be an important distinction. MRS has been identified as a powerful tool to benefit SKCM patients. Moreover, the IFITM3 gene has been identified as the key gene, validated to express highly at the protein level via the immunohistochemical assay in SKCM.ConclusionMRS is accurate and specific in evaluating SKCM patients’ clinical outcomes. IFITM3 is a potential biomarker. Moreover, they are promising to improve the prognosis of SKCM patients.
f
Table 1_Identification of novel molecular subtypes and construction of a...
frontiersin.figshare.com
xlsx
Updated Jul 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ke Ma; Jie Xu; Congyue Wang; Xu Cao; Wenjie Yu; Jingjing Xi; Xuan Zhang; Jiamin Zhan; Yang Liu; Aoyang Yu; Shuhan Liu; Yanhua Liu; Chong Chen; Xiaoli Mai (2025). Table 1_Identification of novel molecular subtypes and construction of a prognostic signature via multi-omics analysis and machine learning in lung adenocarcinoma.xlsx [Dataset]. http://doi.org/10.3389/fonc.2025.1590216.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fonc.2025.1590216.s001
Dataset updated
Jul 21, 2025
Dataset provided by
Frontiers
Authors
Ke Ma; Jie Xu; Congyue Wang; Xu Cao; Wenjie Yu; Jingjing Xi; Xuan Zhang; Jiamin Zhan; Yang Liu; Aoyang Yu; Shuhan Liu; Yanhua Liu; Chong Chen; Xiaoli Mai
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionThe development of high-throughput sequencing technologies and targeted therapeutic strategies has significantly improved the prognosis of lung adenocarcinoma (LUAD) patients with sensitive gene mutations. However, patients harboring rare or no actionable mutations were rarely benefit from these targeted therapies. This study aimed to identify novel molecular subtypes and construct a prognostic signature to enhance the stratification of LUAD prognosis.Materials and methodsNovel molecular subtypes of LUAD patients were identified by applying 10 distinct clustering algorithms on multi-omics data. Single-cell RNA-sequencing (scRNA-seq) data were integrated to characterize subtype-specific immune microenvironments. A multi-omics and machine learning-driven prognostic signature (MO-MLPS) was constructed in The Cancer Genome Atlas (TCGA) LUAD dataset using ten machine learning algorithms and subsequently validated across six independent datasets from the Gene Expression Omnibus (GEO) database. The robustness of the model was assessed using the concordance index (C-index), Kaplan-Meier survival analyses, receiver operating characteristic (ROC) curves, and both univariate and multivariate Cox regression analyses. We further confirmed the effects of ANLN knockdown and the expression of a domain-negative anillin protein (dnANLN) via western blotting, cell proliferation assays, flow cytometry, and transwell migration assays in vitro.ResultsOur analysis revealed that the novel molecular subtypes exhibited differences in prognoses, biological functions, and immune infiltration profiles in LUAD. The MO-MLPS was successfully established and validated across TCGA-LUAD cohorts, six independent GEO datasets, and their composite meta-cohort. Higher risk scores from the MO-MLPS correlated with poorer prognosis in LUAD, with AUC values exceeding 0.5 at 1, 3, and 5 years across various cohorts. The signature outperformed 49 previously published prognostic signatures. Furthermore, patients classified as high risk exhibited significantly worse overall and progression-free survival than those classified as low risk. Notably, ANLN knockdown and dnANLN expression significantly inhibited cell proliferation and migration in vitro and enhanced the efficacy of docetaxel.ConclusionA comprehensive analysis of multi-omics data redefines the molecular subtype of LUAD patients. The MO-MLPS derived from subtype characteristics has the potential to serve as a clinically valuable prognostic tool. Furthermore, ANLN emerges as a promising novel therapeutic target in the treatment of LUAD.
f
Evaluation of GraphFP’s performance on quantifying the stochastic dynamics...
plos.figshare.com
xls
Updated Jun 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qi Jiang; Shuo Zhang; Lin Wan (2023). Evaluation of GraphFP’s performance on quantifying the stochastic dynamics of cell-type frequencies with cell-cell interaction term (W ≠ 0) and without cell-cell interaction term (W = 0) on the murine cerebral cortex dataset. [Dataset]. http://doi.org/10.1371/journal.pcbi.1009821.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1009821.t001
Dataset updated
Jun 10, 2023
Dataset provided by
PLOS Computational Biology
Authors
Qi Jiang; Shuo Zhang; Lin Wan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Evaluation of GraphFP’s performance on quantifying the stochastic dynamics of cell-type frequencies with cell-cell interaction term (W ≠ 0) and without cell-cell interaction term (W = 0) on the murine cerebral cortex dataset.
f
Table_4_S100A12 as Biomarker of Disease Severity and Prognosis in Patients...
frontiersin.figshare.com
xlsx
Updated Jun 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yupeng Li; Yaowu He; Shibin Chen; Qi Wang; Yi Yang; Danting Shen; Jing Ma; Zhe Wen; Shangwei Ning; Hong Chen (2023). Table_4_S100A12 as Biomarker of Disease Severity and Prognosis in Patients With Idiopathic Pulmonary Fibrosis.xlsx [Dataset]. http://doi.org/10.3389/fimmu.2022.810338.s018
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fimmu.2022.810338.s018
Dataset updated
Jun 16, 2023
Dataset provided by
Frontiers
Authors
Yupeng Li; Yaowu He; Shibin Chen; Qi Wang; Yi Yang; Danting Shen; Jing Ma; Zhe Wen; Shangwei Ning; Hong Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundIdiopathic pulmonary fibrosis (IPF) is one of interstitial lung diseases (ILDs) with poor prognosis. S100 calcium binding protein A12 (S100A12) has been reported as a prognostic serum biomarker in the IPF, but its correlation with IPF remains unclear in the lung tissue and bronchoalveolar lavage fluids (BALF).MethodsDatasets were collected from the Gene Expression Omnibus (GEO) database. Person correlation coefficient, Kaplan–Meier analysis, Cox regression analysis, functional enrichment analysis and so on were used. And single cell RNA-sequencing (scRNA-seq) analysis was also used to explore the role of S100A12 and related genes in the IPF.ResultsS100A12 was mainly and highly expressed in the monocytes, and its expression was downregulated in the lung of patients with IPF according to scRNA-seq and the transcriptome analysis. However, S100A12 expression was upregulated both in blood and BALF of patients with IPF. In addition, 10 genes were found to interact with S100A12 according to protein–protein interaction (PPI) network, and the first four transcription factors (TF) targeted these genes were found according to hTFtarget database. Two most significant co-expression genes of S100A12 were S100A8 and S100A9. The 3 genes were significantly negatively associated with lung function and positively associated with the St. George’s Respiratory Questionnaire (SGRQ) scores in the lung of patients with IPF. And, high expression of the 3 genes was associated with higher mortality in the BALF, and shorter transplant-free survival (TFS) and progression-free survival (PFS) time in the blood. Prognostic predictive value of S100A12 was more superior to S100A8 and S100A9 in patients with IPF, and the composited variable [S100A12 + GAP index (gender, age, and physiological index)] may be a more effective predictive index.ConclusionThese results imply that S100A12 might be an efficient disease severity and prognostic biomarker in patients with IPF.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yang Xu (2022). Fast model-free integration and transfer learning via MASI for single-cell expression data [Dataset]. http://doi.org/10.6084/m9.figshare.18866264.v1

Fast model-free integration and transfer learning via MASI for single-cell expression data

Explore at:

hdfAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.18866264.v1

Dataset updated

Jan 23, 2022

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

Yang Xu

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Processsed data for MASI manuscript

Clear search

Close search

Google apps

Main menu

Fast model-free integration and transfer learning via MASI for single-cell...

Table 1_Integrative single-cell and cell-free plasma RNA transcriptomics...

scRNA-seq data of lung cancer

scRNAseq of patients with chronic graft-versus-host-disease

Data from: Single-cell RNA-seq provides insight into the underdeveloped...

OCID – Object Clutter Indoor Dataset

OCID – Object Clutter Indoor Dataset

Dataset

ARID20-structure

ARID10-structure

YCB10-structure

Structure:

Dataset creation using EasyLabel:

Research paper

Contact & credits

References

Distinct mechanisms of germ cell factor regulation for an inductive germ...

DataSheet_2_Lineage tracing of T cell differentiation from T-iPSC by 2D...

Raw gene counts

3k PBMCs from a healthy donor

Data_Sheet_1_Identification and Validation of a Malignant Cell Subset...

DataSheet_1_Bulk and single-cell RNA-sequencing analyses along with abundant...

Table 1_Identification of novel molecular subtypes and construction of a...

Evaluation of GraphFP’s performance on quantifying the stochastic dynamics...

Table_4_S100A12 as Biomarker of Disease Severity and Prognosis in Patients...

Fast model-free integration and transfer learning via MASI for single-cell expression data