15 datasets found
  1. Fast model-free integration and transfer learning via MASI for single-cell...

    • figshare.com
    hdf
    Updated Jan 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yang Xu (2022). Fast model-free integration and transfer learning via MASI for single-cell expression data [Dataset]. http://doi.org/10.6084/m9.figshare.18866264.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Jan 23, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Yang Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Processsed data for MASI manuscript

  2. f

    Table 1_Integrative single-cell and cell-free plasma RNA transcriptomics...

    • frontiersin.figshare.com
    xlsx
    Updated May 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li Wu; Renxin Zhang; Yichao Wang; Shaoxing Dai; Naixue Yang (2025). Table 1_Integrative single-cell and cell-free plasma RNA transcriptomics identifies biomarkers for early non-invasive AD screening.xlsx [Dataset]. http://doi.org/10.3389/fnagi.2025.1571783.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2025
    Dataset provided by
    Frontiers
    Authors
    Li Wu; Renxin Zhang; Yichao Wang; Shaoxing Dai; Naixue Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionData-driven omics approaches have rapidly advanced our understanding of the molecular heterogeneity of Alzheimer’s disease (AD). However, limited by the unavailability of brain tissue, there is an urgent need for a non-invasive tool to detect alterations in the AD brain. Cell-free RNA (cfRNA), which crosses the blood-brain barrier, could reflect AD brain pathology and serve as a diagnostic biomarker.MethodsHere, we integrated plasma-derived cfRNA-seq data from 337 samples (172 AD patients and 165 age-matched controls) with brain-derived single cell RNA-seq (scRNA-seq) data from 88 samples (46 AD patients and 42 controls) to explore the potential of cfRNA profiling for AD diagnosis. A systematic comparative analysis of cfRNA and brain scRNA-seq datasets was conducted to identify dysregulated genes linked to AD pathology. Machine learning models—including support vector machine, random forest, and logistic regression—were trained using cfRNA expression patterns of the identified gene set to predict AD diagnosis and classify disease progression stages. Model performance was rigorously evaluated using area under the receiver operating characteristic curve (AUC), with robustness assessed through cross-validation and independent validation cohorts.ResultsNotably, we identified 34 dysregulated genes with consistent expression changes in both cfRNA and scRNA-seq. Machine learning models based on the cfRNA expression patterns of these 34 genes can accurately predict AD patients (the highest AUC = 89%) and effectively distinguish patients at early stage of AD. Furthermore, classifiers developed based on the expression of 34 genes in brain transcriptome data demonstrated robust predictive performance for assessing the risk of AD in the population (the highest AUC = 94%).DiscussionThis multi-omics approach overcomes limitations of invasive brain biomarkers and noisy blood-based signatures. The 34-gene panel provides non-invasive molecular insights into AD pathogenesis and early screening. While cfRNA stability challenges clinical translation, our framework highlights the potential for precision diagnostics and personalized therapeutic monitoring in AD.

  3. S

    scRNA-seq data of lung cancer

    • scidb.cn
    Updated Jul 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weimin Li (2022). scRNA-seq data of lung cancer [Dataset]. http://doi.org/10.57760/sciencedb.02028
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 21, 2022
    Dataset provided by
    Science Data Bank
    Authors
    Weimin Li
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    we collected 40 tumor and adjacent normal tissue samples from 19 pathologically diagnosed NSCLC patients (10 LUAD and 9 LUSC) during surgical resections, and rapidly digested the tissues to obtain single-cell suspensions and constructed the cDNA libraries of these samples within 24 hours using the protocol of 10X gennomic. These libraries were sequenced on the Illumina NovaSeq 6000 platform. Finally we obtained the raw gene expression matrices were generated using CellRanger (version 3.0.1). Information was processed in R (version 3.6.0) using the Seurat R package (version 2.3.4).

  4. E

    scRNAseq of patients with chronic graft-versus-host-disease

    • ega-archive.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    scRNAseq of patients with chronic graft-versus-host-disease [Dataset]. https://ega-archive.org/datasets/EGAD00001012121
    Explore at:
    License

    https://ega-archive.org/dacs/EGAC00001003458https://ega-archive.org/dacs/EGAC00001003458

    Description

    This dataset contains 10 samples from 9 patients with chronic graft-versus-host disease (GVHD). Each sample is analysed with Chromium V(D)J and 5' Gene Expression Platform v1.1 (10X Genomics). The raw data includes fastq files for Gene expression and fastq files for V(D)J Expression. The processed data have been deposited in the ArrayExpress database at EMBL-EBI (www.ebi.ac.uk/arrayexpress) under accession number E-MTAB-13419.

  5. S

    Data from: Single-cell RNA-seq provides insight into the underdeveloped...

    • scidb.cn
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yifei SHENG; Xiaodong FANG (2025). Single-cell RNA-seq provides insight into the underdeveloped immune system of germ-free mice [Dataset]. http://doi.org/10.57760/sciencedb.j00139.00203
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Yifei SHENG; Xiaodong FANG
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This RDS file contains processed single-cell RNA sequencing (scRNA-seq) data comparing immune cell populations from germ-free (GF) and specific-pathogen-free (SPF) mice. The dataset includes:Samples: Peripheral blood (PB) and bone marrow (BM) from GF and SPF miceCell Counts:Raw: 21,827 cells (PB) and 19,940 cells (BM)Quality-filtered: 18,344 high-quality cells (PB) and 16,537 high-quality cells (BM)Gene Coverage: Median 1,426 genes per cell (PB) and 1,391 genes per cell (BM)Cell Classifications: 18 major cell identities further divided into 25 subpopulationsAnnotation: Cells identified using established marker genes for blood cells

  6. t

    OCID – Object Clutter Indoor Dataset

    • researchdata.tuwien.at
    application/gzip
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jean-Baptiste Nicolas Weibel; Markus Suchi; Jean-Baptiste Nicolas Weibel; Markus Suchi (2025). OCID – Object Clutter Indoor Dataset [Dataset]. http://doi.org/10.48436/pcbjd-4wa12
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset provided by
    TU Wien
    Authors
    Jean-Baptiste Nicolas Weibel; Markus Suchi; Jean-Baptiste Nicolas Weibel; Markus Suchi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 20, 2019
    Description

    OCID – Object Clutter Indoor Dataset

    Developing robot perception systems for handling objects in the real-world requires computer vision algorithms to be carefully scrutinized with respect to the expected operating domain. This demands large quantities of ground truth data to rigorously evaluate the performance of algorithms.

    The Object Cluttered Indoor Dataset is an RGBD-dataset containing point-wise labeled point-clouds for each object. The data was captured using two ASUS-PRO Xtion cameras that are positioned at different heights. It captures diverse settings of objects, background, context, sensor to scene distance, viewpoint angle and lighting conditions. The main purpose of OCID is to allow systematic comparison of existing object segmentation methods in scenes with increasing amount of clutter. In addition OCID does also provide ground-truth data for other vision tasks like object-classification and recognition.

    OCID comprises 96 fully built up cluttered scenes. Each scene is a sequence of labeled pointclouds which are created by building a increasing cluttered scene incrementally and adding one object after the other. The first item in a sequence contains no objects, the second one object, up to the final count of added objects.

    Dataset

    The dataset uses 89 different objects that are chosen representatives from the Autonomous Robot Indoor Dataset(ARID)[1] classes and YCB Object and Model Set (YCB)[2] dataset objects.

    The ARID20 subset contains scenes including up to 20 objects from ARID. The ARID10 and YCB10 subsets include cluttered scenes with up to 10 objects from ARID and the YCB objects respectively. The scenes in each subset are composed of objects from only one set at a time to maintain separation between datasets. Scene variation includes different floor (plastic, wood, carpet) and table textures (wood, orange striped sheet, green patterned sheet). The complete set of data provides 2346 labeled point-clouds.

    OCID subsets are structured so that specific real-world factors can be individually assessed.

    ARID20-structure

    • location: floor, table
    • view: bottom, top
    • scene: sequence-id
    • free: clearly separated (objects 1-9 in corresponding sequence)
    • touching: physically touching (objects 10-16 in corresponding sequence)
    • stacked: on top of each other (objects 17-20 in corresponding sequence)

    ARID10-structure

    • location: floor, table
    • view: bottom, top
    • box: objects with sharp edges (e.g. cereal-boxes)
    • curved: objects with smooth curved surfaces (e.g. ball)
    • mixed: objects from both the box and curved
    • fruits: fruit and vegetables
    • non-fruits: mixed objects without fruits
    • scene: sequence-id

    YCB10-structure

    • location: floor, table
    • view: bottom, top
    • box: objects with sharp edges (e.g. cereal-boxes)
    • curved: objects with smooth curved surfaces (e.g. ball)
    • mixed: objects from both the box and curved
    • scene: sequence-id

    Structure:

    You can find all labeled pointclouds of the ARID20 dataset for the first sequence on a table recorded with the lower mounted camera in this directory:

    ./ARID20/table/bottom/seq01/pcd/

    In addition to labeled organized point-cloud files, corresponding depth, RGB and 2d-label-masks are available:

    • pcd: 640×480 organized XYZRGBL-pointcloud file with ground truth
    • rgb: 640×480 RGB png-image
    • depth: 640×480 16-bit png-image with depth in mm
    • label: 640×480 16-bit png-image with unique integer-label for each object at each pixel

    Dataset creation using EasyLabel:

    OCID was created using EasyLabel – a semi-automatic annotation tool for RGBD-data. EasyLabel processes recorded sequences of organized point-cloud files and exploits incrementally built up scenes, where in each take one additional object is placed. The recorded point-cloud data is then accumulated and the depth difference between two consecutive recordings are used to label new objects. The code is available here.

    OCID data for instance recognition/classification

    For ARID10 and ARID20 there is additional data available usable for object recognition and classification tasks. It contains semantically annotated RGB and depth image crops extracted from the OCID dataset.

    The structure is as follows:

    • type: depth, RGB
    • class name: eg. banana, kleenex, …
    • class instance: eg. banana_1, banana_2, kleenex_1, kleenex_2,…

    The data is provided by Mohammad Reza Loghmani.

    Research paper

    If you found our dataset useful, please cite the following paper:

    @inproceedings{DBLP:conf/icra/SuchiPFV19,

    author = {Markus Suchi and

    Timothy Patten and

    David Fischinger and

    Markus Vincze},

    title = {EasyLabel: {A} Semi-Automatic Pixel-wise Object Annotation Tool for

    Creating Robotic {RGB-D} Datasets},

    booktitle = {International Conference on Robotics and Automation, {ICRA} 2019,

    Montreal, QC, Canada, May 20-24, 2019},

    pages = {6678--6684},

    year = {2019},

    crossref = {DBLP:conf/icra/2019},

    url = {https://doi.org/10.1109/ICRA.2019.8793917},

    doi = {10.1109/ICRA.2019.8793917},

    timestamp = {Tue, 13 Aug 2019 20:25:20 +0200},

    biburl = {https://dblp.org/rec/bib/conf/icra/SuchiPFV19},

    bibsource = {dblp computer science bibliography, https://dblp.org}

    }

    @proceedings{DBLP:conf/icra/2019,

    title = {International Conference on Robotics and Automation, {ICRA} 2019,

    Montreal, QC, Canada, May 20-24, 2019},

    publisher = {{IEEE}},

    year = {2019},

    url = {http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=8780387},

    isbn = {978-1-5386-6027-0},

    timestamp = {Tue, 13 Aug 2019 20:23:21 +0200},

    biburl = {https://dblp.org/rec/bib/conf/icra/2019},

    bibsource = {dblp computer science bibliography, https://dblp.org}

    }

    Contact & credits

    For any questions or issues with the OCID-dataset, feel free to contact the author:

    • Markus Suchi – email: suchi@acin.tuwien.ac.at
    • Tim Patten – email: patten@acin.tuwien.ac.at

    For specific questions about the OCID-semantic crops data please contact:

    • Mohammad Reza Loghmani – email: loghmani@acin.tuwien.ac.at

    References

    [1] Loghmani, Mohammad Reza et al. "Recognizing Objects in-the-Wild: Where do we Stand?" 2018 IEEE International Conference on Robotics and Automation (ICRA) (2018): 2170-2177.

    [2] Berk Calli, Arjun Singh, James Bruce, Aaron Walsman, Kurt Konolige, Siddhartha Srinivasa, Pieter Abbeel, Aaron M Dollar, Yale-CMU-Berkeley dataset for robotic manipulation research, The International Journal of Robotics Research, vol. 36, Issue 3, pp. 261 – 268, April 2017.

  7. e

    Distinct mechanisms of germ cell factor regulation for an inductive germ...

    • b2find.eudat.eu
    Updated Feb 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Distinct mechanisms of germ cell factor regulation for an inductive germ cell fate - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/1c33ed55-a9ab-5712-a582-512d2d081388
    Explore at:
    Dataset updated
    Feb 8, 2025
    Description

    Here we employed single cell RNA sequencing to identify the transcriptional program of Nanos and Vasa positive cells and their changes during development. Our single cell sequencing analysis of six developmental stages in P. miniata revealed cell types derived from the three germ layers and expression of the germ cell genes Nanos and Vasa. We used these datasets to parse out 20 cell lineages of the embryo identified by this approach and to focus on the key transitions of germ cell gene expression and test their coexpression with key signaling components. Overall design: Adult Patiria miniata animals were collected by either Peter Halmay (PeterHalmay@gmail.com) or Josh Ross (info@scbiomarine.com) off the Californian coast. Embryos were cultured essentially as described previously (Fresques et al., 2016). Embryos were cultured in filtered (0.2micron) sea water collected at the Marine Biological laboratories in Woods Hole MA, until the appropriate stage for dissociation. All embryos used in the study resulted from mating of one male and one female. Multiple fertilizations were initiated in this study and timed such that the appropriate stages of embryonic development were reached at a common endpoint. The embryos were then collected and washed twice with calcium-free sea water, and then suspended hyalin-extraction media (HEM) for 10-15 minutes, depending on the stage of dissociation. When cells were beginning to dissociate, the embryos were collected and washed in 0.5M NaCl, gently sheared with a pipette, run through a 40micron Nitex mesh, counted on a hemocytometer, and diluted to reach the appropriate concentration for the scRNA-seq protocol. Equal numbers of embryos were used in each time point and at no time were cells or embryos pelleted in a centrifuge (Oulhen et al., 2019).

  8. f

    DataSheet_2_Lineage tracing of T cell differentiation from T-iPSC by 2D...

    • frontiersin.figshare.com
    pdf
    Updated Dec 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yoshitaka Ishiguro; Shoichi Iriguchi; Shinya Asano; Tokuyuki Shinohara; Sara Shiina; Suguru Arima; Yoshiaki Kassai; Yoshiharu Sakai; Kazutaka Obama; Shin Kaneko (2023). DataSheet_2_Lineage tracing of T cell differentiation from T-iPSC by 2D feeder-free culture and 3D organoid culture.pdf [Dataset]. http://doi.org/10.3389/fimmu.2023.1303713.s002
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Dec 15, 2023
    Dataset provided by
    Frontiers
    Authors
    Yoshitaka Ishiguro; Shoichi Iriguchi; Shinya Asano; Tokuyuki Shinohara; Sara Shiina; Suguru Arima; Yoshiaki Kassai; Yoshiharu Sakai; Kazutaka Obama; Shin Kaneko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionT cells induced from induced pluripotent stem cells(iPSCs) derived from antigen-specific T cells (T-iPS-T cells) are an attractive tool for T cell immunotherapy. The induction of cytotoxic T-iPS-T cells is well established in feeder-free condition for the aim of off-the-shelf production, however, the induction of helper T-iPS-T cells remains challenging.MethodsWe analyzed T-iPS-T cells matured in 3D organoid culture at different steps in the culture process at the single-cell level. T-iPS-T cell datasets were merged with an available human thymocyte dataset based in single-cell RNA sequencing (scRNA-seq). Particularly, we searched for genes crucial for generation CD4+ T-iPS-T cells by comparing T-iPS-T cells established in 2D feeder-free or 3D organoid culture.ResultsThe scRNA-seq data indicated that T-iPS-T cells are similar to T cells transitioning to human thymocytes, with SELENOW, GIMAP4, 7, SATB1, SALMF1, IL7R, SYTL2, S100A11, STAT1, IFITM1, LZTFL1 and SOX4 identified as candidate genes for the 2D feeder-free induction of CD4+ T-iPS-T cells.DiscussionThis study provides single cell transcriptome datasets of iPS-T cells and leads to further analysis for CD4+ T cell generation from T-iPSCs.

  9. Raw gene counts

    • figshare.com
    txt
    Updated Oct 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geoff Stanley (2020). Raw gene counts [Dataset]. http://doi.org/10.6084/m9.figshare.12089430.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 6, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Geoff Stanley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Non-small cell lung cancer (NSCLC) metastatic to the brain leptomeninges (LMD) is rapidly fatal, cannot be biopsied, and the number of cancer cells in the cerebral spinal fluid (CSF) are few; therefore, the tissue samples available for research and the development of effective treatments are severely limited. We overcame these obstacles using LMD patient CSF to perform massive parallel qPCR to analyze the cell-free RNA signatures (n=14), and performed single cell RNA sequencing (scRNAseq; n=197 cells from 4 patients).

  10. f

    3k PBMCs from a healthy donor

    • figshare.com
    hdf
    Updated Feb 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Songqi Duan (2025). 3k PBMCs from a healthy donor [Dataset]. http://doi.org/10.6084/m9.figshare.28414916.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Feb 14, 2025
    Dataset provided by
    figshare
    Authors
    Songqi Duan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains single-cell RNA sequencing (scRNA-seq) data of 3,000 peripheral blood mononuclear cells (PBMCs) from a healthy donor, processed using the 10x Genomics Chromium platform. The raw data was obtained from 10x Genomics and subsequently aligned using Cell Ranger 8.0.1 with the GENCODE Release 47 (GRCh38.p14) reference genome.The dataset includes the following output files from the Cell Ranger pipeline:filtered_feature_bc_matrix.h5 – Filtered count matrix in HDF5 formatfiltered_feature_bc_matrix – Filtered gene-barcode matrix in directory formatraw_feature_bc_matrix – Raw gene-barcode matrix in directory formatraw_feature_bc_matrix.h5 – Raw count matrix in HDF5 formatThis dataset is valuable for researchers studying single-cell transcriptomics, immune cell profiling, and bioinformatics pipeline benchmarking.File format: HDF5 and Matrix Market (MTX)Reference Genome: GENCODE Release 47 (GRCh38.p14)Processing Pipeline: Cell Ranger 8.0.1For any questions or collaborations, please feel free to contact the uploader.

  11. f

    Data_Sheet_1_Identification and Validation of a Malignant Cell Subset...

    • frontiersin.figshare.com
    xls
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qiyuan Zou; Yufeng Lv; Zuhuan Gan; Shulan Liao; Zhonghui Liang (2023). Data_Sheet_1_Identification and Validation of a Malignant Cell Subset Marker-Based Polygenic Risk Score in Stomach Adenocarcinoma Through Integrated Analysis of Bulk and Single-Cell RNA Sequencing Data.xls [Dataset]. http://doi.org/10.3389/fcell.2021.720649.s001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    Frontiers
    Authors
    Qiyuan Zou; Yufeng Lv; Zuhuan Gan; Shulan Liao; Zhonghui Liang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Objectives: The aim of the present study was to construct a polygenic risk score (PRS) for poor survival among patients with stomach adenocarcinoma (STAD) based on expression of malignant cell markers.Methods: Integrated analyses of bulk and single-cell RNA sequencing (scRNA-seq) of STAD and normal stomach tissues were conducted to identify malignant and non-malignant markers. Analyses of the scRNA-seq profile from early STAD were used to explore intratumoral heterogeneity (ITH) of the malignant cell subpopulations. Dimension reduction, cell clustering, pseudotime, and gene set enrichment analyses were performed. The marker genes of each malignant tissue and cell clusters were screened to create a PRS using Cox regression analyses. Combined with the PRS and routine clinicopathological characteristics, a nomogram tool was generated to predict prognosis of patients with STAD. The prognostic power of the PRS was validated in two independent external datasets.Results: The malignant and non-malignant cells were identified according to 50 malignant and non-malignant cell markers. The malignant cells were divided into nine clusters with different marker genes and biological characteristics. Pseudotime analysis showed the potential differentiation trajectory of these nine malignant cell clusters and identified genes that affect cell differentiation. Ten malignant cell markers were selected to generate a PRS: RGS1, AADAC, NPC2, COL10A1, PRKCSH, RAMP1, PRR15L, TUBA1A, CXCR6, and UPP1. The PRS was associated with both overall and progression-free survival (PFS) and proved to be a prognostic factor independent of routine clinicopathological characteristics. PRS could successfully divide patients with STAD in three datasets into high- or low-risk groups. In addition, we combined PRS and the tumor clinicopathological characteristics into a nomogram tool to help predict the survival of patients with STAD.Conclusion: We revealed limited but significant intratumoral heterogeneity in STAD and proposed a malignant cell subset marker-based PRS through integrated analysis of bulk sequencing and scRNA-seq data.

  12. f

    DataSheet_1_Bulk and single-cell RNA-sequencing analyses along with abundant...

    • frontiersin.figshare.com
    pdf
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuyao Liu; Haoxue Zhang; Yan Mao; Yangyang Shi; Xu Wang; Shaomin Shi; Delin Hu; Shengxiu Liu (2023). DataSheet_1_Bulk and single-cell RNA-sequencing analyses along with abundant machine learning methods identify a novel monocyte signature in SKCM.pdf [Dataset]. http://doi.org/10.3389/fimmu.2023.1094042.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Yuyao Liu; Haoxue Zhang; Yan Mao; Yangyang Shi; Xu Wang; Shaomin Shi; Delin Hu; Shengxiu Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundGlobal patterns of immune cell communications in the immune microenvironment of skin cutaneous melanoma (SKCM) haven’t been well understood. Here we recognized signaling roles of immune cell populations and main contributive signals. We explored how multiple immune cells and signal paths coordinate with each other and established a prognosis signature based on the key specific biomarkers with cellular communication.MethodsThe single-cell RNA sequencing (scRNA-seq) dataset was downloaded from the Gene Expression Omnibus (GEO) database, in which various immune cells were extracted and re-annotated according to cell markers defined in the original study to identify their specific signs. We computed immune-cell communication networks by calculating the linking number or summarizing the communication probability to visualize the cross-talk tendency in different immune cells. Combining abundant analyses of communication networks and identifications of communication modes, all networks were quantitatively characterized and compared. Based on the bulk RNA sequencing data, we trained specific markers of hub communication cells through integration programs of machine learning to develop new immune-related prognostic combinations.ResultsAn eight-gene monocyte-related signature (MRS) has been built, confirmed as an independent risk factor for disease-specific survival (DSS). MRS has great predictive values in progression free survival (PFS) and possesses better accuracy than traditional clinical variables and molecular features. The low-risk group has better immune functions, infiltrated with more lymphocytes and M1 macrophages, with higher expressions of HLA, immune checkpoints, chemokines and costimulatory molecules. The pathway analysis based on seven databases confirms the biological uniqueness of the two risk groups. Additionally, the regulon activity profiles of 18 transcription factors highlight possible differential regulatory patterns between the two risk groups, suggesting epigenetic event-driven transcriptional networks may be an important distinction. MRS has been identified as a powerful tool to benefit SKCM patients. Moreover, the IFITM3 gene has been identified as the key gene, validated to express highly at the protein level via the immunohistochemical assay in SKCM.ConclusionMRS is accurate and specific in evaluating SKCM patients’ clinical outcomes. IFITM3 is a potential biomarker. Moreover, they are promising to improve the prognosis of SKCM patients.

  13. f

    Table 1_Identification of novel molecular subtypes and construction of a...

    • frontiersin.figshare.com
    xlsx
    Updated Jul 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Ma; Jie Xu; Congyue Wang; Xu Cao; Wenjie Yu; Jingjing Xi; Xuan Zhang; Jiamin Zhan; Yang Liu; Aoyang Yu; Shuhan Liu; Yanhua Liu; Chong Chen; Xiaoli Mai (2025). Table 1_Identification of novel molecular subtypes and construction of a prognostic signature via multi-omics analysis and machine learning in lung adenocarcinoma.xlsx [Dataset]. http://doi.org/10.3389/fonc.2025.1590216.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 21, 2025
    Dataset provided by
    Frontiers
    Authors
    Ke Ma; Jie Xu; Congyue Wang; Xu Cao; Wenjie Yu; Jingjing Xi; Xuan Zhang; Jiamin Zhan; Yang Liu; Aoyang Yu; Shuhan Liu; Yanhua Liu; Chong Chen; Xiaoli Mai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionThe development of high-throughput sequencing technologies and targeted therapeutic strategies has significantly improved the prognosis of lung adenocarcinoma (LUAD) patients with sensitive gene mutations. However, patients harboring rare or no actionable mutations were rarely benefit from these targeted therapies. This study aimed to identify novel molecular subtypes and construct a prognostic signature to enhance the stratification of LUAD prognosis.Materials and methodsNovel molecular subtypes of LUAD patients were identified by applying 10 distinct clustering algorithms on multi-omics data. Single-cell RNA-sequencing (scRNA-seq) data were integrated to characterize subtype-specific immune microenvironments. A multi-omics and machine learning-driven prognostic signature (MO-MLPS) was constructed in The Cancer Genome Atlas (TCGA) LUAD dataset using ten machine learning algorithms and subsequently validated across six independent datasets from the Gene Expression Omnibus (GEO) database. The robustness of the model was assessed using the concordance index (C-index), Kaplan-Meier survival analyses, receiver operating characteristic (ROC) curves, and both univariate and multivariate Cox regression analyses. We further confirmed the effects of ANLN knockdown and the expression of a domain-negative anillin protein (dnANLN) via western blotting, cell proliferation assays, flow cytometry, and transwell migration assays in vitro.ResultsOur analysis revealed that the novel molecular subtypes exhibited differences in prognoses, biological functions, and immune infiltration profiles in LUAD. The MO-MLPS was successfully established and validated across TCGA-LUAD cohorts, six independent GEO datasets, and their composite meta-cohort. Higher risk scores from the MO-MLPS correlated with poorer prognosis in LUAD, with AUC values exceeding 0.5 at 1, 3, and 5 years across various cohorts. The signature outperformed 49 previously published prognostic signatures. Furthermore, patients classified as high risk exhibited significantly worse overall and progression-free survival than those classified as low risk. Notably, ANLN knockdown and dnANLN expression significantly inhibited cell proliferation and migration in vitro and enhanced the efficacy of docetaxel.ConclusionA comprehensive analysis of multi-omics data redefines the molecular subtype of LUAD patients. The MO-MLPS derived from subtype characteristics has the potential to serve as a clinically valuable prognostic tool. Furthermore, ANLN emerges as a promising novel therapeutic target in the treatment of LUAD.

  14. f

    Evaluation of GraphFP’s performance on quantifying the stochastic dynamics...

    • plos.figshare.com
    xls
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qi Jiang; Shuo Zhang; Lin Wan (2023). Evaluation of GraphFP’s performance on quantifying the stochastic dynamics of cell-type frequencies with cell-cell interaction term (W ≠ 0) and without cell-cell interaction term (W = 0) on the murine cerebral cortex dataset. [Dataset]. http://doi.org/10.1371/journal.pcbi.1009821.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Qi Jiang; Shuo Zhang; Lin Wan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Evaluation of GraphFP’s performance on quantifying the stochastic dynamics of cell-type frequencies with cell-cell interaction term (W ≠ 0) and without cell-cell interaction term (W = 0) on the murine cerebral cortex dataset.

  15. f

    Table_4_S100A12 as Biomarker of Disease Severity and Prognosis in Patients...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yupeng Li; Yaowu He; Shibin Chen; Qi Wang; Yi Yang; Danting Shen; Jing Ma; Zhe Wen; Shangwei Ning; Hong Chen (2023). Table_4_S100A12 as Biomarker of Disease Severity and Prognosis in Patients With Idiopathic Pulmonary Fibrosis.xlsx [Dataset]. http://doi.org/10.3389/fimmu.2022.810338.s018
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    Frontiers
    Authors
    Yupeng Li; Yaowu He; Shibin Chen; Qi Wang; Yi Yang; Danting Shen; Jing Ma; Zhe Wen; Shangwei Ning; Hong Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundIdiopathic pulmonary fibrosis (IPF) is one of interstitial lung diseases (ILDs) with poor prognosis. S100 calcium binding protein A12 (S100A12) has been reported as a prognostic serum biomarker in the IPF, but its correlation with IPF remains unclear in the lung tissue and bronchoalveolar lavage fluids (BALF).MethodsDatasets were collected from the Gene Expression Omnibus (GEO) database. Person correlation coefficient, Kaplan–Meier analysis, Cox regression analysis, functional enrichment analysis and so on were used. And single cell RNA-sequencing (scRNA-seq) analysis was also used to explore the role of S100A12 and related genes in the IPF.ResultsS100A12 was mainly and highly expressed in the monocytes, and its expression was downregulated in the lung of patients with IPF according to scRNA-seq and the transcriptome analysis. However, S100A12 expression was upregulated both in blood and BALF of patients with IPF. In addition, 10 genes were found to interact with S100A12 according to protein–protein interaction (PPI) network, and the first four transcription factors (TF) targeted these genes were found according to hTFtarget database. Two most significant co-expression genes of S100A12 were S100A8 and S100A9. The 3 genes were significantly negatively associated with lung function and positively associated with the St. George’s Respiratory Questionnaire (SGRQ) scores in the lung of patients with IPF. And, high expression of the 3 genes was associated with higher mortality in the BALF, and shorter transplant-free survival (TFS) and progression-free survival (PFS) time in the blood. Prognostic predictive value of S100A12 was more superior to S100A8 and S100A9 in patients with IPF, and the composited variable [S100A12 + GAP index (gender, age, and physiological index)] may be a more effective predictive index.ConclusionThese results imply that S100A12 might be an efficient disease severity and prognostic biomarker in patients with IPF.

  16. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yang Xu (2022). Fast model-free integration and transfer learning via MASI for single-cell expression data [Dataset]. http://doi.org/10.6084/m9.figshare.18866264.v1
Organization logoOrganization logo

Fast model-free integration and transfer learning via MASI for single-cell expression data

Explore at:
hdfAvailable download formats
Dataset updated
Jan 23, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Yang Xu
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Processsed data for MASI manuscript

Search
Clear search
Close search
Google apps
Main menu