5 datasets found
  1. f

    DataSheet1_Benchmarking automated cell type annotation tools for single-cell...

    • frontiersin.figshare.com
    docx
    Updated Jun 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuge Wang; Xingzhi Sun; Hongyu Zhao (2023). DataSheet1_Benchmarking automated cell type annotation tools for single-cell ATAC-seq data.docx [Dataset]. http://doi.org/10.3389/fgene.2022.1063233.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Frontiers
    Authors
    Yuge Wang; Xingzhi Sun; Hongyu Zhao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As single-cell chromatin accessibility profiling methods advance, scATAC-seq has become ever more important in the study of candidate regulatory genomic regions and their roles underlying developmental, evolutionary, and disease processes. At the same time, cell type annotation is critical in understanding the cellular composition of complex tissues and identifying potential novel cell types. However, most existing methods that can perform automated cell type annotation are designed to transfer labels from an annotated scRNA-seq data set to another scRNA-seq data set, and it is not clear whether these methods are adaptable to annotate scATAC-seq data. Several methods have been recently proposed for label transfer from scRNA-seq data to scATAC-seq data, but there is a lack of benchmarking study on the performance of these methods. Here, we evaluated the performance of five scATAC-seq annotation methods on both their classification accuracy and scalability using publicly available single-cell datasets from mouse and human tissues including brain, lung, kidney, PBMC, and BMMC. Using the BMMC data as basis, we further investigated the performance of these methods across different data sizes, mislabeling rates, sequencing depths and the number of cell types unique to scATAC-seq. Bridge integration, which is the only method that requires additional multimodal data and does not need gene activity calculation, was overall the best method and robust to changes in data size, mislabeling rate and sequencing depth. Conos was the most time and memory efficient method but performed the worst in terms of prediction accuracy. scJoint tended to assign cells to similar cell types and performed relatively poorly for complex datasets with deep annotations but performed better for datasets only with major label annotations. The performance of scGCN and Seurat v3 was moderate, but scGCN was the most time-consuming method and had the most similar performance to random classifiers for cell types unique to scATAC-seq.

  2. Data from: A single-cell atlas characterizes dysregulation of the bone...

    • zenodo.org
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Pilcher; William Pilcher (2025). A single-cell atlas characterizes dysregulation of the bone marrow immune microenvironment associated with outcomes in multiple myeloma [Dataset]. http://doi.org/10.5281/zenodo.14624955
    Explore at:
    Dataset updated
    Jan 14, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    William Pilcher; William Pilcher
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 8, 2024
    Description

    This repository contains R Seurat objects associated with our study titled "A single-cell atlas characterizes dysregulation of the bone marrow immune microenvironment associated with outcomes in multiple myeloma".

    Single cell data contained within this object comes from MMRF Immune Atlas Consortium work.

    The .rds files contains a Seurat object saved with version 4.3. This can be loaded in R with the readRDS command.

    Two .RDS files are included in this version of the release.

    • Discovery object: MMRF_ImmuneAtlas_Full_With_Corrected_Censored_Metadata.rds contains all aliquots belonging to the 'discovery' cohort as used in the initial paper. This represents the dataset used for initial clustering, cell annotation, and analysis.

    • Discovery + Validation object: COMBINED_VALIDATION_MMRF_ImmuneAtlas_Full_Censored_Metadata.rds contains both aliquots belonging to the initial 'discovery' cohort, and aliquots belonging to the 'validation' cohort. The group each cell is derived from is listed under the 'cohort' variable. Labels related to cell annotation, including doublet status, are derived from a label transfer process as described in the paper. Labels for the original 'discovery' cohort are unchanged. UMAPs have been reconstructed with both the discovery and validation cohorts integrated.

    --

    The discovery object contains two assays:

    • "RNA" - The raw count matrix
    • "RNA_Batch_Corrected" - Counts adjusted for the combination of 'Study_Site' and 'Batch'.
      • Analysis should prefer the original RNA assay, unless using pipelines which does not support adjusting for technical covariates.

    Currently, the validation object only includes the uncorrected RNA assay.

    --

    The object contains two umaps in the reduction slot:

    • umap - will render the UMAP for the full object with all cells.
    • umap.sub -contains the UMAP embeddings for individual 'compartments', as indicated by 'subcluster_V03072023_compartment'

    --

    Each sample has three different identifiers:

    • public_id
      • Indicates a specific patient (n=263).
      • MMRF_####
      • This is a standard identifier which is used across all MMRF CoMMpass datasets
      • public_ids can map to multiple d_visit_specimen_ids and aliquot_ids
      • As of now, all public_ids have a single sample collected at Baseline.
        • This can be accessed by filtering for 'collection_event' %in% c("Baseline", "Screening") or VJ_INTERVAL == 'Baseline'
    • d_visit_specimen_id
      • Indicates a specific visit by a patient (n=358)
      • MMRF_####_Y
        • Y is a number indicate that this is the 'Y' sample obtained from said patient. This does not correspond to a specific timepoint.
      • This is a standard identifier, which is used across all MMRF CoMMpass datasets
      • The purpose of the visit is indicated in 'collection_event' (Baseline, Relapse, Remmission, etc.). The approximate interval the visit corresponds to is in "VJ_INTERVAL"
      • d_visit_specimen_id uniquely maps to one public_id
      • d_visit_specimen_id can map to multiple aliquot_ids
    • aliquot_id
      • Refers to the specific bone marrow aliquot sample processed (n=361)
      • MMRFA-######
      • This is a unique identifier for each processed scRNA-seq sample.
      • As of now, this uniquely maps to a combination of d_visit_specimen_id, Study_Site, and Batch
      • As of now, is an identifier specific to the MMRF ImmuneAtlas

    Each cell has the following annotation information:

    • subcluster_V03072023
      • These refer to an individual cluster derived from 'Seurat'.
      • Format is 'Compartment'.'Compartment-cluster'.'Compartment-subcluster'
        • 'NkT.2.2', indicates this cell is in the 'Natural Killer + T Cell compartment', was originally part of 'Cluster 2', and then was further separated into a refined subcluster 2.2'
        • If a parent cluster did not need to be further seprated, the 'Compartment-subcluster' part is omitted (e.g., 'NkT.6')
      • As of now, this uniquely maps to a specific cellID_short annotation.
      • Clustering was done on a per compartment basis
        • For most immune cell types, clustering was based on embeddings corrected for 'siteXbatch'. For Plasma, clustering was performed on embeddings corrected on a per-sample basis.
      • In the combined validation object, DISCOVERY.subcluster_V03072023 will contain values only for the discovery cohort, and have NA values for validation samples.
    • subcluster_V03072023_compartment
      • These refer to one of five major compartments as identified roughly on the original UMAP. Clustering was performed on a per-compartment basis following a first pass rough annotation.
      • The possible compartments are
        • NkT (T cell + Natural Killer Cells)
        • Myeloid (Monocytes, Macrophages, Dendritic cells, Neutrophil/Granulocyte populations)
        • BEry (B Cell, Erythroblasts, bone marrow progenitor populations, pDCs)
        • Ery (Erythrocyte population)
        • Plasma (Plasma cell populations)
      • Each compartment has it's own UMAP generated, which can be accessed in the 'umap.sub' reduction
      • One cluster was isolated from all other populations, and was not assigned to a compartment. This cluster is labeled as 'Full.23'.
      • In the combined validation object, DISCOVERY.subcluster_V03072023_compartment will contain values only for the discovery cohort, and have NA values for validation samples.
    • cellID_short
      • This is the individual annotation for each cluster.
      • Please see the 'Cell Population Annotation Dictionary' for further details.
      • If different seurat clusters were assigned similar annotations, the celltype annotation will be appended with a distinct cluster gene, or with '_b', '_c'
    • lineage_group
      • This is an annotation driven grouping of clusters into major immune populations, as shown in Figure 2.
      • This includes "CD8", "CD4", "M" (Myeloid), "B" (B cell), "E" (Erythroid), "P" (Plasma), "Other" (HSC, Fibro, pDC_a), "LQ" (Doublet)
    • isDoublet
      • This is a binary 'True' or 'False' derived from manual review of clusters following doublet analysis, as described in the paper.
      • True indicates the cluster was determined to be a doublet population.
      • This is derived from 'doublet_pred', in which 'dblet_cluster' and 'poss_dblet_cluster' were flagged as doublet populations for subsequent analysis.
      • In the validation object, the doublet status of new samples were inferred by if label transfer from the discovery cohort mapped the cell from the new sample as one of the previously identified doublet populations. The raw doublet scores from doublet finder, pegasus, or scrublet, are not included in this release.

    --

    Each sample has the following information indicating shipment batches, for batch correction

    • Study_Site
      • The center which processed a specific aliquot_id
      • EMORY, MSSM, WashU, MAYO
    • Batch
      • The shipment batch the sample was associated with
      • Valued 1 to 3 for EMORY, MSSM, MAYO, and 1 to 4 for WashU
    • siteXbatch
      • A combination of the above to variables, to be used for batch correction
    • (Combined Validation Object only): cohort
      • Indicates if the sample was involved in the 'discovery' cohort, or 'validation' cohort. Samples in the 'validation' cohort will have labels inferred from label mapping

    --

    Each public_id has limited demographic information based on publicly available information in the MMRF CoMMpass study.

    • d_pt_sex
      • Patient sex (not self-identified). Male or Female
    • d_pt_race_1
      • Patient self-identified race
    • d_pt_ethnicity
      • Patient self-identified ethnicity
    • d_dx_amm_age
      • Patient age at diagnosis.
      • Not reported for patients above 90 at diagnosis
    • d_dx_amm_bmi
      • Patient BMI at diagnosis
    • d_pt_height_cm
      • Patient height at diagnosis, in centimeters.
    • d_dx_amm_weight_kg
      • Patient weight at diagnosis, in kilograms

    d_specimen_visit_id contains two data points providing limited information about the visit

    • collection_event
      • Description of why the sample was collected
        • e.g., 'Baseline' and 'Screening' indicates the sample was obtained prior to therapy
        • 'Relapse/Progression' indicates the sample was collected due to disease progression based on clinical assessment
        • 'Remission/Response' indicates the sample was collected due to patient entering remission based on clinical assessment
        • Samples may be collected for reasons independent of the above, such as 'Pre' or 'Post' ASCT, or for other unspecified reasons
    • VJ_INTERVAL
      • Indicates the rough interval following start of therapy the sample is assigned to
        • "Baseline", "Month 3", "Year 2", etc.

    All the single-cell raw data, along with outcome and cytogenetic information, is available at MMRF’s VLAB shared resource. Requests to access these data will be reviewed by data access committee at MMRF and any data shared will be released under a data transfer agreement that will protect the identities of patients involved in the study. Other information from the CoMMpass trial can also generally be

  3. Data from: Immunothrombolytic monocyte-neutrophil axes dominate the...

    • zenodo.org
    bin, text/x-python +1
    Updated Apr 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Markus Joppich; Markus Joppich; Kami Alexander Pekayvaz; Kami Alexander Pekayvaz; Sophia Brambs; Viktoria Knottenberg; Luke Eivers; Martin Dichgans; Steffen Tiedt; Ralf Zimmer; Leo Nicolai; Konstantin Stark; Sophia Brambs; Viktoria Knottenberg; Luke Eivers; Martin Dichgans; Steffen Tiedt; Ralf Zimmer; Leo Nicolai; Konstantin Stark (2025). Immunothrombolytic monocyte-neutrophil axes dominate the single-cell landscape of human thrombosis [Dataset]. http://doi.org/10.5281/zenodo.14050429
    Explore at:
    bin, tsv, text/x-pythonAvailable download formats
    Dataset updated
    Apr 17, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Markus Joppich; Markus Joppich; Kami Alexander Pekayvaz; Kami Alexander Pekayvaz; Sophia Brambs; Viktoria Knottenberg; Luke Eivers; Martin Dichgans; Steffen Tiedt; Ralf Zimmer; Leo Nicolai; Konstantin Stark; Sophia Brambs; Viktoria Knottenberg; Luke Eivers; Martin Dichgans; Steffen Tiedt; Ralf Zimmer; Leo Nicolai; Konstantin Stark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Original Data for the manuscript:

    Immunothrombolytic monocyte-neutrophil axes dominate the single-cell landscape of human thrombosis and correlate with thrombus resolution

    The manuscript is accessible at https://doi.org/10.1016/j.immuni.2025.03.020 . Please cite the journal publication when using any of this data.

    Processed scRNA-seq data
    seurat_humancombined_libintSubsetGroups.Rds -> fully processed Seurat objects
    objGroups_libint_conditions.tsv -> annotations for main Seurat object
    integrated_mouse_thrombus.rds -> mouse thrombus dataset
    arb_obj_integrated.Rds -> ARB Thrombus (Artificial thrombus, Real Thrombus, Blood) Seurat object

    Original Data (scRNA-seq)
    pat1_raw_feature_bc_matrix.h5 -> Sample 21053_0001
    pat2_raw_feature_bc_matrix.h5 -> Sample 21053_0003
    pat3_raw_feature_bc_matrix.h5 -> Sample sample_Thr3
    pat4_5_raw_feature_bc_matrix.h5 -> Sample sample_Thr4
    pat6_raw_feature_bc_matrix.h5 -> Sample sample_Thr5
    pat7_raw_feature_bc_matrix.h5 -> Sample samples_ATTHR
    mt_raw_feature_bc_matrix.h5 -> Mouse
    arb_raw_feature_bc_matrix.h5 -> ARB Thrombus (Artificial thrombus, Real Thrombus, Blood)
    Original Data (bulk)
    coagulation_classical_counts.tsv
    coagulation_nonclassical_counts.tsv
    hypoxia_classical_counts.tsv
    hypoxia_nonclassical_counts.tsv
    monocytes_blood_thrombus_counts.tsv
    neutrophils_blood_thrombus_counts.tsv


    Original Data (spliced/unspliced by velocyto; matching above sample names):
    pat1_velocyto.loom -> Sample 21053_0001
    pat2_velocyto.loom -> Sample 21053_0003
    pat3_velocyto.loom -> Sample sample_Thr3
    pat4_5_velocyto.loom -> Sample sample_Thr4
    pat6_velocyto.loom -> Sample sample_Thr5
    pat7_velocyto.loom -> Sample samples_ATTHR


    Scripts:
    functions.R -> script with helper utilities
    mt_process.R -> script for processing the mouse data

    process.R -> script for processing the main data set

    process_de_obj.R -> script for performing DE analysis
    process_label_transfer.R -> script for doing label transfer between human and mouse dataset and further analyses
    process_label_transfer.ipynb -> script for doing label transfer between human and mouse dataset and further analyses
    process_monocle.R -> script for doing monocle-based analyses
    process_wgcna.R -> script for doing wgcna-based analyses
    subset_velocities_step1.R -> Seurat to file-based data
    2_velocities_monos.py-> Mono-subset velocity analysis
    2_velocities_neutros.py -> Neutro-subset velocity analysis
    analysis_arb_process.ipynb -> Seurat processing of ARB Thrombus

  4. n

    Dermomyotome-derived endothelial cells migrate to the dorsal aorta to...

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Oct 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Traver; Pankaj Sahai-Hernandez; Claire Pouget; Shai Eyal; Ondrej Svoboda; Jose Chacon; Lin Grimm; Tor Gjøen (2023). Dermomyotome-derived endothelial cells migrate to the dorsal aorta to support hematopoietic stem cell emergence [Dataset]. http://doi.org/10.6075/J0GB22J0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 4, 2023
    Dataset provided by
    University of California, San Diego
    University of Oslo
    Authors
    David Traver; Pankaj Sahai-Hernandez; Claire Pouget; Shai Eyal; Ondrej Svoboda; Jose Chacon; Lin Grimm; Tor Gjøen
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Development of the dorsal aorta is a key step in the establishment of the adult blood-forming system since hematopoietic stem and progenitor cells (HSPCs) arise from ventral aortic endothelium in all vertebrate animals studied. Work in zebrafish has demonstrated that arterial and venous endothelial precursors arise from distinct subsets of lateral plate mesoderm. Here, we profile the transcriptome of the earliest detectable endothelial cells (ECs) during zebrafish embryogenesis to demonstrate that tissue-specific EC programs initiate much earlier than previously appreciated, by the end of gastrulation. Classic studies in the chick embryo showed that paraxial mesoderm generates a subset of somite-derived endothelial cells (SDECs) that incorporate into the dorsal aorta to replace HSPCs as they exit the aorta and enter circulation. We describe a conserved program in the zebrafish, where a rare population of endothelial precursors delaminates from the dermomyotome to incorporate exclusively into the developing dorsal aorta. Although SDECs lack hematopoietic potential, they act as a local niche to support the emergence of HSPCs from neighboring hemogenic endothelium. Thus, at least three subsets of ECs contribute to the developing dorsal aorta: vascular ECs, hemogenic ECs, and SDECs. Taken together, our findings indicate that the distinct spatial origins of endothelial precursors dictate different cellular potentials within the developing dorsal aorta. Methods Single-cell RNA sample preparation After FACS, total cell concentration and viability were ascertained using a TC20 Automated Cell Counter (Bio-Rad). Samples were then resuspended in 1XPBS with 10% BSA at a concentration between 800-3000 per ml. Samples were loaded on the 10X Chromium system and processed as per manufacturer’s instructions (10X Genomics). Single cell libraries were prepared as per the manufacturer’s instructions using the Single Cell 3’ Reagent Kit v2 (10X Genomics). Single cell RNA-seq libraries and barcode amplicons were sequenced on an Illumina HiSeq platform. Single-cell RNA sequencing analysis The Chromium 3’ sequencing libraries were generated using Chromium Single Cell 3’ Chip kit v3 and sequenced with (actually, I don’t know:( what instrument was used?). The Ilumina FASTQ files were used to generate filtered matrices using CellRanger (10X Genomics) with default parameters and imported into R for exploration and statistical analysis using a Seurat package (La Manno et al., 2018). Counts were normalized according to total expression, multiplied by a scale factor (10,000), and log-transformed. For cell cluster identification and visualization, gene expression values were also scaled according to highly variable genes after controlling for unwanted variation generated by sample identity. Cell clusters were identified based on UMAP of the first 14 principal components of PCA using Seurat’s method, Find Clusters, with an original Louvain algorithm and resolution parameter value 0.5. To find cluster marker genes, Seurat’s method, FindAllMarkers. Only genes exhibiting significant (adjusted p-value < 0.05) a minimal average absolute log2-fold change of 0.2 between each of the clusters and the rest of the dataset were considered as differentially expressed. To merge individual datasets and to remove batch effects, Seurat v3 Integration and Label Transfer standard workflow (Stuart et al., 2019)

  5. f

    Tubuloid kidney organoid - single cell RNA-seq

    • figshare.com
    tar
    Updated May 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Javier Perales Patón; Rafael Kramann (2022). Tubuloid kidney organoid - single cell RNA-seq [Dataset]. http://doi.org/10.6084/m9.figshare.11786238.v1
    Explore at:
    tarAvailable download formats
    Dataset updated
    May 16, 2022
    Dataset provided by
    figshare
    Authors
    Javier Perales Patón; Rafael Kramann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    It is included data derived from the processing of single-cell and single-nuclei RNA-seq from several samples (see below). This data corresponds to the input and intermediate output files from https://github.com/saezlab/Xu_tubuloid . Data The data include:

    Binary sparse matrices for the UMI gene expression quantification from cellranger (filtered feature-barcode matrices). These are TAR archive files named with the name of the sample. Seurat Objects with normalized data, embeddings of dimensionality reduction, clustering and cell cluster annotation. These are TAR archive files including final objects, grouped by sample type: SeuratObjects_[SortedCells | Organoids | Human Kidney Tissue]. The HumanKidneyTissue also includes the SeuratObject after Harmony integration. Exported barcode idents from unsupervised clustering and manual annotation ("barcodeIdents*.csv" files). Label transfer via Symphony mapping to tubuloid cells from each organoid to a integrated reference atlas of human kidney tissue (SymphonyMapped*.csv).

    Samples The data corresponds to the following samples, which were profiled at the single-cell resolution:

    CK5 early organoid (Healthy). Organoid generated from CD24+ sorted cells from human adult kidney tissue at an early stage. CK119 late organoid (Healthy). Organoid generated from CD24+ sorted cells from human adult kidney tissue at a late stage.

    JX1 late organoid (Healthy). Organoid generated following Hans Clever's protocol for kidney organoids. JX2 PKD1-KO organoid (PKD). Organoid generated from CD24+ sorted cells from human adult kidney tissue, for which PKD1 was gene-edited to reproduce PKD phenotype, developed at a late stage. JX3 PKD2-KO organoid (PKD). Organoid generated from CD24+ sorted cells from human adult kidney tissue, for which PKD2 was gene-edited to reproduce PKD phenotype, developed at a late stage. CK120 CD13. CD13+ sorted cells from human adult kidney tissue. CK121 CD24. CD24+ sorted cells from human adult kidney tissue.

    In addition, human adult kidney tissue were profiled in the context of ADPKD:

    CK224 : human specimen with ADPKD (PKD2- genotype).

    CK225 : human specimen with ADPKD (PKD1- genotype). ADPKD3: human specimen with ADPKD (ND genotype).

    Control1 : human specimen with healthy tissue. Control2 : human specimen with healthy tissue.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yuge Wang; Xingzhi Sun; Hongyu Zhao (2023). DataSheet1_Benchmarking automated cell type annotation tools for single-cell ATAC-seq data.docx [Dataset]. http://doi.org/10.3389/fgene.2022.1063233.s001

DataSheet1_Benchmarking automated cell type annotation tools for single-cell ATAC-seq data.docx

Related Article
Explore at:
docxAvailable download formats
Dataset updated
Jun 21, 2023
Dataset provided by
Frontiers
Authors
Yuge Wang; Xingzhi Sun; Hongyu Zhao
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

As single-cell chromatin accessibility profiling methods advance, scATAC-seq has become ever more important in the study of candidate regulatory genomic regions and their roles underlying developmental, evolutionary, and disease processes. At the same time, cell type annotation is critical in understanding the cellular composition of complex tissues and identifying potential novel cell types. However, most existing methods that can perform automated cell type annotation are designed to transfer labels from an annotated scRNA-seq data set to another scRNA-seq data set, and it is not clear whether these methods are adaptable to annotate scATAC-seq data. Several methods have been recently proposed for label transfer from scRNA-seq data to scATAC-seq data, but there is a lack of benchmarking study on the performance of these methods. Here, we evaluated the performance of five scATAC-seq annotation methods on both their classification accuracy and scalability using publicly available single-cell datasets from mouse and human tissues including brain, lung, kidney, PBMC, and BMMC. Using the BMMC data as basis, we further investigated the performance of these methods across different data sizes, mislabeling rates, sequencing depths and the number of cell types unique to scATAC-seq. Bridge integration, which is the only method that requires additional multimodal data and does not need gene activity calculation, was overall the best method and robust to changes in data size, mislabeling rate and sequencing depth. Conos was the most time and memory efficient method but performed the worst in terms of prediction accuracy. scJoint tended to assign cells to similar cell types and performed relatively poorly for complex datasets with deep annotations but performed better for datasets only with major label annotations. The performance of scGCN and Seurat v3 was moderate, but scGCN was the most time-consuming method and had the most similar performance to random classifiers for cell types unique to scATAC-seq.

Search
Clear search
Close search
Google apps
Main menu