18 datasets found
  1. f

    Cluster tendency assessment in neuronal spike data

    • plos.figshare.com
    pdf
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sara Mahallati; James C. Bezdek; Milos R. Popovic; Taufik A. Valiante (2023). Cluster tendency assessment in neuronal spike data [Dataset]. http://doi.org/10.1371/journal.pone.0224547
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Sara Mahallati; James C. Bezdek; Milos R. Popovic; Taufik A. Valiante
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sorting spikes from extracellular recording into clusters associated with distinct single units (putative neurons) is a fundamental step in analyzing neuronal populations. Such spike sorting is intrinsically unsupervised, as the number of neurons are not known a priori. Therefor, any spike sorting is an unsupervised learning problem that requires either of the two approaches: specification of a fixed value k for the number of clusters to seek, or generation of candidate partitions for several possible values of c, followed by selection of a best candidate based on various post-clustering validation criteria. In this paper, we investigate the first approach and evaluate the utility of several methods for providing lower dimensional visualization of the cluster structure and on subsequent spike clustering. We also introduce a visualization technique called improved visual assessment of cluster tendency (iVAT) to estimate possible cluster structures in data without the need for dimensionality reduction. Experimental results are conducted on two datasets with ground truth labels. In data with a relatively small number of clusters, iVAT is beneficial in estimating the number of clusters to inform the initialization of clustering algorithms. With larger numbers of clusters, iVAT gives a useful estimate of the coarse cluster structure but sometimes fails to indicate the presumptive number of clusters. We show that noise associated with recording extracellular neuronal potentials can disrupt computational clustering schemes, highlighting the benefit of probabilistic clustering models. Our results show that t-Distributed Stochastic Neighbor Embedding (t-SNE) provides representations of the data that yield more accurate visualization of potential cluster structure to inform the clustering stage. Moreover, The clusters obtained using t-SNE features were more reliable than the clusters obtained using the other methods, which indicates that t-SNE can potentially be used for both visualization and to extract features to be used by any clustering algorithm.

  2. Z

    DrCyZ: Techniques for analyzing and extracting useful information from CyZ.

    • data.niaid.nih.gov
    Updated Jan 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    de Zarzà, I. (2022). DrCyZ: Techniques for analyzing and extracting useful information from CyZ. [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5816857
    Explore at:
    Dataset updated
    Jan 19, 2022
    Dataset provided by
    de Curtò, J.
    de Zarzà, I.
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    DrCyZ: Techniques for analyzing and extracting useful information from CyZ.

    Samples from NASA Perseverance and set of GAN generated synthetic images from Neural Mars.

    Repository: https://github.com/decurtoidiaz/drcyz

    Subset of samples from (includes tools to visualize and analyse the dataset):

    CyZ: MARS Space Exploration Dataset. [https://doi.org/10.5281/zenodo.5655473]

    Images from NASA missions of the celestial body.

    Repository: https://github.com/decurtoidiaz/cyz

    Authors:

    J. de Curtò c@decurto.be

    I. de Zarzà z@dezarza.be

    File Information from DrCyZ-1.1

    • Subset of samples from Perseverance (drcyz/c).
      ∙ png (drcyz/c/png).
        PNG files (5025) selected from NASA Perseverance (CyZ-1.1) after t-SNE and K-means Clustering. 
      ∙ csv (drcyz/c/csv).
        CSV file.
    
    
    • Resized samples from Perseverance (drcyz/c+).
      ∙ png 64x64; 128x128; 256x256; 512x512; 1024x1024 (drcyz/c+/drcyz_64-1024).
        PNG files resized at the corresponding size. 
      ∙ TFRecords 64x64; 128x128; 256x256; 512x512; 1024x1024 (drcyz/c+/tfr_drcyz_64-1024).
        TFRecord resized at the corresponding size to import on Tensorflow.
    
    
    • Synthetic images from Neural Mars generated using Stylegan2-ada (drcyz/drcyz+).
      ∙ png 100; 1000; 10000 (drcyz/drcyz+/drcyz_256_100-10000)
        PNG files subset of 100, 1000 and 10000 at size 256x256.
    
    
    • Network Checkpoint from Stylegan2-ada trained at size 256x256 (drcyz/model_drcyz).
      ∙ network-snapshot-000798-drcyz.pkl
    
    
    • Notebooks in python to analyse the original dataset and reproduce the experiments; K-means Clustering, t-SNE, PCA, synthetic generation using Stylegan2-ada and instance segmentation using Deeplab (https://github.com/decurtoidiaz/drcyz/tree/main/dr_cyz+).
      ∙ clustering_curiosity_de_curto_and_de_zarza.ipynb
        K-means Clustering and PCA(2) with images from Curiosity.
      ∙ clustering_perseverance_de_curto_and_de_zarza.ipynb
        K-means Clustering and PCA(2) with images from Perseverance.
      ∙ tsne_curiosity_de_curto_and_de_zarza.ipynb
        t-SNE and PCA (components selected to explain 99% of variance) with images from Curiosity.
      ∙ tsne_perseverance_de_curto_and_de_zarza.ipynb
        t-SNE and PCA (components selected to explain 99% of variance) with images from Perseverance.
      ∙ Stylegan2-ada_de_curto_and_de_zarza.ipynb
        Stylegan2-ada trained on a subset of images from NASA Perseverance (DrCyZ).
      ∙ statistics_perseverance_de_curto_and_de_zarza.ipynb
        Compute statistics from synthetic samples generated by Stylegan2-ada (DrCyZ) and images from NASA Perseverance (CyZ).
      ∙ DeepLab_TFLite_ADE20k_de_curto_and_de_zarza.ipynb
        Example of instance segmentation using Deeplab with a sample from NASA Perseverance (DrCyZ).
    
  3. f

    Supplementary Table 2: All Genes TSNE-clusters

    • figshare.com
    xlsx
    Updated Feb 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amarjit Saini; Linda Björkhem-Bergman; Johan Boström; Mats Lilja; Michael Melin; lena ekström; Peter Bergman; Mikael Altun; Eric Rullman; Thomas Gustafsson (2019). Supplementary Table 2: All Genes TSNE-clusters [Dataset]. http://doi.org/10.6084/m9.figshare.7732187.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 18, 2019
    Dataset provided by
    figshare
    Authors
    Amarjit Saini; Linda Björkhem-Bergman; Johan Boström; Mats Lilja; Michael Melin; lena ekström; Peter Bergman; Mikael Altun; Eric Rullman; Thomas Gustafsson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplementary Table 2: All Genes TSNE-clusters RNA-seq data

  4. d

    Replication Data for the \"Keratoconus severity identification using...

    • search.dataone.org
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yousefi, Siamak (2023). Replication Data for the \"Keratoconus severity identification using unsupervised machine learning\", Siamak Yousefi, Ebrahim Yousefi, Hidenori Takahashi, Takahiko Hayashi, Hironobu Tampo, Satoru Inoda, Yusuke Arai, and Penny Asbell, PLOS One 2018 [Dataset]. http://doi.org/10.7910/DVN/G2CRMO
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Yousefi, Siamak
    Description

    Dataset and labels for the article of Keratoconus severity identification using unsupervised machine learning by Siamak Yousefi

  5. Additional file 5 of GECO: gene expression clustering optimization app for...

    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A. N. Habowski; T. J. Habowski; M. L. Waterman (2023). Additional file 5 of GECO: gene expression clustering optimization app for non-linear data visualization of patterns [Dataset]. http://doi.org/10.6084/m9.figshare.13642382.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    A. N. Habowski; T. J. Habowski; M. L. Waterman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 5: CSV file of bulk RNA-seq data of F. nucleatum infection time course used for GECO UMAP generation.

  6. f

    Scripts for Analysis

    • figshare.com
    txt
    Updated Jul 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sneddon Lab UCSF (2018). Scripts for Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6783569.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 18, 2018
    Dataset provided by
    figshare
    Authors
    Sneddon Lab UCSF
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.

  7. d

    Data from: Reference transcriptomics of porcine peripheral immune cells...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +3more
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. https://catalog.data.gov/dataset/data-from-reference-transcriptomics-of-porcine-peripheral-immune-cells-created-through-bul-e667c
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows: matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz) *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include: nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().

  8. Additional file 4 of GECO: gene expression clustering optimization app for...

    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A. N. Habowski; T. J. Habowski; M. L. Waterman (2023). Additional file 4 of GECO: gene expression clustering optimization app for non-linear data visualization of patterns [Dataset]. http://doi.org/10.6084/m9.figshare.13642379.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    A. N. Habowski; T. J. Habowski; M. L. Waterman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 4: CSV file of colon crypt bulk RNA-seq data used for GECO UMAP generation.

  9. f

    Table_2_A Novel Computational Framework for Precision Diagnosis and Subtype...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fei Xia; Xiaojun Xie; Zongqin Wang; Shichao Jin; Ke Yan; Zhiwei Ji (2023). Table_2_A Novel Computational Framework for Precision Diagnosis and Subtype Discovery of Plant With Lesion.XLSX [Dataset]. http://doi.org/10.3389/fpls.2021.789630.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    Frontiers
    Authors
    Fei Xia; Xiaojun Xie; Zongqin Wang; Shichao Jin; Ke Yan; Zhiwei Ji
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Plants are often attacked by various pathogens during their growth, which may cause environmental pollution, food shortages, or economic losses in a certain area. Integration of high throughput phenomics data and computer vision (CV) provides a great opportunity to realize plant disease diagnosis in the early stage and uncover the subtype or stage patterns in the disease progression. In this study, we proposed a novel computational framework for plant disease identification and subtype discovery through a deep-embedding image-clustering strategy, Weighted Distance Metric and the t-stochastic neighbor embedding algorithm (WDM-tSNE). To verify the effectiveness, we applied our method on four public datasets of images. The results demonstrated that the newly developed tool is capable of identifying the plant disease and further uncover the underlying subtypes associated with pathogenic resistance. In summary, the current framework provides great clustering performance for the root or leave images of diseased plants with pronounced disease spots or symptoms.

  10. m

    GERDA datasets including NGS and SGA data

    • data.mendeley.com
    Updated Apr 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabian Otte (2023). GERDA datasets including NGS and SGA data [Dataset]. http://doi.org/10.17632/8c4zbxfvwk.3
    Explore at:
    Dataset updated
    Apr 26, 2023
    Authors
    Fabian Otte
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets linked to publication "Revealing viral and cellular dynamics of HIV-1 at the single-cell level during early treatment periods", Otte et al 2023 published in Cell Reports Methods pre-ART (antiretroviral therapy) cryo-conserved and and whole blood specimen were sampled for HIV-1 virus reservoir determination in HIV-1 positive individuals from the Swiss HIV Study Cohort. Patients were monitored for proviral (DNA), poly-A transcripts (RNA), late protein translation (Gag and Envelope reactivation co-detection assay, GERDA) and intact viruses (golden standard: viral outgrowth assay, VOA). In this dataset we deposited the pipeline for the multidimensional data analysis of our newly established GERDA method, using DBScan and tSNE. For further comprehension NGS and Sanger sequencing data were attached as processed and raw data (GenBank).

    Resubmitted to Cell Reports Methods (Jan-2023), accepted in principal (Mar-2023)

    GERDA is a new detection method to decipher the HIV-1 cellular reservoir in blood (tissue or any other specimen). It integrates HIV-1 Gag and Env co-detection along with cellular surface markers to reveal 1) what cells still contain HIV-1 translation competent virus and 2) which marker the respective infected cells express. The phenotypic marker repertoire of the cells allow to make predictions on potential homing and to assess the HIV-1 (tissue) reservoir. All FACS data were acquired on a LSRFortessa BD FACS machine (markers: CCR7, CD45RA, CD28, CD4, CD25, PD1, IntegrinB7, CLA, HIV-1 Env, HIV-1 Gag) Raw FACS data (pre-gated CD4CD3+ T-cells) were arcsin transformed and dimensionally reduced using optsne. Data was further clustered using DBSCAN and either individual clusters were further analyzed for individual marker expression or expression profiles of all relevant clusters were analyzed by heatmaps. Sequences before/after therapy initiation and during viral outgrowth cultures were monitored for individuals P01-46 and P04-56 by Next-generation sequencing (NGS of HIV-1 Envelope V3 loop only) and by Sanger (single genome amplification, SGA)

    data normalization code (by Julian Spagnuolo) FACS normalized data as CSV (XXX_arcsin.csv) OMIQ conText file (_OMIQ-context_XXX) arcsin normalized FACS data after optsne dimension reduction with OMIQ.ai as CSV file (XXXarcsin.csv.csv) R pipeline with codes (XXX_commented.R) P01_46-NGS and Sanger sequences P04_56-NGS and Sanger sequences

  11. f

    clustering and annotation metadata

    • figshare.com
    zip
    Updated Apr 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geoff Stanley (2020). clustering and annotation metadata [Dataset]. http://doi.org/10.6084/m9.figshare.12093471.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 7, 2020
    Dataset provided by
    figshare
    Authors
    Geoff Stanley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    There are 3 files in seurat_results.zip: one containing the principal component values used for dimensionality reduction and clustering of all MSN, one containing the computed tSNE values, and one containing the louvain clusters. The metadata_final.csv file contains the annotated major cell types and subtypes.

  12. f

    Dataset name, reference, dimensions and cell type composition.

    • plos.figshare.com
    xls
    Updated Dec 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuta Hozumi; Guo-Wei Wei (2024). Dataset name, reference, dimensions and cell type composition. [Dataset]. http://doi.org/10.1371/journal.pone.0311791.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 13, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Yuta Hozumi; Guo-Wei Wei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset name, reference, dimensions and cell type composition.

  13. Cell and gene data for testicular single-cell RNA-Seq

    • figshare.com
    xlsx
    Updated Sep 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soeren Lukassen; Elisabeth Bosch; Arif B. Ekici; Andreas Winterpacht (2019). Cell and gene data for testicular single-cell RNA-Seq [Dataset]. http://doi.org/10.6084/m9.figshare.6139469.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Sep 18, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Soeren Lukassen; Elisabeth Bosch; Arif B. Ekici; Andreas Winterpacht
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Tables containing additional information on genes and cell obtained from single-cell RNA-Seq analysis of mouse testis data. SuppTableCells.xlsx contains the 10X Barcode as identifier, the replicate ID, position in the t-SNE plot, UMI and gene count per cell, the proportion of mitochondrial transcripts, cluster ID obtained by k-Means clustering (k=9), inferred cell type, and pseudotime information obtained using monocle and Scrat.SuppTableGenes contains the average expression value, fold-change compared to other cell types, and p-value relative to other cell types for each gene that was expressed in the dataset. Throughout the data, the following abbreviations for cell types are used: Spg=spermatogonia, SC=spermatocytes, RS=round spermatids, ES=elongating spermatids, CS=condensed/condensing spermatids. In cases where several clusters were identified per cell type, the earlier cluster was designated as 1.

  14. f

    Top 10 GO biological processes for BRCA1 coexpressed genes from an EnrichR...

    • figshare.com
    xls
    Updated Oct 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miguel-Angel Cortes-Guzman; Víctor Treviño (2024). Top 10 GO biological processes for BRCA1 coexpressed genes from an EnrichR analysis from top 100 genes coexpressed at the tissue-level or at the system-level. [Dataset]. http://doi.org/10.1371/journal.pone.0309961.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 4, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Miguel-Angel Cortes-Guzman; Víctor Treviño
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Top 10 GO biological processes for BRCA1 coexpressed genes from an EnrichR analysis from top 100 genes coexpressed at the tissue-level or at the system-level.

  15. f

    Cell Atlas of the Xenopus Laevis at Single-Cell Resolution

    • figshare.com
    zip
    Updated Jul 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guoji Guo; Xiaoping Han (2022). Cell Atlas of the Xenopus Laevis at Single-Cell Resolution [Dataset]. http://doi.org/10.6084/m9.figshare.19152839.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 23, 2022
    Dataset provided by
    figshare
    Authors
    Guoji Guo; Xiaoping Han
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    dge_raw_data.zip: contains separate raw count matrices for 17 adult Xenopus tissues (bladder, bone marrow, brain, eye, heart, intestine, kidney, liver, lung, muscle, ovary, oviduct, pancreas, skin, spleen, stomach and testis) and 4 larva stages (St48, St54, St59 and St66) dge_cell_info.zip: contains the cell annotations corresponding to dge_raw_data.zip, including tSNE coordinates, clusters, stages and cell-type annotations.Xenopus_Figure1.h5ad: contains both raw count and normalized datasets (contains only highly variable genes) for 501,358 cells in XCL, which can be processed into the python environment with SCANPY directly.Xenopus_Figure1_cell_info.csv: contains the cell annotations for 501,358 cells in XCL, including tSNE coordinates, clusters, tissue origins, stages and cell-type annotations.Larva_Figure4.h5ad: contains both raw count and normalized datasets (contains only highly variable genes) for 188,020 larva cells in XCL, which can be processed into the python environment with SCANPY directly.Larva_Figure4_cell_info.csv: contains the cell annotations for 188,020 larva cells in XCL, including tSNE coordinates, clusters, stages and cell-type annotations.

  16. f

    Details of overall clustering results.

    • plos.figshare.com
    xls
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahneela Pitafi; Toni Anwar; I Dewa Made Widia; Zubair Sharif; Boonsit Yimwadsana (2024). Details of overall clustering results. [Dataset]. http://doi.org/10.1371/journal.pone.0313890.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Shahneela Pitafi; Toni Anwar; I Dewa Made Widia; Zubair Sharif; Boonsit Yimwadsana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Perimeter Intrusion Detection Systems (PIDS) are crucial for protecting any physical locations by detecting and responding to intrusions around its perimeter. Despite the availability of several PIDS, challenges remain in detection accuracy and precise activity classification. To address these challenges, a new machine learning model is developed. This model utilizes the pre-trained InceptionV3 for feature extraction on PID intrusion image dataset, followed by t-SNE for dimensionality reduction and subsequent clustering. When handling high-dimensional data, the existing Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm faces efficiency issues due to its complexity and varying densities. To overcome these limitations, this research enhances the traditional DBSCAN algorithm. In the enhanced DBSCAN, distances between minimal points are determined using an estimation for the epsilon values with the Manhattan distance formula. The effectiveness of the proposed model is evaluated by comparing it to state-of-the-art techniques found in the literature. The analysis reveals that the proposed model achieved a silhouette score of 0.86, while comparative techniques failed to produce similar results. This research contributes to societal security by improving location perimeter protection, and future researchers can utilize the developed model for human activity recognition from image datasets.

  17. f

    t-SNE embedding illustrates the distribution of cell populations.

    • plos.figshare.com
    zip
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Changhai Long; Biao Ma; Xingshun Zhong; Mingzhi Zou; Kai Li; Sijing Liu (2025). t-SNE embedding illustrates the distribution of cell populations. [Dataset]. http://doi.org/10.1371/journal.pone.0326872.s007
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 26, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Changhai Long; Biao Ma; Xingshun Zhong; Mingzhi Zou; Kai Li; Sijing Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    t-SNE embedding illustrates the distribution of cell populations.

  18. f

    Parameters for the t-distributed Stochastic Neighbor Embedding (t-SNE).

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Doina Bucur (2023). Parameters for the t-distributed Stochastic Neighbor Embedding (t-SNE). [Dataset]. http://doi.org/10.1371/journal.pone.0272270.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Doina Bucur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Parameters for the t-distributed Stochastic Neighbor Embedding (t-SNE).

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sara Mahallati; James C. Bezdek; Milos R. Popovic; Taufik A. Valiante (2023). Cluster tendency assessment in neuronal spike data [Dataset]. http://doi.org/10.1371/journal.pone.0224547

Cluster tendency assessment in neuronal spike data

Explore at:
10 scholarly articles cite this dataset (View in Google Scholar)
pdfAvailable download formats
Dataset updated
Jun 5, 2023
Dataset provided by
PLOS ONE
Authors
Sara Mahallati; James C. Bezdek; Milos R. Popovic; Taufik A. Valiante
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Sorting spikes from extracellular recording into clusters associated with distinct single units (putative neurons) is a fundamental step in analyzing neuronal populations. Such spike sorting is intrinsically unsupervised, as the number of neurons are not known a priori. Therefor, any spike sorting is an unsupervised learning problem that requires either of the two approaches: specification of a fixed value k for the number of clusters to seek, or generation of candidate partitions for several possible values of c, followed by selection of a best candidate based on various post-clustering validation criteria. In this paper, we investigate the first approach and evaluate the utility of several methods for providing lower dimensional visualization of the cluster structure and on subsequent spike clustering. We also introduce a visualization technique called improved visual assessment of cluster tendency (iVAT) to estimate possible cluster structures in data without the need for dimensionality reduction. Experimental results are conducted on two datasets with ground truth labels. In data with a relatively small number of clusters, iVAT is beneficial in estimating the number of clusters to inform the initialization of clustering algorithms. With larger numbers of clusters, iVAT gives a useful estimate of the coarse cluster structure but sometimes fails to indicate the presumptive number of clusters. We show that noise associated with recording extracellular neuronal potentials can disrupt computational clustering schemes, highlighting the benefit of probabilistic clustering models. Our results show that t-Distributed Stochastic Neighbor Embedding (t-SNE) provides representations of the data that yield more accurate visualization of potential cluster structure to inform the clustering stage. Moreover, The clusters obtained using t-SNE features were more reliable than the clusters obtained using the other methods, which indicates that t-SNE can potentially be used for both visualization and to extract features to be used by any clustering algorithm.

Search
Clear search
Close search
Google apps
Main menu