22 datasets found
  1. UMAP plots split by dataset and sample

    • springernature.figshare.com
    zip
    Updated Feb 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziteng Li; Hena Zhang; Qin Li; Yan Li; Zhixiang Hu; Xichun Hu; Xiaodong Zhu; Shenglin Huang (2024). UMAP plots split by dataset and sample [Dataset]. http://doi.org/10.6084/m9.figshare.22300675.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 15, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ziteng Li; Hena Zhang; Qin Li; Yan Li; Zhixiang Hu; Xichun Hu; Xiaodong Zhu; Shenglin Huang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    File “UMAP plots split by dataset and sample” supplied the comparison of UMAP plots at dataset or sample level colored by major cell types.

  2. Processed data of single cell RNA-sequencing of 16 NPM1-mutated Acute...

    • figshare.com
    bin
    Updated Jun 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emin Onur Karakaslar (2025). Processed data of single cell RNA-sequencing of 16 NPM1-mutated Acute Myeloid Leukemia samples [Dataset]. http://doi.org/10.6084/m9.figshare.26189771.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 16, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Emin Onur Karakaslar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TLDRSeurat object of the 16 NPM1-mutated AML samples (n = 83,162 cells).AML samplesAll sixteen peripheral blood and bone marrow samples were obtained from patients with AML at diagnosis (n=15) or relapse after chemotherapy (n=1) with written informed consent according to the Declaration of Helsinki. Mononuclear cells were isolated by Ficoll-Isopaque density gradient centrifugation and cryopreserved in the Leiden University Medical Center (LUMC) Biobank for Hematological Diseases after approval by the LUMC Institutional Review Board (protocol no. B18.047).Upstream processing pipelineCellRanger v7.0.0 was run on all samples with the human reference genome hg38. For all QC Seurat v4 was used15. Our QC pipeline had three steps per sample: 1) soft filtering, 2) low quality cluster removal, and 3) doublet detection. In soft filtering, Seurat objects were created with cells expressing at least 200 genes and with the genes expressed at least in 3 cells. Then, standard Seurat command list with default parameters was run to detect low quality clusters. Clusters with >15% mitochondrial and 15% mitochondrial mRNA. We used standard Seurat commands to scale and normalize the data on integrated features. First 30 principal components were used to create UMAP plots. We used clustree to determine optimal cluster number, based on FindClusters with resolutions sweeping from 0 to 1.2. We chose res=0.5, as clusters became stable. Next, we merged two clusters (CC5 and CC12) into one GMP-like cluster as one of these clusters (CC12) had high expression of HSP-genes yet still retained its cell-type specific properties.Note: The file was processed with Seurat v4 but the object is updated for v5. Uploaded as .qs file format for faster reading. To read the file: qs:qread("path/to/data.qs")This data is available for research use only; and cannot be used for commercial purposes.For further queries please refer to our paper:

  3. D

    Data from: Data related to Panzer: A Machine Learning Based Approach to...

    • darus.uni-stuttgart.de
    Updated Nov 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tim Panzer (2024). Data related to Panzer: A Machine Learning Based Approach to Analyze Supersecondary Structures of Proteins [Dataset]. http://doi.org/10.18419/DARUS-4576
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    DaRUS
    Authors
    Tim Panzer
    License

    https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4576https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4576

    Time period covered
    Nov 1, 1976 - Feb 29, 2024
    Dataset funded by
    DFG
    Description

    This entry contains the data used to implement the bachelor thesis. It was investigated how embeddings can be used to analyze supersecondary structures. Abstract of the thesis: This thesis analyzes the behavior of supersecondary structures in the context of embeddings. For this purpose, data from the Protein Topology Graph Library was provided with embeddings. This resulted in a structured graph database, which will be used for future work and analyses. In addition, different projections were made into the two-dimensional space to analyze how the embeddings behave there. In the Jupyter Notebook 1_data_retrival.ipynb the download process of the graph files from the Protein Topology Graph Library (https://ptgl.uni-frankfurt.de) can be found. The downloaded .gml files can also be found in graph_files.zip. These form graphs that represent the relationships of supersecondary structures in the proteins. These form the data basis for further analyses. These graph files are then processed in the Jupyter Notebook 2_data_storage_and_embeddings.ipynb and entered into a graph database. The sequences of the supersecondary and secondary structures from the PTGL can be found in fastas.zip. The embeddings were also calculated using the ESM model of the Facebook Research Group (huggingface.co/facebook/esm2_t12_35M_UR50D), which can be found in three .h5 files. These are then added there subsequently. The whole process in this notebook serves to build up the database, which can then be searched using Cypher querys. In the Jupyter Notebook 3_data_science.ipynb different visualizations and analyses are then carried out, which were made with the help of UMAP. For the installation of all dependencies, it is recommended to create a Conda environment and then install all packages there. To use the project, PyEED should be installed using the snapshot of the original repository (source repository: https://github.com/PyEED/pyeed). The best way to install PyEED is to execute the pip install -e . command in the pyeed_BT folder. The dependencies can also be installed by using poetry and the .toml file. In addition, seaborn, h5py and umap-learn are required. These can be installed using the following commands: pip install h5py==3.12.1 pip install seaborn==0.13.2 umap-learn==0.5.7

  4. f

    Additional file 6 of Gossypetin ameliorates 5xFAD spatial learning and...

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Oct 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Choi, Yoon Ha; Kim, Jong Kyoung; Jo, Kyung Won; Oh, Eunji; Kim, Somi; Kim, Kyong-Tai; Gon Cha, Dong; Park, Eun Seo; Lee, Dohyun (2022). Additional file 6 of Gossypetin ameliorates 5xFAD spatial learning and memory through enhanced phagocytosis against Aβ [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000395905
    Explore at:
    Dataset updated
    Oct 22, 2022
    Authors
    Choi, Yoon Ha; Kim, Jong Kyoung; Jo, Kyung Won; Oh, Eunji; Kim, Somi; Kim, Kyong-Tai; Gon Cha, Dong; Park, Eun Seo; Lee, Dohyun
    Description

    Additional file 6: Fig.S1 Gossypetin does not affect expression of β-, and γ-secretases and activity of β-secretase. (A to G) Time dependent β-secretase activity of mouse hippocampal lysate was measured with Relative Fluorescence Unit (RFU). Fluorescence excitation and emission wavelength was 335 nm and 495 nm respectively (A). Bar graph of RFU at each time point of 10 min (B), 20 min (C), 30 min (D), 40 min (E), 50 min (F), 60 min (G). (n = 10~12 mice per group) (H to L) Representative images of Western blot analysis for β-, γ-secretase subunits, and GAPDH (H). Bar graphs represent relative protein expression levels of BACE1 (I), Nicastrin (J), APH-1 (K), and PEN2 (L). (n = 12~15 mice per group) (M to P) Bar graphs represent relative mRNA expression level of β-, and γ-secretase subunits bace1 (M), ncstn (N), aph1 (O), pen2 (P). (n = 9~10 mice per group) Error bars represent the mean ± SD, p < 0.05, ns = not significant, two-way ANOVA followed by Tukey’s multiple comparisons test. Fig. S2 Cell type classification of brain samples. (A) UMAP plot showing all cells from the brain samples, colored by their cell types. (B) Heatmap illustrating the Z-scores of average normalized expressions of cell type markers. (C) Violin plots displaying the log-scaled number of detected genes (top), Unique Molecular Identifiers (UMIs) (middle), and the percentage of mitochondrial gene expressions (bottom) per cell for each cell type. (D) UMAP plots showing all cells from the brain samples, colored by their sampled region (left), mouse strain (middle), or drug administration (right) condition. Fig. S3 Detailed subtyping of the microglial population. (A) UMAP plots showing all microglial cells from cortex region. The cells are colored by their celltypes (left). Heatmap showing the Z-scores of average normalized expressions of representative DEGs for each cell type from cortex region (right). (B) UMAP plots showing microglial cells from cortex (left) or hippocampus (right), colored by combination of mouse strain and drug administration condition. (C) UMAP plots illustrating microglial cells from cortex (left) or hippocampus (right), colored by their inferred cell cycle. (D) Bar plots for the fraction of cortex (left) or hippocampus (right) microglial cells by sample conditions, which are the combination of mouse strain and drug administration, for each microglial subtype. Fig. S4 Differential gene expressions between vehicle- and gossypetin-treated microglia. (A) Scatter plot showing GOBP terms that are upregulated or downregulated by5xFAD construction or gossypetin administration for each microglial subtype from cortex. Significant (Fisher’s exact test, P < 0.01) terms associated with antigen presentation are colored by their biological keywords. (B) GSEA plots showing significant (P< 0.05) GOBP terms for gossypetin administration condition against vehicle treatment within 5xFAD homeostatic microglia from hippocampus region. Related to Fig. 3D. (C) Volcano plot illustrating the DEGs selected by the comparison between wild type and 5xFAD(left), or vehicle and gossypetin treated 5xFAD (right) from homeostatic microglial population of cortex region. Fig. S5 Transcriptomic transition in cortex microglia and measurement of DAM signature score. (A) Volcano plot showing significant (p < 0.05) DEGs selected by the comparison between cortex homeostatic microglia in vehicle treated wild type and 5xFAD (top left), or vehicle and gossypetin treated 5xFAD (top right). Volcano plots illustrating comparison between gossypetin administration condition against vehicle treatment within 5xFAD stage 1 DAM (bottom left) or stage 2 DAM (bottom right) from cortex are also presented. (B) Violin plot illustrating module scores for the DAM-related genes from previous studies. Cells are grouped by the combination of their mouse strain and treatment condition. (P < 0.001) Fig. S6 Gossypetin ameliorates gliosis in microglia and astrocytes. (A to D) Representative images of hippocampus (A) and cortex (C) stained with Hoechst and Iba-1. Scale bar corresponds to 200μm. Bar graph represents quantification of Iba-1 positive area in dentate gyrus of hippocampus (n = 9~12 mice per group, 3~6 slices per brain) (B) and cortex (n = 9~12 mice per group, 3~6 slices per brain) (D). (E to H) Representative images of hippocampus (E) and cortex (G) stained with Hoechst and GFAP. Scale bar corresponds to 200μm. Bar graph represents quantification of GFAP positive area in dentate gyrus of hippocampus (n = 9~12 mice per group, 3~6 slices per brain) (F) and cortex (n = 9~12 mice per group, 3~5 slices per brain) (H). The error bars represent the mean ± SEM.**p <0.0001, ***p < 0.001, **p < 0.01, ns = not significant, two-way ANOVA followed by Tukey’s multiple comparisons test (B, D, F and H). Fig. S7 Gossypetin increases Aβ phagocytic capacity and dynamics of BV2 microglial cell line. (A) Representative images of BV2 cells treated with 488-Aβ and stained with Hoechst and Iba-1. Gossypetin (25μM) was pretreated for 24 h before 488-Aβ treatment. Scale bar corresponds to 100μm. (B). Bar graph represents quantification of area of internalized 488-Aβ in BV2 (n= 3 per group, 253~656 cells per sample). (C) Line graph represents quantification of fluorescent area generated by internalized 488-Aβ in BV2 in a time dependent manner (n = 3 per group, 107~347 cells per sample). The error bars represent the mean ± SEM. ****p <0.0001, *p < 0.05, two-way ANOVA followed by Tukey’s multiple comparisons test (C), Student’s t test (B).

  5. h

    sclerobase_data

    • huggingface.co
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natalie Chan (2025). sclerobase_data [Dataset]. https://huggingface.co/datasets/nfc22/sclerobase_data
    Explore at:
    Dataset updated
    Jan 14, 2025
    Authors
    Natalie Chan
    Description

    Monitoring Progression of Scleroderma

      Project Description
    

    This is a website for visualising datasets to study protein expression in Scleroderma patients. The website is able to generate the following plots:

    Correlation Plot Boxplot UMAP plot Volcano plot Violin plot

      Introduction
    

    Scleroderma is an autoimmune disease that can cause thickened areas of skin and connective tissues. To gain a deeper understanding of this condition, analysing the expression of… See the full description on the dataset page: https://huggingface.co/datasets/nfc22/sclerobase_data.

  6. Z

    DCASE2021 UAD-S UMAP Data

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +1more
    Updated Aug 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernandez Rodriguez, Andres; Plumbley, Mark D. (2021). DCASE2021 UAD-S UMAP Data [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_5123023
    Explore at:
    Dataset updated
    Aug 23, 2021
    Dataset provided by
    University of Surrey
    Authors
    Fernandez Rodriguez, Andres; Plumbley, Mark D.
    License

    Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
    License information was derived automatically

    Description

    Support data for our paper:

    USING UMAP TO INSPECT AUDIO DATA FOR UNSUPERVISED ANOMALY DETECTION UNDER DOMAIN-SHIFT CONDITIONS

    ArXiv preprint can be found here. Code for the experiment software pipeline described in the paper can be found here. The pipeline requires and generates different forms of data. Here we provide the following:

    AudioSet_wav_fragments.zip: This is a custom selection of 39437 wav files (32kHz, mono, 10 seconds) randomly extracted from AudioSet (originally released under CC-BY). In addition to this custom subset, the paper also uses the following ones, which can be downloaded at their respective websites:

    DCASE2021 Task 2 Development Dataset

    DCASE2021 Task 2 Additional Training Dataset

    Fraunhofer's IDMT-ISA-ELECTRIC-ENGINE Dataset

    dcase2021_uads_umaps.zip: To compute the UMAPs, first the log-STFT, log-mel and L3 representations must be extracted, and then the UMAPs must be computed. This can take a substantial amount of time and resources. For convenience, we provide here the 72 UMAPs discussed in the paper.

    dcase2021_uads_umap_plots.zip: Also for convenience, we provide here the 198 high-resolution scatter plots rendered from the UMAPs.

    For a comprehensive visual inspection of the computed representations, it is sufficient to download the plots only. Users interested in exploring the plots interactively will need to download all the audio datasets and compute the log-STFT, log-mel and L3 representations as well as the UMAPs themselves (code provided in the GitHub repository). UMAPs for further representations can also be computed and plotted.

  7. Interactive UMAP plot of the Australia recordings.

    • plos.figshare.com
    html
    Updated May 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben Williams; Santiago M. Balvanera; Sarab S. Sethi; Timothy A.C. Lamont; Jamaluddin Jompa; Mochyudho Prasetya; Laura Richardson; Lucille Chapuis; Emma Weschke; Andrew Hoey; Ricardo Beldade; Suzanne C. Mills; Anne Haguenauer; Frederic Zuberer; Stephen D. Simpson; David Curnick; Kate E. Jones (2025). Interactive UMAP plot of the Australia recordings. [Dataset]. http://doi.org/10.1371/journal.pcbi.1013029.s005
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 9, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ben Williams; Santiago M. Balvanera; Sarab S. Sethi; Timothy A.C. Lamont; Jamaluddin Jompa; Mochyudho Prasetya; Laura Richardson; Lucille Chapuis; Emma Weschke; Andrew Hoey; Ricardo Beldade; Suzanne C. Mills; Anne Haguenauer; Frederic Zuberer; Stephen D. Simpson; David Curnick; Kate E. Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Australia
    Description

    Interactive UMAP plot of the Australia recordings.

  8. f

    Interactive UMAP plot of the French Polynesia recordings.

    • figshare.com
    html
    Updated May 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben Williams; Santiago M. Balvanera; Sarab S. Sethi; Timothy A.C. Lamont; Jamaluddin Jompa; Mochyudho Prasetya; Laura Richardson; Lucille Chapuis; Emma Weschke; Andrew Hoey; Ricardo Beldade; Suzanne C. Mills; Anne Haguenauer; Frederic Zuberer; Stephen D. Simpson; David Curnick; Kate E. Jones (2025). Interactive UMAP plot of the French Polynesia recordings. [Dataset]. http://doi.org/10.1371/journal.pcbi.1013029.s006
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 9, 2025
    Dataset provided by
    PLOS Computational Biology
    Authors
    Ben Williams; Santiago M. Balvanera; Sarab S. Sethi; Timothy A.C. Lamont; Jamaluddin Jompa; Mochyudho Prasetya; Laura Richardson; Lucille Chapuis; Emma Weschke; Andrew Hoey; Ricardo Beldade; Suzanne C. Mills; Anne Haguenauer; Frederic Zuberer; Stephen D. Simpson; David Curnick; Kate E. Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    French Polynesia
    Description

    Interactive UMAP plot of the French Polynesia recordings.

  9. n

    Acoustic features as a tool to visualize and explore marine soundscapes:...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Feb 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simone Cominelli; Nicolo' Bellin; Carissa D. Brown; Jack Lawson (2024). Acoustic features as a tool to visualize and explore marine soundscapes: Applications illustrated using marine mammal Passive Acoustic Monitoring datasets [Dataset]. http://doi.org/10.5061/dryad.3bk3j9kn8
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 15, 2024
    Dataset provided by
    Fisheries and Oceans Canada
    Memorial University of Newfoundland
    University of Parma
    Authors
    Simone Cominelli; Nicolo' Bellin; Carissa D. Brown; Jack Lawson
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Passive Acoustic Monitoring (PAM) is emerging as a solution for monitoring species and environmental change over large spatial and temporal scales. However, drawing rigorous conclusions based on acoustic recordings is challenging, as there is no consensus over which approaches, and indices are best suited for characterizing marine and terrestrial acoustic environments. Here, we describe the application of multiple machine-learning techniques to the analysis of a large PAM dataset. We combine pre-trained acoustic classification models (VGGish, NOAA & Google Humpback Whale Detector), dimensionality reduction (UMAP), and balanced random forest algorithms to demonstrate how machine-learned acoustic features capture different aspects of the marine environment. The UMAP dimensions derived from VGGish acoustic features exhibited good performance in separating marine mammal vocalizations according to species and locations. RF models trained on the acoustic features performed well for labelled sounds in the 8 kHz range, however, low and high-frequency sounds could not be classified using this approach. The workflow presented here shows how acoustic feature extraction, visualization, and analysis allow for establishing a link between ecologically relevant information and PAM recordings at multiple scales. The datasets and scripts provided in this repository allow replicating the results presented in the publication. Methods Data acquisition and preparation We collected all records available in the Watkins Marine Mammal Database website listed under the “all cuts'' page. For each audio file in the WMD the associated metadata included a label for the sound sources present in the recording (biological, anthropogenic, and environmental), as well as information related to the location and date of recording. To minimize the presence of unwanted sounds in the samples, we only retained audio files with a single source listed in the metadata. We then labelled the selected audio clips according to taxonomic group (Odontocetae, Mysticetae), and species. We limited the analysis to 12 marine mammal species by discarding data when a species: had less than 60 s of audio available, had a vocal repertoire extending beyond the resolution of the acoustic classification model (VGGish), or was recorded in a single country. To determine if a species was suited for analysis using VGGish, we inspected the Mel-spectrograms of 3-s audio samples and only retained species with vocalizations that could be captured in the Mel-spectrogram (Appendix S1). The vocalizations of species that produce very low frequency, or very high frequency were not captured by the Mel-spectrogram, thus we removed them from the analysis. To ensure that records included the vocalizations of multiple individuals for each species, we only considered species with records from two or more different countries. Lastly, to avoid overrepresentation of sperm whale vocalizations, we excluded 30,000 sperm whale recordings collected in the Dominican Republic. The resulting dataset consisted in 19,682 audio clips with a duration of 960 milliseconds each (0.96 s) (Table 1). The Placentia Bay Database (PBD) includes recordings collected by Fisheries and Oceans Canada in Placentia Bay (Newfoundland, Canada), in 2019. The dataset consisted of two months of continuous recordings (1230 hours), starting on July 1st, 2019, and ending on August 31st 2029. The data was collected using an AMAR G4 hydrophone (sensitivity: -165.02 dB re 1V/µPa at 250 Hz) deployed at 64 m of depth. The hydrophone was set to operate following 15 min cycles, with the first 60 s sampled at 512 kHz, and the remaining 14 min sampled at 64 kHz. For the purpose of this study, we limited the analysis to the 64 kHz recordings. Acoustic feature extraction The audio files from the WMD and PBD databases were used as input for VGGish (Abu-El-Haija et al., 2016; Chung et al., 2018), a CNN developed and trained to perform general acoustic classification. VGGish was trained on the Youtube8M dataset, containing more than two million user-labelled audio-video files. Rather than focusing on the final output of the model (i.e., the assigned labels), here the model was used as a feature extractor (Sethi et al., 2020). VGGish converts audio input into a semantically meaningful vector consisting of 128 features. The model returns features at multiple resolution: ~1 s (960 ms); ~5 s (4800 ms); ~1 min (59’520 ms); ~5 min (299’520 ms). All of the visualizations and results pertaining to the WMD were prepared using the finest feature resolution of ~1 s. The visualizations and results pertaining to the PBD were prepared using the ~5 s features for the humpback whale detection example, and were then averaged to an interval of 30 min in order to match the temporal resolution of the environmental measures available for the area. UMAP ordination and visualization UMAP is a non-linear dimensionality reduction algorithm based on the concept of topological data analysis which, unlike other dimensionality reduction techniques (e.g., tSNE), preserves both the local and global structure of multivariate datasets (McInnes et al., 2018). To allow for data visualization and to reduce the 128 features to two dimensions for further analysis, we applied Uniform Manifold Approximation and Projection (UMAP) to both datasets and inspected the resulting plots. The UMAP algorithm generates a low-dimensional representation of a multivariate dataset while maintaining the relationships between points in the global dataset structure (i.e., the 128 features extracted from VGGish). Each point in a UMAP plot in this paper represents an audio sample with duration of ~ 1 second (WMD dataset), ~ 5 seconds (PBD dataset, humpback whale detections), or 30 minutes (PBD dataset, environmental variables). Each point in the two-dimensional UMAP space also represents a vector of 128 VGGish features. The nearer two points are in the plot space, the nearer the two points are in the 128-dimensional space, and thus the distance between two points in UMAP reflects the degree of similarity between two audio samples in our datasets. Areas with a high density of samples in UMAP space should, therefore, contain sounds with similar characteristics, and such similarity should decrease with increasing point distance. Previous studies illustrated how VGGish and UMAP can be applied to the analysis of terrestrial acoustic datasets (Heath et al., 2021; Sethi et al., 2020). The visualizations and classification trials presented here illustrate how the two techniques (VGGish and UMAP) can be used together for marine ecoacoustics analysis. UMAP visualizations were prepared the umap-learn package for Python programming language (version 3.10). All UMAP visualizations presented in this study were generated using the algorithm’s default parameters.
    Labelling sound sources The labels for the WMD records (i.e., taxonomic group, species, location) were obtained from the database metadata. For the PBD recordings, we obtained measures of wind speed, surface temperature, and current speed from (Fig 1) an oceanographic buy located in proximity of the recorder. We choose these three variables for their different contributions to background noise in marine environments. Wind speed contributes to underwater background noise at multiple frequencies, ranging 500 Hz to 20 kHz (Hildebrand et al., 2021). Sea surface temperature contributes to background noise at frequencies between 63 Hz and 125 Hz (Ainslie et al., 2021), while ocean currents contribute to ambient noise at frequencies below 50 Hz (Han et al., 2021) Prior to analysis, we categorized the environmental variables and assigned the categories as labels to the acoustic features (Table 2). Humpback whale vocalizations in the PBD recordings were processed using the humpback whale acoustic detector created by NOAA and Google (Allen et al., 2021), providing a model score for every ~5 s sample. This model was trained on a large dataset (14 years and 13 locations) using humpback whale recordings annotated by experts (Allen et al., 2021). The model returns scores ranging from 0 to 1 indicating the confidence in the predicted humpback whale presence. We used the results of this detection model to label the PBD samples according to presence of humpback whale vocalizations. To verify the model results, we inspected all audio files that contained a 5 s sample with a model score higher than 0.9 for the month of July. If the presence of a humpback whale was confirmed, we labelled the segment as a model detection. We labelled any additional humpback whale vocalization present in the inspected audio files as a visual detection, while we labelled other sources and background noise samples as absences. In total, we labelled 4.6 hours of recordings. We reserved the recordings collected in August to test the precision of the final predictive model. Label prediction performance We used Balanced Random Forest models (BRF) provided in the imbalanced-learn python package (Lemaître et al., 2017) to predict humpback whale presence and environmental conditions from the acoustic features generated by VGGish. We choose BRF as the algorithm as it is suited for datasets characterized by class imbalance. The BRF algorithm performs under sampling of the majority class prior to prediction, allowing to overcome class imbalance (Lemaître et al., 2017). For each model run, the PBD dataset was split into training (80%) and testing (20%) sets. The training datasets were used to fine-tune the models though a nested k-fold cross validation approach with ten-folds in the outer loop, and five-folds in the inner loop. We selected nested cross validation as it allows optimizing model hyperparameters and performing model evaluation in a single step. We used the default parameters of the BRF algorithm, except for the ‘n_estimators’ hyperparameter, for which we tested

  10. p

    Human Protein Atlas - Single cell type

    • proteinatlas.org
    • v21.proteinatlas.org
    Updated Nov 19, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Human Protein Atlas - Single cell type [Dataset]. http://www.proteinatlas.org/ENSG00000163254-CRYGC/celltype/colon
    Explore at:
    Dataset updated
    Nov 19, 2020
    License

    https://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence

    Description

    This section contains Single Cell Type information based on single cell RNA sequencing (scRNAseq) data from 25 human tissues and peripheral blood mononuclear cells (PBMCs), together with in-house generated immunohistochemically stained tissue sections visualizing the corresponding spatial protein expression patterns. The scRNAseq analysis was based on publicly available genome-wide expression data and comprises all protein-coding genes in 444 individual cell type clusters corresponding to 15 different cell type groups. A specificity and distribution classification was performed to determine the number of genes elevated in these single cell types, and the number of genes detected in one, several or all cell types, respectively. The genes expressed in each of the cell types can be explored in interactive UMAP plots and bar charts, with links to corresponding immunohistochemical stainings in human tissues.
    More information about the specific content and the generation and analysis of the data in the section can be found on the Methods Summary. Learn about:

    mRNA and protein expression in single cell types if a gene is enriched in a particular cell type (specificity) which genes have a similar expression profile across cell types (expression cluster)

  11. Interactive plot colored by EDSS.

    • figshare.com
    html
    Updated Jun 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandre Bois; Brian Tervil; Albane Moreau; Aliénor Vienne-Jumeau; Damien Ricard; Laurent Oudre (2023). Interactive plot colored by EDSS. [Dataset]. http://doi.org/10.1371/journal.pone.0268475.s002
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alexandre Bois; Brian Tervil; Albane Moreau; Aliénor Vienne-Jumeau; Damien Ricard; Laurent Oudre
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Interactive version of the UMAP plot from Fig 8. (HTML)

  12. Joint embedding of vertebrate brain single-cell RNA-Seq using sequence or...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Aug 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sun, Dennis (2023). Joint embedding of vertebrate brain single-cell RNA-Seq using sequence or structure [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7838975
    Explore at:
    Dataset updated
    Aug 18, 2023
    Dataset provided by
    Arcadia Science LLC
    Authors
    Sun, Dennis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Embeddings of single-cell RNA-Seq data from three adult vertebrate brain datasets into Orthogroup feature space or Structural cluster feature space. Orthogroups were generated using OrthoFinder v5.5.0; Structural clusters were assigned by using FoldSeek to cluster AlphaFold-v4 structural predictions.

    The three datasets used as the basis for these embeddings were:

    sample "Brain8" from the Jiang et al. 2021 zebrafish cell atlas (files beginning with GSM3768152)

    sample "Brain1" from the Han et al. 2018 mouse cell atlas (files beginning with GSM2906405)

    sample "Xenopus_brain_COL65" from the Liao et al. 2022 Xenopus laevis adult cell atlas (files beginning with GSM6214268)

    For each dataset, we also generated a standardized cell type annotation file based on the author's originally provided cell type annotation data. The first column is the cell barcode for that species and the second column is the original study's cell type annotation for that cell.

    For the Xenopus brain data, we removed around ~18k cells that were not annotated in the original data to simplify data analyses - these are reflected in the files with the "subsampled" suffix. Subsampled versions of the data are also available for the joint embedding space (prefixed with "DrerMmusXlae").

    For the final datasets used in our analyses, we also provide features x cell matrices as .h5ad files for smaller file sizes and faster loading using Scanpy.

    For visualizing our UMAP plots of our top200 embedding space, we provide ".tsv" files with a variety of metrics and the x and y positions of each cell in the UMAP. See "DrerMmusXlae_adultbrain_FoldSeek_plotlydata.tsv" and "DrerMmusXlae_adultbrain_OrthoFinder_plotlydata.tsv"

    These data are part of the Arcadia Science Pub titled "Comparing gene expression across species based on protein structure instead of sequence".

  13. p

    Human Protein Atlas - Celltype Atlas

    • v20.proteinatlas.org
    Updated Nov 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Human Protein Atlas - Celltype Atlas [Dataset]. https://v20.proteinatlas.org/ENSG00000213626-LBH/celltype
    Explore at:
    Dataset updated
    Nov 19, 2020
    License

    https://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence

    Description

    The Single Cell Type Atlas contains single cell RNA sequencing (scRNAseq) data from 13 different human tissues, together with in-house generated immunohistochemically stained tissue sections visualizing the corresponding spatial protein expression patterns. The scRNAseq analysis was based on publicly available genome-wide expression data and comprises all protein-coding genes in 192 individual cell type clusters corresponding to 12 different cell type groups. A specificity and distribution classification was performed to determine the number of genes elevated in these single cell types, and the number of genes detected in one, several or all cell types, respectively. The genes expressed in each of the cell types can be explored in interactive UMAP plots and bar charts, with links to corresponding immunohistochemical stainings in human tissues.

  14. f

    fnsys-16-975989_Dimensionality reduction and recurrence analysis reveal...

    • frontiersin.figshare.com
    xml
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miguel Serrano-Reyes; Jesús Esteban Pérez-Ortega; Brisa García-Vilchis; Antonio Laville; Aidán Ortega; Elvira Galarraga; Jose Bargas (2023). fnsys-16-975989_Dimensionality reduction and recurrence analysis reveal hidden structures of striatal pathological states.xmltable [Dataset]. http://doi.org/10.3389/fnsys.2022.975989.s001
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Frontiers
    Authors
    Miguel Serrano-Reyes; Jesús Esteban Pérez-Ortega; Brisa García-Vilchis; Antonio Laville; Aidán Ortega; Elvira Galarraga; Jose Bargas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A pipeline is proposed here to describe different features to study brain microcircuits on a histological scale using multi-scale analyses, including the uniform manifold approximation and projection (UMAP) dimensional reduction technique and modularity algorithm to identify neuronal ensembles, Runs tests to show significant ensembles activation, graph theory to show trajectories between ensembles, and recurrence analyses to describe how regular or chaotic ensembles dynamics are. The data set includes ex-vivo NMDA-activated striatal tissue in control conditions as well as experimental models of disease states: decorticated, dopamine depleted, and L-DOPA-induced dyskinetic rodent samples. The goal was to separate neuronal ensembles that have correlated activity patterns. The pipeline allows for the demonstration of differences between disease states in a brain slice. First, the ensembles were projected in distinctive locations in the UMAP space. Second, graphs revealed functional connectivity between neurons comprising neuronal ensembles. Third, the Runs test detected significant peaks of coactivity within neuronal ensembles. Fourth, significant peaks of coactivity were used to show activity transitions between ensembles, revealing recurrent temporal sequences between them. Fifth, recurrence analysis shows how deterministic, chaotic, or recurrent these circuits are. We found that all revealed circuits had recurrent activity except for the decorticated circuits, which tended to be divergent and chaotic. The Parkinsonian circuits exhibit fewer transitions, becoming rigid and deterministic, exhibiting a predominant temporal sequence that disrupts transitions found in the controls, thus resembling the clinical signs of rigidity and paucity of movements. Dyskinetic circuits display a higher recurrence rate between neuronal ensembles transitions, paralleling clinical findings: enhancement in involuntary movements. These findings confirm that looking at neuronal circuits at the histological scale, recording dozens of neurons simultaneously, can show clear differences between control and diseased striatal states: “fingerprints” of the disease states. Therefore, the present analysis is coherent with previous ones of striatal disease states, showing that data obtained from the tissue are robust. At the same time, it adds heuristic ways to interpret circuitry activity in different states.

  15. GAMMA: Galactic Attributes of Mass, Metallicity, and Age Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Nov 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ufuk Çakır; Ufuk Çakır (2023). GAMMA: Galactic Attributes of Mass, Metallicity, and Age Dataset [Dataset]. http://doi.org/10.5281/zenodo.8375344
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 3, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ufuk Çakır; Ufuk Çakır
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce the GAMMA (Galactic Attributes of Mass, Metallicity, and Age) dataset, a comprehensive collection of galaxy data tailored for Machine Learning applications. This dataset offers detailed 2D maps and 3D cubes of 11 727 galaxies, capturing essential attributes: stellar age, metallicity, and mass.

    Together with the dataset we publish our code to extract any other stellar or gaseous property from the raw simulation suite to extend the dataset beyond these initial properties, ensuring versatility for various computational tasks. Ideal for feature extraction, clustering, and regression tasks, GAMMA offers a unique lens for exploring galactic structures through computational methods and is a bridge between astrophysical simulations and the field of scientific machine learning (ML).

    As a first benchmark, we apply Principal Component Analysis (PCA) on this dataset. We find that PCA effectively captures the key morphological features of galaxies with a small number of components. We achieve a dimensionality reduction by a factor of ∼200 (∼3650) for 2D images (3D cubes) with a reconstruction accuracy below 5%.

    We calculate UMAP (Uniform Manifold Approximation and Projection for Dimension Reduction) on the lower dimensional PCA scores of the 2D images to visualize the image space. An interactive version of this plot can be accessed using an online Dashboard (hover over a point to see the galaxy image and the IllustrisTNG Subhalo ID).

    All the code to generate this dataset and load the data structure is publicly available on GitHub, with an additional documentation page hosted on ReadTheDocs.

  16. R-script for single-cell RNA-seq data analysis

    • figshare.com
    Updated Oct 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alaullah Sheikh (2025). R-script for single-cell RNA-seq data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.30307015.v1
    Explore at:
    Dataset updated
    Oct 24, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Alaullah Sheikh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R script for single-cell RNA-seq data analysis. The code includes steps for quality control, normalization using SCTransform, dimensional reduction (PCA and UMAP), clustering, differential gene expression analysis, and visualization of marker genes. Integration workflows were performed to combine control and LT-treated organoid datasets, followed by annotation of epithelial subtypes based on established marker genes. Additional scripts generate figures such as UMAP projections, heatmaps, dot plots, and violin plots.

  17. Additional file 1 of Choice of pre-processing pipeline influences clustering...

    • springernature.figshare.com
    zip
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inbal Shainer; Manuel Stemmer (2023). Additional file 1 of Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets [Dataset]. http://doi.org/10.6084/m9.figshare.16620628.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Inbal Shainer; Manuel Stemmer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 1: Fig. S1 Total gene detection of all datasets compared after processing with either kallisto or Cell Ranger. The Venn diagrams show commonly detected number of genes by both pipelines and uniquely detected genes. Fig. S2 Violin-plots showing distribution of gene and UMI detection per cell of all the analyzed datasets (Table 1) run with the Cell Ranger pipeline. Fig. S3 Violin-plots showing distribution of gene and UMI detection per cell of all the analyzed datasets (Table 1) run with the kallisto pipeline. Fig. S4 Cell counts of all datasets compared after processing with either kallisto forced or Cell Ranger. The Venn diagrams show commonly detected cell barcodes by both pipelines and uniquely detected cell barcodes. Fig. S5 Alignment results of all datasets (Table 1) run with either Cell Ranger or kallisto forced against Ensembl reference. a Percent alignment rates of all reads against the reference transcriptome. b Total gene detection. c Median gene counts over all cells per dataset. d Median UMI counts over all cells per dataset. e Total cell counts of each dataset. Fig. S6 Total gene detection of all datasets compared after processing with either kallisto forced or Cell Ranger. The Venn diagrams show commonly detected number of genes by both pipelines and uniquely detected genes. Fig. S7 Violin-plots showing distribution of gene and UMI detection per cell of all the analyzed datasets (Table 1) run with the kallisto forced pipeline. Fig. S8 Violin-plots showing distribution of gene and UMI detection per cell of the dr_pineal_s2 dataset after additional filtering for downstream analysis. Run with either Cell Ranger (a), kallisto (b) or kallisto forced (c). Fig. S9 Downstream analysis of dr_pineal_s2 before cluster merging. a 2D visualization using UMAP of Cell Ranger analyzed clusters before merging, with resolution equal to 0.9. Each point represents a single cell, colored according to cell type. The cells were clustered into 21 types. b Expression profile of marker genes according to cluster [7] of (a). Clusters 0, 1, 8 and 18 are all rod-like PhRs subclusters. They expressed rod-like PhR markers (exorh, gant1, gngt1), but the expression levels differed and resulted in their separation. For simplicity, they were merged and referred as a single rod-like PhRs cluster in the main text. Similarly, cluster 7 and 12 were merged into a single Müller-glia like cluster, clusters 2, 5, 16 were merged into a single RPE-like cluster, clusters 3 and 10 were merged into a single habenula kiss1 cluster and cluster 11 and 19 were merged into a single leukocytes cluster. c. 2D visualization using UMAP of Cell Ranger analyzed clusters, with resolution equal to 2. The cells were clustered into 31 types. However, the two different cone-like PhR cell types are still not distinguished from one another. d Expression profile of marker genes according to cluster of (c). e 2D visualization using UMAP of kallisto analyzed dr_pineal_s2 clusters before merging, with resolution equal to 0.9. The cells were clustered into 24 types. f Expression profile of marker genes according to cluster of (c). Similar to the descried above, clusters 1, 2, 3, 7 and 21 were merged into a single rod-like PhRs cluster, clusters 0, 9, 17 were merged into a single RPE-like cluster, clusters 11 and 12 were merged into a single Müller-glia like cluster, clusters 4, 5 and 20 were merged into a single habenula kiss1 cluster and clusters 13 and 22 were merged into a single leukocytes cluster. g 2D visualization using UMAP of kallisto forced analyzed dr_pineal_s2 clusters, with resolution equal to 1.2. The cells were clustered into 27 types. h Expression profile of marker genes according to cluster of (g). The col14a1b gene was only detected in the kallisto and kallisto forced datasets and is the strongest DE marker within the red cone-like cluster (f, h). Fig. S10 Heatmap of genes with higher counts in kallisto pre-processed pineal data. All the UMI counts for both kallisto and Cell Ranger were summed, and the diff_ratio value was calculated ( kallisto _ counts − CellRanger _ counts kallisto _ counts + CellRanger _ counts \(\frac{\left( kallisto\_ counts- CellRanger\_ counts\right)}{\left( kallisto\_ counts+ CellRanger\_ counts\right)}\) ) for each gene (Additional file 1: Fig. 10). The top 80 diff_ratio genes, as well as the top 20 genes uniquely identified in kallisto were plotted according to the average scaled expression per cluster. Fig. S11 Heatmap of genes with higher counts in Cell Ranger pre-processed pineal data. All the UMI counts for both kallisto and Cell Ranger were summed, and the diff_ratio value was calculated ( kallisto _ counts − CellRanger _ counts kallisto _ counts + CellRanger _ counts \(\frac{\left( kallisto\_ counts- CellRanger\_ counts\right)}{\left( kallisto\_ counts+ CellRanger\_ counts\right)}\) ) for each gene (Additional file 1: Fig. S11). The top 80 diff_ratio genes, as well as the top 20 genes uniquely identified in Cell Ranger were plotted according to the average scaled expression per cluster.

  18. Structural Classification of PFAS using Molecular Fingerprints and Graph...

    • figshare.com
    zip
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arpan Mukherjee (2025). Structural Classification of PFAS using Molecular Fingerprints and Graph Networks [Dataset]. http://doi.org/10.6084/m9.figshare.29128016.v6
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Arpan Mukherjee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This project provides a network-based structural classification resource for 13,028 per- and polyfluoroalkyl substances (PFAS) from the U.S. EPA’s 2024 PFAS8a7v3 list. Using eight molecular fingerprinting methods and K-Nearest Neighbor Graphs (K-NNGs), we assign Proximity Classes to 288 previously unclassified PFAS compounds. The dataset includes Proximity Class outputs, UMAP coordinates, edge lists, and interactive 3D network visualizations across multiple neighborhood sizes. All files are designed for reuse in regulatory prioritization, chemical substitution, and machine learning applications.

  19. Comparison of machine-learning methods by different measurements for CyTOF...

    • plos.figshare.com
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lijun Cheng; Pratik Karkhanis; Birkan Gokbag; Yueze Liu; Lang Li (2023). Comparison of machine-learning methods by different measurements for CyTOF Dataset 1 (13 biomarkers, 24 labeled cell types). [Dataset]. http://doi.org/10.1371/journal.pcbi.1008885.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lijun Cheng; Pratik Karkhanis; Birkan Gokbag; Yueze Liu; Lang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of machine-learning methods by different measurements for CyTOF Dataset 1 (13 biomarkers, 24 labeled cell types).

  20. Comparison of methods for averaging performance in the identification of...

    • plos.figshare.com
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lijun Cheng; Pratik Karkhanis; Birkan Gokbag; Yueze Liu; Lang Li (2023). Comparison of methods for averaging performance in the identification of known cell types in training and testing data by different measurements for CyTOF1 and CyTOF2 datasets. [Dataset]. http://doi.org/10.1371/journal.pcbi.1008885.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lijun Cheng; Pratik Karkhanis; Birkan Gokbag; Yueze Liu; Lang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of methods for averaging performance in the identification of known cell types in training and testing data by different measurements for CyTOF1 and CyTOF2 datasets.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ziteng Li; Hena Zhang; Qin Li; Yan Li; Zhixiang Hu; Xichun Hu; Xiaodong Zhu; Shenglin Huang (2024). UMAP plots split by dataset and sample [Dataset]. http://doi.org/10.6084/m9.figshare.22300675.v1
Organization logoOrganization logo

UMAP plots split by dataset and sample

Explore at:
zipAvailable download formats
Dataset updated
Feb 15, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ziteng Li; Hena Zhang; Qin Li; Yan Li; Zhixiang Hu; Xichun Hu; Xiaodong Zhu; Shenglin Huang
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

File “UMAP plots split by dataset and sample” supplied the comparison of UMAP plots at dataset or sample level colored by major cell types.

Search
Clear search
Close search
Google apps
Main menu