25 datasets found
  1. f

    UMAP plots split by dataset and sample

    • springernature.figshare.com
    zip
    Updated Feb 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziteng Li; Hena Zhang; Qin Li; Yan Li; Zhixiang Hu; Xichun Hu; Xiaodong Zhu; Shenglin Huang (2024). UMAP plots split by dataset and sample [Dataset]. http://doi.org/10.6084/m9.figshare.22300675.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 15, 2024
    Dataset provided by
    figshare
    Authors
    Ziteng Li; Hena Zhang; Qin Li; Yan Li; Zhixiang Hu; Xichun Hu; Xiaodong Zhu; Shenglin Huang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    File “UMAP plots split by dataset and sample” supplied the comparison of UMAP plots at dataset or sample level colored by major cell types.

  2. D

    Data from: Data related to Panzer: A Machine Learning Based Approach to...

    • darus.uni-stuttgart.de
    Updated Nov 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tim Panzer (2024). Data related to Panzer: A Machine Learning Based Approach to Analyze Supersecondary Structures of Proteins [Dataset]. http://doi.org/10.18419/DARUS-4576
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    DaRUS
    Authors
    Tim Panzer
    License

    https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4576https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4576

    Time period covered
    Nov 1, 1976 - Feb 29, 2024
    Dataset funded by
    DFG
    Description

    This entry contains the data used to implement the bachelor thesis. It was investigated how embeddings can be used to analyze supersecondary structures. Abstract of the thesis: This thesis analyzes the behavior of supersecondary structures in the context of embeddings. For this purpose, data from the Protein Topology Graph Library was provided with embeddings. This resulted in a structured graph database, which will be used for future work and analyses. In addition, different projections were made into the two-dimensional space to analyze how the embeddings behave there. In the Jupyter Notebook 1_data_retrival.ipynb the download process of the graph files from the Protein Topology Graph Library (https://ptgl.uni-frankfurt.de) can be found. The downloaded .gml files can also be found in graph_files.zip. These form graphs that represent the relationships of supersecondary structures in the proteins. These form the data basis for further analyses. These graph files are then processed in the Jupyter Notebook 2_data_storage_and_embeddings.ipynb and entered into a graph database. The sequences of the supersecondary and secondary structures from the PTGL can be found in fastas.zip. The embeddings were also calculated using the ESM model of the Facebook Research Group (huggingface.co/facebook/esm2_t12_35M_UR50D), which can be found in three .h5 files. These are then added there subsequently. The whole process in this notebook serves to build up the database, which can then be searched using Cypher querys. In the Jupyter Notebook 3_data_science.ipynb different visualizations and analyses are then carried out, which were made with the help of UMAP. For the installation of all dependencies, it is recommended to create a Conda environment and then install all packages there. To use the project, PyEED should be installed using the snapshot of the original repository (source repository: https://github.com/PyEED/pyeed). The best way to install PyEED is to execute the pip install -e . command in the pyeed_BT folder. The dependencies can also be installed by using poetry and the .toml file. In addition, seaborn, h5py and umap-learn are required. These can be installed using the following commands: pip install h5py==3.12.1 pip install seaborn==0.13.2 umap-learn==0.5.7

  3. Z

    DCASE2021 UAD-S UMAP Data

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Plumbley, Mark D. (2021). DCASE2021 UAD-S UMAP Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5123023
    Explore at:
    Dataset updated
    Aug 23, 2021
    Dataset provided by
    Plumbley, Mark D.
    Fernandez Rodriguez, Andres
    License

    Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
    License information was derived automatically

    Description

    Support data for our paper:

    USING UMAP TO INSPECT AUDIO DATA FOR UNSUPERVISED ANOMALY DETECTION UNDER DOMAIN-SHIFT CONDITIONS

    ArXiv preprint can be found here. Code for the experiment software pipeline described in the paper can be found here. The pipeline requires and generates different forms of data. Here we provide the following:

    AudioSet_wav_fragments.zip: This is a custom selection of 39437 wav files (32kHz, mono, 10 seconds) randomly extracted from AudioSet (originally released under CC-BY). In addition to this custom subset, the paper also uses the following ones, which can be downloaded at their respective websites:

    DCASE2021 Task 2 Development Dataset

    DCASE2021 Task 2 Additional Training Dataset

    Fraunhofer's IDMT-ISA-ELECTRIC-ENGINE Dataset

    dcase2021_uads_umaps.zip: To compute the UMAPs, first the log-STFT, log-mel and L3 representations must be extracted, and then the UMAPs must be computed. This can take a substantial amount of time and resources. For convenience, we provide here the 72 UMAPs discussed in the paper.

    dcase2021_uads_umap_plots.zip: Also for convenience, we provide here the 198 high-resolution scatter plots rendered from the UMAPs.

    For a comprehensive visual inspection of the computed representations, it is sufficient to download the plots only. Users interested in exploring the plots interactively will need to download all the audio datasets and compute the log-STFT, log-mel and L3 representations as well as the UMAPs themselves (code provided in the GitHub repository). UMAPs for further representations can also be computed and plotted.

  4. m

    Data from: A multiplex single-cell RNA-Seq pharmacotranscriptomics pipeline...

    • data.mendeley.com
    Updated Oct 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alice Dini (2024). A multiplex single-cell RNA-Seq pharmacotranscriptomics pipeline for drug discovery [Dataset]. http://doi.org/10.17632/j9j4mdm9yr.1
    Explore at:
    Dataset updated
    Oct 22, 2024
    Authors
    Alice Dini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We developed a single-cell transcriptomics pipeline for high-throughput pharmacotranscriptomic screening. We explored the transcriptional landscape of three HGSOC models (JHOS2, a representative cell line; PDC2 and PDC3, two patient-derived samples) after treating their cells for 24 hours with 45 drugs representing 13 distinct classes of mechanism of action. Our work establishes a new precision oncology framework for the study of molecular mechanisms activated by a broad array of drug responses in cancer. . ├── 3D UMAPs/ → Interactive 3D UMAPs of cells treated with the 45 drugs used for multiplexed scRNA-seq. Related to Figure 4. Coordinates: x = UMAP 1; y = UMAP 2; z = UMAP 3. Legend: green = PDC1; blue = PDC2; red = JHOS2. │ ├── DMSO_3D_UMAP_Dini.et.al.html → 3D UMAP of untreated cells. │ └── drug_3D_UMAP_Dini.et.al.html → 3D UMAP of cells treated with (drug). ├── QC_plots/ → Diagnostic plots. Related to Figures 2–4. │ ├── model_QC_violin_plot_2023.pdf → Violin plots of the QC metrics used to filter the data. │ ├── model_col_HTO or model_row_HTO before and after filt → Heatmaps of the row or column HTO expression in each cell. │ └── model_counts_histogram_2023.pdf → Histogram of the distribution of the total counts per cell after filtering for high-quality cells. ├── scRNAseq/ → scRNA-seq data. Related to Figures 2–4. │ ├── AllData_subsampled_DGE_edgeR.csv.gz → Differential gene expression analyses results between treated and untreated cells via pseudobulk of aggregate subsamples, for each of the three models. Related to Figure 3. │ └── All_vs_all_RNAclusters_DEG_signif.txt → Differential gene expression analysis results (p.adj < 0.05) of FindAllMarkers for the Leiden/RNA clusters. ├── PDCs.transcript.counts.tsv → Bulk RNA-seq count data for PDCs 1–3 processed by Kallisto. Related to Figure S6. └── PDCs.transcript.TPM.tsv → Bulk RNA-seq TPM data for PDCs 1–3 processed by Kallisto. Related to Figure S6.

  5. f

    Data_Sheet_1_Manifold learning for fMRI time-varying functional...

    • frontiersin.figshare.com
    docx
    Updated Jul 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Javier Gonzalez-Castillo; Isabel S. Fernandez; Ka Chun Lam; Daniel A. Handwerker; Francisco Pereira; Peter A. Bandettini (2023). Data_Sheet_1_Manifold learning for fMRI time-varying functional connectivity.docx [Dataset]. http://doi.org/10.3389/fnhum.2023.1134012.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 11, 2023
    Dataset provided by
    Frontiers
    Authors
    Javier Gonzalez-Castillo; Isabel S. Fernandez; Ka Chun Lam; Daniel A. Handwerker; Francisco Pereira; Peter A. Bandettini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Whole-brain functional connectivity (FC) measured with functional MRI (fMRI) evolves over time in meaningful ways at temporal scales going from years (e.g., development) to seconds [e.g., within-scan time-varying FC (tvFC)]. Yet, our ability to explore tvFC is severely constrained by its large dimensionality (several thousands). To overcome this difficulty, researchers often seek to generate low dimensional representations (e.g., 2D and 3D scatter plots) hoping those will retain important aspects of the data (e.g., relationships to behavior and disease progression). Limited prior empirical work suggests that manifold learning techniques (MLTs)—namely those seeking to infer a low dimensional non-linear surface (i.e., the manifold) where most of the data lies—are good candidates for accomplishing this task. Here we explore this possibility in detail. First, we discuss why one should expect tvFC data to lie on a low dimensional manifold. Second, we estimate what is the intrinsic dimension (ID; i.e., minimum number of latent dimensions) of tvFC data manifolds. Third, we describe the inner workings of three state-of-the-art MLTs: Laplacian Eigenmaps (LEs), T-distributed Stochastic Neighbor Embedding (T-SNE), and Uniform Manifold Approximation and Projection (UMAP). For each method, we empirically evaluate its ability to generate neuro-biologically meaningful representations of tvFC data, as well as their robustness against hyper-parameter selection. Our results show that tvFC data has an ID that ranges between 4 and 26, and that ID varies significantly between rest and task states. We also show how all three methods can effectively capture subject identity and task being performed: UMAP and T-SNE can capture these two levels of detail concurrently, but LE could only capture one at a time. We observed substantial variability in embedding quality across MLTs, and within-MLT as a function of hyper-parameter selection. To help alleviate this issue, we provide heuristics that can inform future studies. Finally, we also demonstrate the importance of feature normalization when combining data across subjects and the role that temporal autocorrelation plays in the application of MLTs to tvFC data. Overall, we conclude that while MLTs can be useful to generate summary views of labeled tvFC data, their application to unlabeled data such as resting-state remains challenging.

  6. f

    Additional file 6 of Gossypetin ameliorates 5xFAD spatial learning and...

    • springernature.figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kyung Won Jo; Dohyun Lee; Dong Gon Cha; Eunji Oh; Yoon Ha Choi; Somi Kim; Eun Seo Park; Jong Kyoung Kim; Kyong-Tai Kim (2023). Additional file 6 of Gossypetin ameliorates 5xFAD spatial learning and memory through enhanced phagocytosis against Aβ [Dataset]. http://doi.org/10.6084/m9.figshare.21382874.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    figshare
    Authors
    Kyung Won Jo; Dohyun Lee; Dong Gon Cha; Eunji Oh; Yoon Ha Choi; Somi Kim; Eun Seo Park; Jong Kyoung Kim; Kyong-Tai Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 6: Fig.S1 Gossypetin does not affect expression of β-, and γ-secretases and activity of β-secretase. (A to G) Time dependent β-secretase activity of mouse hippocampal lysate was measured with Relative Fluorescence Unit (RFU). Fluorescence excitation and emission wavelength was 335 nm and 495 nm respectively (A). Bar graph of RFU at each time point of 10 min (B), 20 min (C), 30 min (D), 40 min (E), 50 min (F), 60 min (G). (n = 10~12 mice per group) (H to L) Representative images of Western blot analysis for β-, γ-secretase subunits, and GAPDH (H). Bar graphs represent relative protein expression levels of BACE1 (I), Nicastrin (J), APH-1 (K), and PEN2 (L). (n = 12~15 mice per group) (M to P) Bar graphs represent relative mRNA expression level of β-, and γ-secretase subunits bace1 (M), ncstn (N), aph1 (O), pen2 (P). (n = 9~10 mice per group) Error bars represent the mean ± SD, p < 0.05, ns = not significant, two-way ANOVA followed by Tukey’s multiple comparisons test. Fig. S2 Cell type classification of brain samples. (A) UMAP plot showing all cells from the brain samples, colored by their cell types. (B) Heatmap illustrating the Z-scores of average normalized expressions of cell type markers. (C) Violin plots displaying the log-scaled number of detected genes (top), Unique Molecular Identifiers (UMIs) (middle), and the percentage of mitochondrial gene expressions (bottom) per cell for each cell type. (D) UMAP plots showing all cells from the brain samples, colored by their sampled region (left), mouse strain (middle), or drug administration (right) condition. Fig. S3 Detailed subtyping of the microglial population. (A) UMAP plots showing all microglial cells from cortex region. The cells are colored by their celltypes (left). Heatmap showing the Z-scores of average normalized expressions of representative DEGs for each cell type from cortex region (right). (B) UMAP plots showing microglial cells from cortex (left) or hippocampus (right), colored by combination of mouse strain and drug administration condition. (C) UMAP plots illustrating microglial cells from cortex (left) or hippocampus (right), colored by their inferred cell cycle. (D) Bar plots for the fraction of cortex (left) or hippocampus (right) microglial cells by sample conditions, which are the combination of mouse strain and drug administration, for each microglial subtype. Fig. S4 Differential gene expressions between vehicle- and gossypetin-treated microglia. (A) Scatter plot showing GOBP terms that are upregulated or downregulated by5xFAD construction or gossypetin administration for each microglial subtype from cortex. Significant (Fisher’s exact test, P < 0.01) terms associated with antigen presentation are colored by their biological keywords. (B) GSEA plots showing significant (P< 0.05) GOBP terms for gossypetin administration condition against vehicle treatment within 5xFAD homeostatic microglia from hippocampus region. Related to Fig. 3D. (C) Volcano plot illustrating the DEGs selected by the comparison between wild type and 5xFAD(left), or vehicle and gossypetin treated 5xFAD (right) from homeostatic microglial population of cortex region. Fig. S5 Transcriptomic transition in cortex microglia and measurement of DAM signature score. (A) Volcano plot showing significant (p < 0.05) DEGs selected by the comparison between cortex homeostatic microglia in vehicle treated wild type and 5xFAD (top left), or vehicle and gossypetin treated 5xFAD (top right). Volcano plots illustrating comparison between gossypetin administration condition against vehicle treatment within 5xFAD stage 1 DAM (bottom left) or stage 2 DAM (bottom right) from cortex are also presented. (B) Violin plot illustrating module scores for the DAM-related genes from previous studies. Cells are grouped by the combination of their mouse strain and treatment condition. (P < 0.001) Fig. S6 Gossypetin ameliorates gliosis in microglia and astrocytes. (A to D) Representative images of hippocampus (A) and cortex (C) stained with Hoechst and Iba-1. Scale bar corresponds to 200μm. Bar graph represents quantification of Iba-1 positive area in dentate gyrus of hippocampus (n = 9~12 mice per group, 3~6 slices per brain) (B) and cortex (n = 9~12 mice per group, 3~6 slices per brain) (D). (E to H) Representative images of hippocampus (E) and cortex (G) stained with Hoechst and GFAP. Scale bar corresponds to 200μm. Bar graph represents quantification of GFAP positive area in dentate gyrus of hippocampus (n = 9~12 mice per group, 3~6 slices per brain) (F) and cortex (n = 9~12 mice per group, 3~5 slices per brain) (H). The error bars represent the mean ± SEM.**p

  7. Research data supporting: "Relevant, hidden, and frustrated information in...

    • zenodo.org
    zip
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chiara Lionello; Chiara Lionello; Matteo Becchi; Matteo Becchi; Simone Martino; Simone Martino; Giovanni M. Pavan; Giovanni M. Pavan (2025). Research data supporting: "Relevant, hidden, and frustrated information in high-dimensional analyses of complex dynamical systems with internal noise" [Dataset]. http://doi.org/10.5281/zenodo.14529457
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Chiara Lionello; Chiara Lionello; Matteo Becchi; Matteo Becchi; Simone Martino; Simone Martino; Giovanni M. Pavan; Giovanni M. Pavan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the set of data shown in the paper "Relevant, hidden, and frustrated information in high-dimensional analyses of complex dynamical systems with internal noise", published on arXiv (DOI: 10.48550/arXiv.2412.09412).

    The scripts contained herein are:

    1. PCA-Analysis.py: python script to calculate the SOAP descriptor, denoising it, and compute the Principal Component Analysis
    2. SOAP-Component-Analysis.py: python script to calculate the variance of the single SOAP components
    3. Hierarchical-Clustering.py: python script to compute the hierarchical clustering and plot the dataset
    4. OnionClustering-1d.py: script to compute the Onion clustering on a single SOAP component or principal component
    5. OnionClustering-2d.py: script to compute bi-dimensional Onion clustering
    6. OnionClustering-plot.py: script to plot the Onion plot, removing clusters with population <1%
    7. UMAP.py: script to compute the UMAP dimensionality reduction technique

    To reproduce the data of this work you should start form SOAP-Component-Analysis.py to calculate the SOAP descriptor and select the components that are interesting for you, then you can calculate the PCA with PCA-Analysis.py, and applying the clustering based on your necessities (OnionClustering-1d.py, OnionClustering-2d.py, Hierarchical-Clustering.py). Further modifications of the Onion plot can be done with the script: OnionClustering-plot.py. Umap can be calculated with UMAP.py.

    Additional data contained herein are:

    1. starting-configuration.gro: gromacs file with the initial configuration of the ice-water system
    2. traj-ice-water-50ns-sampl4ps.xtc: trajectory of the ice-water system sampled every 4 ps
    3. traj-ice-water-50ns-sampl40ps.xtc: trajectory of the ice-water system sampled every 40 ps
    4. some files containing the SOAP descriptor of the ice-water system: ice-water-50ns-sampl40ps.hdf5, ice-water-50ns-sampl40ps_soap.hdf5, ice-water-50ns-sampl40ps_soap.npy, ice-water-50ns-sampl40ps_soap-spavg.npy
    5. PCA-results: folder that contains some example results of the PCA
    6. UMAP-results: folder that contains some example results of UMAP

    The data related to the Quincke rollers can be found here: https://zenodo.org/records/10638736

  8. h

    sclerobase_data

    • huggingface.co
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natalie Chan (2025). sclerobase_data [Dataset]. https://huggingface.co/datasets/nfc22/sclerobase_data
    Explore at:
    Dataset updated
    Jan 14, 2025
    Authors
    Natalie Chan
    Description

    Monitoring Progression of Scleroderma

      Project Description
    

    This is a website for visualising datasets to study protein expression in Scleroderma patients. The website is able to generate the following plots:

    Correlation Plot Boxplot UMAP plot Volcano plot Violin plot

      Introduction
    

    Scleroderma is an autoimmune disease that can cause thickened areas of skin and connective tissues. To gain a deeper understanding of this condition, analysing the expression of… See the full description on the dataset page: https://huggingface.co/datasets/nfc22/sclerobase_data.

  9. f

    Table1_Influence of single-cell RNA sequencing data integration on the...

    • frontiersin.figshare.com
    docx
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomasz Kujawa; Michał Marczyk; Joanna Polanska (2023). Table1_Influence of single-cell RNA sequencing data integration on the performance of differential gene expression analysis.docx [Dataset]. http://doi.org/10.3389/fgene.2022.1009316.s002
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    Frontiers
    Authors
    Tomasz Kujawa; Michał Marczyk; Joanna Polanska
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Large-scale comprehensive single-cell experiments are often resource-intensive and require the involvement of many laboratories and/or taking measurements at various times. This inevitably leads to batch effects, and systematic variations in the data that might occur due to different technology platforms, reagent lots, or handling personnel. Such technical differences confound biological variations of interest and need to be corrected during the data integration process. Data integration is a challenging task due to the overlapping of biological and technical factors, which makes it difficult to distinguish their individual contribution to the overall observed effect. Moreover, the choice of integration method may impact the downstream analyses, including searching for differentially expressed genes. From the existing data integration methods, we selected only those that return the full expression matrix. We evaluated six methods in terms of their influence on the performance of differential gene expression analysis in two single-cell datasets with the same biological study design that differ only in the way the measurement was done: one dataset manifests strong batch effects due to the measurements of each sample at a different time. Integrated data were visualized using the UMAP method. The evaluation was done both on individual gene level using parametric and non-parametric approaches for finding differentially expressed genes and on gene set level using gene set enrichment analysis. As an evaluation metric, we used two correlation coefficients, Pearson and Spearman, of the obtained test statistics between reference, test, and corrected studies. Visual comparison of UMAP plots highlighted ComBat-seq, limma, and MNN, which reduced batch effects and preserved differences between biological conditions. Most of the tested methods changed the data distribution after integration, which negatively impacts the use of parametric methods for the analysis. Two algorithms, MNN and Scanorama, gave very poor results in terms of differential analysis on gene and gene set levels. Finally, we highlight ComBat-seq as it led to the highest correlation of test statistics between reference and corrected dataset among others. Moreover, it does not distort the original distribution of gene expression data, so it can be used in all types of downstream analyses.

  10. n

    Acoustic features as a tool to visualize and explore marine soundscapes:...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Feb 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simone Cominelli; Nicolo' Bellin; Carissa D. Brown; Jack Lawson (2024). Acoustic features as a tool to visualize and explore marine soundscapes: Applications illustrated using marine mammal Passive Acoustic Monitoring datasets [Dataset]. http://doi.org/10.5061/dryad.3bk3j9kn8
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 15, 2024
    Dataset provided by
    University of Parma
    Memorial University of Newfoundland
    Fisheries and Oceans Canada
    Authors
    Simone Cominelli; Nicolo' Bellin; Carissa D. Brown; Jack Lawson
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Passive Acoustic Monitoring (PAM) is emerging as a solution for monitoring species and environmental change over large spatial and temporal scales. However, drawing rigorous conclusions based on acoustic recordings is challenging, as there is no consensus over which approaches, and indices are best suited for characterizing marine and terrestrial acoustic environments. Here, we describe the application of multiple machine-learning techniques to the analysis of a large PAM dataset. We combine pre-trained acoustic classification models (VGGish, NOAA & Google Humpback Whale Detector), dimensionality reduction (UMAP), and balanced random forest algorithms to demonstrate how machine-learned acoustic features capture different aspects of the marine environment. The UMAP dimensions derived from VGGish acoustic features exhibited good performance in separating marine mammal vocalizations according to species and locations. RF models trained on the acoustic features performed well for labelled sounds in the 8 kHz range, however, low and high-frequency sounds could not be classified using this approach. The workflow presented here shows how acoustic feature extraction, visualization, and analysis allow for establishing a link between ecologically relevant information and PAM recordings at multiple scales. The datasets and scripts provided in this repository allow replicating the results presented in the publication. Methods Data acquisition and preparation We collected all records available in the Watkins Marine Mammal Database website listed under the “all cuts'' page. For each audio file in the WMD the associated metadata included a label for the sound sources present in the recording (biological, anthropogenic, and environmental), as well as information related to the location and date of recording. To minimize the presence of unwanted sounds in the samples, we only retained audio files with a single source listed in the metadata. We then labelled the selected audio clips according to taxonomic group (Odontocetae, Mysticetae), and species. We limited the analysis to 12 marine mammal species by discarding data when a species: had less than 60 s of audio available, had a vocal repertoire extending beyond the resolution of the acoustic classification model (VGGish), or was recorded in a single country. To determine if a species was suited for analysis using VGGish, we inspected the Mel-spectrograms of 3-s audio samples and only retained species with vocalizations that could be captured in the Mel-spectrogram (Appendix S1). The vocalizations of species that produce very low frequency, or very high frequency were not captured by the Mel-spectrogram, thus we removed them from the analysis. To ensure that records included the vocalizations of multiple individuals for each species, we only considered species with records from two or more different countries. Lastly, to avoid overrepresentation of sperm whale vocalizations, we excluded 30,000 sperm whale recordings collected in the Dominican Republic. The resulting dataset consisted in 19,682 audio clips with a duration of 960 milliseconds each (0.96 s) (Table 1). The Placentia Bay Database (PBD) includes recordings collected by Fisheries and Oceans Canada in Placentia Bay (Newfoundland, Canada), in 2019. The dataset consisted of two months of continuous recordings (1230 hours), starting on July 1st, 2019, and ending on August 31st 2029. The data was collected using an AMAR G4 hydrophone (sensitivity: -165.02 dB re 1V/µPa at 250 Hz) deployed at 64 m of depth. The hydrophone was set to operate following 15 min cycles, with the first 60 s sampled at 512 kHz, and the remaining 14 min sampled at 64 kHz. For the purpose of this study, we limited the analysis to the 64 kHz recordings. Acoustic feature extraction The audio files from the WMD and PBD databases were used as input for VGGish (Abu-El-Haija et al., 2016; Chung et al., 2018), a CNN developed and trained to perform general acoustic classification. VGGish was trained on the Youtube8M dataset, containing more than two million user-labelled audio-video files. Rather than focusing on the final output of the model (i.e., the assigned labels), here the model was used as a feature extractor (Sethi et al., 2020). VGGish converts audio input into a semantically meaningful vector consisting of 128 features. The model returns features at multiple resolution: ~1 s (960 ms); ~5 s (4800 ms); ~1 min (59’520 ms); ~5 min (299’520 ms). All of the visualizations and results pertaining to the WMD were prepared using the finest feature resolution of ~1 s. The visualizations and results pertaining to the PBD were prepared using the ~5 s features for the humpback whale detection example, and were then averaged to an interval of 30 min in order to match the temporal resolution of the environmental measures available for the area. UMAP ordination and visualization UMAP is a non-linear dimensionality reduction algorithm based on the concept of topological data analysis which, unlike other dimensionality reduction techniques (e.g., tSNE), preserves both the local and global structure of multivariate datasets (McInnes et al., 2018). To allow for data visualization and to reduce the 128 features to two dimensions for further analysis, we applied Uniform Manifold Approximation and Projection (UMAP) to both datasets and inspected the resulting plots. The UMAP algorithm generates a low-dimensional representation of a multivariate dataset while maintaining the relationships between points in the global dataset structure (i.e., the 128 features extracted from VGGish). Each point in a UMAP plot in this paper represents an audio sample with duration of ~ 1 second (WMD dataset), ~ 5 seconds (PBD dataset, humpback whale detections), or 30 minutes (PBD dataset, environmental variables). Each point in the two-dimensional UMAP space also represents a vector of 128 VGGish features. The nearer two points are in the plot space, the nearer the two points are in the 128-dimensional space, and thus the distance between two points in UMAP reflects the degree of similarity between two audio samples in our datasets. Areas with a high density of samples in UMAP space should, therefore, contain sounds with similar characteristics, and such similarity should decrease with increasing point distance. Previous studies illustrated how VGGish and UMAP can be applied to the analysis of terrestrial acoustic datasets (Heath et al., 2021; Sethi et al., 2020). The visualizations and classification trials presented here illustrate how the two techniques (VGGish and UMAP) can be used together for marine ecoacoustics analysis. UMAP visualizations were prepared the umap-learn package for Python programming language (version 3.10). All UMAP visualizations presented in this study were generated using the algorithm’s default parameters.
    Labelling sound sources The labels for the WMD records (i.e., taxonomic group, species, location) were obtained from the database metadata. For the PBD recordings, we obtained measures of wind speed, surface temperature, and current speed from (Fig 1) an oceanographic buy located in proximity of the recorder. We choose these three variables for their different contributions to background noise in marine environments. Wind speed contributes to underwater background noise at multiple frequencies, ranging 500 Hz to 20 kHz (Hildebrand et al., 2021). Sea surface temperature contributes to background noise at frequencies between 63 Hz and 125 Hz (Ainslie et al., 2021), while ocean currents contribute to ambient noise at frequencies below 50 Hz (Han et al., 2021) Prior to analysis, we categorized the environmental variables and assigned the categories as labels to the acoustic features (Table 2). Humpback whale vocalizations in the PBD recordings were processed using the humpback whale acoustic detector created by NOAA and Google (Allen et al., 2021), providing a model score for every ~5 s sample. This model was trained on a large dataset (14 years and 13 locations) using humpback whale recordings annotated by experts (Allen et al., 2021). The model returns scores ranging from 0 to 1 indicating the confidence in the predicted humpback whale presence. We used the results of this detection model to label the PBD samples according to presence of humpback whale vocalizations. To verify the model results, we inspected all audio files that contained a 5 s sample with a model score higher than 0.9 for the month of July. If the presence of a humpback whale was confirmed, we labelled the segment as a model detection. We labelled any additional humpback whale vocalization present in the inspected audio files as a visual detection, while we labelled other sources and background noise samples as absences. In total, we labelled 4.6 hours of recordings. We reserved the recordings collected in August to test the precision of the final predictive model. Label prediction performance We used Balanced Random Forest models (BRF) provided in the imbalanced-learn python package (Lemaître et al., 2017) to predict humpback whale presence and environmental conditions from the acoustic features generated by VGGish. We choose BRF as the algorithm as it is suited for datasets characterized by class imbalance. The BRF algorithm performs under sampling of the majority class prior to prediction, allowing to overcome class imbalance (Lemaître et al., 2017). For each model run, the PBD dataset was split into training (80%) and testing (20%) sets. The training datasets were used to fine-tune the models though a nested k-fold cross validation approach with ten-folds in the outer loop, and five-folds in the inner loop. We selected nested cross validation as it allows optimizing model hyperparameters and performing model evaluation in a single step. We used the default parameters of the BRF algorithm, except for the ‘n_estimators’ hyperparameter, for which we tested

  11. p

    Human Protein Atlas - Single cell type

    • proteinatlas.org
    • v21.proteinatlas.org
    Updated Nov 18, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Human Protein Atlas - Single cell type [Dataset]. http://www.proteinatlas.org/ENSG00000006059-KRT33A/celltype
    Explore at:
    Dataset updated
    Nov 18, 2021
    License

    https://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence

    Description

    This section contains Single Cell Type information based on single cell RNA sequencing (scRNAseq) data from 25 human tissues and peripheral blood mononuclear cells (PBMCs), together with in-house generated immunohistochemically stained tissue sections visualizing the corresponding spatial protein expression patterns. The scRNAseq analysis was based on publicly available genome-wide expression data and comprises all protein-coding genes in 444 individual cell type clusters corresponding to 15 different cell type groups. A specificity and distribution classification was performed to determine the number of genes elevated in these single cell types, and the number of genes detected in one, several or all cell types, respectively. The genes expressed in each of the cell types can be explored in interactive UMAP plots and bar charts, with links to corresponding immunohistochemical stainings in human tissues.
    More information about the specific content and the generation and analysis of the data in the section can be found on the Methods Summary. Learn about:

    mRNA and protein expression in single cell types if a gene is enriched in a particular cell type (specificity) which genes have a similar expression profile across cell types (expression cluster)

  12. N

    Data from: Constrained chromatin accessibility in PU.1-mutated...

    • data.niaid.nih.gov
    Updated Jul 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gonzalez M; Le Coz C; Garifallou J; Romberg N (2021). Constrained chromatin accessibility in PU.1-mutated agammaglobulinemia patients [Dataset]. https://data.niaid.nih.gov/resources?id=gse165645
    Explore at:
    Dataset updated
    Jul 2, 2021
    Dataset provided by
    Children's Hospital of Philadelphia
    Authors
    Gonzalez M; Le Coz C; Garifallou J; Romberg N
    Description

    Using CITE-seq we measured expression of 132 proteins on the cell surfaces of single human bone marrow aspirate cells. Expression of each protein was normalized with a isotype-specific control on a single cell basis. Principal compenent analysis of normalized proteins was used to produce UMAP plots that clustered like cell types. After identifying pro-B cells on UMAP plots and further refining these populations by filtering on CD19 expression and absent CD20/IgM expression, we identified differentially expressed genes between patient and control pro-B cells. 2 CITE-seq data sets were analyzed

  13. Z

    Single-cell Atlas Reveals Diagnostic Features Predicting Progressive Drug...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pavanish Kumar (2023). Single-cell Atlas Reveals Diagnostic Features Predicting Progressive Drug Resistance in Chronic Myeloid Leukemia [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5118610
    Explore at:
    Dataset updated
    Sep 7, 2023
    Dataset provided by
    John Ouyang
    Pavanish Kumar
    Prasanna Nori Venkatesh
    Alice Man Sze Cheung
    Meera Makheja
    Lee Kian Leong
    Sudipto Bari
    Salvatore Albani
    Zahid Nawaz
    Sin Tiong Ong
    William Ying Khee Hwang
    Vaidehi Krishnan
    Shyam Prabhakar
    Owen Rackham
    Florian Schmidt
    Charles Chuah
    Chan Zhu En
    Ahmad Lajam
    Description

    This archive contains data of scRNAseq and CyTOF in form of Seurat objects, txt and csv files as well as R scripts for data analysis and Figure generation.

    A summary of the content is provided in the following.

    R scripts

    Script to run Machine learning models predicting group specific marker genes: CML_Find_Markers_Zenodo.R Script to reproduce the majority of Main and Supplementary Figures shown in the manuscript: CML_Paper_Figures_Zenodo.R Script to run inferCNV analysis: inferCNV_Zenodo.R Script to plot NATMI analysis results:NATMI_CvsA_FC0.32_Updown_Column_plot_Zenodo.R Script to conduct sub-clustering and filtering of NK cells NK_Marker_Detection_Zenodo.R

    Helper scripts for plotting and DEG calculation:ComputePairWiseDE_v2.R, Seurat_DE_Heatmap_RCA_Style.R

    RDS files

    General scRNA-seq Seurat objects:

    scRNA-seq seurat object after QC, and cell type annotation used for most analysis in the manuscript: DUKE_DataSet_Doublets_Removed_Relabeled.RDS

    scRNA-seq including findings e.g. from NK analysis used in the shiny app: DUKE_final_for_Shiny_App.rds

    Neighborhood enrichment score computed for group A across all HSPCs: Enrichment_score_global_groupA.RDS

    UMAP coordinates used in the article: Layout_2D_nNeighbours_25_Metric_cosine_TCU_removed.RDS

    SCENIC files:

    Regulon set used in SCENIC: 2.6_regulons_asGeneSet.Rds

    AUC values computed for regulons: 3.4_regulonAUC.Rds

    MetaData used in SCENIC cellInfo.Rds

    Group specific regulons for LCS: groupSpecificRegulonsBCRAblP.RDS

    Patient specific regulons for LSC: patientSpecificRegulonsBCRAblP.RDS

    Patient specificity score for LSC: PatientSpecificRegulonSpecificityScoreBCRAblP.RDS

    Regulon specificty score for LSC: RegulonSpecificityScoreBCRAblP.RDS

    BCR-ABL1 inference:

    HSC with inferred BCR-ABL1 label: HSCs_CML_with_BCR-Abl_label.RDS

    UMAP for HSC with inferred BCR-ABL1 label: HSCs_CML_with_BCR-Abl_label_UMAP.RDS

    HSPCs with BCR-ABL1 module scores: HSPC_metacluster_74K_with_modscore_27thmay.RDS

    NK sub-clustering and filtering:

    NK object with module scores: NK_8617cells_with_modscore_1stjune.RDS

    Feature genes for NK cells computed with DubStepR: NK_Cells_DubStepR

    NK cells Seurat object excluding contaminating T and B cells: NK_cells_T_B_17_removed.RDS

    NK Seurat object including neighbourhood enrichment score calculations: NK_seurat_object_with_enrichment_labels_V2.RDS

    txt and csv files:

    Proportions per cluster calculated from CyTOF: CyTOF_Proportions.txt

    Correlation between scRNAseq and CyTOF cell type abundance: scRNAseq_Cor_Cytof.txt

    Correlation between manual gating and FlowSOM clustering: Manual_vs_FlowSOM.txt

    GSEA results:

    HSPC, HSC and LSC results: FINAL_GSEA_DATA_For_GGPLOT.txt

    NK: NK_For_Plotting.txt

    TFRC and HLA expression: TFRC_and_HLA_Values.txt

    NATMI result files:

    UP-regulated_mean.csv

    DOWN-regulated_mean.csv

    Gene position file used in inferCNV: inferCNV_gene_positions_hg38.txt

    Module scores for NK subclusters per cell: NK_Supplementary_Module_Scores.csv

    Compressed folders:

    All CyTOF raw data files: CyTOF_Data_raw.zip

    Results of the patient-based classifier: PatientwiseClassifier.zip

    Results of the single-cell based classifier: SingleCellClassifierResults.zip

    For general new data analysis approaches, we recommend the readers to use the Seruat object stored in DUKE_final_for_Shiny_App.rds or to use the shiny app(http://scdbm.ddnetbio.com/) and perform further analysis from there.

    RAW data is available at EGA upon request using Study ID: EGAS00001005509

    Revision

    The for_CML_manuscript_revision.tar.gz folder contains scripts and data for the paper revision including 1) Detection of the BCR-ABL fusion with long read sequencing; 2) Identification of BCR-ABL junction reads with scRNAseq; 3) Detection of expressed mutations using scRNAseq.

  14. Additional file 1 of MSCsDB: a database of single-cell transcriptomic...

    • springernature.figshare.com
    zip
    Updated Mar 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miao Yu; Ke Sui; Zheng Wang; Xi Zhang (2024). Additional file 1 of MSCsDB: a database of single-cell transcriptomic profiles and in-depth comprehensive analyses of human mesenchymal stem cells [Dataset]. http://doi.org/10.6084/m9.figshare.25357793.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 7, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Miao Yu; Ke Sui; Zheng Wang; Xi Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file1: Figure S1. The information on MSC atlas taxonomy. (A) UMAP of all MSCs with cluster annotations, (B) UMAP of MSCs color-labelled by tissue, (C) Cell counts of MSCs from different tissues in each cluster, and (D) Cell counts of MSCs from different samples in each cluster. Figure S2. Differentiation scoring of MSCs on five differentiation directions. (A) Scoring of osteogenesis, chondrogenesis, adipogenesis, myogenesis and neurogenesis. (B) Scoring of representative gene expression for MSCs differentiation. Figure S3. Home page of MSCsDB. which includes website introduction, functionality overview, gene cloud, and website update news. Figure S4. Module of Dataset and link to the module of Explore. Users can view the metadata of each sample dataset, such as the original article, data repository and sequencing technology. Users can also click on the “Explore” button to view the sample’s clustering annotation, gene expression level analysis, pathway enrichment analysis, copy number variation analysis, and pseudotime analysis results. Figure S5. Functionality in the module of Atlas. (A) UMAP of MSCs with cluster annotations. Users can select specific clusters to view their distribution. The MSC atlas can also be classified by tissue or batch and shown separately. (B) Gene signature of MSCs. Users can analyze the cell percentage of all genes and click on the “View” button to view the gene expression levels in cells and clusters. The Gene Card database is also linked for users to view gene information. Users can also enter a specific gene in the search box to retrieve relevant information. Figure S6. An example of functionality in the module of Atlas. (A) Pathway enrichment analysis of MSCs from different databases. Users can switch between different databases. Users can also select specific clusters and pathways to view their enrichment status. (B) Copy number variation analysis of MSCs using copyKat and InferCNVpy packages. The copyKat software can predict whether the cells are normal cells (diploid) or tumor cells (aneuploid). The InferCNVpy package gives prediction values, so we provide chromosome heatmaps based on CNV clustering for users to distinguish between normal cells and tumor cells. (C) Pseudotime analysis of MSCs using PAGA method. We show the cell trajectory inference plot and cluster UMAP plot for a single sample. (D) Transcription factor network analysis of MSCs using pyscenic package. We provide the transcription factor network analysis result table and heatmap for a single sample’s cluster. Users can click on the “View” button in the table to view the target genes regulated by that transcription factor. Figure S7. De novo analysis for clustering, pathway enrichment, and quality evaluation. (A) UMAP plot of MSC clustering and annotation using Scanpy package for a sample dataset. (B) Pathway enrichment analysis using Clusterprofiler package for a sample dataset. (C) Copy number variation analysis using CopyKat and InferCNVpy packages for a sample dataset. Figure S8. De novo analysis for pseudotime and gene regulatory network analysis. (A) Pseudotime analysis using PAGA method for a sample dataset. (B) Gene regulatory network analysis using pyscenic package for a sample dataset. Table S1. Marker genes used for potency score analysis. Table S2. Scoring for each cluster using geneset.

  15. Joint embedding of vertebrate brain single-cell RNA-Seq using sequence or...

    • zenodo.org
    bin, tsv
    Updated Aug 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dennis Sun; Dennis Sun (2023). Joint embedding of vertebrate brain single-cell RNA-Seq using sequence or structure [Dataset]. http://doi.org/10.5281/zenodo.7838976
    Explore at:
    bin, tsvAvailable download formats
    Dataset updated
    Aug 18, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dennis Sun; Dennis Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Embeddings of single-cell RNA-Seq data from three adult vertebrate brain datasets into Orthogroup feature space or Structural cluster feature space. Orthogroups were generated using OrthoFinder v5.5.0; Structural clusters were assigned by using FoldSeek to cluster AlphaFold-v4 structural predictions.

    The three datasets used as the basis for these embeddings were:

    For each dataset, we also generated a standardized cell type annotation file based on the author's originally provided cell type annotation data. The first column is the cell barcode for that species and the second column is the original study's cell type annotation for that cell.

    For the Xenopus brain data, we removed around ~18k cells that were not annotated in the original data to simplify data analyses - these are reflected in the files with the "subsampled" suffix. Subsampled versions of the data are also available for the joint embedding space (prefixed with "DrerMmusXlae").

    For the final datasets used in our analyses, we also provide features x cell matrices as .h5ad files for smaller file sizes and faster loading using Scanpy.

    For visualizing our UMAP plots of our top200 embedding space, we provide ".tsv" files with a variety of metrics and the x and y positions of each cell in the UMAP. See "DrerMmusXlae_adultbrain_FoldSeek_plotlydata.tsv" and "DrerMmusXlae_adultbrain_OrthoFinder_plotlydata.tsv"

    These data are part of the Arcadia Science Pub titled "Comparing gene expression across species based on protein structure instead of sequence".

  16. d

    Data from: Sphingosine-1-phosphate signaling regulates the ability of Müller...

    • datadryad.org
    zip
    Updated May 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Olivia Taylor; Nicholas Degroff; Heithem El-Hodiri; Chengyu Gao; Andy Fischer (2025). Sphingosine-1-phosphate signaling regulates the ability of Müller glia to become neurogenic, proliferating progenitor-like cells [Dataset]. http://doi.org/10.5061/dryad.tdz08kq8t
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 1, 2025
    Dataset provided by
    Dryad
    Authors
    Olivia Taylor; Nicholas Degroff; Heithem El-Hodiri; Chengyu Gao; Andy Fischer
    Time period covered
    Mar 3, 2025
    Description

    Sphingosine-1-phosphate signaling regulates the ability of Müller glia to become neurogenic, proliferating progenitor-like cells

    https://doi.org/10.5061/dryad.tdz08kq8t

    General Information

    Dataset Overview

    A detailed description of the general framework and specific methodology can be found in the relevant publication (https://doi.org/10.7554/eLife.102151.4).

    For each dataset, barcode, feature, and matrix file from CellRanger output are provided. These files serve as inputs for preparing the Seurat objects used in this study. Barcode files contain a list of cell barcodes. Feature files contain gene names from the reference used for CellRanger and include 3 columns: ENSEMBL number, gene name, and the type of assay run ("GENE EXPRESSION"). Matrix files contain the sparse matrix containing UMI counts for each library.

    Dissociated cells were loaded onto the 10X Chromium Cell Controller ...

  17. Interactive UMAP plot of the Australia recordings.

    • plos.figshare.com
    html
    Updated May 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben Williams; Santiago M. Balvanera; Sarab S. Sethi; Timothy A.C. Lamont; Jamaluddin Jompa; Mochyudho Prasetya; Laura Richardson; Lucille Chapuis; Emma Weschke; Andrew Hoey; Ricardo Beldade; Suzanne C. Mills; Anne Haguenauer; Frederic Zuberer; Stephen D. Simpson; David Curnick; Kate E. Jones (2025). Interactive UMAP plot of the Australia recordings. [Dataset]. http://doi.org/10.1371/journal.pcbi.1013029.s005
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 9, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ben Williams; Santiago M. Balvanera; Sarab S. Sethi; Timothy A.C. Lamont; Jamaluddin Jompa; Mochyudho Prasetya; Laura Richardson; Lucille Chapuis; Emma Weschke; Andrew Hoey; Ricardo Beldade; Suzanne C. Mills; Anne Haguenauer; Frederic Zuberer; Stephen D. Simpson; David Curnick; Kate E. Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Australia
    Description

    Interactive UMAP plot of the Australia recordings.

  18. f

    Interactive UMAP plot of the French Polynesia recordings.

    • figshare.com
    html
    Updated May 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben Williams; Santiago M. Balvanera; Sarab S. Sethi; Timothy A.C. Lamont; Jamaluddin Jompa; Mochyudho Prasetya; Laura Richardson; Lucille Chapuis; Emma Weschke; Andrew Hoey; Ricardo Beldade; Suzanne C. Mills; Anne Haguenauer; Frederic Zuberer; Stephen D. Simpson; David Curnick; Kate E. Jones (2025). Interactive UMAP plot of the French Polynesia recordings. [Dataset]. http://doi.org/10.1371/journal.pcbi.1013029.s006
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 9, 2025
    Dataset provided by
    PLOS Computational Biology
    Authors
    Ben Williams; Santiago M. Balvanera; Sarab S. Sethi; Timothy A.C. Lamont; Jamaluddin Jompa; Mochyudho Prasetya; Laura Richardson; Lucille Chapuis; Emma Weschke; Andrew Hoey; Ricardo Beldade; Suzanne C. Mills; Anne Haguenauer; Frederic Zuberer; Stephen D. Simpson; David Curnick; Kate E. Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    French Polynesia
    Description

    Interactive UMAP plot of the French Polynesia recordings.

  19. p

    Human Protein Atlas - Celltype Atlas

    • v20.proteinatlas.org
    Updated Nov 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Human Protein Atlas - Celltype Atlas [Dataset]. https://v20.proteinatlas.org/ENSG00000213626-LBH/celltype
    Explore at:
    Dataset updated
    Nov 19, 2020
    License

    https://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence

    Description

    The Single Cell Type Atlas contains single cell RNA sequencing (scRNAseq) data from 13 different human tissues, together with in-house generated immunohistochemically stained tissue sections visualizing the corresponding spatial protein expression patterns. The scRNAseq analysis was based on publicly available genome-wide expression data and comprises all protein-coding genes in 192 individual cell type clusters corresponding to 12 different cell type groups. A specificity and distribution classification was performed to determine the number of genes elevated in these single cell types, and the number of genes detected in one, several or all cell types, respectively. The genes expressed in each of the cell types can be explored in interactive UMAP plots and bar charts, with links to corresponding immunohistochemical stainings in human tissues.

  20. Z

    Oncogenic signalling is coupled to colorectal cancer cell differentiation...

    • data.niaid.nih.gov
    Updated Apr 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sell, Thomas (2023). Oncogenic signalling is coupled to colorectal cancer cell differentiation state [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6400082
    Explore at:
    Dataset updated
    Apr 11, 2023
    Dataset provided by
    Sell, Thomas
    Astaburuaga-García, Rosario
    Fischer, Matthias M.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mass cytometry and single-cell RNA-sequencing data as well as R Markdown reports to reproduce the figures of our publication.

    Raw MC data were saved post de-convolution, spillover-compensation, and removal of calibration bead events. Gates for singlets and non-dead cells (low_Pt) are included as logical columns and should be applied prior to usage.

    As we performed random sampling to equalise cell numbers across conditions, batch normalisation, and used non-linear dimensionality reduction techniques (UMAP and Diffusion Maps), resulting plots may differ slightly from the published figures, yet still support the drawn conclusions. Already normalised and/or sampled data as well as pre-computed UMAP and Diffusion Map coordinates are included in this data set to reproduce the manuscript figures exactly, as shown in the included report “figures_only”. For all details on the batch normalisation and data analysis steps performed, please consult the report “data_analysis” instead.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ziteng Li; Hena Zhang; Qin Li; Yan Li; Zhixiang Hu; Xichun Hu; Xiaodong Zhu; Shenglin Huang (2024). UMAP plots split by dataset and sample [Dataset]. http://doi.org/10.6084/m9.figshare.22300675.v1

UMAP plots split by dataset and sample

Explore at:
zipAvailable download formats
Dataset updated
Feb 15, 2024
Dataset provided by
figshare
Authors
Ziteng Li; Hena Zhang; Qin Li; Yan Li; Zhixiang Hu; Xichun Hu; Xiaodong Zhu; Shenglin Huang
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

File “UMAP plots split by dataset and sample” supplied the comparison of UMAP plots at dataset or sample level colored by major cell types.

Search
Clear search
Close search
Google apps
Main menu