5 datasets found
  1. Z

    GAMMA: Galactic Attributes of Mass, Metallicity, and Age Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Çakır, Ufuk (2023). GAMMA: Galactic Attributes of Mass, Metallicity, and Age Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8375343
    Explore at:
    Dataset updated
    Nov 3, 2023
    Dataset authored and provided by
    Çakır, Ufuk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce the GAMMA (Galactic Attributes of Mass, Metallicity, and Age) dataset, a comprehensive collection of galaxy data tailored for Machine Learning applications. This dataset offers detailed 2D maps and 3D cubes of 11 727 galaxies, capturing essential attributes: stellar age, metallicity, and mass. Together with the dataset we publish our code to extract any other stellar or gaseous property from the raw simulation suite to extend the dataset beyond these initial properties, ensuring versatility for various computational tasks. Ideal for feature extraction, clustering, and regression tasks, GAMMA offers a unique lens for exploring galactic structures through computational methods and is a bridge between astrophysical simulations and the field of scientific machine learning (ML). As a first benchmark, we apply Principal Component Analysis (PCA) on this dataset. We find that PCA effectively captures the key morphological features of galaxies with a small number of components. We achieve a dimensionality reduction by a factor of ∼200 (∼3650) for 2D images (3D cubes) with a reconstruction accuracy below 5%. We calculate UMAP (Uniform Manifold Approximation and Projection for Dimension Reduction) on the lower dimensional PCA scores of the 2D images to visualize the image space. An interactive version of this plot can be accessed using an online Dashboard (hover over a point to see the galaxy image and the IllustrisTNG Subhalo ID). All the code to generate this dataset and load the data structure is publicly available on GitHub, with an additional documentation page hosted on ReadTheDocs.

  2. f

    Supplementary Table S2 Markers genes for UMAP clusters of day 14 pig taste...

    • figshare.com
    csv
    Updated Aug 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maire Doyle (2025). Supplementary Table S2 Markers genes for UMAP clusters of day 14 pig taste organoids [Dataset]. http://doi.org/10.6084/m9.figshare.29128955.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 4, 2025
    Dataset provided by
    figshare
    Authors
    Maire Doyle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Marker genes for each cluster shown in the pig taste organoid scRNAseq for organoids harvested on Day 14.See Figure 1E for UMAP representation of genes found in scRNAseq analysis of pig taste organoids harvested on Day 14 (n=2), colored by cluster of cell type. See Table 1 for cell cluster assignment.

  3. m

    Data from: A multiplex single-cell RNA-Seq pharmacotranscriptomics pipeline...

    • data.mendeley.com
    Updated Oct 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alice Dini (2024). A multiplex single-cell RNA-Seq pharmacotranscriptomics pipeline for drug discovery [Dataset]. http://doi.org/10.17632/j9j4mdm9yr.1
    Explore at:
    Dataset updated
    Oct 22, 2024
    Authors
    Alice Dini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We developed a single-cell transcriptomics pipeline for high-throughput pharmacotranscriptomic screening. We explored the transcriptional landscape of three HGSOC models (JHOS2, a representative cell line; PDC2 and PDC3, two patient-derived samples) after treating their cells for 24 hours with 45 drugs representing 13 distinct classes of mechanism of action. Our work establishes a new precision oncology framework for the study of molecular mechanisms activated by a broad array of drug responses in cancer. . ├── 3D UMAPs/ → Interactive 3D UMAPs of cells treated with the 45 drugs used for multiplexed scRNA-seq. Related to Figure 4. Coordinates: x = UMAP 1; y = UMAP 2; z = UMAP 3. Legend: green = PDC1; blue = PDC2; red = JHOS2. │ ├── DMSO_3D_UMAP_Dini.et.al.html → 3D UMAP of untreated cells. │ └── drug_3D_UMAP_Dini.et.al.html → 3D UMAP of cells treated with (drug). ├── QC_plots/ → Diagnostic plots. Related to Figures 2–4. │ ├── model_QC_violin_plot_2023.pdf → Violin plots of the QC metrics used to filter the data. │ ├── model_col_HTO or model_row_HTO before and after filt → Heatmaps of the row or column HTO expression in each cell. │ └── model_counts_histogram_2023.pdf → Histogram of the distribution of the total counts per cell after filtering for high-quality cells. ├── scRNAseq/ → scRNA-seq data. Related to Figures 2–4. │ ├── AllData_subsampled_DGE_edgeR.csv.gz → Differential gene expression analyses results between treated and untreated cells via pseudobulk of aggregate subsamples, for each of the three models. Related to Figure 3. │ └── All_vs_all_RNAclusters_DEG_signif.txt → Differential gene expression analysis results (p.adj < 0.05) of FindAllMarkers for the Leiden/RNA clusters. ├── PDCs.transcript.counts.tsv → Bulk RNA-seq count data for PDCs 1–3 processed by Kallisto. Related to Figure S6. └── PDCs.transcript.TPM.tsv → Bulk RNA-seq TPM data for PDCs 1–3 processed by Kallisto. Related to Figure S6.

  4. f

    Supplementary file 1_Multiscale topology of the spectroscopic mixing space:...

    • frontiersin.figshare.com
    pdf
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Small; Daniel Sousa (2025). Supplementary file 1_Multiscale topology of the spectroscopic mixing space: crystalline substrates.pdf [Dataset]. http://doi.org/10.3389/frsen.2025.1551139.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset provided by
    Frontiers
    Authors
    Christopher Small; Daniel Sousa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The statistical and topological properties of spectral feature spaces are direct expressions of the populations of spectra they represent. Characterization of the topology and dimensionality of spectral feature spaces provides both quantitative and qualitative insight into their information content. Understanding the characteristics and information content of a spectral feature space is essential to modeling and interpretation of the target properties of spectra. The reflectance of crystalline substrates, specifically sands and evaporites, is of immediate relevance to remote sensing of the diversity of soils and terrestrial substrates more generally. The objective of this analysis is to characterize the topology and spectral dimensionality of spectroscopic feature spaces composed of a diversity of co-occurring sands and evaporites worldwide. To achieve this, we construct a composite spectral feature space as a mosaic of 30 desert environments imaged by NASA’s EMIT spaceborne imaging spectrometer and compare the global and local structure of the aggregate spectral feature space using a combination of linear and nonlinear dimensionality reduction. The 3D (>99%) variance partition of the EMIT mosaic indicates that the spectral diversity of sand and evaporite reflectances is determined primarily by albedo and spectral continuum–related to mineralogy, moisture content and illumination geometry. The spectral feature space defined by the low order principal components clearly distinguishes low and high albedo sand endmembers with multiple internal clusters indicating distinct spectral continuum shapes. The same feature space also contains a continuum of evaporite endmembers with no apparent clustering but a strong dependence of albedo and continuum curvature on moisture content. In contrast, 2D and 3D UMAP embeddings of the same feature space clearly distinguish at least 18 spectrally separable clusters interspersed amidst two continua of tendrils. One continuum is associated with multiple sand albedo gradients in the Gobi Desert while the other corresponds to a variety of low albedo basement outcrops in multiple granules. Together, these observations indicate that the EMIT spectrometer is able to clearly distinguish spectrally separable reflectance features in both the spectral continuum and narrowband absorptions, suggesting that the geographically distinct crystalline substrates included in the study are mineralogically distinct and completely spectrally separable.

  5. f

    Data_Sheet_1_Manifold learning for fMRI time-varying functional...

    • frontiersin.figshare.com
    docx
    Updated Jul 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Javier Gonzalez-Castillo; Isabel S. Fernandez; Ka Chun Lam; Daniel A. Handwerker; Francisco Pereira; Peter A. Bandettini (2023). Data_Sheet_1_Manifold learning for fMRI time-varying functional connectivity.docx [Dataset]. http://doi.org/10.3389/fnhum.2023.1134012.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 11, 2023
    Dataset provided by
    Frontiers
    Authors
    Javier Gonzalez-Castillo; Isabel S. Fernandez; Ka Chun Lam; Daniel A. Handwerker; Francisco Pereira; Peter A. Bandettini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Whole-brain functional connectivity (FC) measured with functional MRI (fMRI) evolves over time in meaningful ways at temporal scales going from years (e.g., development) to seconds [e.g., within-scan time-varying FC (tvFC)]. Yet, our ability to explore tvFC is severely constrained by its large dimensionality (several thousands). To overcome this difficulty, researchers often seek to generate low dimensional representations (e.g., 2D and 3D scatter plots) hoping those will retain important aspects of the data (e.g., relationships to behavior and disease progression). Limited prior empirical work suggests that manifold learning techniques (MLTs)—namely those seeking to infer a low dimensional non-linear surface (i.e., the manifold) where most of the data lies—are good candidates for accomplishing this task. Here we explore this possibility in detail. First, we discuss why one should expect tvFC data to lie on a low dimensional manifold. Second, we estimate what is the intrinsic dimension (ID; i.e., minimum number of latent dimensions) of tvFC data manifolds. Third, we describe the inner workings of three state-of-the-art MLTs: Laplacian Eigenmaps (LEs), T-distributed Stochastic Neighbor Embedding (T-SNE), and Uniform Manifold Approximation and Projection (UMAP). For each method, we empirically evaluate its ability to generate neuro-biologically meaningful representations of tvFC data, as well as their robustness against hyper-parameter selection. Our results show that tvFC data has an ID that ranges between 4 and 26, and that ID varies significantly between rest and task states. We also show how all three methods can effectively capture subject identity and task being performed: UMAP and T-SNE can capture these two levels of detail concurrently, but LE could only capture one at a time. We observed substantial variability in embedding quality across MLTs, and within-MLT as a function of hyper-parameter selection. To help alleviate this issue, we provide heuristics that can inform future studies. Finally, we also demonstrate the importance of feature normalization when combining data across subjects and the role that temporal autocorrelation plays in the application of MLTs to tvFC data. Overall, we conclude that while MLTs can be useful to generate summary views of labeled tvFC data, their application to unlabeled data such as resting-state remains challenging.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Çakır, Ufuk (2023). GAMMA: Galactic Attributes of Mass, Metallicity, and Age Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8375343

GAMMA: Galactic Attributes of Mass, Metallicity, and Age Dataset

Explore at:
Dataset updated
Nov 3, 2023
Dataset authored and provided by
Çakır, Ufuk
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We introduce the GAMMA (Galactic Attributes of Mass, Metallicity, and Age) dataset, a comprehensive collection of galaxy data tailored for Machine Learning applications. This dataset offers detailed 2D maps and 3D cubes of 11 727 galaxies, capturing essential attributes: stellar age, metallicity, and mass. Together with the dataset we publish our code to extract any other stellar or gaseous property from the raw simulation suite to extend the dataset beyond these initial properties, ensuring versatility for various computational tasks. Ideal for feature extraction, clustering, and regression tasks, GAMMA offers a unique lens for exploring galactic structures through computational methods and is a bridge between astrophysical simulations and the field of scientific machine learning (ML). As a first benchmark, we apply Principal Component Analysis (PCA) on this dataset. We find that PCA effectively captures the key morphological features of galaxies with a small number of components. We achieve a dimensionality reduction by a factor of ∼200 (∼3650) for 2D images (3D cubes) with a reconstruction accuracy below 5%. We calculate UMAP (Uniform Manifold Approximation and Projection for Dimension Reduction) on the lower dimensional PCA scores of the 2D images to visualize the image space. An interactive version of this plot can be accessed using an online Dashboard (hover over a point to see the galaxy image and the IllustrisTNG Subhalo ID). All the code to generate this dataset and load the data structure is publicly available on GitHub, with an additional documentation page hosted on ReadTheDocs.

Search
Clear search
Close search
Google apps
Main menu