30 datasets found
  1. R script and datasets - Cluster Analysis and Heat maps

    • figshare.com
    txt
    Updated May 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chui Pin Leaw; Po Teen Lim; Li Keat Lee (2020). R script and datasets - Cluster Analysis and Heat maps [Dataset]. http://doi.org/10.6084/m9.figshare.12387242.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2020
    Dataset provided by
    figshare
    Authors
    Chui Pin Leaw; Po Teen Lim; Li Keat Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contained R scripts and data sets used to generate clustering dendogram and heatmaps as shown Fig. 3.

  2. f

    ProjecTILs murine reference atlas of tumor-infiltrating T cells, version 1

    • figshare.com
    application/gzip
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massimo Andreatta; Santiago Carmona (2023). ProjecTILs murine reference atlas of tumor-infiltrating T cells, version 1 [Dataset]. http://doi.org/10.6084/m9.figshare.12478571.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 29, 2023
    Dataset provided by
    figshare
    Authors
    Massimo Andreatta; Santiago Carmona
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We have developed ProjecTILs, a computational approach to project new data sets into a reference map of T cells, enabling their direct comparison in a stable, annotated system of coordinates. Because new cells are embedded in the same space of the reference, ProjecTILs enables the classification of query cells into annotated, discrete states, but also over a continuous space of intermediate states. By comparing multiple samples over the same map, and across alternative embeddings, the method allows exploring the effect of cellular perturbations (e.g. as the result of therapy or genetic engineering) and identifying genetic programs significantly altered in the query compared to a control set or to the reference map. We illustrate the projection of several data sets from recent publications over two cross-study murine T cell reference atlases: the first describing tumor-infiltrating T lymphocytes (TILs), the second characterizing acute and chronic viral infection.To construct the reference TIL atlas, we obtained single-cell gene expression matrices from the following GEO entries: GSE124691, GSE116390, GSE121478, GSE86028; and entry E-MTAB-7919 from Array-Express. Data from GSE124691 contained samples from tumor and from tumor-draining lymph nodes, and were therefore treated as two separate datasets. For the TIL projection examples (OVA Tet+, miR-155 KO and Regnase-KO), we obtained the gene expression counts from entries GSE122713, GSE121478 and GSE137015, respectively.Prior to dataset integration, single-cell data from individual studies were filtered using TILPRED-1.0 (https://github.com/carmonalab/TILPRED), which removes cells not enriched in T cell markers (e.g. Cd2, Cd3d, Cd3e, Cd3g, Cd4, Cd8a, Cd8b1) and cells enriched in non T cell genes (e.g. Spi1, Fcer1g, Csf1r, Cd19). Dataset integration was performed using STACAS (https://github.com/carmonalab/STACAS), a batch-correction algorithm based on Seurat 3. For the TIL reference map, we specified 600 variable genes per dataset, excluding cell cycling genes, mitochondrial, ribosomal and non-coding genes, as well as genes expressed in less than 0.1% or more than 90% of the cells of a given dataset. For integration, a total of 800 variable genes were derived as the intersection of the 600 variable genes of individual datasets, prioritizing genes found in multiple datasets and, in case of draws, those derived from the largest datasets. We determined pairwise dataset anchors using STACAS with default parameters, and filtered anchors using an anchor score threshold of 0.8. Integration was performed using the IntegrateData function in Seurat3, providing the anchor set determined by STACAS, and a custom integration tree to initiate alignment from the largest and most heterogeneous datasets.Next, we performed unsupervised clustering of the integrated cell embeddings using the Shared Nearest Neighbor (SNN) clustering method implemented in Seurat 3 with parameters {resolution=0.6, reduction=”umap”, k.param=20}. We then manually annotated individual clusters (merging clusters when necessary) based on several criteria: i) average expression of key marker genes in individual clusters; ii) gradients of gene expression over the UMAP representation of the reference map; iii) gene-set enrichment analysis to determine over- and under- expressed genes per cluster using MAST. In order to have access to predictive methods for UMAP, we recomputed PCA and UMAP embeddings independently of Seurat3 using respectively the prcomp function from basic R package “stats”, and the “umap” R package (https://github.com/tkonopka/umap).

  3. n

    CalCENv1 co-expression network UMAP clusters

    • data.niaid.nih.gov
    Updated Dec 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew O'Meara (2020). CalCENv1 co-expression network UMAP clusters [Dataset]. https://data.niaid.nih.gov/resources?id=ds_4a1633821f
    Explore at:
    Dataset updated
    Dec 6, 2020
    Dataset provided by
    Matthew O'Meara
    Teresa O'Meara
    Description

    CalCENv1 co-expression network was projected to two dimensions using UMAP and 18 clusters were identified and annotated through gene set enrichment analysis.

  4. d

    Data from: Reference transcriptomics of porcine peripheral immune cells...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +3more
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. https://catalog.data.gov/dataset/data-from-reference-transcriptomics-of-porcine-peripheral-immune-cells-created-through-bul-e667c
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows: matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz) *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include: nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().

  5. a

    2011-2020 HSIP Pedestrian Cluster

    • geo-massdot.opendata.arcgis.com
    • gis.data.mass.gov
    • +3more
    Updated May 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massachusetts geoDOT (2023). 2011-2020 HSIP Pedestrian Cluster [Dataset]. https://geo-massdot.opendata.arcgis.com/maps/2011-2020-hsip-pedestrian-cluster
    Explore at:
    Dataset updated
    May 10, 2023
    Dataset authored and provided by
    Massachusetts geoDOT
    Area covered
    Description

    The top locations where reported collisions occurred between pedestrians and motor vehicles have been identified. The crash cluster analysis methodology for the top pedestrian clusters uses a fixed meter search distance of 100 meters (328 ft.) to merge crash clusters together. Located crashes between motor vehicles and pedestrians were identified by using the non-motorist type code as well as first harmful events and most harmful events within the CDS database. Furthermore, the methodology uses the Equivalent Property Damage Only (EPDO) weighting to rank the clusters. EPDO is based any type of injury crash (including fatal, incapacitating, non-incapacitating and possible) having a weighting of 21 compared to a property damage only crash (which has weighting of 1). However, because of the relatively small number of reported pedestrian crashes in the crash data file, the clustering analysis used crashes from the ten year period from 2010-2019. Additionally, due to the larger geographic area encompassed by the pedestrian crash clusters, it was difficult to name them so they were left unnamed but can be viewed spatially.

  6. Data from: A single-cell atlas characterizes dysregulation of the bone...

    • zenodo.org
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Pilcher; William Pilcher (2025). A single-cell atlas characterizes dysregulation of the bone marrow immune microenvironment associated with outcomes in multiple myeloma [Dataset]. http://doi.org/10.5281/zenodo.14624955
    Explore at:
    Dataset updated
    Jan 14, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    William Pilcher; William Pilcher
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 8, 2024
    Description

    This repository contains R Seurat objects associated with our study titled "A single-cell atlas characterizes dysregulation of the bone marrow immune microenvironment associated with outcomes in multiple myeloma".

    Single cell data contained within this object comes from MMRF Immune Atlas Consortium work.

    The .rds files contains a Seurat object saved with version 4.3. This can be loaded in R with the readRDS command.

    Two .RDS files are included in this version of the release.

    • Discovery object: MMRF_ImmuneAtlas_Full_With_Corrected_Censored_Metadata.rds contains all aliquots belonging to the 'discovery' cohort as used in the initial paper. This represents the dataset used for initial clustering, cell annotation, and analysis.

    • Discovery + Validation object: COMBINED_VALIDATION_MMRF_ImmuneAtlas_Full_Censored_Metadata.rds contains both aliquots belonging to the initial 'discovery' cohort, and aliquots belonging to the 'validation' cohort. The group each cell is derived from is listed under the 'cohort' variable. Labels related to cell annotation, including doublet status, are derived from a label transfer process as described in the paper. Labels for the original 'discovery' cohort are unchanged. UMAPs have been reconstructed with both the discovery and validation cohorts integrated.

    --

    The discovery object contains two assays:

    • "RNA" - The raw count matrix
    • "RNA_Batch_Corrected" - Counts adjusted for the combination of 'Study_Site' and 'Batch'.
      • Analysis should prefer the original RNA assay, unless using pipelines which does not support adjusting for technical covariates.

    Currently, the validation object only includes the uncorrected RNA assay.

    --

    The object contains two umaps in the reduction slot:

    • umap - will render the UMAP for the full object with all cells.
    • umap.sub -contains the UMAP embeddings for individual 'compartments', as indicated by 'subcluster_V03072023_compartment'

    --

    Each sample has three different identifiers:

    • public_id
      • Indicates a specific patient (n=263).
      • MMRF_####
      • This is a standard identifier which is used across all MMRF CoMMpass datasets
      • public_ids can map to multiple d_visit_specimen_ids and aliquot_ids
      • As of now, all public_ids have a single sample collected at Baseline.
        • This can be accessed by filtering for 'collection_event' %in% c("Baseline", "Screening") or VJ_INTERVAL == 'Baseline'
    • d_visit_specimen_id
      • Indicates a specific visit by a patient (n=358)
      • MMRF_####_Y
        • Y is a number indicate that this is the 'Y' sample obtained from said patient. This does not correspond to a specific timepoint.
      • This is a standard identifier, which is used across all MMRF CoMMpass datasets
      • The purpose of the visit is indicated in 'collection_event' (Baseline, Relapse, Remmission, etc.). The approximate interval the visit corresponds to is in "VJ_INTERVAL"
      • d_visit_specimen_id uniquely maps to one public_id
      • d_visit_specimen_id can map to multiple aliquot_ids
    • aliquot_id
      • Refers to the specific bone marrow aliquot sample processed (n=361)
      • MMRFA-######
      • This is a unique identifier for each processed scRNA-seq sample.
      • As of now, this uniquely maps to a combination of d_visit_specimen_id, Study_Site, and Batch
      • As of now, is an identifier specific to the MMRF ImmuneAtlas

    Each cell has the following annotation information:

    • subcluster_V03072023
      • These refer to an individual cluster derived from 'Seurat'.
      • Format is 'Compartment'.'Compartment-cluster'.'Compartment-subcluster'
        • 'NkT.2.2', indicates this cell is in the 'Natural Killer + T Cell compartment', was originally part of 'Cluster 2', and then was further separated into a refined subcluster 2.2'
        • If a parent cluster did not need to be further seprated, the 'Compartment-subcluster' part is omitted (e.g., 'NkT.6')
      • As of now, this uniquely maps to a specific cellID_short annotation.
      • Clustering was done on a per compartment basis
        • For most immune cell types, clustering was based on embeddings corrected for 'siteXbatch'. For Plasma, clustering was performed on embeddings corrected on a per-sample basis.
      • In the combined validation object, DISCOVERY.subcluster_V03072023 will contain values only for the discovery cohort, and have NA values for validation samples.
    • subcluster_V03072023_compartment
      • These refer to one of five major compartments as identified roughly on the original UMAP. Clustering was performed on a per-compartment basis following a first pass rough annotation.
      • The possible compartments are
        • NkT (T cell + Natural Killer Cells)
        • Myeloid (Monocytes, Macrophages, Dendritic cells, Neutrophil/Granulocyte populations)
        • BEry (B Cell, Erythroblasts, bone marrow progenitor populations, pDCs)
        • Ery (Erythrocyte population)
        • Plasma (Plasma cell populations)
      • Each compartment has it's own UMAP generated, which can be accessed in the 'umap.sub' reduction
      • One cluster was isolated from all other populations, and was not assigned to a compartment. This cluster is labeled as 'Full.23'.
      • In the combined validation object, DISCOVERY.subcluster_V03072023_compartment will contain values only for the discovery cohort, and have NA values for validation samples.
    • cellID_short
      • This is the individual annotation for each cluster.
      • Please see the 'Cell Population Annotation Dictionary' for further details.
      • If different seurat clusters were assigned similar annotations, the celltype annotation will be appended with a distinct cluster gene, or with '_b', '_c'
    • lineage_group
      • This is an annotation driven grouping of clusters into major immune populations, as shown in Figure 2.
      • This includes "CD8", "CD4", "M" (Myeloid), "B" (B cell), "E" (Erythroid), "P" (Plasma), "Other" (HSC, Fibro, pDC_a), "LQ" (Doublet)
    • isDoublet
      • This is a binary 'True' or 'False' derived from manual review of clusters following doublet analysis, as described in the paper.
      • True indicates the cluster was determined to be a doublet population.
      • This is derived from 'doublet_pred', in which 'dblet_cluster' and 'poss_dblet_cluster' were flagged as doublet populations for subsequent analysis.
      • In the validation object, the doublet status of new samples were inferred by if label transfer from the discovery cohort mapped the cell from the new sample as one of the previously identified doublet populations. The raw doublet scores from doublet finder, pegasus, or scrublet, are not included in this release.

    --

    Each sample has the following information indicating shipment batches, for batch correction

    • Study_Site
      • The center which processed a specific aliquot_id
      • EMORY, MSSM, WashU, MAYO
    • Batch
      • The shipment batch the sample was associated with
      • Valued 1 to 3 for EMORY, MSSM, MAYO, and 1 to 4 for WashU
    • siteXbatch
      • A combination of the above to variables, to be used for batch correction
    • (Combined Validation Object only): cohort
      • Indicates if the sample was involved in the 'discovery' cohort, or 'validation' cohort. Samples in the 'validation' cohort will have labels inferred from label mapping

    --

    Each public_id has limited demographic information based on publicly available information in the MMRF CoMMpass study.

    • d_pt_sex
      • Patient sex (not self-identified). Male or Female
    • d_pt_race_1
      • Patient self-identified race
    • d_pt_ethnicity
      • Patient self-identified ethnicity
    • d_dx_amm_age
      • Patient age at diagnosis.
      • Not reported for patients above 90 at diagnosis
    • d_dx_amm_bmi
      • Patient BMI at diagnosis
    • d_pt_height_cm
      • Patient height at diagnosis, in centimeters.
    • d_dx_amm_weight_kg
      • Patient weight at diagnosis, in kilograms

    d_specimen_visit_id contains two data points providing limited information about the visit

    • collection_event
      • Description of why the sample was collected
        • e.g., 'Baseline' and 'Screening' indicates the sample was obtained prior to therapy
        • 'Relapse/Progression' indicates the sample was collected due to disease progression based on clinical assessment
        • 'Remission/Response' indicates the sample was collected due to patient entering remission based on clinical assessment
        • Samples may be collected for reasons independent of the above, such as 'Pre' or 'Post' ASCT, or for other unspecified reasons
    • VJ_INTERVAL
      • Indicates the rough interval following start of therapy the sample is assigned to
        • "Baseline", "Month 3", "Year 2", etc.

    All the single-cell raw data, along with outcome and cytogenetic information, is available at MMRF’s VLAB shared resource. Requests to access these data will be reviewed by data access committee at MMRF and any data shared will be released under a data transfer agreement that will protect the identities of patients involved in the study. Other information from the CoMMpass trial can also generally be

  7. a

    2019-2021 HSIP Cluster

    • hub.arcgis.com
    • geodot-massdot.hub.arcgis.com
    • +2more
    Updated Jul 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massachusetts geoDOT (2024). 2019-2021 HSIP Cluster [Dataset]. https://hub.arcgis.com/maps/MassDOT::2019-2021-hsip-cluster-
    Explore at:
    Dataset updated
    Jul 2, 2024
    Dataset authored and provided by
    Massachusetts geoDOT
    Area covered
    Description

    The top locations where reported collisions occurred at intersections have been identified. The crash cluster analysis methodology for the top intersection clusters uses a fixed meter search distance of 25 meters (82 ft.) to merge crash clusters together. This analysis was based on crashes where a police officer specified one of the following junction types: Four way intersection, T-intersection, Y-intersection, five point or more. Furthermore, the methodology uses the Equivalent Property Damage Only (EPDO) weighting to rank the clusters. EPDO is based any type of injury crash (including fatal, incapacitating, non-incapacitating and possible) having a weighting of 21 compared to a property damage only crash (which has weighting of 1). The clustering analysis used crashes from the three year period from 2019-2021. The area encompassing the crash cluster may cover a larger area than just the intersection so it is critical to view these spatially.

  8. d

    Environmental variables and multivariate seabed classification maps via k...

    • search.dataone.org
    • doi.pangaea.de
    • +1more
    Updated Jan 7, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jerosch, Kerstin; Scharf, Frauke Katharina; Pehlke, Hendrik; Weber, Lukas; Abele, Doris (2018). Environmental variables and multivariate seabed classification maps via k means clustering of Potter Cove, Antarctica, link to input and result files in GeoTIFF format [Dataset]. http://doi.org/10.1594/PANGAEA.856971
    Explore at:
    Dataset updated
    Jan 7, 2018
    Dataset provided by
    PANGAEA Data Publisher for Earth and Environmental Science
    Authors
    Jerosch, Kerstin; Scharf, Frauke Katharina; Pehlke, Hendrik; Weber, Lukas; Abele, Doris
    Area covered
    Description

    This study subdivides the Potter Cove, King George Island, Antarctica, into seafloor regions using multivariate statistical methods. These regions are categories used for comparing, contrasting and quantifying biogeochemical processes and biodiversity between ocean regions geographically but also regions under development within the scope of global change. The division obtained is characterized by the dominating components and interpreted in terms of ruling environmental conditions. The analysis includes in total 42 different environmental variables, interpolated based on samples taken during Australian summer seasons 2010/2011 and 2011/2012. The statistical errors of several interpolation methods (e.g. IDW, Indicator, Ordinary and Co-Kriging) with changing settings have been compared and the most reasonable method has been applied. The multivariate mathematical procedures used are regionalized classification via k means cluster analysis, canonical-correlation analysis and multidimensional scaling. Canonical-correlation analysis identifies the influencing factors in the different parts of the cove. Several methods for the identification of the optimum number of clusters have been tested and 4, 7, 10 as well as 12 were identified as reasonable numbers for clustering the Potter Cove. Especially the results of 10 and 12 clusters identify marine-influenced regions which can be clearly separated from those determined by the geological catchment area and the ones dominated by river discharge.

  9. f

    Data_Sheet_1_Physical Activity-Related Profiles of Female Sixth-Graders...

    • frontiersin.figshare.com
    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joachim Bachner; David J. Sturm; Xavier García-Massó; Javier Molina-García; Yolanda Demetriou (2023). Data_Sheet_1_Physical Activity-Related Profiles of Female Sixth-Graders Regarding Motivational Psychosocial Variables: A Cluster Analysis Within the CReActivity Project.CSV [Dataset]. http://doi.org/10.3389/fpsyg.2020.580563.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Joachim Bachner; David J. Sturm; Xavier García-Massó; Javier Molina-García; Yolanda Demetriou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionAdolescents’ physical activity (PA) behavior can be driven by several psychosocial determinants at the same time. Most analyses use a variable-based approach that examines relations between PA-related determinants and PA behavior on the between-person level. Using this approach, possible coexistences of different psychosocial determinants within one person cannot be examined. Therefore, by applying a person-oriented approach, this study examined (a) which profiles regarding PA-related psychosocial variables typically occur in female sixth-graders, (b) if these profiles deliver a self-consistent picture according to theoretical assumptions, and (c) if the profiles contribute to the explanation of PA.Materials and MethodsThe sample comprised 475 female sixth-graders. Seventeen PA-related variables were assessed: support for autonomy, competence and relatedness in PE as well as their satisfaction in PE and leisure-time; behavioral regulation of exercise (five subscales); self-efficacy and social support from friends and family (two subscales). Moderate-to-vigorous PA was measured using accelerometers. Data were analyzed using the self-organizing maps (SOM) analysis, a cluster analysis including an unsupervised algorithm for non-linear models.ResultsAccording to the respective level of psychosocial resources, a positive, a medium and a negative cluster were identified. This superordinate cluster solution represented a self-consistent picture that was in line with theoretical assumptions. The three-cluster solution contributed to the explanation of PA behavior, with the positive cluster accumulating an average of 6 min more moderate-to-vigorous PA per day than the medium cluster and 10 min more than the negative cluster. Additionally, SOM detected a subgroup within the positive cluster that benefited from a specific combination of intrinsic and external regulations with regard to PA.DiscussionThe results underline the relevance of the assessed psychosocial determinants of PA behavior in female sixth-graders. The results further indicate that the different psychosocial resources within a given person do not develop independently of one another, which supports the use of a person-oriented approach. In addition, the SOM analysis identified subgroups with specific characteristics, which would have remained undetected using variable-based approaches. Thus, this approach offers the possibility to reduce data complexity without overlooking subgroups with special demands that go beyond the superordinate cluster solution.

  10. Research data supporting: "Relevant, hidden, and frustrated information in...

    • zenodo.org
    zip
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chiara Lionello; Chiara Lionello; Matteo Becchi; Matteo Becchi; Simone Martino; Simone Martino; Giovanni M. Pavan; Giovanni M. Pavan (2025). Research data supporting: "Relevant, hidden, and frustrated information in high-dimensional analyses of complex dynamical systems with internal noise" [Dataset]. http://doi.org/10.5281/zenodo.14529457
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Chiara Lionello; Chiara Lionello; Matteo Becchi; Matteo Becchi; Simone Martino; Simone Martino; Giovanni M. Pavan; Giovanni M. Pavan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the set of data shown in the paper "Relevant, hidden, and frustrated information in high-dimensional analyses of complex dynamical systems with internal noise", published on arXiv (DOI: 10.48550/arXiv.2412.09412).

    The scripts contained herein are:

    1. PCA-Analysis.py: python script to calculate the SOAP descriptor, denoising it, and compute the Principal Component Analysis
    2. SOAP-Component-Analysis.py: python script to calculate the variance of the single SOAP components
    3. Hierarchical-Clustering.py: python script to compute the hierarchical clustering and plot the dataset
    4. OnionClustering-1d.py: script to compute the Onion clustering on a single SOAP component or principal component
    5. OnionClustering-2d.py: script to compute bi-dimensional Onion clustering
    6. OnionClustering-plot.py: script to plot the Onion plot, removing clusters with population <1%
    7. UMAP.py: script to compute the UMAP dimensionality reduction technique

    To reproduce the data of this work you should start form SOAP-Component-Analysis.py to calculate the SOAP descriptor and select the components that are interesting for you, then you can calculate the PCA with PCA-Analysis.py, and applying the clustering based on your necessities (OnionClustering-1d.py, OnionClustering-2d.py, Hierarchical-Clustering.py). Further modifications of the Onion plot can be done with the script: OnionClustering-plot.py. Umap can be calculated with UMAP.py.

    Additional data contained herein are:

    1. starting-configuration.gro: gromacs file with the initial configuration of the ice-water system
    2. traj-ice-water-50ns-sampl4ps.xtc: trajectory of the ice-water system sampled every 4 ps
    3. traj-ice-water-50ns-sampl40ps.xtc: trajectory of the ice-water system sampled every 40 ps
    4. some files containing the SOAP descriptor of the ice-water system: ice-water-50ns-sampl40ps.hdf5, ice-water-50ns-sampl40ps_soap.hdf5, ice-water-50ns-sampl40ps_soap.npy, ice-water-50ns-sampl40ps_soap-spavg.npy
    5. PCA-results: folder that contains some example results of the PCA
    6. UMAP-results: folder that contains some example results of UMAP

    The data related to the Quincke rollers can be found here: https://zenodo.org/records/10638736

  11. f

    ProjecTILs murine reference atlas of virus-specific CD8 T cells, version 2

    • figshare.com
    application/gzip
    Updated Jul 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massimo Andreatta; Santiago Carmona (2023). ProjecTILs murine reference atlas of virus-specific CD8 T cells, version 2 [Dataset]. http://doi.org/10.6084/m9.figshare.23764572.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jul 26, 2023
    Dataset provided by
    figshare
    Authors
    Massimo Andreatta; Santiago Carmona
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We have developed ProjecTILs, a computational approach to project new data sets into a reference map of T cells, enabling their direct comparison in a stable, annotated system of coordinates. Because new cells are embedded in the same space of the reference, ProjecTILs enables the classification of query cells into annotated, discrete states, but also over a continuous space of intermediate states. By comparing multiple samples over the same map, and across alternative embeddings, the method allows exploring the effect of cellular perturbations (e.g. as the result of therapy or genetic engineering) and identifying genetic programs significantly altered in the query compared to a control set or to the reference map. We illustrate the projection of several data sets from recent publications over two cross-study murine T cell reference atlases: the first describing tumor-infiltrating T lymphocytes (TILs), the second characterizing acute and chronic viral infection. Single-cell data to build the virus-specific CD8 T cell reference map were downloaded from GEO under the following entries: GSE131535, GSE134139 and GSE119943, selecting only samples in wild type conditions. Data for the Ptpn2-KO, Tox-KO and CD4-depletion projections were obtained from entries GSE134139, GSE119943, and GSE137007 and were not included in the construction of the reference map. To construct the LCMV reference map, we split the dataset into five batches that displayed strong batch effects, and applied STACAS (https://github.com/carmonalab/STACAS) to mitigate its confounding effects. We computed 800 variable genes per batch, excluding cell cycling genes, ribosomal and mitochondrial genes, and computed pairwise anchors using 200 integration genes, and otherwise default STACAS parameters. Anchors were filtered at the default threshold 0.8 percentile, and integration was performed with the IntegrateData Seurat3 function with the guide tree suggested by STACAS. Next, we performed unsupervised clustering of the integrated cell embeddings using the Shared Nearest Neighbor (SNN) clustering method implemented in Seurat 3 with parameters {resolution=0.4, reduction=”pca”, k.param=20}. We then manually annotated individual clusters (merging clusters when necessary) based on several criteria: i) average expression of key marker genes in individual clusters; ii) gradients of gene expression over the UMAP representation of the reference map; iii) gene-set enrichment analysis to determine over- and under- expressed genes per cluster using MAST. In order to have access to predictive methods for UMAP, we recomputed PCA and UMAP embeddings independently of Seurat3 using respectively the prcomp function from basic R package “stats”, and the “umap” R package (https://github.com/tkonopka/umap).

  12. n

    Data from: Neo-taphonomic analysis of the Misiam leopard lair

    • data.niaid.nih.gov
    • produccioncientifica.ucm.es
    • +2more
    zip
    Updated Aug 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuel Domínguez-Rodrigo (2022). Neo-taphonomic analysis of the Misiam leopard lair [Dataset]. http://doi.org/10.5061/dryad.34tmpg4n2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 18, 2022
    Dataset provided by
    Rice University
    Authors
    Manuel Domínguez-Rodrigo
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The data set presented here contains the MAU% data for the selected hyena-made and leopard-made faunal assemblages with which the Misiam assemblage is compared. Misiam is a recently discovered modern faunal accumulation found at Olduvai Gorge (Tanzania) interpreted as a palimpsest resulting from the action of leopards (main transporting agents) and hyenas (secondary scavengers). It is the first open-air reported leopard-made faunal accumulation. Defining the anatomical and taphonomic characteristics of such an assembllage is important for the interpretation of prehistoric faunal assemblages created by carnivores. It is also relevant for modern ecological studies. In this particular case, the bulk of the assemblage is composed of wildebeests. This is usually not the target of leopards; however, their seasonal abundance during the wildebeest migration on the plains adjacent to Olduvai Gorge prompts this rather exceptional highly-specialized behavior by usually eclectic leopards. In the present work, a thorough taphonomic analysis is carried out and the main taxonomic, anatomical and taphonomic characteristics of this felid-hyenic modified assemblage is decribed. The analytical approach adopted uses the data presented here. Methods The Misiam data were collected in the field. The bone assemblage lay on the surface of a densely-vegetated ravine. Bones were simply collected and in one particular area an excavation was m,ade to retrieve bones sub-surficially, In order to compare skeletal profiles in felid and hyenid assemblages, we will use some of the most representative assemblages in the literature. For spotted hyena dens, we will use data from the Koobi Fora Hyena Den 1 (KFHD1) , the Amboseli den, the Maasai Mara den, and the Syokimau den, all of them in Kenya, and the Eyasi (Kisima Ngeda) Hyena Den 2 (KND2) (Tanzania). We used these assemblages also because they are either dominated by size 3 carcasses or these make up a significant part of the assemblage.

    When comparing long bone shaft breakage patterns, we also used additional hyena-made assemblages: Dumali, Heraide, Yangula Ari, Oboley (spotted hyenas), Datagabou (striped hyena, Djibouti), and Uniab (brown hyena, Namibia). These assemblages are almost completely dominated by very small fauna (Capra hircus), and several of them constitute significantly smaller sample sizes than the hyena dens mentioned above.

    The leopard lairs used for comparison are: Portsmut and Hakos River (Namibia), and WU/BA-001 (South Africa). Portsmut and Hakos River show a low density of remains, probably also modified by porcupines or other agents. The remains belonging to larger animals show an interesting contrast with those documented in hyena dens: the presence of axial and compact bones is high. These latter bones are also well represented in smaller carcasses. This characteristic is more marked in WU/BA-001; the least altered leopard lair documented to date. This lair was monitored for 7 years.

    All the comparative assemblages were transformed into %MAU to account for differential inter-assemblage quantitative representation. First, they were analyzed using Generalized Low Rank Models (GLRM) as an exploratory method. Then, we used a Uniform Manifold Approximation and Projection (UMAP), to classify leopards´ and hyenas´ bone assemblages, especially according to each feature. Lastly, we used a cluster analysis with variance-dependent phylogenetic tree to show the actual distances among all the assemblages compared.

    GLRM are a series of methods for dimensionality reduction that use several loss function types and can implement regularization functions. Whereas principal component analysis (PCA) is based on orthogonal projections of linear relationships, in cases where relationships are non-linear, the PCA underperforms compared to other more flexible methods. GLRM decomposes a table into two distinctive matrices X and Y. X contains the same number of rows as the original table, but all variables are condensed into k factors. Y has k rows and the same number of columns as features (i.e., variables) in the original table. Each of the rows is an archetypal feature derived from the columns (i.e., variables) of the original table. Each row of X corresponds to a row of the original table projected into this reduced dimension feature space. Data are compressed by the low-rank representation derived from k feature reduction. An advantage of GLRM over PCA is that it can handle mixed datasets containing numeric, categorical and Boolean data. GLRM admits several types of loss functions: Huber, Poisson, quadratic, periodic or hinge. It also allows the use of regularization functions, including: Lasso, Ridge, OneSparse, Simplex, UnitOneSparse, and quadratic. Loss functions are used to select the optimal archetypal values. Regularization is used to limit X and Y archetypal values. This impacts the effect of negative data, multicollinearity and overfitting. In the present analysis, GLRM was performed with the “h2o” R library (www.r-project.org).

    UMAPs is a non-linear dimension-reduction method based on finding inter-case distances in a low-dimensional feature space. The key of UMAP over other dimension-reduction non-linear methods, like t-distributed stochastic neighbor embedding (t-SNE), is that distances are generated along a “manifold”. A manifold is a n-dimensional geometric shape constituted of the path(s) among the points. Every point is referenced according to a small two-dimensional neighborhood around it. The UMAP algorithm searches for a multi-dimensional space delimited by the location of points. UMAP uses a nearest-neighbor approach, by eventually connecting all the points along its search regions. This forces a uniform distribution of points. The distances of points along this manifold are then derived through Euclidean distances. Several optimization methods can be used to reproduce inter-point distances. For the latter process, the UMAP approach that we will use is based on a cross-entropy loss function. For the UMAP analysis, we have used the “umap” R library (www.r-project.org). We have also used a search grid combining ranges of values for number of neighbors, minimal distance between neighbors, distance metric, and number of epochs (i.e., iterations of the optimization process).

    Finally, a hierarchical cluster analysis, using an Euclidean distance matrix on the %MAU dataset, was carried out. The method used was the “average” linkage, which represents the average distance between the points. The combination of the three methods was used to study agent-specific variability in inter-assemblage element representation.

  13. d

    Data from: Taxonomic revision of Stigmatomma Roger (Hymenoptera: Formicidae)...

    • datadryad.org
    • data.niaid.nih.gov
    • +2more
    zip
    Updated Jun 6, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Flavia A. Esteves; Brian L. Fisher (2017). Taxonomic revision of Stigmatomma Roger (Hymenoptera: Formicidae) in the Malagasy region [Dataset]. http://doi.org/10.5061/dryad.m7340
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 6, 2017
    Dataset provided by
    Dryad
    Authors
    Flavia A. Esteves; Brian L. Fisher
    Time period covered
    2017
    Area covered
    Madagascar, Seychelles
    Description

    R script for clustering specimens based on measurement dataScript for performing UPGMA hierarchical cluster analysis on the R platform.R script for clustering.pdfR script for Principal Component Analysis (PCA): specimens on a morphometric ordination spaceScript for performing Principal Component Analysis (PCA) on the R platform.R script for PCA.pdfScript for mapping the distribution of Stigmatomma species in Madagascar and SeychellesR code for making species distribution maps. Note: Our maps use the ecoregion outlines of Madagascar, which were based on the vector data disclosed by the Terrestrial Ecoregions of the World (available at the WWF website). However, the original outlines were slightly mismatching the relief of Madagascar. To solve this, we combined the original ecoregion data with data from the Remaining Primary Vegetation of Madagascar (available at the Kew Royal Botanic Gardens website), which has more natural outlines.Linear morphometry of Stigmatomma species in the Malaga...

  14. g

    Stacked species distribution models of deep-sea corals and sponges off the...

    • gimi9.com
    • catalog.data.gov
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Stacked species distribution models of deep-sea corals and sponges off the United States west coast (NCEI Accession 0303081) [Dataset]. https://gimi9.com/dataset/data-gov_4df66e89eb2e5e5f719dccc3dedf419046f8f9a1
    Explore at:
    Dataset updated
    Jun 3, 2025
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    West Coast of the United States, United States
    Description

    These data are a set of raster maps of community-level predictions of deep-sea coral and sponge taxa distributions off the continental U.S. west coast, spanning depths from 50 to 1200 m. The raster files come in two versions: one where predicted distribution suitability range from 0 - 1 and one where the predicted suitability is classified into five classes; very low (0–0.2), low (0.21–0.40), moderate (0.41–0.60), high (0.61–0.80) and very high (0.81–1.00). These raster maps were derived from 40 habitat suitability models (HSMs) conducted at the genus- and species-level maps done by Poti et al. (2020). A cluster analysis of the original individually-modeled taxa identified 10 groups whose member HSMs were stacked and averaged to produce a stacked species distribution model (S-SDM). Further details about the generation of the S-SDMs and their interpretation can be found in Shantharam et al. (2025).

  15. f

    Statistical separation of UMAP clusters between each real dataset and its...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sam T. M. Ball; Numan Celik; Elaheh Sayari; Lina Abdul Kadir; Fiona O’Brien; Richard Barrett-Jolley (2023). Statistical separation of UMAP clusters between each real dataset and its GAN simulated equivalent. [Dataset]. http://doi.org/10.1371/journal.pone.0267452.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Sam T. M. Ball; Numan Celik; Elaheh Sayari; Lina Abdul Kadir; Fiona O’Brien; Richard Barrett-Jolley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistical separation of UMAP clusters between each real dataset and its GAN simulated equivalent.

  16. f

    Statistical separation of UMAP clusters between real datasets.

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sam T. M. Ball; Numan Celik; Elaheh Sayari; Lina Abdul Kadir; Fiona O’Brien; Richard Barrett-Jolley (2023). Statistical separation of UMAP clusters between real datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0267452.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Sam T. M. Ball; Numan Celik; Elaheh Sayari; Lina Abdul Kadir; Fiona O’Brien; Richard Barrett-Jolley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistical separation of UMAP clusters between real datasets.

  17. f

    Statistical separation of UMAP clusters between GAN generated datasets.

    • plos.figshare.com
    xls
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sam T. M. Ball; Numan Celik; Elaheh Sayari; Lina Abdul Kadir; Fiona O’Brien; Richard Barrett-Jolley (2023). Statistical separation of UMAP clusters between GAN generated datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0267452.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Sam T. M. Ball; Numan Celik; Elaheh Sayari; Lina Abdul Kadir; Fiona O’Brien; Richard Barrett-Jolley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistical separation of UMAP clusters between GAN generated datasets.

  18. Cluster descriptions.

    • plos.figshare.com
    xls
    Updated Nov 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Waad R. Alolayan; Jana M. Rieger; Minn N. Yoon (2023). Cluster descriptions. [Dataset]. http://doi.org/10.1371/journal.pone.0294712.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Waad R. Alolayan; Jana M. Rieger; Minn N. Yoon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the increasing focus on patient-centred care, this study sought to understand priorities considered by patients and healthcare providers from their experience with head and neck cancer treatment, and to compare how patients’ priorities compare to healthcare providers’ priorities. Group concept mapping was used to actively identify priorities from participants (patients and healthcare providers) in two phases. In phase one, participants brainstormed statements reflecting considerations related to their experience with head and neck cancer treatment. In phase two, statements were sorted based on their similarity in theme and rated in terms of their priority. Multidimensional scaling and cluster analysis were performed to produce multidimensional maps to visualize the findings. Two-hundred fifty statements were generated by participants in the brainstorming phase, finalized to 94 statements that were included in phase two. From the sorting activity, a two-dimensional map with stress value of 0.2213 was generated, and eight clusters were created to encompass all statements. Timely care, education, and person-centred care were the highest rated priorities for patients and healthcare providers. Overall, there was a strong correlation between patient and healthcare providers’ ratings (r = 0.80). Our findings support the complexity of the treatment planning process in head and neck cancer, evident by the complex maps and highly interconnected statements related to the experience of treatment. Implications for improving the quality of care delivered and care experience of head and cancer are discussed.

  19. d

    Metabolic heat maps of tea and coffee variants

    • search.dataone.org
    • doi.pangaea.de
    Updated Jan 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Montero-Vargas, Josaphat Miguel; Gonzáles-Gonzáles, Lindbergh Humberto; Galvez-Ponce, Eligio; Ramírez-Chávez, Enrique; Molina-Torres, Jorge; Chagolla, Alicia; Montagnon, Christophe; Winkler, Robert (2018). Metabolic heat maps of tea and coffee variants [Dataset]. http://doi.org/10.1594/PANGAEA.774218
    Explore at:
    Dataset updated
    Jan 5, 2018
    Dataset provided by
    PANGAEA Data Publisher for Earth and Environmental Science
    Authors
    Montero-Vargas, Josaphat Miguel; Gonzáles-Gonzáles, Lindbergh Humberto; Galvez-Ponce, Eligio; Ramírez-Chávez, Enrique; Molina-Torres, Jorge; Chagolla, Alicia; Montagnon, Christophe; Winkler, Robert
    Area covered
    Description

    High-throughput metabolic phenotyping is a challenge, but it provides an alternative and comprehensive access to the rapid and accurate characterization of plants. In addition to the technical issues of obtaining quantitative data of plenty of metabolic traits from numerous samples, a suitable data processing and statistical evaluation strategy must be developed. We present a simple, robust and highly scalable strategy for the comparison of multiple chemical profiles from coffee and tea leaf extracts, based on direct-injection electrospray mass spectrometry (DIESI-MS) and hierarchical cluster analysis (HCA). More than 3500 individual Coffea canephora and Coffea arabica trees from experimental fields in Mexico were sampled and processed using this method. Our strategy permits the classification of trees according to their metabolic fingerprints and the screening for families with desired characteristics, such as extraordinarily high or low caffeine content in their leaves.

  20. f

    Characteristics of the ecocentric vs. social-ecological clusters identified...

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Céline Fromont; Julien Blanco; Christian Culas; Emmanuel Pannier; Mireille Razafindrakoto; François Roubaud; Stéphanie M. Carrière (2023). Characteristics of the ecocentric vs. social-ecological clusters identified in respondents’ individual cognitive maps (ICMs). [Dataset]. http://doi.org/10.1371/journal.pone.0272223.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Céline Fromont; Julien Blanco; Christian Culas; Emmanuel Pannier; Mireille Razafindrakoto; François Roubaud; Stéphanie M. Carrière
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Characteristics of the ecocentric vs. social-ecological clusters identified in respondents’ individual cognitive maps (ICMs).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Chui Pin Leaw; Po Teen Lim; Li Keat Lee (2020). R script and datasets - Cluster Analysis and Heat maps [Dataset]. http://doi.org/10.6084/m9.figshare.12387242.v2
Organization logo

R script and datasets - Cluster Analysis and Heat maps

Explore at:
txtAvailable download formats
Dataset updated
May 30, 2020
Dataset provided by
figshare
Authors
Chui Pin Leaw; Po Teen Lim; Li Keat Lee
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This folder contained R scripts and data sets used to generate clustering dendogram and heatmaps as shown Fig. 3.

Search
Clear search
Close search
Google apps
Main menu