37 datasets found
  1. f

    Pseudocode for the PCA-K-medoids clustering algorithm.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lin Qi; Yunjie Xie; Qianqian Zhang; Jian Zhang; Yanhong Ma (2025). Pseudocode for the PCA-K-medoids clustering algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0316277.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Lin Qi; Yunjie Xie; Qianqian Zhang; Jian Zhang; Yanhong Ma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pseudocode for the PCA-K-medoids clustering algorithm.

  2. r

    Fuzzy Clustering of Interval Time Series

    • researchdata.edu.au
    Updated Dec 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elizabeth Ann Maharaj; Elizabeth Ann Maharaj (2019). Fuzzy Clustering of Interval Time Series [Dataset]. http://doi.org/10.26180/5dcccac222d76
    Explore at:
    Dataset updated
    Dec 9, 2019
    Dataset provided by
    Monash University
    Authors
    Elizabeth Ann Maharaj; Elizabeth Ann Maharaj
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Simulation Data sets

    Set 1 : c
    Set 2: d
    Set 3: g
    Set 4 : i

  3. o

    Data for "A Quantum Definition of Molecular Structure"

    • explore.openaire.eu
    Updated Oct 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucas Lang; Henrique Musseli Cezar; Ludwik Adamowicz; Thomas Bondo Pedersen (2023). Data for "A Quantum Definition of Molecular Structure" [Dataset]. http://doi.org/10.5281/zenodo.8421051
    Explore at:
    Dataset updated
    Oct 9, 2023
    Authors
    Lucas Lang; Henrique Musseli Cezar; Ludwik Adamowicz; Thomas Bondo Pedersen
    Description

    This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 101025672. This work was supported by the Research Council of Norway through its Centres of Excellence scheme, project no. 262695. T.B.P acknowledges the support of the Centre for Advanced Study in Oslo, Norway, which funded and hosted the CAS research project Attosecond Quantum Dynamics Beyond the Born-Oppenheimer Approximation during the academic year 2021-2022. Partial support from the National Science Foundation (grant no. 1856702) is also acknowledged. Supplemental data for our article "A Quantum Definition of Molecular Structure". Version 1.1.0 contains data for additional k-medoids runs performed on different subsets of the complete sample.

  4. f

    Identification of groups in k-medoids clustering approach.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. Mofizul Islam; Jose M. Valderas; Laurann Yen; Paresh Dawda; Tanisha Jowsey; Ian S. McRae (2023). Identification of groups in k-medoids clustering approach. [Dataset]. http://doi.org/10.1371/journal.pone.0083783.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    M. Mofizul Islam; Jose M. Valderas; Laurann Yen; Paresh Dawda; Tanisha Jowsey; Ian S. McRae
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Identification of groups in k-medoids clustering approach.

  5. Data for "A Quantum Definition of Molecular Structure"

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucas Lang; Lucas Lang; Henrique Musseli Cezar; Henrique Musseli Cezar; Ludwik Adamowicz; Ludwik Adamowicz; Thomas Bondo Pedersen; Thomas Bondo Pedersen (2023). Data for "A Quantum Definition of Molecular Structure" [Dataset]. http://doi.org/10.5281/zenodo.10182902
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lucas Lang; Lucas Lang; Henrique Musseli Cezar; Henrique Musseli Cezar; Ludwik Adamowicz; Ludwik Adamowicz; Thomas Bondo Pedersen; Thomas Bondo Pedersen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplemental data for our article "A Quantum Definition of Molecular Structure".

    Version 1.1.0 contains data for additional k-medoids runs performed on different subsets of the complete sample.

  6. d

    A universal probe set for targeted sequencing of 353 nuclear genes from any...

    • search.dataone.org
    • explore.openaire.eu
    • +2more
    Updated Apr 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew G. Johnson; Lisa Pokorny; Steven Dodsworth; Laura R. Botigue; Robyn S. Cowan; Alison Devault; Wolf L. Eiserhardt; Niroshini Epitawalage; Félix Forest; Jan T. Kim; James Leebens-Mack; Ilia J. Leitch; Olivier Maurin; Doug Soltis; Pamela S. Soltis; Gane Ka-Shu Wong; William J. Baker; Norman Wickett (2025). A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-medoids clustering [Dataset]. http://doi.org/10.5061/dryad.s3h9r6j
    Explore at:
    Dataset updated
    Apr 18, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Matthew G. Johnson; Lisa Pokorny; Steven Dodsworth; Laura R. Botigue; Robyn S. Cowan; Alison Devault; Wolf L. Eiserhardt; Niroshini Epitawalage; Félix Forest; Jan T. Kim; James Leebens-Mack; Ilia J. Leitch; Olivier Maurin; Doug Soltis; Pamela S. Soltis; Gane Ka-Shu Wong; William J. Baker; Norman Wickett
    Time period covered
    Jan 1, 2018
    Description

    Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants). We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes while minimizing the cost of production, we introduce a k-medoids clustering approach to identify the minimum number of sequences necessary to repr...

  7. 4

    Data underlying the MSc thesis: City Clustering based on topology, activity...

    • data.4tu.nl
    zip
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tian Zwart (2025). Data underlying the MSc thesis: City Clustering based on topology, activity distribution, and mobility - A study on 32 European cities using K-Means, K-Medoids, and Ward's Method [Dataset]. http://doi.org/10.4121/89625b76-8ff4-45df-90ff-def3ef1986ad.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 23, 2025
    Dataset provided by
    4TU.ResearchData
    Authors
    Tian Zwart
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2020 - 2025
    Description

    In this dataset, 17 indicators have been collected and/or calculated for 32 European cities. For certain characteristics, plots have been made and included in this dataset. Finally, the urban borders and the cluster assignments for each city are also included for reproducibility.

  8. f

    Clustering of 10 datasets (generated after performing 10 attribute weighting...

    • figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mansour Ebrahimi; Amir Lakizadeh; Parisa Agha-Golzadeh; Esmaeil Ebrahimie; Mahdi Ebrahimi (2023). Clustering of 10 datasets (generated after performing 10 attribute weighting algorithms) into T (mesophile) and F (thermophile) classes by four different unsupervised clustering algorithms (K-Means, K-Medoids, SVC and EMC). [Dataset]. http://doi.org/10.1371/journal.pone.0023146.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Mansour Ebrahimi; Amir Lakizadeh; Parisa Agha-Golzadeh; Esmaeil Ebrahimie; Mahdi Ebrahimi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The actual numbers of T (mesostable) and F (thermostable) classes in the original datasets were 1544 and 513, respectively. The highest accuracy (100%) was observed when the EMC clustering method was applied to datasets generated by Correlation and Uncertainty attribute weighting algorithms that highlighted in the table.

  9. f

    Data from: K-medoids clustering.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giordano Mancini; Costantino Zazza (2023). K-medoids clustering. [Dataset]. http://doi.org/10.1371/journal.pone.0137075.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Giordano Mancini; Costantino Zazza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For a given nC (first row) the Dunn index (DI), the Davies-Bouldin index (DBI) and the size of the various clusters are shown. The column showing the best nC value is typed in italics. Frame set #1 was used in the analysis.

  10. f

    Results of ANOVA and eta squared comparing proportion of variance in child...

    • figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lauren Eyler; Alan Hubbard; Catherine Juillard (2023). Results of ANOVA and eta squared comparing proportion of variance in child height-for-age Z-score, women’s literacy score, and proportion of children who are deceased accounted for by Cameroonian DHS Wealth Index and EconomicClusters models. [Dataset]. http://doi.org/10.1371/journal.pone.0217197.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Lauren Eyler; Alan Hubbard; Catherine Juillard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Cameroon
    Description

    Results of ANOVA and eta squared comparing proportion of variance in child height-for-age Z-score, women’s literacy score, and proportion of children who are deceased accounted for by Cameroonian DHS Wealth Index and EconomicClusters models.

  11. f

    5th level results.

    • plos.figshare.com
    xls
    Updated Mar 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sania Gul; Muhammad Salman Khan; Ata Ur-Rehman (2024). 5th level results. [Dataset]. http://doi.org/10.1371/journal.pone.0300444.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Sania Gul; Muhammad Salman Khan; Ata Ur-Rehman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper presents a novel sound event detection (SED) system for rare events occurring in an open environment. Wavelet multiresolution analysis (MRA) is used to decompose the input audio clip of 30 seconds into five levels. Wavelet denoising is then applied on the third and fifth levels of MRA to filter out the background. Significant transitions, which may represent the onset of a rare event, are then estimated in these two levels by combining the peak-finding algorithm with the K-medoids clustering algorithm. The small portions of one-second duration, called ‘chunks’ are cropped from the input audio signal corresponding to the estimated locations of the significant transitions. Features from these chunks are extracted by the wavelet scattering network (WSN) and are given as input to a support vector machine (SVM) classifier, which classifies them. The proposed SED framework produces an error rate comparable to the SED systems based on convolutional neural network (CNN) architecture. Also, the proposed algorithm is computationally efficient and lightweight as compared to deep learning models, as it has no learnable parameter. It requires only a single epoch of training, which is 5, 10, 200, and 600 times lesser than the models based on CNNs and deep neural networks (DNNs), CNN with long short-term memory (LSTM) network, convolutional recurrent neural network (CRNN), and CNN respectively. The proposed model neither requires concatenation with previous frames for anomaly detection nor any additional training data creation needed for other comparative deep learning models. It needs to check almost 360 times fewer chunks for the presence of rare events than the other baseline systems used for comparison in this paper. All these characteristics make the proposed system suitable for real-time applications on resource-limited devices.

  12. Data Augmentation for learning mechanical digital twins of voids in welding...

    • zenodo.org
    zip
    Updated Mar 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryckelynck; Ryckelynck (2022). Data Augmentation for learning mechanical digital twins of voids in welding joints [Dataset]. http://doi.org/10.5281/zenodo.6355692
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 16, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ryckelynck; Ryckelynck
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In Source-2_Data_Augmentation:

    Exercice1_augmentation.ipynb Jupyter Notebook for data warpping of defect images.

    Exercice2_augmentation_multimodale.ipynb Jupyter Notebook for multimodal data augmentaion (defect images and mechanical fields) via oversampling

    Exercice3_clustering.ipynb Data clustering using the k-medoids algorithm applied to mechanical dissimilarity of the defects.

    k_medoids.py is a python code of a kmedoids algorithm.

    in Data:

    All_images.npy (numpy file) contains the defect images.

    All_Stresses.npy (numpy) contains mechanical fields, All_Stresses[k,i,j,ic,it] is the instance number k of the component ic of the Cauchy stress tensor at time it. The mechanical problem is decribed in 〈10.5802/crmeca.51〉. 〈hal-03113503〉.

    New_images_1.npy and New_Stresses_1.npy are augmented data for k=1.

    New_images_87.npy and New_Stresses_87.npy are augmented data for k=87.

    Dissimilarity_Stress.npy is the Frobenius norm of the distances between stress tensors (All_Stresses.npy).

  13. f

    Data from: Simple Measures of Individual Cluster-Membership Certainty for...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Jul 9, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Graham, Jinko; Liu, Dongmeng (2018). Simple Measures of Individual Cluster-Membership Certainty for Hard Partitional Clustering [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000690043
    Explore at:
    Dataset updated
    Jul 9, 2018
    Authors
    Graham, Jinko; Liu, Dongmeng
    Description

    We propose two probability-like measures of individual cluster-membership certainty which can be applied to a hard partition of the sample such as that obtained from the Partitioning Around Medoids (PAM) algorithm, hierarchical clustering or k-means clustering. One measure extends the individual silhouette widths and the other is obtained directly from the pairwise dissimilarities in the sample. Unlike the classic silhouette, however, the measures behave like probabilities and can be used to investigate an individual's tendency to belong to a cluster. We also suggest two possible ways to evaluate the hard partition using these measures. We evaluate the performance of both measures in individuals with ambiguous cluster membership, using simulated binary datasets that have been partitioned by the PAM algorithm or continuous datasets that have been partitioned by hierarchical clustering and k-means clustering. For comparison, we also present results from soft-clustering algorithms such as soft analysis clustering (FANNY) and two model-based clustering methods. Our proposed measures perform comparably to the posterior-probability estimators from either FANNY or the model-based clustering methods. We also illustrate the proposed measures by applying them to Fisher's classic data set on irises.

  14. z

    USNM_CURC_CAM – Multi-Family Beetle Label Dataset for Computer Vision...

    • zenodo.org
    csv, json
    Updated Jul 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2025). USNM_CURC_CAM – Multi-Family Beetle Label Dataset for Computer Vision Applications [Dataset]. http://doi.org/10.7479/chx2-y845
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jul 9, 2025
    Dataset provided by
    Anonymous
    Authors
    Anonymous; Anonymous
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The USNM_COL_CAM dataset includes 912 high-resolution JPEG images (3030 × 2080 pixels) of specimen labels from diverse Coleoptera families, including Buprestidae, Carabidae, Cerambycidae, Chrysomelidae, Curculionidae, and Scarabaeidae. All images were digitized by the Smithsonian National Museum of Natural History and are annotated with multi-label information. The dataset represents specimens collected across South and Central America over various historical periods and supports research in coleopterology, biodiversity informatics, and computer vision.

    The dataset is augmented with two derived data resources:

    OCR_USNM_COL_CAM.json: Transcribed label content generated using the Google Cloud Vision API, enabling automatic text extraction, content indexing, and structured metadata retrieval.

    Clustering_0.9_USNM_COL_CAM.csv: K-Medoids clustering output based on a 0.9 textual similarity threshold, useful for identifying duplicate records, grouping related specimens, and supporting scalable label processing workflows.

  15. H

    Distinguishing hydraulically-distinct floodplain types from high resolution...

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Lawson (2025). Distinguishing hydraulically-distinct floodplain types from high resolution topography with implications for broad-scale flood routing (data) [Dataset]. http://doi.org/10.4211/hs.d0c0122256244124acaf8e46a1f4b3a6
    Explore at:
    zip(259.9 MB)Available download formats
    Dataset updated
    Mar 10, 2025
    Dataset provided by
    HydroShare
    Authors
    Scott Lawson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Floodplains can have a significant impact on the routing of flood waves across the landscape, yet their representation in broad-scale water resource and flood prediction models are limited. To identify hydraulically-relevant floodplains at scale, we develop a workflow that automates the extraction of reach-averaged morphologic features from high resolution topographic data hypothesized to define a zone within the floodplain that conveys floodwaters distinctly from the surrounding landscape. This zone is identified from departures in hydraulic geometry with stage. Working in the topographically diverse Lake Champlain Basin in Vermont, USA, we apply the workflow to 2,629 reaches and use the extracted features to cluster settings similar in their proposed ability to route floodwaters. In total we identified eight clusters of reach types, two that were pre-sorted and largely lack a floodplain, and six that reflect variability in floodplain features, which were parsed out from the K-medoids clustering analysis. Clusters of floodplain types had distinct impact on the routing of synthetically-derived hydrographs, evaluated using the Muskingum-Cunge routing model. From these clusters we propose a Hydraulic Floodplain Classification, which is comparable to other geographically-defined systems but unique in its focus on the potential of the landscape to influence flood routing. The automated workflow may be repeated in other regions with high resolution topographic datasets, offering an improvement in the functionality of continental to global floodplain mapping efforts. Identification of hydraulically-effective zones has implications for improved watershed management to meet flood resiliency goals, and to improve flood predictions and warnings.

  16. f

    Cluster assignment—proteomics.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Panicucci, Brian; Levin, Michal; Barrett, Michael P.; Regnault, Clément; Dejung, Mario; Butter, Falk; Doleželová, Eva; Kunzová, Michaela; Janzen, Christian J.; Zíková, Alena (2020). Cluster assignment—proteomics. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000513432
    Explore at:
    Dataset updated
    Jun 10, 2020
    Authors
    Panicucci, Brian; Levin, Michal; Barrett, Michael P.; Regnault, Clément; Dejung, Mario; Butter, Falk; Doleželová, Eva; Kunzová, Michaela; Janzen, Christian J.; Zíková, Alena
    Description

    Gene IDs belonging to six different clusters from time-course expression profiling based on K-medoids. GO enrichment analyses performed using GO Term annotations TriTrypDB-36_TbruceiLister427_GO.gaf from TriTrypDB version 36 and Fisher’s exact test. GO, Gene Ontology. (XLSX)

  17. f

    Cluster assignment—transcriptomics.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dejung, Mario; Janzen, Christian J.; Panicucci, Brian; Doleželová, Eva; Regnault, Clément; Butter, Falk; Barrett, Michael P.; Levin, Michal; Kunzová, Michaela; Zíková, Alena (2020). Cluster assignment—transcriptomics. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000513403
    Explore at:
    Dataset updated
    Jun 10, 2020
    Authors
    Dejung, Mario; Janzen, Christian J.; Panicucci, Brian; Doleželová, Eva; Regnault, Clément; Butter, Falk; Barrett, Michael P.; Levin, Michal; Kunzová, Michaela; Zíková, Alena
    Description

    Gene IDs belonging to four different clusters from time-course expression profiling based on K-medoids. GO enrichment analyses performed using GO Term annotations TriTrypDB-36_TbruceiLister427_GO.gaf from TriTrypDB version 36 and Fisher’s exact test. GO, Gene Ontology. (XLSX)

  18. a

    BOMEC 15 Class

    • hub.arcgis.com
    Updated Nov 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Water and Atmospheric Research (2018). BOMEC 15 Class [Dataset]. https://hub.arcgis.com/maps/NIWA::bomec-15-class/explore
    Explore at:
    Dataset updated
    Nov 14, 2018
    Dataset authored and provided by
    National Institute of Water and Atmospheric Research
    Area covered
    Description

    Distributional data for eight taxonomic groups (asteroids, bryozoans, benthic foraminiferans, octocorals, polychaetes, matrix-forming scleractinian corals, sponges, and benthic fish) have been used to train an environmental classification for those parts of New Zealand's 200 n. mile Exclusive Economic Zone (EEZ) with depths of 3000 m or less. A variety of environmental variables were used as input to this process, including estimates of depth, temperature, salinity, sea surface temperature gradient, surface water productivity, suspended sediments, tidal currents, and seafloor sediments and slope. These variables were transformed using results averaged across eight Generalised Dissimilarity Modelling analyses that indicate relationships between species turnover and environment for each species group. The matrix of transformed variables was then classified using k-meDOIds clustering to identify an initial set of 300 groups of cells based on their environmental similarities, with relationships between these groups then described using agglomerative hierarchical clustering. Groups at a fifteen group level of classification appropriate for use at a whole-of-EEZ scale are described; the classification can also be used at other levels of detail, for example when higher levels of classification detail are required to discriminate variation within study areas of more limited extent. Although not formally tested in this analysis, we expect the analytical process used here to increase the biological discrimination of the environmental classification. That is, the resulting environmental groups are more likely to have similar biological characteristics than when the input environmental variables are selected, weighted, and perhaps transformed using qualitative methods. As a consequence, they are more likely to be reliable when used as "habitat classes" for the management of biological values than groups defined using alternative approaches._Item Page Created: 2018-11-14 00:08 Item Page Last Modified: 2025-04-05 16:28Owner: NIWA_OpenDataBOMEC 15 ClassNo data edit dates availableFields: ID,GRIDCODE

  19. f

    Duration of rare events in DCASE dataset.

    • figshare.com
    xls
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sania Gul; Muhammad Salman Khan; Ata Ur-Rehman (2024). Duration of rare events in DCASE dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0300444.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Sania Gul; Muhammad Salman Khan; Ata Ur-Rehman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper presents a novel sound event detection (SED) system for rare events occurring in an open environment. Wavelet multiresolution analysis (MRA) is used to decompose the input audio clip of 30 seconds into five levels. Wavelet denoising is then applied on the third and fifth levels of MRA to filter out the background. Significant transitions, which may represent the onset of a rare event, are then estimated in these two levels by combining the peak-finding algorithm with the K-medoids clustering algorithm. The small portions of one-second duration, called ‘chunks’ are cropped from the input audio signal corresponding to the estimated locations of the significant transitions. Features from these chunks are extracted by the wavelet scattering network (WSN) and are given as input to a support vector machine (SVM) classifier, which classifies them. The proposed SED framework produces an error rate comparable to the SED systems based on convolutional neural network (CNN) architecture. Also, the proposed algorithm is computationally efficient and lightweight as compared to deep learning models, as it has no learnable parameter. It requires only a single epoch of training, which is 5, 10, 200, and 600 times lesser than the models based on CNNs and deep neural networks (DNNs), CNN with long short-term memory (LSTM) network, convolutional recurrent neural network (CRNN), and CNN respectively. The proposed model neither requires concatenation with previous frames for anomaly detection nor any additional training data creation needed for other comparative deep learning models. It needs to check almost 360 times fewer chunks for the presence of rare events than the other baseline systems used for comparison in this paper. All these characteristics make the proposed system suitable for real-time applications on resource-limited devices.

  20. Observer Function Database - Asano (2015)

    • zenodo.org
    txt, xls
    Updated Nov 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuta Asano; Yuta Asano (2020). Observer Function Database - Asano (2015) [Dataset]. http://doi.org/10.5281/zenodo.3252742
    Explore at:
    xls, txtAvailable download formats
    Dataset updated
    Nov 30, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yuta Asano; Yuta Asano
    Description

    Source URL: https://www.rit.edu/cos/colorscience/re_AsanoObserverFunctions.php
    Source DOI: 10.1371/journal.pone.0145671

    Categorical observers

    Categorical observers are observer functions that would represent color-normal populations. They are finite and discrete as opposed to observer functions generated from the individual colorimetric observer model. Thus, they would offer more convenient and practical approaches for the personalized color imaging workflow and color matching analyses. Categorical observers were derived in two steps. At the first step, 10,000 observer functions were generated from the individual colorimetric observer model using Monte Carlo simulation. At the second step, the cluster analysis, a modified k-medoids algorithm, was applied to the 10,000 observers minimizing the squared Euclidean distance in cone fundamentals space, and categorical observers were derived iteratively. Since the proposed categorical observers are defined by their physiological parameters and ages, their CMFs can be derived for any target field size.

    Categorical observers were ordered by the importance; the first categorical observer vas the average observer equivalent to CIEPO06 with 38 year-old for a given field size, followed by the second most important categorical observer, the third, and so on.

    The color matching analyses showed that ten categorical observers are good for general use and convenience to represent color normal populations. On average, the prediction error improvement was small after adding tenth categorical observers, and the prediction errors became one-third by introducing ten observers. Nevertheless, readers should be aware that the number of required categorical observers varies depending on an application (a pair of spectra viewed by observers). For example, the simulation revealed that as many as 50 categorical observers would be required to predict individual observers’ matches satisfactorily when a laser projector is viewed.

    Matlab code for the categorical observers and CMFs as well as model parameters for ten categorical observers are available for download below.

    151 color-normal observers

    CMFs of 151 color-normal observers were estimated by combining the individual colorimetric observer model and the color matching proposed in Asano’s PhD dissertation. The color matching consisted of five color matches aimed to highlight and detect inter-observer variability among color-normals. To obtain a set of CMFs for a given human observer, at first, the observer performed the five color matches with three repetitions. Then, his/her eight physiological parameters (used in the individual colorimetric observer model) were estimated from the color matching results by a non-linear optimization. The objective function was to optimize the eight physiological parameters such that the color differences between the human observer results and model predictions were minimized. Finally, the CMFs were reconstructed from the estimated physiological parameters and the observer's real age.

    The estimated CMFs for 151 color-normal human observers, the corresponding model parameters, and other information such as gender, experience in color-related subjective experiments, ethnic origin, color deficiency in family, diabetes, and intra-observer variability (Mean Color Difference from the Mean using CIEDE2000) for each of the 151 observers are available for download

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lin Qi; Yunjie Xie; Qianqian Zhang; Jian Zhang; Yanhong Ma (2025). Pseudocode for the PCA-K-medoids clustering algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0316277.t003

Pseudocode for the PCA-K-medoids clustering algorithm.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Jan 3, 2025
Dataset provided by
PLOS ONE
Authors
Lin Qi; Yunjie Xie; Qianqian Zhang; Jian Zhang; Yanhong Ma
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Pseudocode for the PCA-K-medoids clustering algorithm.

Search
Clear search
Close search
Google apps
Main menu