37 datasets found

f
Pseudocode for the PCA-K-medoids clustering algorithm.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lin Qi; Yunjie Xie; Qianqian Zhang; Jian Zhang; Yanhong Ma (2025). Pseudocode for the PCA-K-medoids clustering algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0316277.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316277.t003
Dataset updated
Jan 3, 2025
Dataset provided by
PLOS ONE
Authors
Lin Qi; Yunjie Xie; Qianqian Zhang; Jian Zhang; Yanhong Ma
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pseudocode for the PCA-K-medoids clustering algorithm.
r
Fuzzy Clustering of Interval Time Series
researchdata.edu.au
Updated Dec 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Ann Maharaj; Elizabeth Ann Maharaj (2019). Fuzzy Clustering of Interval Time Series [Dataset]. http://doi.org/10.26180/5dcccac222d76
Explore at:
Unique identifier
https://doi.org/10.26180/5dcccac222d76
Dataset updated
Dec 9, 2019
Dataset provided by
Monash University
Authors
Elizabeth Ann Maharaj; Elizabeth Ann Maharaj
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Simulation Data sets
Set 1 : c
Set 2: d
Set 3: g
Set 4 : i
o
Data for "A Quantum Definition of Molecular Structure"
explore.openaire.eu
Updated Oct 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucas Lang; Henrique Musseli Cezar; Ludwik Adamowicz; Thomas Bondo Pedersen (2023). Data for "A Quantum Definition of Molecular Structure" [Dataset]. http://doi.org/10.5281/zenodo.8421051
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8421051
Dataset updated
Oct 9, 2023
Authors
Lucas Lang; Henrique Musseli Cezar; Ludwik Adamowicz; Thomas Bondo Pedersen
Description
This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 101025672. This work was supported by the Research Council of Norway through its Centres of Excellence scheme, project no. 262695. T.B.P acknowledges the support of the Centre for Advanced Study in Oslo, Norway, which funded and hosted the CAS research project Attosecond Quantum Dynamics Beyond the Born-Oppenheimer Approximation during the academic year 2021-2022. Partial support from the National Science Foundation (grant no. 1856702) is also acknowledged. Supplemental data for our article "A Quantum Definition of Molecular Structure". Version 1.1.0 contains data for additional k-medoids runs performed on different subsets of the complete sample.
f
Identification of groups in k-medoids clustering approach.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M. Mofizul Islam; Jose M. Valderas; Laurann Yen; Paresh Dawda; Tanisha Jowsey; Ian S. McRae (2023). Identification of groups in k-medoids clustering approach. [Dataset]. http://doi.org/10.1371/journal.pone.0083783.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0083783.t004
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
M. Mofizul Islam; Jose M. Valderas; Laurann Yen; Paresh Dawda; Tanisha Jowsey; Ian S. McRae
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Identification of groups in k-medoids clustering approach.
Data for "A Quantum Definition of Molecular Structure"
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucas Lang; Lucas Lang; Henrique Musseli Cezar; Henrique Musseli Cezar; Ludwik Adamowicz; Ludwik Adamowicz; Thomas Bondo Pedersen; Thomas Bondo Pedersen (2023). Data for "A Quantum Definition of Molecular Structure" [Dataset]. http://doi.org/10.5281/zenodo.10182902
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10182902
Dataset updated
Nov 22, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lucas Lang; Lucas Lang; Henrique Musseli Cezar; Henrique Musseli Cezar; Ludwik Adamowicz; Ludwik Adamowicz; Thomas Bondo Pedersen; Thomas Bondo Pedersen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplemental data for our article "A Quantum Definition of Molecular Structure".
Version 1.1.0 contains data for additional k-medoids runs performed on different subsets of the complete sample.
d
A universal probe set for targeted sequencing of 353 nuclear genes from any...
search.dataone.org
explore.openaire.eu
+2more
Updated Apr 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew G. Johnson; Lisa Pokorny; Steven Dodsworth; Laura R. Botigue; Robyn S. Cowan; Alison Devault; Wolf L. Eiserhardt; Niroshini Epitawalage; FÃ©lix Forest; Jan T. Kim; James Leebens-Mack; Ilia J. Leitch; Olivier Maurin; Doug Soltis; Pamela S. Soltis; Gane Ka-Shu Wong; William J. Baker; Norman Wickett (2025). A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-medoids clustering [Dataset]. http://doi.org/10.5061/dryad.s3h9r6j
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.s3h9r6j
Dataset updated
Apr 18, 2025
Dataset provided by
Dryad Digital Repository
Authors
Matthew G. Johnson; Lisa Pokorny; Steven Dodsworth; Laura R. Botigue; Robyn S. Cowan; Alison Devault; Wolf L. Eiserhardt; Niroshini Epitawalage; FÃ©lix Forest; Jan T. Kim; James Leebens-Mack; Ilia J. Leitch; Olivier Maurin; Doug Soltis; Pamela S. Soltis; Gane Ka-Shu Wong; William J. Baker; Norman Wickett
Time period covered
Jan 1, 2018
Description
Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants). We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes while minimizing the cost of production, we introduce a k-medoids clustering approach to identify the minimum number of sequences necessary to repr...
4
Data underlying the MSc thesis: City Clustering based on topology, activity...
data.4tu.nl
zip
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tian Zwart (2025). Data underlying the MSc thesis: City Clustering based on topology, activity distribution, and mobility - A study on 32 European cities using K-Means, K-Medoids, and Ward's Method [Dataset]. http://doi.org/10.4121/89625b76-8ff4-45df-90ff-def3ef1986ad.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/89625b76-8ff4-45df-90ff-def3ef1986ad.v2
Dataset updated
May 23, 2025
Dataset provided by
4TU.ResearchData
Authors
Tian Zwart
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2020 - 2025
Description
In this dataset, 17 indicators have been collected and/or calculated for 32 European cities. For certain characteristics, plots have been made and included in this dataset. Finally, the urban borders and the cluster assignments for each city are also included for reproducibility.
f
Clustering of 10 datasets (generated after performing 10 attribute weighting...
figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mansour Ebrahimi; Amir Lakizadeh; Parisa Agha-Golzadeh; Esmaeil Ebrahimie; Mahdi Ebrahimi (2023). Clustering of 10 datasets (generated after performing 10 attribute weighting algorithms) into T (mesophile) and F (thermophile) classes by four different unsupervised clustering algorithms (K-Means, K-Medoids, SVC and EMC). [Dataset]. http://doi.org/10.1371/journal.pone.0023146.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0023146.t002
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Mansour Ebrahimi; Amir Lakizadeh; Parisa Agha-Golzadeh; Esmaeil Ebrahimie; Mahdi Ebrahimi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The actual numbers of T (mesostable) and F (thermostable) classes in the original datasets were 1544 and 513, respectively. The highest accuracy (100%) was observed when the EMC clustering method was applied to datasets generated by Correlation and Uncertainty attribute weighting algorithms that highlighted in the table.
f
Data from: K-medoids clustering.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giordano Mancini; Costantino Zazza (2023). K-medoids clustering. [Dataset]. http://doi.org/10.1371/journal.pone.0137075.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0137075.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Giordano Mancini; Costantino Zazza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
For a given nC (first row) the Dunn index (DI), the Davies-Bouldin index (DBI) and the size of the various clusters are shown. The column showing the best nC value is typed in italics. Frame set #1 was used in the analysis.
f
Results of ANOVA and eta squared comparing proportion of variance in child...
figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lauren Eyler; Alan Hubbard; Catherine Juillard (2023). Results of ANOVA and eta squared comparing proportion of variance in child height-for-age Z-score, women’s literacy score, and proportion of children who are deceased accounted for by Cameroonian DHS Wealth Index and EconomicClusters models. [Dataset]. http://doi.org/10.1371/journal.pone.0217197.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0217197.t001
Dataset updated
May 30, 2023
Dataset provided by
PLOS ONE
Authors
Lauren Eyler; Alan Hubbard; Catherine Juillard
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Cameroon
Description
Results of ANOVA and eta squared comparing proportion of variance in child height-for-age Z-score, women’s literacy score, and proportion of children who are deceased accounted for by Cameroonian DHS Wealth Index and EconomicClusters models.
f
5th level results.
plos.figshare.com
xls
Updated Mar 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sania Gul; Muhammad Salman Khan; Ata Ur-Rehman (2024). 5th level results. [Dataset]. http://doi.org/10.1371/journal.pone.0300444.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0300444.t006
Dataset updated
Mar 28, 2024
Dataset provided by
PLOS ONE
Authors
Sania Gul; Muhammad Salman Khan; Ata Ur-Rehman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper presents a novel sound event detection (SED) system for rare events occurring in an open environment. Wavelet multiresolution analysis (MRA) is used to decompose the input audio clip of 30 seconds into five levels. Wavelet denoising is then applied on the third and fifth levels of MRA to filter out the background. Significant transitions, which may represent the onset of a rare event, are then estimated in these two levels by combining the peak-finding algorithm with the K-medoids clustering algorithm. The small portions of one-second duration, called ‘chunks’ are cropped from the input audio signal corresponding to the estimated locations of the significant transitions. Features from these chunks are extracted by the wavelet scattering network (WSN) and are given as input to a support vector machine (SVM) classifier, which classifies them. The proposed SED framework produces an error rate comparable to the SED systems based on convolutional neural network (CNN) architecture. Also, the proposed algorithm is computationally efficient and lightweight as compared to deep learning models, as it has no learnable parameter. It requires only a single epoch of training, which is 5, 10, 200, and 600 times lesser than the models based on CNNs and deep neural networks (DNNs), CNN with long short-term memory (LSTM) network, convolutional recurrent neural network (CRNN), and CNN respectively. The proposed model neither requires concatenation with previous frames for anomaly detection nor any additional training data creation needed for other comparative deep learning models. It needs to check almost 360 times fewer chunks for the presence of rare events than the other baseline systems used for comparison in this paper. All these characteristics make the proposed system suitable for real-time applications on resource-limited devices.
Data Augmentation for learning mechanical digital twins of voids in welding...
zenodo.org
zip
Updated Mar 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryckelynck; Ryckelynck (2022). Data Augmentation for learning mechanical digital twins of voids in welding joints [Dataset]. http://doi.org/10.5281/zenodo.6355692
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6355692
Dataset updated
Mar 16, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ryckelynck; Ryckelynck
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In Source-2_Data_Augmentation:

Exercice1_augmentation.ipynb Jupyter Notebook for data warpping of defect images.

Exercice2_augmentation_multimodale.ipynb Jupyter Notebook for multimodal data augmentaion (defect images and mechanical fields) via oversampling

Exercice3_clustering.ipynb Data clustering using the k-medoids algorithm applied to mechanical dissimilarity of the defects.

k_medoids.py is a python code of a kmedoids algorithm.

in Data:

All_images.npy (numpy file) contains the defect images.

All_Stresses.npy (numpy) contains mechanical fields, All_Stresses[k,i,j,ic,it] is the instance number k of the component ic of the Cauchy stress tensor at time it. The mechanical problem is decribed in 〈10.5802/crmeca.51〉. 〈hal-03113503〉.

New_images_1.npy and New_Stresses_1.npy are augmented data for k=1.

New_images_87.npy and New_Stresses_87.npy are augmented data for k=87.

Dissimilarity_Stress.npy is the Frobenius norm of the distances between stress tensors (All_Stresses.npy).
f
Data from: Simple Measures of Individual Cluster-Membership Certainty for...
datasetcatalog.nlm.nih.gov
figshare.com
Updated Jul 9, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Graham, Jinko; Liu, Dongmeng (2018). Simple Measures of Individual Cluster-Membership Certainty for Hard Partitional Clustering [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000690043
Explore at:
Dataset updated
Jul 9, 2018
Authors
Graham, Jinko; Liu, Dongmeng
Description
We propose two probability-like measures of individual cluster-membership certainty which can be applied to a hard partition of the sample such as that obtained from the Partitioning Around Medoids (PAM) algorithm, hierarchical clustering or k-means clustering. One measure extends the individual silhouette widths and the other is obtained directly from the pairwise dissimilarities in the sample. Unlike the classic silhouette, however, the measures behave like probabilities and can be used to investigate an individual's tendency to belong to a cluster. We also suggest two possible ways to evaluate the hard partition using these measures. We evaluate the performance of both measures in individuals with ambiguous cluster membership, using simulated binary datasets that have been partitioned by the PAM algorithm or continuous datasets that have been partitioned by hierarchical clustering and k-means clustering. For comparison, we also present results from soft-clustering algorithms such as soft analysis clustering (FANNY) and two model-based clustering methods. Our proposed measures perform comparably to the posterior-probability estimators from either FANNY or the model-based clustering methods. We also illustrate the proposed measures by applying them to Fisher's classic data set on irises.
z
USNM_CURC_CAM – Multi-Family Beetle Label Dataset for Computer Vision...
zenodo.org
csv, json
Updated Jul 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2025). USNM_CURC_CAM – Multi-Family Beetle Label Dataset for Computer Vision Applications [Dataset]. http://doi.org/10.7479/chx2-y845
Explore at:
csv, jsonAvailable download formats
Unique identifier
https://doi.org/10.7479/chx2-y845
Dataset updated
Jul 9, 2025
Dataset provided by
Anonymous
Authors
Anonymous; Anonymous
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The USNM_COL_CAM dataset includes 912 high-resolution JPEG images (3030 × 2080 pixels) of specimen labels from diverse Coleoptera families, including Buprestidae, Carabidae, Cerambycidae, Chrysomelidae, Curculionidae, and Scarabaeidae. All images were digitized by the Smithsonian National Museum of Natural History and are annotated with multi-label information. The dataset represents specimens collected across South and Central America over various historical periods and supports research in coleopterology, biodiversity informatics, and computer vision.

The dataset is augmented with two derived data resources:

• OCR_USNM_COL_CAM.json: Transcribed label content generated using the Google Cloud Vision API, enabling automatic text extraction, content indexing, and structured metadata retrieval.

• Clustering_0.9_USNM_COL_CAM.csv: K-Medoids clustering output based on a 0.9 textual similarity threshold, useful for identifying duplicate records, grouping related specimens, and supporting scalable label processing workflows.
H
Distinguishing hydraulically-distinct floodplain types from high resolution...
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Mar 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scott Lawson (2025). Distinguishing hydraulically-distinct floodplain types from high resolution topography with implications for broad-scale flood routing (data) [Dataset]. http://doi.org/10.4211/hs.d0c0122256244124acaf8e46a1f4b3a6
Explore at:
zip(259.9 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.d0c0122256244124acaf8e46a1f4b3a6
Dataset updated
Mar 10, 2025
Dataset provided by
HydroShare
Authors
Scott Lawson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
Floodplains can have a significant impact on the routing of flood waves across the landscape, yet their representation in broad-scale water resource and flood prediction models are limited. To identify hydraulically-relevant floodplains at scale, we develop a workflow that automates the extraction of reach-averaged morphologic features from high resolution topographic data hypothesized to define a zone within the floodplain that conveys floodwaters distinctly from the surrounding landscape. This zone is identified from departures in hydraulic geometry with stage. Working in the topographically diverse Lake Champlain Basin in Vermont, USA, we apply the workflow to 2,629 reaches and use the extracted features to cluster settings similar in their proposed ability to route floodwaters. In total we identified eight clusters of reach types, two that were pre-sorted and largely lack a floodplain, and six that reflect variability in floodplain features, which were parsed out from the K-medoids clustering analysis. Clusters of floodplain types had distinct impact on the routing of synthetically-derived hydrographs, evaluated using the Muskingum-Cunge routing model. From these clusters we propose a Hydraulic Floodplain Classification, which is comparable to other geographically-defined systems but unique in its focus on the potential of the landscape to influence flood routing. The automated workflow may be repeated in other regions with high resolution topographic datasets, offering an improvement in the functionality of continental to global floodplain mapping efforts. Identification of hydraulically-effective zones has implications for improved watershed management to meet flood resiliency goals, and to improve flood predictions and warnings.
f
Cluster assignment—proteomics.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jun 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Panicucci, Brian; Levin, Michal; Barrett, Michael P.; Regnault, Clément; Dejung, Mario; Butter, Falk; Doleželová, Eva; Kunzová, Michaela; Janzen, Christian J.; Zíková, Alena (2020). Cluster assignment—proteomics. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000513432
Explore at:
Dataset updated
Jun 10, 2020
Authors
Panicucci, Brian; Levin, Michal; Barrett, Michael P.; Regnault, Clément; Dejung, Mario; Butter, Falk; Doleželová, Eva; Kunzová, Michaela; Janzen, Christian J.; Zíková, Alena
Description
Gene IDs belonging to six different clusters from time-course expression profiling based on K-medoids. GO enrichment analyses performed using GO Term annotations TriTrypDB-36_TbruceiLister427_GO.gaf from TriTrypDB version 36 and Fisher’s exact test. GO, Gene Ontology. (XLSX)
f
Cluster assignment—transcriptomics.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jun 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dejung, Mario; Janzen, Christian J.; Panicucci, Brian; Doleželová, Eva; Regnault, Clément; Butter, Falk; Barrett, Michael P.; Levin, Michal; Kunzová, Michaela; Zíková, Alena (2020). Cluster assignment—transcriptomics. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000513403
Explore at:
Dataset updated
Jun 10, 2020
Authors
Dejung, Mario; Janzen, Christian J.; Panicucci, Brian; Doleželová, Eva; Regnault, Clément; Butter, Falk; Barrett, Michael P.; Levin, Michal; Kunzová, Michaela; Zíková, Alena
Description
Gene IDs belonging to four different clusters from time-course expression profiling based on K-medoids. GO enrichment analyses performed using GO Term annotations TriTrypDB-36_TbruceiLister427_GO.gaf from TriTrypDB version 36 and Fisher’s exact test. GO, Gene Ontology. (XLSX)
a
BOMEC 15 Class
hub.arcgis.com
Updated Nov 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Water and Atmospheric Research (2018). BOMEC 15 Class [Dataset]. https://hub.arcgis.com/maps/NIWA::bomec-15-class/explore
Explore at:
Dataset updated
Nov 14, 2018
Dataset authored and provided by
National Institute of Water and Atmospheric Research
Area covered
Description
Distributional data for eight taxonomic groups (asteroids, bryozoans, benthic foraminiferans, octocorals, polychaetes, matrix-forming scleractinian corals, sponges, and benthic fish) have been used to train an environmental classification for those parts of New Zealand's 200 n. mile Exclusive Economic Zone (EEZ) with depths of 3000 m or less. A variety of environmental variables were used as input to this process, including estimates of depth, temperature, salinity, sea surface temperature gradient, surface water productivity, suspended sediments, tidal currents, and seafloor sediments and slope. These variables were transformed using results averaged across eight Generalised Dissimilarity Modelling analyses that indicate relationships between species turnover and environment for each species group. The matrix of transformed variables was then classified using k-meDOIds clustering to identify an initial set of 300 groups of cells based on their environmental similarities, with relationships between these groups then described using agglomerative hierarchical clustering. Groups at a fifteen group level of classification appropriate for use at a whole-of-EEZ scale are described; the classification can also be used at other levels of detail, for example when higher levels of classification detail are required to discriminate variation within study areas of more limited extent. Although not formally tested in this analysis, we expect the analytical process used here to increase the biological discrimination of the environmental classification. That is, the resulting environmental groups are more likely to have similar biological characteristics than when the input environmental variables are selected, weighted, and perhaps transformed using qualitative methods. As a consequence, they are more likely to be reliable when used as "habitat classes" for the management of biological values than groups defined using alternative approaches._Item Page Created: 2018-11-14 00:08 Item Page Last Modified: 2025-04-05 16:28Owner: NIWA_OpenDataBOMEC 15 ClassNo data edit dates availableFields: ID,GRIDCODE
f
Duration of rare events in DCASE dataset.
figshare.com
xls
Updated Mar 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sania Gul; Muhammad Salman Khan; Ata Ur-Rehman (2024). Duration of rare events in DCASE dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0300444.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0300444.t001
Dataset updated
Mar 28, 2024
Dataset provided by
PLOS ONE
Authors
Sania Gul; Muhammad Salman Khan; Ata Ur-Rehman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper presents a novel sound event detection (SED) system for rare events occurring in an open environment. Wavelet multiresolution analysis (MRA) is used to decompose the input audio clip of 30 seconds into five levels. Wavelet denoising is then applied on the third and fifth levels of MRA to filter out the background. Significant transitions, which may represent the onset of a rare event, are then estimated in these two levels by combining the peak-finding algorithm with the K-medoids clustering algorithm. The small portions of one-second duration, called ‘chunks’ are cropped from the input audio signal corresponding to the estimated locations of the significant transitions. Features from these chunks are extracted by the wavelet scattering network (WSN) and are given as input to a support vector machine (SVM) classifier, which classifies them. The proposed SED framework produces an error rate comparable to the SED systems based on convolutional neural network (CNN) architecture. Also, the proposed algorithm is computationally efficient and lightweight as compared to deep learning models, as it has no learnable parameter. It requires only a single epoch of training, which is 5, 10, 200, and 600 times lesser than the models based on CNNs and deep neural networks (DNNs), CNN with long short-term memory (LSTM) network, convolutional recurrent neural network (CRNN), and CNN respectively. The proposed model neither requires concatenation with previous frames for anomaly detection nor any additional training data creation needed for other comparative deep learning models. It needs to check almost 360 times fewer chunks for the presence of rare events than the other baseline systems used for comparison in this paper. All these characteristics make the proposed system suitable for real-time applications on resource-limited devices.
Observer Function Database - Asano (2015)
zenodo.org
txt, xls
Updated Nov 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuta Asano; Yuta Asano (2020). Observer Function Database - Asano (2015) [Dataset]. http://doi.org/10.5281/zenodo.3252742
Explore at:
xls, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3252742
Dataset updated
Nov 30, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yuta Asano; Yuta Asano
Description
Source URL: https://www.rit.edu/cos/colorscience/re_AsanoObserverFunctions.php
Source DOI: 10.1371/journal.pone.0145671

Categorical observers

Categorical observers are observer functions that would represent color-normal populations. They are finite and discrete as opposed to observer functions generated from the individual colorimetric observer model. Thus, they would offer more convenient and practical approaches for the personalized color imaging workflow and color matching analyses. Categorical observers were derived in two steps. At the first step, 10,000 observer functions were generated from the individual colorimetric observer model using Monte Carlo simulation. At the second step, the cluster analysis, a modified k-medoids algorithm, was applied to the 10,000 observers minimizing the squared Euclidean distance in cone fundamentals space, and categorical observers were derived iteratively. Since the proposed categorical observers are defined by their physiological parameters and ages, their CMFs can be derived for any target field size.

Categorical observers were ordered by the importance; the first categorical observer vas the average observer equivalent to CIEPO06 with 38 year-old for a given field size, followed by the second most important categorical observer, the third, and so on.

The color matching analyses showed that ten categorical observers are good for general use and convenience to represent color normal populations. On average, the prediction error improvement was small after adding tenth categorical observers, and the prediction errors became one-third by introducing ten observers. Nevertheless, readers should be aware that the number of required categorical observers varies depending on an application (a pair of spectra viewed by observers). For example, the simulation revealed that as many as 50 categorical observers would be required to predict individual observers’ matches satisfactorily when a laser projector is viewed.

Matlab code for the categorical observers and CMFs as well as model parameters for ten categorical observers are available for download below.

151 color-normal observers

CMFs of 151 color-normal observers were estimated by combining the individual colorimetric observer model and the color matching proposed in Asano’s PhD dissertation. The color matching consisted of five color matches aimed to highlight and detect inter-observer variability among color-normals. To obtain a set of CMFs for a given human observer, at first, the observer performed the five color matches with three repetitions. Then, his/her eight physiological parameters (used in the individual colorimetric observer model) were estimated from the color matching results by a non-linear optimization. The objective function was to optimize the eight physiological parameters such that the color differences between the human observer results and model predictions were minimized. Finally, the CMFs were reconstructed from the estimated physiological parameters and the observer's real age.

The estimated CMFs for 151 color-normal human observers, the corresponding model parameters, and other information such as gender, experience in color-related subjective experiments, ethnic origin, color deficiency in family, diabetes, and intra-observer variability (Mean Color Difference from the Mean using CIEDE2000) for each of the 151 observers are available for download

Facebook

Twitter

Click to copy link

Link copied

Cite

Lin Qi; Yunjie Xie; Qianqian Zhang; Jian Zhang; Yanhong Ma (2025). Pseudocode for the PCA-K-medoids clustering algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0316277.t003

Pseudocode for the PCA-K-medoids clustering algorithm.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0316277.t003

Dataset updated

Jan 3, 2025

Dataset provided by

PLOS ONE

Authors

Lin Qi; Yunjie Xie; Qianqian Zhang; Jian Zhang; Yanhong Ma

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Pseudocode for the PCA-K-medoids clustering algorithm.

Clear search

Close search

Google apps

Main menu

Pseudocode for the PCA-K-medoids clustering algorithm.

Fuzzy Clustering of Interval Time Series

Data for "A Quantum Definition of Molecular Structure"

Identification of groups in k-medoids clustering approach.

Data for "A Quantum Definition of Molecular Structure"

A universal probe set for targeted sequencing of 353 nuclear genes from any...

Data underlying the MSc thesis: City Clustering based on topology, activity...

Clustering of 10 datasets (generated after performing 10 attribute weighting...

Data from: K-medoids clustering.

Results of ANOVA and eta squared comparing proportion of variance in child...

5th level results.

Data Augmentation for learning mechanical digital twins of voids in welding...

Data from: Simple Measures of Individual Cluster-Membership Certainty for...

USNM_CURC_CAM – Multi-Family Beetle Label Dataset for Computer Vision...

Distinguishing hydraulically-distinct floodplain types from high resolution...

Cluster assignment—proteomics.

Cluster assignment—transcriptomics.

BOMEC 15 Class

Duration of rare events in DCASE dataset.

Observer Function Database - Asano (2015)

Pseudocode for the PCA-K-medoids clustering algorithm.