Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison experiments by using IF.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance of DynGPE.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This article proposes a new graphical tool, the magnitude-shape (MS) plot, for visualizing both the magnitude and shape outlyingness of multivariate functional data. The proposed tool builds on the recent notion of functional directional outlyingness, which measures the centrality of functional data by simultaneously considering the level and the direction of their deviation from the central region. The MS-plot intuitively presents not only levels but also directions of magnitude outlyingness on the horizontal axis or plane, and demonstrates shape outlyingness on the vertical axis. A dividing curve or surface is provided to separate nonoutlying data from the outliers. Both the simulated data and the practical examples confirm that the MS-plot is superior to existing tools for visualizing centrality and detecting outliers for functional data. Supplementary material for this article is available online.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Identifying and dealing with outliers is an important part of data analysis. A new visualization, the O3 plot, is introduced to aid in the display and understanding of patterns of multivariate outliers. It uses the results of identifying outliers for every possible combination of dataset variables to provide insight into why particular cases are outliers. The O3 plot can be used to compare the results from up to six different outlier identification methods. There is anRpackage OutliersO3 implementing the plot. The article is illustrated with outlier analyses of German demographic and economic data. Supplementary materials for this article are available online.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
TopologyBench is a systematic graph theoretical approach to benchmarking optical network topologies. Network datasets are combined with their corresponding graph theoretical analysis to provide a systematic methodology for selecting diverse sets of optical networks for benchmarking. This topology benchmark is comprised of a network dataset and a systematic graph theoretic analysis. The dataset provides (a) 105 real optical networks and (b) synthetic topologies, generated by the SNR-BA model, divided into (i) Syn-small of 900 synthetic networks and (ii) Syn-large of 270,000 synthetic networks. The systematic graph theoretical analysis identifies and analyses structural, spatial and spectral properties of both the real world and synthetic networks. The graph theoretical correlation analysis reveal network design strategies leading to sparse yet efficient networks. An outlier analysis identifies networks that deviate from standard network designs. The analysis also identifies the limitations of real data in terms of network diversity and provides a justification for using synthetic data to complement the real dataset. We conclude the paper by providing a systematic methodology to cluster networks based on unsupervised machine learning and to select a diverse set of topologies for benchmarking. TopologyBench is a novel, high-quality and unified benchmark designed to facilitate research collaborations in long-haul fibre infrastructure by providing a systematic graph theoretical approach to benchmarking optical networks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 2.
This dataset represents the CHANGE in the number of jobs per industry category and sub-category from the previous month, not the raw counts of actual jobs. The data behind these monthly change values is from the Bureau of Labor Statistics (BLS) Current Employment Statistics (CES) program. CES data represents businesses and government agencies, providing detailed industry data on employment on nonfarm payrolls.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Metric multidimensional scaling (MDS) is a widely used multivariate method with applications in almost all scientific disciplines. Eigenvalues obtained in the analysis are usually reported in order to calculate the overall goodness-of-fit of the distance matrix. In this paper, we refine MDS goodness-of-fit calculations, proposing additional point and pairwise goodness-of-fit statistics that can be used to filter poorly represented observations in MDS maps. The proposed statistics are especially relevant for large data sets that contain outliers, with typically many poorly fitted observations, and are helpful for improving MDS output and emphasizing the most important features of the dataset. Several goodness-of-fit statistics are considered, and both Euclidean and non-Euclidean distance matrices are considered. Some examples with data from demographic, genetic and geographic studies are shown.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With the rapid increase of large-scale datasets, biomedical data visualization is facing challenges. The data may be large, have different orders of magnitude, contain extreme values, and the data distribution is not clear. Here we present an R package ggbreak that allows users to create broken axes using ggplot2 syntax. It can effectively use the plotting area to deal with large datasets (especially for long sequential data), data with different magnitudes, and contain outliers. The ggbreak package increases the available visual space for a better presentation of the data and detailed annotation, thus improves our ability to interpret the data. The ggbreak package is fully compatible with ggplot2 and it is easy to superpose additional layers and applies scale and theme to adjust the plot using the ggplot2 syntax. The ggbreak package is open-source software released under the Artistic-2.0 license, and it is freely available on CRAN (https://CRAN.R-project.org/package=ggbreak) and Github (https://github.com/YuLab-SMU/ggbreak).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The crcc T2 Revised statistics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Addition-point OLS matrix, B.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Pulp-fibre Dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Natural plant populations are often adapted to their local climate and environmental conditions, and populations of forest trees offer some of the best examples of this pattern. However, little empirical work has focused on the relative contribution of single-locus versus multilocus effects to the genetic architecture of local adaptation in plants/forest trees. Here, we employ eastern white pine (Pinus strobus) to test the hypothesis that it is the inter-genic effects that primarily drive climate-induced local adaptation. The genetic structure of 29 range-wide natural populations of eastern white pine was determined in relation to local climatic factors using both a reference set of SSR markers, and SNPs located in candidate genes putatively involved in adaptive response to climate. Comparisons were made between marker sets using standard single-locus outlier analysis, single-locus and multilocus environment association analyses and a novel implementation of Population Graphs. Magnitudes of population structure were similar between the two marker sets. Outlier loci consistent with diversifying selection were rare for both SNPs and SSRs. However, genetic distances based on the multilocus among population covariances (cGD) were significantly more correlated to climate, even after correcting for spatial effects, for SNPs as compared to SSRs. Coalescent simulations confirmed that the differences in mutation rates between SSRs and SNPs did not affect the topologies of the Population Graphs, and hence values of cGD and their correlations with associated climate variables. We conclude that the multilocus covariances among populations primarily reflect adaptation to local climate and environment in eastern white pine. This result highlights the complexity of the genetic architecture of adaptive traits, as well as the need to consider multilocus effects in studies of local adaptation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Few-shot Relation Classification identifies the relation between target entity pairs in unstructured natural language texts by training on a small number of labeled samples. Recent prototype network-based studies have focused on enhancing the prototype representation capability of models by incorporating external knowledge. However, the majority of these works constrain the representation of class prototypes implicitly through complex network structures, such as multi-attention mechanisms, graph neural networks, and contrastive learning, which constrict the model’s ability to generalize. In addition, most models with triplet loss disregard intra-class compactness during model training, thereby limiting the model’s ability to handle outlier samples with low semantic similarity. Therefore, this paper proposes a non-weighted prototype enhancement module that uses the feature-level similarity between prototypes and relation information as a gate to filter and complete features. Meanwhile, we design a class cluster loss that samples difficult positive and negative samples and explicitly constrains both intra-class compactness and inter-class separability to learn a metric space with high discriminability. Extensive experiments were done on the publicly available dataset FewRel 1.0 and 2.0, and the results show the effectiveness of the proposed model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Experimental results on FewRel 2.0 domain adaptation test set.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Hyperparameter of the modes built in our experiments.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Similarity Matrix sij=(bi−bj)′(Σ^(B))−1(bi−bj).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In modern Industry 4.0 applications, a huge amount of data is acquired during manufacturing processes and is often contaminated with outliers, which can seriously reduce the performance of control charting procedures, especially in complex and high-dimensional settings. In the context of profile monitoring, we propose a new framework that is referred to as robust multivariate functional control chart (RoMFCC) to monitor a multivariate functional quality characteristic while being robust to both functional casewise and componentwise outliers. In the former case, observations of the quality characteristic are contaminated in all functional variables or components, while, in the latter, the contamination affects one or more components independently. The RoMFCC relies on (I) a functional filter to identify componentwise outliers to be replaced by missing components; (II) a robust multivariate functional data imputation method; (III) a casewise robust dimensionality reduction; (IV) a monitoring strategy for the quality characteristic. Through a Monte Carlo simulation study, the RoMFCC is compared with competing schemes that have already appeared in the literature. A case study is finally presented where the proposed framework is used to monitor a resistance spot welding process in the automotive industry. RoMFCC is implemented in the R package funcharts, available online on CRAN.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Traditional subspace feature selection methods typically rely on a fixed distance to compute residuals between the original and feature reconstruction spaces. However, this approach struggles to adapt to diverse datasets and often fails to handle noise and outliers effectively. In this paper, we propose an unsupervised feature selection method named unsupervised feature selection algorithm based on -norm feature reconstruction (NFRFS). Employing a flexible norm to represent both the original space and the spatial distance of feature reconstruction, enhances adaptability and broadens its applicability by adjusting p. Additionally, adaptive graph learning is integrated into the feature selection process to preserve the local geometric structure of the data. Features exhibiting sparsity and low redundancy are selected through the regularization constraint of the inner product in the feature selection matrix. To demonstrate the effectiveness of the method, numerical studies were conducted on 14 benchmark datasets. Our results indicate that the method outperforms 10 unsupervised feature selection algorithms in terms of clustering performance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bolded values are those with P-values < 0.05.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison experiments by using IF.