Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The provided code is based on the work by A. Gooya et al. [1] which proposes a Gaussian mixture model based approach to training statistical shape models (SSMs). The novel feature of the proposed approach is the application of a symmetric Dirichlet prior on the mixture coefficients to enforce sparsity and search over a continuous space for the optimal number of Gaussian components, to address the common issue of over or under-fitting. Additionally, we provide code to reconstruct surfaces from the unstructured point sets generated, following SSM training.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
row sparse (Sparse Model B)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
New simulated data used in the supplement for comparing model performance in a sparse setting
MATLAB code + demo to reproduce results for "Sparse Principal Component Analysis with Preserved Sparsity". This code calculates the principal loading vectors for any given high-dimensional data matrix. The advantage of this method over existing sparse-PCA methods is that it can produce principal loading vectors with the same sparsity pattern for any number of principal components. Please see Readme.md for more information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The presented data set, inspired by the SophiaBeads Dataset Project for X-ray Computed Tomography, is collected for studies involving sparsity-regularised reconstruction. The aim is to provide tomographic data for various samples where the sparsity in the image varies.
This dataset is made available as part of the publication
"SparseBeads Data: Benchmarking Sparsity-Regularized Computed Tomography", Jakob S Jørgensen et al, 2017. Meas. Sci. Technol. 28 124005.
Direct link: https://doi.org/10.1088/1361-6501/aa8c29.
This manuscript is published as part of Special Feature on Advanced X-ray Tomography (open access). We refer the users to this publication for an extensive detail in the experimental planning and data acquisition.
Each zipped data folder includes
The meta data for data acquisition and geometry parameters of the scan (.xtekct and .ctprofile.xml).
A sinogram of the central slice (CentreSlice > Sinograms > .tif) along with meta data for the 2D slice (.xtek2dct and .ct2dprofile.xml),
List of projection angles (.ang)
and a 2D FDK reconstruction using the CTPro reconstruction suite (RECON2D > .vol) with volume visualisation parameters (.vgi), added as a reference.
We also include an extra script for those that wish to use the SophiaBeads Dataset Project Codes, which essentially replaces the main script provided, sophiaBeads.m (visit https://zenodo.org/record/16539). Please note that sparseBeads.m script will have to be placed in the same folder as the project codes. The latest version of this script can be found here: https://github.com/jakobsj/SparseBeads_code
For more information, please contact
jakj [at] dtu.dk
jakob.jorgensen [at] manchester.ac.uk
We introduce phase-diagram analysis, a standard tool in compressed sensing (CS), to the X-ray computed tomography (CT) community as a systematic method for determining how few projections suffice for accurate sparsity-regularized reconstruction. In CS, a phase diagram is a convenient way to study and express certain theoretical relations between sparsity and sufficient sampling. We adapt phase-diagram analysis for empirical use in X-ray CT for which the same theoretical results do not hold. We demonstrate in three case studies the potential of phase-diagram analysis for providing quantitative answers to questions of undersampling. First, we demonstrate that there are cases where X-ray CT empirically performs comparably with a near-optimal CS strategy, namely taking measurements with Gaussian sensing matrices. Second, we show that, in contrast to what might have been anticipated, taking randomized CT measurements does not lead to improved performance compared with standard structured sam...
In this paper we propose an innovative learning algorithm - a variation of One-class ? Support Vector Machines (SVMs) learning algorithm to produce sparser solutions with much reduced computational complexities. The proposed technique returns an approximate solution, nearly as good as the solution set obtained by the classical approach, by minimizing the original risk function along with a regularization term. We introduce a bi-criterion optimization that helps guide the search towards the optimal set in much reduced time. The outcome of the proposed learning technique was compared with the benchmark one-class Support Vector machines algorithm which more often leads to solutions with redundant support vectors. Through out the analysis, the problem size for both optimization routines was kept consistent. We have tested the proposed algorithm on a variety of data sources under different conditions to demonstrate the effectiveness. In all cases the proposed algorithm closely preserves the accuracy of standard one-class ? SVMs while reducing both training time and test time by several factors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Many problems in classification involve huge numbers of irrelevant features. Variable selection reveals the crucial features, reduces the dimensionality of feature space, and improves model interpretation. In the support vector machine literature, variable selection is achieved by l1 penalties. These convex relaxations seriously bias parameter estimates toward 0 and tend to admit too many irrelevant features. The current article presents an alternative that replaces penalties by sparse-set constraints. Penalties still appear, but serve a different purpose. The proximal distance principle takes a loss function L(β) and adds the penalty ρ2dist(β,Sk)2 capturing the squared Euclidean distance of the parameter vector β to the sparsity set Sk where at most k components of β are nonzero. If βρ represents the minimum of the objective fρ(β)=L(β)+ρ2dist(β,Sk)2, then βρ tends to the constrained minimum of L(β) over Sk as ρ tends to ∞. We derive two closely related algorithms to carry out this strategy. Our simulated and real examples vividly demonstrate how the algorithms achieve better sparsity without loss of classification power. Supplementary materials for this article are available online.
Sparse Basic Linear Algebra Subprograms (BLAS), comprise of computational kernels for the calculation sparse vectors and matrices operations.
Sparse machine learning has recently emerged as powerful tool to obtain models of high-dimensional data with high degree of interpretability, at low computational cost. This paper posits that these methods can be extremely useful for understanding large collections of text documents, without requiring user expertise in machine learning. Our approach relies on three main ingredients: (a) multi-document text summarization and (b) comparative summarization of two corpora, both using parse regression or classifi?cation; (c) sparse principal components and sparse graphical models for unsupervised analysis and visualization of large text corpora. We validate our approach using a corpus of Aviation Safety Reporting System (ASRS) reports and demonstrate that the methods can reveal causal and contributing factors in runway incursions. Furthermore, we show that the methods automatically discover four main tasks that pilots perform during flight, which can aid in further understanding the causal and contributing factors to runway incursions and other drivers for aviation safety incidents. Citation: L. El Ghaoui, G. C. Li, V. Duong, V. Pham, A. N. Srivastava, and K. Bhaduri, “Sparse Machine Learning Methods for Understanding Large Text Corpora,” Proceedings of the Conference on Intelligent Data Understanding, 2011.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The first capture of the area North of the Floreat Surf Life Saving Club, these sand dunes were captured by UAV imagery on 17th Aug 2021 for the Cambridge Coastcare beach dune modelling and monitoring project. It was created as part of an initiative to innovatively monitor coastal dune erosion and visualize these changes over time for future management and mitigation. This data includes Orthomosaic, DSM, DTM, Elevation Contours, 3D Mesh, 3D Point Cloud and LiDAR constructed from over 500 images captured from UAV (drone) and processed in Pix4D. All datasets can be freely accessed through DataWA. Link to Animated video fly-through of this 3D data model Link to the Sketchfab visualisation of the 3D textured mesh The dataset is a Sparse 3D Point Cloud (i.e. a 3D set of points): the X,Y,Z position and colour information is stored for each point of the point cloud. This dataset is of the area North of Floreat SLSC (2021 Flight-2 project area).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Change-point (CP) VAR models face a dimensionality curse due to the proliferation of parameters that arises when new breaks are detected. We introduce the Sparse CP-VAR model which determines which parameters truly vary when a break is detected. By doing so, the number of new parameters to be estimated at each regime is drastically reduced and the break dynamics becomes easier to be interpreted. The Sparse CP-VAR model disentangles the dynamics of the mean parameters and the covariance matrix. The former uses CP dynamics with shrinkage prior distributions, while the latter is driven by an infinite hidden Markov framework. An extensive simulation study is carried out to compare our approach with existing ones. We provide applications to financial and macroeconomic systems. It turns out that many off-diagonal VAR parameters are zero for the entire sample period and that most break activity is in the covariance matrix. We show that this has important consequences for portfolio optimization, in particular when future instabilities are included in the predictive densities. Forecasting-wise, the Sparse CP-VAR model compares favorably to several time-varying parameter models in terms of density and point forecast metrics.
Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and finances. Gaussian Process regression is a popular technique for modeling the input-output relations of a set of variables under the assumption that the weight vector has a Gaussian prior. However, it is challenging to apply Gaussian Process regression to large data sets since prediction based on the learned model requires inversion of an order n kernel matrix. Approximate solutions for sparse Gaussian Processes have been proposed for sparse problems. However, in almost all cases, these solution techniques are agnostic to the input domain and do not preserve the similarity structure in the data. As a result, although these solutions sometimes provide excellent accuracy, the models do not have interpretability. Such interpretable sparsity patterns are very important for many applications. We propose a new technique for sparse Gaussian Process regression that allows us to compute a parsimonious model while preserving the interpretability of the sparsity structure in the data. We discuss how the inverse kernel matrix used in Gaussian Process prediction gives valuable domain information and then adapt the inverse covariance estimation from Gaussian graphical models to estimate the Gaussian kernel. We solve the optimization problem using the alternating direction method of multipliers that is amenable to parallel computation. We demonstrate the performance of our method in terms of accuracy, scalability and interpretability on a climate data set.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EMG data for classifier evaluation
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The artifact consists of the necessary data to reproduce the results reported in the SAT-20 Paper titled "On the Sparsity of XORs in Approximate Model Counting".
In particular, the artifact consists of the binaries, the log files generated by our computing cluster, and scripts to generate tables and the plots used in the paper.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Input and output files for the results presented in Tables 1-5 of the paper 'Exploiting Sparsity in Free Energy Basin-Hopping'. Further information on these files can be found in the README files within the tar file.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Induced sparsity in the factor loading matrix identifies the factor basis, while rotational identification is obtained ex post by clustering methods closely related to machine learning. We extract meaningful economic concepts from a high-dimensional data set, which together with observed variables follow an unrestricted, reduced-form VAR process. Including a comprehensive set of economic concepts allows reliable, fundamental structural analysis, even of the factor augmented VAR itself. We illustrate this by combining two structural identification methods to further analyze the model. To account for the shift in monetary policy instruments triggered by the Great Recession, we follow separate strategies to identify monetary policy shocks. Comparing ours to other parametric and non-parametric factor estimates uncovers advantages of parametric sparse factor estimation in a high dimensional data environment. Besides meaningful factor extraction, we gain precision in the estimation of factor loadings.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We aim to provably complete a sparse and highly-missing tensor in the presence of covariate information along tensor modes. Our motivation comes from online advertising where users’ click-through-rates (CTR) on ads over various devices form a CTR tensor that has about 96% missing entries and has many zeros on non-missing entries, which makes the standalone tensor completion method unsatisfactory. Beside the CTR tensor, additional ad features or user characteristics are often available. In this paper, we propose Covariate-assisted Sparse Tensor Completion (COSTCO) to incorporate covariate information for the recovery of the sparse tensor. The key idea is to jointly extract latent components from both the tensor and the covariate matrix to learn a synthetic representation. Theoretically, we derive the error bound for the recovered tensor components and explicitly quantify the improvements on both the reveal probability condition and the tensor recovery accuracy due to covariates. Finally, we apply COSTCO to an advertisement dataset consisting of a CTR tensor and ad covariate matrix, leading to 23% accuracy improvement over the baseline. An important by-product is that ad latent components from COSTCO reveal interesting ad clusters, which are useful for better ad targeting.
Rural and Urban Definitions Grid showing settlement classification
Review of Economics and Statistics: Forthcoming.. Visit https://dataone.org/datasets/sha256%3Ad00937e0e95caca90195351492ee3df98fa25094069700fa52605c182a3a5a0c for complete metadata about this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The provided code is based on the work by A. Gooya et al. [1] which proposes a Gaussian mixture model based approach to training statistical shape models (SSMs). The novel feature of the proposed approach is the application of a symmetric Dirichlet prior on the mixture coefficients to enforce sparsity and search over a continuous space for the optimal number of Gaussian components, to address the common issue of over or under-fitting. Additionally, we provide code to reconstruct surfaces from the unstructured point sets generated, following SSM training.