100+ datasets found

Data from: Sparse Machine Learning Methods for Understanding Large Text...
data.nasa.gov
gimi9.com
+4more
application/rdfxml +5
Updated Jun 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Sparse Machine Learning Methods for Understanding Large Text Corpora [Dataset]. https://data.nasa.gov/w/3xx3-746c/default?cur=hFRJxcUSf5W
Explore at:
xml, csv, application/rdfxml, tsv, json, application/rssxmlAvailable download formats
Dataset updated
Jun 26, 2018
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Sparse machine learning has recently emerged as powerful tool to obtain models of high-dimensional data with high degree of interpretability, at low computational cost. This paper posits that these methods can be extremely useful for understanding large collections of text documents, without requiring user expertise in machine learning. Our approach relies on three main ingredients: (a) multi-document text summarization and (b) comparative summarization of two corpora, both using parse regression or classification; (c) sparse principal components and sparse graphical models for unsupervised analysis and visualization of large text corpora. We validate our approach using a corpus of Aviation Safety Reporting System (ASRS) reports and demonstrate that the methods can reveal causal and contributing factors in runway incursions. Furthermore, we show that the methods automatically discover four main tasks that pilots perform during flight, which can aid in further understanding the causal and contributing factors to runway incursions and other drivers for aviation safety incidents.

Citation: L. El Ghaoui, G. C. Li, V. Duong, V. Pham, A. N. Srivastava, and K. Bhaduri, “Sparse Machine Learning Methods for Understanding Large Text Corpora,” Proceedings of the Conference on Intelligent Data Understanding, 2011.
SparseBeads Dataset
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J. S. Jørgensen; S. B. Coban; W. R. B. Lionheart; S. A. McDonald; P. J. Withers; J. S. Jørgensen; S. B. Coban; W. R. B. Lionheart; S. A. McDonald; P. J. Withers (2020). SparseBeads Dataset [Dataset]. http://doi.org/10.5281/zenodo.290117
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.290117
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
J. S. Jørgensen; S. B. Coban; W. R. B. Lionheart; S. A. McDonald; P. J. Withers; J. S. Jørgensen; S. B. Coban; W. R. B. Lionheart; S. A. McDonald; P. J. Withers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The presented data set, inspired by the SophiaBeads Dataset Project for X-ray Computed Tomography, is collected for studies involving sparsity-regularised reconstruction. The aim is to provide tomographic data for various samples where the sparsity in the image varies.

This dataset is made available as part of the publication

"SparseBeads Data: Benchmarking Sparsity-Regularized Computed Tomography", Jakob S Jørgensen et al, 2017. Meas. Sci. Technol. 28 124005.

Direct link: https://doi.org/10.1088/1361-6501/aa8c29.

This manuscript is published as part of Special Feature on Advanced X-ray Tomography (open access). We refer the users to this publication for an extensive detail in the experimental planning and data acquisition.

Each zipped data folder includes

The meta data for data acquisition and geometry parameters of the scan (.xtekct and .ctprofile.xml).

A sinogram of the central slice (CentreSlice > Sinograms > .tif) along with meta data for the 2D slice (.xtek2dct and .ct2dprofile.xml),

List of projection angles (.ang)

and a 2D FDK reconstruction using the CTPro reconstruction suite (RECON2D > .vol) with volume visualisation parameters (.vgi), added as a reference.

We also include an extra script for those that wish to use the SophiaBeads Dataset Project Codes, which essentially replaces the main script provided, sophiaBeads.m (visit https://zenodo.org/record/16539). Please note that sparseBeads.m script will have to be placed in the same folder as the project codes. The latest version of this script can be found here: https://github.com/jakobsj/SparseBeads_code

For more information, please contact

jakj [at] dtu.dk

jakob.jorgensen [at] manchester.ac.uk
Z
Data from: SparsePoser: Real-time Full-body Motion Reconstruction from...
data.niaid.nih.gov
zenodo.org
Updated Oct 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aristidou, Andreas (2023). SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8427980
Explore at:
Dataset updated
Oct 12, 2023
Dataset provided by
Yun, Haoran
Pelechano, Nuria
Ponton, Jose Luis
Andujar, Carlos
Aristidou, Andreas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data used for the paper SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data

It contains over 1GB of high-quality motion capture data recorded with an Xsens Awinda system while using a variety of VR applications in Meta Quest devices.

Visit the paper website!

If you find our data useful, please cite our paper:

@article{10.1145/3625264, author = {Ponton, Jose Luis and Yun, Haoran and Aristidou, Andreas and Andujar, Carlos and Pelechano, Nuria}, title = {SparsePoser: Real-Time Full-Body Motion Reconstruction from Sparse Data}, year = {2023}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, issn = {0730-0301}, url = {https://doi.org/10.1145/3625264}, doi = {10.1145/3625264}, journal = {ACM Trans. Graph.}, month = {oct}}
d
Floreat-f2 - Sparse Point Cloud LAS - Aug 2021 - Datasets - data.wa.gov.au
catalogue.data.wa.gov.au
Updated Aug 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Floreat-f2 - Sparse Point Cloud LAS - Aug 2021 - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/floreat-f2-sparse-point-cloud-las-aug-2021
Explore at:
Dataset updated
Aug 17, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Western Australia, Floreat
Description
The first capture of the area North of the Floreat Surf Life Saving Club, these sand dunes were captured by UAV imagery on 17th Aug 2021 for the Cambridge Coastcare beach dune modelling and monitoring project. It was created as part of an initiative to innovatively monitor coastal dune erosion and visualize these changes over time for future management and mitigation. This data includes Orthomosaic, DSM, DTM, Elevation Contours, 3D Mesh, 3D Point Cloud and LiDAR constructed from over 500 images captured from UAV (drone) and processed in Pix4D. All datasets can be freely accessed through DataWA. Link to Animated video fly-through of this 3D data model Link to the Sketchfab visualisation of the 3D textured mesh The dataset is a Sparse 3D Point Cloud (i.e. a 3D set of points): the X,Y,Z position and colour information is stored for each point of the point cloud. This dataset is of the area North of Floreat SLSC (2021 Flight-2 project area).
f
Assessment and Improvement of Statistical Tools for Comparative Proteomics...
figshare.com
acs.figshare.com
txt
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Veit Schwämmle; Ileana Rodríguez León; Ole Nørregaard Jensen (2023). Assessment and Improvement of Statistical Tools for Comparative Proteomics Analysis of Sparse Data Sets with Few Experimental Replicates [Dataset]. http://doi.org/10.1021/pr400045u.s002
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1021/pr400045u.s002
Dataset updated
Jun 3, 2023
Dataset provided by
ACS Publications
Authors
Veit Schwämmle; Ileana Rodríguez León; Ole Nørregaard Jensen
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved rank products algorithm and the combined analysis is available.
4
Code: Quality Diversity Algorithms for Calibrating a Supply Chain Simulation...
data.4tu.nl
zip
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Isabelle van Schilt (2024). Code: Quality Diversity Algorithms for Calibrating a Supply Chain Simulation Model with Sparse Data [Dataset]. http://doi.org/10.4121/766f4e89-fa03-47c6-a9f2-fa41f241984b.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/766f4e89-fa03-47c6-a9f2-fa41f241984b.v1
Dataset updated
Jul 22, 2024
Dataset provided by
4TU.ResearchData
Authors
Isabelle van Schilt
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
This repository is part of the Ph.D. thesis of Isabelle M. van Schilt, Delft University of Technology.
This repository is used to calibrate the underlying structure and parameters of a stylized supply chain simulation model of counterfeit Personal Protective Equipment (PPE) using the quality diversity algorithm. For this, we use the pyribs library for the quality diversity algorithm, and pydsol-core and pydsol-model for the discrete event simulation model. The calibration is done with sparse data, which is generated by degrading the ground truth data on noise, bias, and missing values. We define the structure of a supply chain simulation model as a key value of a dictionary (sorted on graph density), which is a set of possible supply chain models. The integer is, thus, a decision variable of the calibration, next to other parameters in the simulation model.
To use this repository, we need a simulation model developed in pydsol-core and pydsol-model . Additionally, we need a dictionary with various different simulation structures as input, as well as the ground truth data. For this project, we use the repository complex_stylized_supply_chain_model_generator as simulation model.
c
Data from: Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach
s.cnmilf.com
catalog.data.gov
Updated Dec 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2023). Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/sparse-solutions-for-single-class-svms-a-bi-criterion-approach
Explore at:
Dataset updated
Dec 6, 2023
Dataset provided by
Dashlink
Description
In this paper we propose an innovative learning algorithm - a variation of One-class ? Support Vector Machines (SVMs) learning algorithm to produce sparser solutions with much reduced computational complexities. The proposed technique returns an approximate solution, nearly as good as the solution set obtained by the classical approach, by minimizing the original risk function along with a regularization term. We introduce a bi-criterion optimization that helps guide the search towards the optimal set in much reduced time. The outcome of the proposed learning technique was compared with the benchmark one-class Support Vector machines algorithm which more often leads to solutions with redundant support vectors. Through out the analysis, the problem size for both optimization routines was kept consistent. We have tested the proposed algorithm on a variety of data sources under different conditions to demonstrate the effectiveness. In all cases the proposed algorithm closely preserves the accuracy of standard one-class ? SVMs while reducing both training time and test time by several factors.
Data from: Lagrangian analysis of submesoscale flows from sparse data using...
zenodo.org
zip
Updated Mar 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
H. M. Aravind; H. M. Aravind; Tamay Ozgokmen; Michael Allshouse; Michael Allshouse; Tamay Ozgokmen (2024). Lagrangian analysis of submesoscale flows from sparse data using Gaussian Process Regression for field reconstruction [Dataset]. http://doi.org/10.5281/zenodo.10795574
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10795574
Dataset updated
Mar 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
H. M. Aravind; H. M. Aravind; Tamay Ozgokmen; Michael Allshouse; Michael Allshouse; Tamay Ozgokmen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data used for the preparation of the manuscript "Lagrangian analysis of submesoscale flows from sparse data using Gaussian Process Regression for field reconstruction".
d
Data from: spectre: An R package to estimate spatially-explicit community...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Oct 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Craig Eric Simpkins; Sebastian Hanß; Matthias Spangenberg; Jan Salecker; Maximilian Hesselbarth; Kerstin Wiegand (2022). spectre: An R package to estimate spatially-explicit community composition using sparse data [Dataset]. http://doi.org/10.5061/dryad.fbg79cnz7
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.fbg79cnz7
Dataset updated
Oct 6, 2022
Dataset provided by
Dryad
Authors
Craig Eric Simpkins; Sebastian Hanß; Matthias Spangenberg; Jan Salecker; Maximilian Hesselbarth; Kerstin Wiegand
Time period covered
2022
Description
The simulated community datasets were built using the virtualspecies V1.5.1 R package (Leroy et al., 2016), which generates spatially-explicit presence/absence matrices from habitat suitability maps. We simulated these suitability maps using Gaussian fields neutral landscapes produced using the NLMR V1.0 R package (Sciaini et al., 2018). To allow for some level of overlap between species suitability maps, we divided the γ-diversity (i.e., the total number of simulated species) by an adjustable correlation value to create several species groups that share suitability maps. Using a full factorial design, we developed 81 presence/absence maps varying across four axes (see Supplemental Table 1 and Supplemental Figure 1): 1) landscape size, representing the number of sites in the simulated landscape; 2) γ-diversity; 3) the level of correlation among species suitability maps, with greater correlations resulting in fewer shared species groups among suitability maps; and 4) the habitat suitabil...
Z
Bayesian estimation of information-theoretic metrics for sparsely sampled...
data.niaid.nih.gov
zenodo.org
Updated Jan 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Piga, Angelo (2024). Bayesian estimation of information-theoretic metrics for sparsely sampled distributions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10592746
Explore at:
Dataset updated
Jan 30, 2024
Dataset provided by
Font-Pomarol, Lluc
Guimerà, Roger
Piga, Angelo
Sales-Pardo, Marta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Codes, syntetic and empirical data for "Bayesian estimation of information-theoretic metrics for sparsely sampled distributions"

Abstract:

Estimating the Shannon entropy of a discrete distribution from which we have only observed a small sample is challenging. Estimating other information-theoretic metrics, such as the Kullback-Leibler divergence between two sparsely sampled discrete distributions, is even harder. Here, we propose a fast, semi-analytical estimator for sparsely sampled distributions. Its derivation is grounded in probabilistic considerations and uses a hierarchical Bayesian approach to extract as much information as possible from the few observations available. Our approach provides estimates of the Shannon entropy with precision at least comparable to the benchmarks we consider, and most often higher; it does so across diverse distributions with very different properties. Our method can also be used to obtain accurate estimates of other information-theoretic metrics, including the notoriously challenging Kullback-Leibler divergence. Here, again, our approach has less bias, overall, than the benchmark estimators we consider.
w
Subjects of Sparse matrix technology
workwithdata.com
Updated Jul 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Subjects of Sparse matrix technology [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=book&fop0=%3D&fval0=Sparse+matrix+technology
Explore at:
Dataset updated
Jul 13, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects and is filtered where the books is Sparse matrix technology, featuring 10 columns including authors, average publication date, book publishers, book subject, and books. The preview is ordered by number of books (descending).
Grating Lobes and Spatial Aliasing in Sparse Array Beampatterns
catalog.data.gov
gimi9.com
+2more
Updated Jul 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). Grating Lobes and Spatial Aliasing in Sparse Array Beampatterns [Dataset]. https://catalog.data.gov/dataset/grating-lobes-and-spatial-aliasing-in-sparse-array-beampatterns
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
Calculated beam pattern in Fourier space of a unitary input given two sparsely sampled synthetic aperture arrays: 1. a regularly spaced array sampled at 2*lambda, where lambda is the wavelength of the 40 GHz signal, and 2. the regularly spaced array with random perturbations (of order ~<lambda) to the (x,y) spatial location of each sample point. This dataset is published in "An Overview of Advances in Signal Processing Techniques for Classical and Quantum Wideband Synthetic Apertures" by Vouras, et al. in IEEE Selected Topics in Signal Processing Recent Advances in Wideband Signal Processing for Classical and Quantum Synthetic Apertures.
d
Data from: Direction matching for sparse movement data sets: determining...
search.dataone.org
Updated Dec 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bonnell, Tyler R.; Henzi, S. Peter; Barrett, Louise (2024). Data from: Direction matching for sparse movement data sets: determining interaction rules in social groups [Dataset]. https://search.dataone.org/view/sha256%3A55f5360b0607b9a7b17dc62d5e77d5c3dddacf213d2c7f13b5abdeeba295ab41
Explore at:
Dataset updated
Dec 11, 2024
Dataset provided by
Borealis
Authors
Bonnell, Tyler R.; Henzi, S. Peter; Barrett, Louise
Description
AbstractIt is generally assumed that high-resolution movement data are needed to extract meaningful decision-making patterns of animals on the move. Here we propose a modified version of force matching (referred to here as direction matching), whereby sparse movement data (i.e., collected over minutes instead of seconds) can be used to test hypothesized forces acting on a focal animal based on their ability to explain observed movement. We first test the direction matching approach using simulated data from an agent-based model, and then go on to apply it to a sparse movement data set collected on a troop of baboons in the DeHoop Nature Reserve, South Africa. We use the baboon data set to test the hypothesis that an individual’s motion is influenced by the group as a whole or, alternatively, whether it is influenced by the location of specific individuals within the group. Our data provide support for both hypotheses, with stronger support for the latter. The focal animal showed consistent patterns of movement toward particular individuals when distance from these individuals increased beyond 5.6 m. Although the focal animal was also sensitive to the group movement on those occasions when the group as a whole was highly clustered, these conditions of isolation occurred infrequently. We suggest that specific social interactions may thus drive overall group cohesion. The results of the direction matching approach suggest that relatively sparse data, with low technical and economic costs, can be used to test between hypotheses on the factors driving movement decisions. Usage notesXY coordinates of individual baboonsThis dataset consists of 74 days of full-day follows of a baboon troop (Papio hamadryas ursinus) at the DeHoop Nature Reserve in South Africa. Individual GPS points of all adult group members (N=14) were collected continuously throughout the day by an observer walking repeatedly from one end of the group to the other. A GPS point was taken on all adults present in the group by holding the GPS receiver above the animal (or as close as possible) to record its position and individual identitybaboon_group_dehoop.csv
National Forest and Sparse Woody Vegetation Data (Version 3, 2018 Release)
demo.dev.magda.io
geotiff, pdf, wms +1
Updated Jul 4, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Government Department of Climate Change, Energy, the Environment and Water (2022). National Forest and Sparse Woody Vegetation Data (Version 3, 2018 Release) [Dataset]. https://demo.dev.magda.io/dataset/ds-dga-d734c65e-0e7b-4190-9aa5-ddbb5844e86d
Explore at:
zip, wms, geotiff, pdfAvailable download formats
Dataset updated
Jul 4, 2022
Dataset provided by
Australian Governmenthttp://www.australia.gov.au/
Description
Landsat satellite imagery is used to derive woody vegetation extent products that discriminate between forest, sparse woody and non-woody land cover across a time series from 1988 to 2018. A forest …Show full descriptionLandsat satellite imagery is used to derive woody vegetation extent products that discriminate between forest, sparse woody and non-woody land cover across a time series from 1988 to 2018. A forest is defined as woody vegetation with a minimum 20 per cent canopy cover, potentially reaching 2 metres high and a minimum area of 0.2 hectares. Sparse woody is defined as woody vegetation with a canopy cover between 5-19 per cent. The three-class classification (forest, sparse woody and non-woody) supersedes the two class classification (forest and non-forest) from 2016. The new classification is produced using the same approach in terms of time series processing (conditional probability networks) as the two-class method, to detect woody vegetation cover. The three-class algorithm better encompasses the different types of woody vegetation across the Australian landscape.
H
Replication Data for: Sparse Estimation and Uncertainty with Application to...
dataverse.harvard.edu
Updated Oct 31, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marc Ratkovic; Dustin Tingley (2016). Replication Data for: Sparse Estimation and Uncertainty with Application to Subgroup Analysis [Dataset]. http://doi.org/10.7910/DVN/RNMB1Q
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/RNMB1Q
Dataset updated
Oct 31, 2016
Dataset provided by
Harvard Dataverse
Authors
Marc Ratkovic; Dustin Tingley
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Replication matrerials for Ratkovic and Tingley (2016) “ Sparse Estimation and Uncertainty with Application to Subgroup Analysis.” All files, data, and scripts needed to generate the figures and results in the paper are in this archive. The zip file contains two sets of files, for the Bechtel and Scheve (2013) replication and files for replicating the simulation study.
SPM script for sparse fMRI data processing single subject analysis
osf.io
Updated Jan 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pradeep D (2023). SPM script for sparse fMRI data processing single subject analysis [Dataset]. http://doi.org/10.17605/OSF.IO/P825K
Explore at:
Unique identifier
https://doi.org/10.17605/OSF.IO/P825K
Dataset updated
Jan 31, 2023
Dataset provided by
Center for Open Sciencehttps://cos.io/
Authors
Pradeep D
License
http://www.gnu.org/licenses/gpl-3.0.txthttp://www.gnu.org/licenses/gpl-3.0.txt
Description
MATLAB script using SPM12 batch for performing single subject analysis on sparse fMRI data
Performance measurements for "Bringing Order to Sparsity: A Sparse Matrix...
zenodo.org
zip
Updated Apr 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James D. Trotter; James D. Trotter (2023). Performance measurements for "Bringing Order to Sparsity: A Sparse Matrix Reordering Study on Multicore CPUs" [Dataset]. http://doi.org/10.5281/zenodo.7821491
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7821491
Dataset updated
Apr 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
James D. Trotter; James D. Trotter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The paper "Bringing Order to Sparsity: A Sparse Matrix Reordering Study on Multicore CPUs" compares various strategies for reordering sparse matrices. The purpose of reordering is to improve performance of sparse matrix operations, for example, by reducing fill-in resulting from sparse Cholesky factorisation or improving data locality in sparse matrix-vector multiplication (SpMV). Many reordering strategies have been proposed in the literature and the current paper provides a thorough comparison of several of the most popular methods.

This comparison is based on performance measurements that were collected on the eX3 cluster, a Norwegian, experimental research infrastructure for exploration of exascale computing. These performance measurements are gathered in the data set provided here, particularly related to the performance of two SpMV kernels with respect to 490 sparse matrices, 6 matrix orderings and 8 multicore CPUs.

Experimental results are provided in a human-readable, tabular format using plain-text ASCII. This format may be readily consumed by gnuplot to create plots or imported into commonly used spreadsheet tools for further analysis.

Performance measurements are provided based on an SpMV kernel using the compressed sparse row (CSR) storage format with 7 matrix orderings. One file is provided for each of 8 multicore CPU systems considered in the paper:

1. Skylake: csr_all_xeongold16q_032_threads_ss490.txt
2. Ice Lake: csr_all_habanaq_072_threads_ss490.txt
3. Naples: csr_all_defq_064_threads_ss490.txt
4. Rome: csr_all_rome16q_016_threads_ss490.txt
5. Milan A: csr_all_fpgaq_048_threads_ss490.txt
6. Milan B: csr_all_milanq_128_threads_ss490.txt
7. TX2: csr_all_armq_064_threads_ss490.txt
8. Hi1620: csr_all_huaq_128_threads_ss490.txt

A corresponding set of files and performance measurements are provided for a second SpMV kernel that is also studied in the paper.

Each file consists of 490 rows and 54 columns. Each row corresponds to a different matrix from the SuiteSparse Matrix Collection (https://sparse.tamu.edu/). The first 5 columns specify some general information about the matrix, such as its group and name, as well as the number of rows, columns and nonzeros. Column 6 specifies the number of threads used for the experiment (which depends on the CPU). The remaining columns are grouped according to the 7 different matrix orderings that were studied, in the following order: original, Reverse Cuthill-McKee (RCM), Nested Dissection (ND), Approximate Minimum Degree (AMD), Graph Partitioning (GP), Hypergraph Partitioning (HP), and Gray ordering. For each ordering, the following 7 columns are given:

1. Minimum number of nonzeros processed by any thread by the SpMV kernel
2. Maximum number of nonzeros processed by any thread by the SpMV kernel
3. Mean number of nonzeros processed per thread by the SpMV kernel
4. Imbalance factor, which is the ratio of the maximum to the mean number of nonzeros processed per thread by the SpMV kernel
5. Time (in seconds) to perform a single SpMV iteration; this was measured by taking the minimum out of 100 SpMV iterations performed
6. Maximum performance (in Gflop/s) for a single SpMV iteration; this was measured by taking twice the number of matrix nonzeros and dividing by the minimum time out of 100 SpMV iterations performed.
7. Mean performance (in Gflop/s) for a single SpMV iteration; this was measured by taking twice the number of matrix nonzeros and dividing by the mean time of the 97 last SpMV iterations performed (i.e., the first 3 SpMV iterations are ignored).

The results in Fig. 1 of the paper show speedup (or slowdown) resulting from reordering with respect to 3 reorderings and 3 selected matrices. These results can be reproduced by inspecting the performance results that were collected on the Milan B and Ice Lake systems for the three matrices Freescale/Freescale2, SNAP/com-Amazon and GenBank/kmer_V1r. Specifically, the numbers displayed in the figure are obtained by dividing the maximum performance measured for the respective orderings (i.e., RCM, ND and GP) by the maximum performance measured for the original ordering.

The results presented in Figs. 2 and 3 of the paper show the speedup of SpMV as a result of reordering for the two SpMV kernels considered in the paper. In this case, gnuplot scripts are provided to reproduce the figures from the data files described above.
d
Fast sparse matrix multiplication
elsevier.digitalcommonsdata.com
Updated Jan 1, 1992
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S.C. Park (1992). Fast sparse matrix multiplication [Dataset]. http://doi.org/10.17632/ydtrxpr4vw.1
Explore at:
Unique identifier
https://doi.org/10.17632/ydtrxpr4vw.1
Dataset updated
Jan 1, 1992
Authors
S.C. Park
License
https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/
Description
Abstract A new space-efficient representation for sparse matrices is introduced and a fast sparse matrix multiplication algorithm based on the new representation is presented. The scheme is very efficient when the nonzero elements of a sparse matrix are partially or fully adjacent to one another as in band or triangular matrices. The space complexity of the new representation is better than that of existing algorithms when the number of sets of adjacent nonzero elements, called segments, is less than ...

Title of program: SMM Catalogue Id: ACHR_v1_0

Nature of problem Sparse matrix multiplication often arises in scientific computations. Since a sparse matrix includes many zero elements, the multiplication should not be handled in the same way as for dense matrices. The standard matrix multiplication algorithm for n x n factor matrices, represented in the usual two-dimensional array form, takes O(n^3) time. This means that when the factor matrices are very large, e.g., 1000's x 1000's, not only will the computation time be excessively long but the demands on ...

Versions of this program held in the CPC repository in Mendeley Data ACHR_v1_0; SMM; 10.1016/0010-4655(92)90116-G

This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)
r
National Forest and Sparse Woody Vegetation Data (Version 8.0 - 2023...
researchdata.edu.au
Updated Aug 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Government Department of Climate Change, Energy, the Environment and Water (2024). National Forest and Sparse Woody Vegetation Data (Version 8.0 - 2023 Release) [Dataset]. https://researchdata.edu.au/national-forest-sparse-2023-release/3381312
Explore at:
Dataset updated
Aug 9, 2024
Dataset provided by
data.gov.au
Authors
Australian Government Department of Climate Change, Energy, the Environment and Water
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
Landsat satellite imagery is used to derive woody vegetation extent products that discriminate between forest, sparse woody and non-woody land cover across a time series from 1988 to 2023. A forest is defined as woody vegetation with a minimum 20 per cent canopy cover, at least 2 metres high and a minimum area of 0.2 hectares. Note that this product is not filtered to the 0.2ha criteria for forest to allow for flexibility in different use cases. Filtering to remove areas less than 0.2ha is undertaken in downstream processing for the purposes of Australia's National Inventory Reports. Sparse woody is defined as woody vegetation with a canopy cover between 5-19 per cent.\r \r The three-class classification (forest, sparse woody and non-woody) supersedes the two-class classification (forest and non-forest) from 2016. The new classification is produced using the same approach in terms of time series processing (conditional probability networks) as the two-class method, to detect woody vegetation cover. The three-class algorithm better encompasses the different types of woody vegetation across the Australian landscape.\r \r Unlike previous versions of the National Forest and Sparse Woody Vegetation data releases where 35 tiles have been released as part of the product, only the 25 southern tiles have been supplied in this release. The 10 northern tiles will be released as a separate product release, expected later in the financial year, as these are subject to a methodological change associated with the adoption of the Sentinel sensor and will be supplied at a different resolution. Please see the National Forest and Sparse Woody Vegetation data metadata pdf (Version 8.0 - 2023 release) for more information.
Data from: Generating fast sparse matrix vector multiplication from a high...
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Jun 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federico Pizzuti; Federico Pizzuti; Michel Steuwer; Christophe Dubach; Michel Steuwer; Christophe Dubach (2022). Generating fast sparse matrix vector multiplication from a high level generic functional IR [Dataset]. http://doi.org/10.5061/dryad.wstqjq2gs
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.wstqjq2gs
Dataset updated
Jun 2, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Federico Pizzuti; Federico Pizzuti; Michel Steuwer; Christophe Dubach; Michel Steuwer; Christophe Dubach
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Usage of high-level intermediate representations promises the generation of fast code from a high-level description, improving the productivity of developers while achieving the performance traditionally only reached with low-level programming approaches.

High-level IRs come in two flavors:
1) domain-specific IRs designed to express only for a specific application area; or
2) generic high-level IRs that can be used to generate high-performance code across many domains.
Developing generic IRs is more challenging but offers the advantage of reusing a common compiler infrastructure various applications.

In this paper, we extend a generic high-level IR to enable efficient computation with sparse data structures.
Crucially, we encode sparse representation using reusable dense building blocks already present in the high-level IR.
We use a form of dependent types to model sparse matrices in CSR format by expressing the relationship between multiple dense arrays explicitly separately storing the length of rows, the column indices, and the non-zero values of the matrix.

We demonstrate that we achieve high-performance compared to spare low-level library code using our extended generic high-level code generator.
On an Nvidia GPU, we outperform the highly tuned Nvidia cuSparse implementation of SpMV multiplication across 28 sparse matrices of varying sparsity on average by $1.7\times$.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2018). Sparse Machine Learning Methods for Understanding Large Text Corpora [Dataset]. https://data.nasa.gov/w/3xx3-746c/default?cur=hFRJxcUSf5W

Data from: Sparse Machine Learning Methods for Understanding Large Text Corpora

Explore at:

xml, csv, application/rdfxml, tsv, json, application/rssxmlAvailable download formats

Dataset updated

Jun 26, 2018

License

U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically

Description

Sparse machine learning has recently emerged as powerful tool to obtain models of high-dimensional data with high degree of interpretability, at low computational cost. This paper posits that these methods can be extremely useful for understanding large collections of text documents, without requiring user expertise in machine learning. Our approach relies on three main ingredients: (a) multi-document text summarization and (b) comparative summarization of two corpora, both using parse regression or classification; (c) sparse principal components and sparse graphical models for unsupervised analysis and visualization of large text corpora. We validate our approach using a corpus of Aviation Safety Reporting System (ASRS) reports and demonstrate that the methods can reveal causal and contributing factors in runway incursions. Furthermore, we show that the methods automatically discover four main tasks that pilots perform during flight, which can aid in further understanding the causal and contributing factors to runway incursions and other drivers for aviation safety incidents.

Citation: L. El Ghaoui, G. C. Li, V. Duong, V. Pham, A. N. Srivastava, and K. Bhaduri, “Sparse Machine Learning Methods for Understanding Large Text Corpora,” Proceedings of the Conference on Intelligent Data Understanding, 2011.

Clear search

Close search

Google apps

Main menu

Data from: Sparse Machine Learning Methods for Understanding Large Text...

SparseBeads Dataset

Data from: SparsePoser: Real-time Full-body Motion Reconstruction from...

Floreat-f2 - Sparse Point Cloud LAS - Aug 2021 - Datasets - data.wa.gov.au

Assessment and Improvement of Statistical Tools for Comparative Proteomics...

Code: Quality Diversity Algorithms for Calibrating a Supply Chain Simulation...

Data from: Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach

Data from: Lagrangian analysis of submesoscale flows from sparse data using...

Data from: spectre: An R package to estimate spatially-explicit community...

Bayesian estimation of information-theoretic metrics for sparsely sampled...

Subjects of Sparse matrix technology

Grating Lobes and Spatial Aliasing in Sparse Array Beampatterns

Data from: Direction matching for sparse movement data sets: determining...

National Forest and Sparse Woody Vegetation Data (Version 3, 2018 Release)

Replication Data for: Sparse Estimation and Uncertainty with Application to...

SPM script for sparse fMRI data processing single subject analysis

Performance measurements for "Bringing Order to Sparsity: A Sparse Matrix...

Fast sparse matrix multiplication

National Forest and Sparse Woody Vegetation Data (Version 8.0 - 2023...

Data from: Generating fast sparse matrix vector multiplication from a high...

Data from: Sparse Machine Learning Methods for Understanding Large Text Corpora