59 datasets found

i
Dataset for sparse data reconstruction with AI
ieee-dataport.org
Updated Sep 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mingqiang Zhang (2022). Dataset for sparse data reconstruction with AI [Dataset]. https://ieee-dataport.org/documents/dataset-sparse-data-reconstruction-ai
Explore at:
Dataset updated
Sep 27, 2022
Authors
Mingqiang Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
row sparse (Sparse Model B)
d
Data from: Sparse Inverse Gaussian Process Regression with Application to...
catalog.data.gov
data.nasa.gov
+2more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Sparse Inverse Gaussian Process Regression with Application to Climate Network Discovery [Dataset]. https://catalog.data.gov/dataset/sparse-inverse-gaussian-process-regression-with-application-to-climate-network-discovery
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and finances. Gaussian Process regression is a popular technique for modeling the input-output relations of a set of variables under the assumption that the weight vector has a Gaussian prior. However, it is challenging to apply Gaussian Process regression to large data sets since prediction based on the learned model requires inversion of an order n kernel matrix. Approximate solutions for sparse Gaussian Processes have been proposed for sparse problems. However, in almost all cases, these solution techniques are agnostic to the input domain and do not preserve the similarity structure in the data. As a result, although these solutions sometimes provide excellent accuracy, the models do not have interpretability. Such interpretable sparsity patterns are very important for many applications. We propose a new technique for sparse Gaussian Process regression that allows us to compute a parsimonious model while preserving the interpretability of the sparsity structure in the data. We discuss how the inverse kernel matrix used in Gaussian Process prediction gives valuable domain information and then adapt the inverse covariance estimation from Gaussian graphical models to estimate the Gaussian kernel. We solve the optimization problem using the alternating direction method of multipliers that is amenable to parallel computation. We demonstrate the performance of our method in terms of accuracy, scalability and interpretability on a climate data set.
Data from: Lagrangian analysis of submesoscale flows from sparse data using...
zenodo.org
zip
Updated Mar 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
H. M. Aravind; H. M. Aravind; Tamay Ozgokmen; Michael Allshouse; Michael Allshouse; Tamay Ozgokmen (2024). Lagrangian analysis of submesoscale flows from sparse data using Gaussian Process Regression for field reconstruction [Dataset]. http://doi.org/10.5281/zenodo.10795574
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10795574
Dataset updated
Mar 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
H. M. Aravind; H. M. Aravind; Tamay Ozgokmen; Michael Allshouse; Michael Allshouse; Tamay Ozgokmen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data used for the preparation of the manuscript "Lagrangian analysis of submesoscale flows from sparse data using Gaussian Process Regression for field reconstruction".
f
Data from: Clustering High-Dimensional Noisy Categorical Data
tandf.figshare.com
pdf
Updated Jan 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhiyi Tian; Jiaming Xu; Jen Tang (2024). Clustering High-Dimensional Noisy Categorical Data [Dataset]. http://doi.org/10.6084/m9.figshare.24925957.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24925957.v1
Dataset updated
Jan 16, 2024
Dataset provided by
Taylor & Francis
Authors
Zhiyi Tian; Jiaming Xu; Jen Tang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Clustering is a widely used unsupervised learning technique that groups data into homogeneous clusters. However, when dealing with real-world data that contain categorical values, existing algorithms can be computationally costly in high dimensions and can struggle with noisy data that has missing values. Furthermore, except for one algorithm, no others provide theoretical guarantees of clustering accuracy. In this article, we propose a general categorical data encoding method and a computationally efficient spectral-based algorithm to cluster high-dimensional noisy categorical data (nominal or ordinal). Under a statistical model for data on m attributes from n subjects in r clusters with missing probability ϵ, we show that our algorithm exactly recovers the true clusters with high probability when mn(1−ϵ)≥CMr2 log 3M, with M=max(n,m) and a fixed constant C. In addition, we show that mn(1−ϵ)2≥rδ/2 with 0
Data from: GPerturb: Gaussian process modelling of single-cell perturbation...
figshare.com
application/csv
Updated Nov 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hanwen Xing; Christopher Yau (2024). GPerturb: Gaussian process modelling of single-cell perturbation data [Dataset]. http://doi.org/10.6084/m9.figshare.26491588.v3
Explore at:
application/csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26491588.v3
Dataset updated
Nov 1, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Hanwen Xing; Christopher Yau
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Pre-trained models and associated processed datasets for all analyses in the paper
s
Data from: Mapping beta diversity from space: Sparse Generalized...
eprints.soton.ac.uk
search.dataone.org
+3more
Updated May 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leitão, Pedro J.; Suess, Stefan; Schwieder, Marcel; Catry, Inês; Milton, Edward; Moreira, Francisco; Osborne, Patrick E.; Pinto, Manuel J.; Van Der Linden, Sebastian; Hostert, Patrick; Milton, Edward (2023). Data from: Mapping beta diversity from space: Sparse Generalized Dissimilarity Modelling (SGDM) for analysing high-dimensional data [Dataset]. http://doi.org/10.5061/dryad.ns7pv
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.ns7pv
Dataset updated
May 6, 2023
Dataset provided by
DRYAD
Authors
Leitão, Pedro J.; Suess, Stefan; Schwieder, Marcel; Catry, Inês; Milton, Edward; Moreira, Francisco; Osborne, Patrick E.; Pinto, Manuel J.; Van Der Linden, Sebastian; Hostert, Patrick; Milton, Edward
Description
Species and environmental dataThis compiled (zip) file consists of 7 matrices of data: one species data matrix, with abundance observations per visited plot; and 6 environmental data matrices, consisting of land cover classification (Class), simulated EnMAP and Landsat data (April and August), and a 6 time-step Landsat time series (January, March, May, June, July and September). All data is compiled to the 125m radius plots, as described in the paper.Leitaoetal_Mapping beta diversity from space_Data.zip,1. Spatial patterns of community composition turnover (beta diversity) may be mapped through Generalised Dissimilarity Modelling (GDM). While remote sensing data are adequate to describe these patterns, the often high-dimensional nature of these data poses some analytical challenges, potentially resulting in loss of generality. This may hinder the use of such data for mapping and monitoring beta-diversity patterns. 2. This study presents Sparse Generalised Dissimilarity Modelling (SGDM), a methodological framework designed to improve the use of high-dimensional data to predict community turnover with GDM. SGDM consists of a two-stage approach, by first transforming the environmental data with a sparse canonical correlation analysis (SCCA), aimed at dealing with high-dimensional datasets, and secondly fitting the transformed data with GDM. The SCCA penalisation parameters are chosen according to a grid search procedure in order to optimise the predictive performance of a GDM fit on the resulting components. The proposed method was illustrated on a case study with a clear environmental gradient of shrub encroachment following cropland abandonment, and subsequent turnover in the bird communities. Bird community data, collected on 115 plots located along the described gradient, were used to fit composition dissimilarity as a function of several remote sensing datasets, including a time series of Landsat data as well as simulated EnMAP hyperspectral data. 3. The proposed approach always outperformed GDM models when fit on high-dimensional datasets. Its usage on low-dimensional data was not consistently advantageous. Models using high-dimensional data, on the other hand, always outperformed those using low-dimensional data, such as single date multispectral imagery. 4. This approach improved the direct use of high-dimensional remote sensing data, such as time series or hyperspectral imagery, for community dissimilarity modelling, resulting in better performing models. The good performance of models using high-dimensional datasets further highlights the relevance of dense time series and data coming from new and forthcoming satellite sensors for ecological applications such as mapping species beta diversity.
f
Data from: Algorithms for Sparse Support Vector Machines
tandf.figshare.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alfonso Landeros; Kenneth Lange (2023). Algorithms for Sparse Support Vector Machines [Dataset]. http://doi.org/10.6084/m9.figshare.21554661.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21554661.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francis
Authors
Alfonso Landeros; Kenneth Lange
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Many problems in classification involve huge numbers of irrelevant features. Variable selection reveals the crucial features, reduces the dimensionality of feature space, and improves model interpretation. In the support vector machine literature, variable selection is achieved by l1 penalties. These convex relaxations seriously bias parameter estimates toward 0 and tend to admit too many irrelevant features. The current article presents an alternative that replaces penalties by sparse-set constraints. Penalties still appear, but serve a different purpose. The proximal distance principle takes a loss function L(β) and adds the penalty ρ2dist(β,Sk)2 capturing the squared Euclidean distance of the parameter vector β to the sparsity set Sk where at most k components of β are nonzero. If βρ represents the minimum of the objective fρ(β)=L(β)+ρ2dist(β,Sk)2, then βρ tends to the constrained minimum of L(β) over Sk as ρ tends to ∞. We derive two closely related algorithms to carry out this strategy. Our simulated and real examples vividly demonstrate how the algorithms achieve better sparsity without loss of classification power. Supplementary materials for this article are available online.
J
Sparse Partial Least Squares in Time Series for Macroeconomic Forecasting...
jda-test.zbw.eu
txt
Updated Nov 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julieta Fuentes; Pilar Poncela; Julio Rodríguez; Julieta Fuentes; Pilar Poncela; Julio Rodríguez (2022). Sparse Partial Least Squares in Time Series for Macroeconomic Forecasting (replication data) [Dataset]. https://jda-test.zbw.eu/dataset/sparse-partial-least-squares-in-time-series-for-macroeconomic-forecasting
Explore at:
txt(417806), txt(413218), txt(1525)Available download formats
Dataset updated
Nov 8, 2022
Dataset provided by
ZBW - Leibniz Informationszentrum Wirtschaft
Authors
Julieta Fuentes; Pilar Poncela; Julio Rodríguez; Julieta Fuentes; Pilar Poncela; Julio Rodríguez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Factor models have been applied extensively for forecasting when high-dimensional datasets are available. In this case, the number of variables can be very large. For instance, usual dynamic factor models in central banks handle over 100 variables. However, there is a growing body of literature indicating that more variables do not necessarily lead to estimated factors with lower uncertainty or better forecasting results. This paper investigates the usefulness of partial least squares techniques that take into account the variable to be forecast when reducing the dimension of the problem from a large number of variables to a smaller number of factors. We propose different approaches of dynamic sparse partial least squares as a means of improving forecast efficiency by simultaneously taking into account the variable forecast while forming an informative subset of predictors, instead of using all the available ones to extract the factors. We use the well-known Stock and Watson database to check the forecasting performance of our approach. The proposed dynamic sparse models show good performance in improving efficiency compared to widely used factor methods in macroeconomic forecasting.
d
Data from: Pseudo-Label Generation for Multi-Label Text Classification
catalog.data.gov
datasets.ai
+1more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Pseudo-Label Generation for Multi-Label Text Classification [Dataset]. https://catalog.data.gov/dataset/pseudo-label-generation-for-multi-label-text-classification
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.
d
Data from: Discovery of sparse, reliable omic biomarkers with Stabl
datadryad.org
data.niaid.nih.gov
zip
Updated Oct 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julien Hédou; Ivana Marić; Grégoire Bellan; Jakob Einhaus; Brice Gaudillière (2023). Discovery of sparse, reliable omic biomarkers with Stabl [Dataset]. http://doi.org/10.5061/dryad.stqjq2c7d
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.stqjq2c7d
Dataset updated
Oct 12, 2023
Dataset provided by
Dryad
Authors
Julien Hédou; Ivana Marić; Grégoire Bellan; Jakob Einhaus; Brice Gaudillière
Time period covered
2023
Description
Stabl: sparse and reliable biomarker discovery in predictive modeling of high-dimensional omic data

This is a scikit-learn compatible Python implementation of Stabl, coupled with useful functions and example notebooks to rerun the analyses on the different use cases located in the Sample data folder of the code library and in the data.zip folder of this repository

Requirements

Python version : from 3.7 up to 3.10

Python packages:

joblib == 1.1.0

tqdm == 4.64.0

matplotlib == 3.5.2

numpy == 1.23.1

cmake == 3.27.1

knockpy == 1.2

scikit-learn == 1.1.2

seaborn == 0.12.0

groupyr == 0.3.2

pandas == 1.4.2

statsmodels == 0.14.0

openpyxl == 3.0.7

adjustText == 0.8

scipy == 1.10.1

julia == 0.6.1

osqp == 0.6.2

Julia package for noise generation (version 1.9.2) :

Bigsimr == 0.8.7

Distributions == 0.25.98

PyCall == 1.96.1

Installation

Julia installation

To install Julia, please follow these instructions:

Download Julia from [here](ht...
Code and Data from: An Imputation-Based Approach for Augmenting Sparse...
zenodo.org
zip
Updated Jul 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy Benefield; Amy Benefield; VP Nagraj; VP Nagraj; Desiree Williams; Desiree Williams (2024). Code and Data from: An Imputation-Based Approach for Augmenting Sparse Epidemiological Signals [Dataset]. http://doi.org/10.5281/zenodo.13146377
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13146377
Dataset updated
Jul 31, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Amy Benefield; Amy Benefield; VP Nagraj; VP Nagraj; Desiree Williams; Desiree Williams
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This directory contains R code and required data to run the full data augmentation described in, "An Imputation-Based Approach for Augmenting Sparse Epidemiological Signals."

"aug_pipeline.R" runs through all component steps and calls individual functions and data files within the directory. "plots_for_pipeline.R" uses data created during the aug_pipeline script to visualize individual steps in the augmentation process.
Z
Data from: netDx: Interpretable patient classification using integrated...
data.niaid.nih.gov
Updated Jul 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shah, Muhammad A (2024). netDx: Interpretable patient classification using integrated patient similarity networks [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2558451
Explore at:
Dataset updated
Jul 25, 2024
Dataset provided by
Bader, Gary D
Shah, Muhammad A
Kaka, Hussam
Hui, Shirley
Isserlin, Ruth
Pai, Shraddha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Docker image containing installed netDx software in Ubuntu to reproduce examples from the published manuscript. The R implementation of netDx is hosted at: https://github.com/BaderLab/netDx

Publication abstract: Patient classification has widespread biomedical and clinical applications, including diagnosis, prognosis and treatment response prediction. A clinically useful prediction algorithm should be accurate, generalizable, be able to integrate diverse data types, and handle sparse data. A clinical predictor based on genomic data needs to be easily interpretable to drive hypothesis-driven research into new treatments. We describe netDx, a novel supervised patient classification framework based on patient similarity networks. netDx meets the above criteria and particularly excels at data integration and model interpretability. We compared classification performance of this method against other machine-learning algorithms, using a cancer survival benchmark with four cancer types, each requiring integration of up to six genomic and clinical data types. In these tests, netDx has significantly higher average performance than most other machine-learning approaches across most cancer types. In comparison to traditional machine learning-based patient classifiers, netDx results are more interpretable, visualizing the decision boundary in the context of patient similarity space. When patient similarity is defined by pathway-level gene expression, netDx identifies biological pathways important for outcome prediction, as demonstrated in diverse data sets of breast cancer and asthma. Thus, netDx can serve both as a patient classifier and as a tool for discovery of biological features characteristic of disease. We provide a freely available software implementation of netDx along with sample files and automation workflows in R.
d
Data from: A Comparison of Three Data-driven Techniques for Prognostics
catalog.data.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
+1more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). A Comparison of Three Data-driven Techniques for Prognostics [Dataset]. https://catalog.data.gov/dataset/a-comparison-of-three-data-driven-techniques-for-prognostics
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
In situations where the cost/benefit analysis of using physics-based damage propagation algorithms is not favorable and when sufficient test data are available that map out the damage space, one can employ data-driven approaches. In this investigation, we evaluate different algorithms for their suitability in those circumstances. We are interested in assessing the trade-off that arises from the ability to support uncertainty management, and the accuracy of the predictions. We compare here a Relevance Vector Machine (RVM), Gaussian Process Regression (GPR), and a Neural Network-based approach and employ them on relatively sparse training sets with very high noise content. Results show that while all methods can provide remaining life estimates although different damage estimates of the data (diagnostic output) changes the outcome considerably. In addition, we found that there is a need for performance metrics that provide a comprehensive and objective assessment of prognostics algorithm performance.
Z
Data from: How Many Events Do You Need? Event-Based Visual Place Recognition...
data.niaid.nih.gov
Updated Jan 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fischer, Tobias (2024). How Many Events Do You Need? Event-Based Visual Place Recognition Using Sparse But Varying Pixels [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10494919
Explore at:
Dataset updated
Jan 15, 2024
Dataset provided by
Milford, Michael
Fischer, Tobias
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset accompanies the following publication, please cite this publication if you use this dataset:

Fischer, T. and Milford, M., 2022. How Many Events Do You Need? Event-Based Visual Place Recognition Using Sparse But Varying Pixels. IEEE Robotics and Automation Letters, 7(4), pp.12275-12282.

@article{FischerRAL2022ICRA2023,

title={How Many Events do You Need? Event-based Visual Place Recognition Using Sparse But Varying Pixels}, author={Tobias Fischer and Michael Milford}, journal={IEEE Robotics and Automation Letters}, volume={7}, number={4}, pages={12275--12282}, year={2022}, doi={10.1109/LRA.2022.3216226},

}

The dataset contains seven sequences of recordings. For each recording, the following files are made available:

A rosbag (*.bag) file with the following contents:

/dvs/events (type: dvs_msgs/EventArray) with the event stream, see https://github.com/uzh-rpg/rpg_dvs_ros

/dvs/camera_info (type: sensor_msgs/CameraInfo) with the camera info of the DAVIS frame camera

/dvs/image_raw (type: sensor_msgs/Image) with the DAVIS frame camera images

/dvs/imu (sensor_msgs/Imu) with the IMU data of the event camera

A parquet file that can be read with pandas, which is converted from the bag file, with a denoising algorithm applied.

A zip file containing the DAVIS frame camera images. Once extracted, the images have the timestamp as their filename.

Please see the associated code repository (https://github.com/Tobias-Fischer/sparse-event-vpr) for manually annotated ground-truth information.
m
Data for ANN Analysis
data.mendeley.com
Updated Oct 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ivan Kopal (2021). Data for ANN Analysis [Dataset]. http://doi.org/10.17632/rfxtdz5pr2.1
Explore at:
Unique identifier
https://doi.org/10.17632/rfxtdz5pr2.1
Dataset updated
Oct 25, 2021
Authors
Ivan Kopal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In the present work, a new artificial neural network-based model for predicting the curing characteristics of rubber blends with different contents of carbon black filler cured at various temperatures has been developed. The variations of 4 curing characteristics, most commonly used in the rubber industry, namely of the minimum and maximum elastic torque, scorch time and optimal cure time, with carbon black contents in the rubber blend and cure temperature, have been obtained on the basis of the analysis of 11 experimental isothermal rheological cure curves registered by an oscillating-disk rheometer at 10 cure temperatures. The computer implementation of the ANN model requires a special pre-processing of the raw experimental data, which is described in detail in the paper. The implementation of ANN model for predicting the curing characteristics of RBs with different contents of CB filler at various cure temperatures was done in the MATLAB® software package, Version 9.0.0.341360 R2016a 64-bit, equipped with a Neural Network Toolbox (Math Works, Natic, MA, USA), that provides a number of built-in tools for sufficiently powerful and user-friendly work with ANNs of a wide range of types and architectures. The GRNN was used to solve the given function approximation problem, in particular for its extremely high learning rate and rapid convergence to optimal regression levels even in the case of sparse data. The satisfactory agreement between the experimental and modelled values has been found for all four curing characteristics, with the maximum error in the prediction for modelled minimum and maximum elastic torque less than 3%, and for modelled scorch time and optimal cure time not exceeding 5% of their experimental values. It can be concluded that the generalized regression neural network is a very powerful tool for intelligent modelling the curing process of rubber blends even in the case of a small training dataset, and it can find a wide practical application in the area of the rubber industry.
D
The 11th SPE Comparative Solution Project: Submitted Data
darus.uni-stuttgart.de
Updated Mar 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bernd Flemisch; Jan Martin Nordbotten; Martin Fernø; Knut-Andreas Lie; Anthony Kovscek; Jakub Both; Olav Møyner; Tor Harald Sandve; Chaojie Di; Zhangxing Chen; Firdovsi Gasanzade; Sebastian Bauer; Christopher Green; Mohammad Sayyafzadeh; Yousef Ghomian; Nicolas Ruby; George Hadjisotiriou; Denis Voskov; Jacques Franc; Dickson Kachuma; Didier Yu DING; Eric Flauraud; Abdallah A. Youssef; Pablo Salinas; Paolo Orsini; David Landa-Marbán; Kjetil Olsen Lye; Jakob Torben; Etienne Ahusborde; Michel Kern; Michael Nole; Glenn Hammond; Jakub Solovsky; Abbas Firoozabadi; Marie Ann Giddins; Kai Wendel; Timo Koch; Holger Class; Adam Turner; Hai Huang; David Element; Bruno Ramon Batista Fernandes; Prasanna Krishnamurthy (2025). The 11th SPE Comparative Solution Project: Submitted Data [Dataset]. http://doi.org/10.18419/DARUS-4750
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-4750
Dataset updated
Mar 22, 2025
Dataset provided by
DaRUS
Authors
Bernd Flemisch; Jan Martin Nordbotten; Martin Fernø; Knut-Andreas Lie; Anthony Kovscek; Jakub Both; Olav Møyner; Tor Harald Sandve; Chaojie Di; Zhangxing Chen; Firdovsi Gasanzade; Sebastian Bauer; Christopher Green; Mohammad Sayyafzadeh; Yousef Ghomian; Nicolas Ruby; George Hadjisotiriou; Denis Voskov; Jacques Franc; Dickson Kachuma; Didier Yu DING; Eric Flauraud; Abdallah A. Youssef; Pablo Salinas; Paolo Orsini; David Landa-Marbán; Kjetil Olsen Lye; Jakob Torben; Etienne Ahusborde; Michel Kern; Michael Nole; Glenn Hammond; Jakub Solovsky; Abbas Firoozabadi; Marie Ann Giddins; Kai Wendel; Timo Koch; Holger Class; Adam Turner; Hai Huang; David Element; Bruno Ramon Batista Fernandes; Prasanna Krishnamurthy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data submitted to the 11th Society of Petroleum Engineers Comparative Solution Project. Contains the sparse and dense data files of 77 results submitted by 18 participating groups for three cases SPE11A-C. Each zip file contains one such result, where the name speX_NAMEY.zip indicates Result Y for Case SPE11X of Participant NAME. Unpacking a result file yields one sparse data file speX_time_series.csv, several dense data files speX_spatial_map_TIME.csv, and, optionally, performance data files. A sparse data file contains the evolution of several scalar quantities over time, while a dense data file contains the spatial distribution of several scalar quantities at a particular reporting time step. For more information, see the related publication. The results can be processed by the scripts provided in the repository github.com/Simulation-Benchmarks/11thSPE-CSP. From the repository's website, access to a Jupyter Hub is enabled that allows to run the scripts on the full dataset.
f
Data from: A change-point–based control chart for detecting sparse mean...
tandf.figshare.com
txt
Updated Jan 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zezhong Wang; Inez Maria Zwetsloot (2024). A change-point–based control chart for detecting sparse mean changes in high-dimensional heteroscedastic data [Dataset]. http://doi.org/10.6084/m9.figshare.24441804.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24441804.v1
Dataset updated
Jan 17, 2024
Dataset provided by
Taylor & Francis
Authors
Zezhong Wang; Inez Maria Zwetsloot
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Because of the “curse of dimensionality,” high-dimensional processes present challenges to traditional multivariate statistical process monitoring (SPM) techniques. In addition, the unknown underlying distribution of and complicated dependency among variables such as heteroscedasticity increase the uncertainty of estimated parameters and decrease the effectiveness of control charts. In addition, the requirement of sufficient reference samples limits the application of traditional charts in high-dimension, low-sample-size scenarios (small n, large p). More difficulties appear when detecting and diagnosing abnormal behaviors caused by a small set of variables (i.e., sparse changes). In this article, we propose two change-point–based control charts to detect sparse shifts in the mean vector of high-dimensional heteroscedastic processes. Our proposed methods can start monitoring when the number of observations is a lot smaller than the dimensionality. The simulation results show that the proposed methods are robust to nonnormality and heteroscedasticity. Two real data examples are used to illustrate the effectiveness of the proposed control charts in high-dimensional applications. The R codes are provided online.
National Forest and Sparse Woody Vegetation Data (Version 3, 2018 Release)
data.gov.au
.pdf, geotiff, wms +1
Updated Apr 7, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Government Department of Climate Change, Energy, the Environment and Water (2022). National Forest and Sparse Woody Vegetation Data (Version 3, 2018 Release) [Dataset]. https://data.gov.au/data/dataset/national-forest-and-sparse-woody-vegetation-data-version-3-2018-release
Explore at:
zip(1542892812), .pdf(466616), wms, zip(921336367), zip(293806346), zip(79186384), zip, geotiffAvailable download formats
Dataset updated
Apr 7, 2022
Dataset provided by
Department of Climate Change, Energy, the Environment and Water of Australiahttps://www.dcceew.gov.au/
Australian Governmenthttp://www.australia.gov.au/
Authors
Australian Government Department of Climate Change, Energy, the Environment and Water
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Landsat satellite imagery is used to derive woody vegetation extent products that discriminate between forest, sparse woody and non-woody land cover across a time series from 1988 to 2018. A forest is defined as woody vegetation with a minimum 20 per cent canopy cover, potentially reaching 2 metres high and a minimum area of 0.2 hectares. Sparse woody is defined as woody vegetation with a canopy cover between 5-19 per cent.

The three-class classification (forest, sparse woody and non-woody) supersedes the two class classification (forest and non-forest) from 2016. The new classification is produced using the same approach in terms of time series processing (conditional probability networks) as the two-class method, to detect woody vegetation cover. The three-class algorithm better encompasses the different types of woody vegetation across the Australian landscape.
d
Efficient Matlab Programs
catalog.data.gov
datasets.ai
+1more
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Efficient Matlab Programs [Dataset]. https://catalog.data.gov/dataset/efficient-matlab-programs
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
Matlab has a reputation for running slowly. Here are some pointers on how to speed computations, to an often unexpected degree. Subjects currently covered: Matrix Coding Implicit Multithreading on a Multicore Machine Sparse Matrices Sub-Block Computation to Avoid Memory Overflow Matrix Coding - 1 Matlab documentation notes that efficient computation depends on using the matrix facilities, and that mathematically identical algorithms can have very different runtimes, but they are a bit coy about just what these differences are. A simple but telling example: The following is the core of the GD-CLS algorithm of Berry et.al., copied from fig. 1 of Shahnaz et.al, 2006, "Document clustering using nonnegative matrix factorization': for jj = 1:maxiter A = W'*W + lambda*eye(k); for ii = 1:n b = W'*V(:,ii); H(:,ii) = A \ b; end H = H .* (H>0); W = W .* (V*H') ./ (W*(H*H') + 1e-9); end Replacing the columwise update of H with a matrix update gives: for jj = 1:maxiter A = W'*W + lambda*eye(k); B = W'*V; H = A \ B; H = H .* (H>0); W = W .* (V*H') ./ (W*(H*H') + 1e-9); end These were tested on an 8049 x 8660 sparse matrix bag of words V (.0083 non-zeros), with W of size 8049 x 50, H 50 x 8660, maxiter = 50, lambda = 0.1, and identical initial W. They were run consecutivly, multithreaded on an 8-processor Sun server, starting at ~7:30PM. Tic-toc timing was recorded. Runtimes were respectivly 6586.2 and 70.5 seconds, a 93:1 difference. The maximum absolute pairwise difference between W matrix values was 6.6e-14. Similar speedups have been consistantly observed in other cases. In one algorithm, combining matrix operations with efficient use of the sparse matrix facilities gave a 3600:1 speedup. For speed alone, C-style iterative programming should be avoided wherever possible. In addition, when a couple lines of matrix code can substitute for an entire C-style function, program clarity is much improved. Matrix Coding - 2 Applied to integration, the speed gains are not so great, largely due to the time taken to set up the and deal with the boundaries. The anyomous function setup time is neglegable. I demonstrate on a simple uniform step linearly interpolated 1-D integration of cos() from 0 to pi, which should yield zero: tic; step = .00001; fun = @cos; start = 0; endit = pi; enda = floor((endit - start)/step)step + start; delta = (endit - enda)/step; intF = fun(start)/2; intF = intF + fun(endit)delta/2; intF = intF + fun(enda)(delta+1)/2; for ii = start+step:step:enda-step intF = intF + fun(ii); end intF = intFstep toc; intF = -2.910164109692914e-14 Elapsed time is 4.091038 seconds. Replacing the inner summation loop with the matrix equivalent speeds things up a bit: tic; step = .00001; fun = @cos; start = 0; endit = pi; enda = floor((endit - start)/step)*step + start; delta = (endit - enda)/step; intF = fun(start)/2; intF = intF + fun(endit)*delta/2; intF = intF + fun(enda)*(delta+1)/2; intF = intF + sum(fun(start+step:step:enda-step)); intF = intF*step toc; intF = -2.868419946011613e-14 Elapsed time is 0.141564 seconds. The core computation take
Data Mining for IVHM using Sparse Binary Ensembles, Phase I
data.nasa.gov
application/rdfxml +5
Updated Jun 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Data Mining for IVHM using Sparse Binary Ensembles, Phase I [Dataset]. https://data.nasa.gov/dataset/Data-Mining-for-IVHM-using-Sparse-Binary-Ensembles/qfus-evzq
Explore at:
xml, tsv, csv, application/rssxml, application/rdfxml, jsonAvailable download formats
Dataset updated
Jun 26, 2018
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
In response to NASA SBIR topic A1.05, "Data Mining for Integrated Vehicle Health Management", Michigan Aerospace Corporation (MAC) asserts that our unique SPADE (Sparse Processing Applied to Data Exploitation) technology meets a significant fraction of the stated criteria and has functionality that enables it to handle many applications within the aircraft lifecycle. SPADE distills input data into highly quantized features and uses MAC's novel techniques for constructing Ensembles of Decision Trees to develop extremely accurate diagnostic/prognostic models for classification, regression, clustering, anomaly detection and semi-supervised learning tasks. These techniques are currently being employed to do Threat Assessment for satellites in conjunction with researchers at the Air Force Research Lab. Significant advantages to this approach include: 1) completely data driven; 2) training and evaluation are faster than conventional methods; 3) operates effectively on huge datasets (> billion samples X > million features), 4) proven to be as accurate as state-of-the-art techniques in many significant real-world applications. The specific goals for Phase 1 will be to work with domain experts at NASA and with our partners Boeing, SpaceX and GMV Space Systems to delineate a subset of problems that are particularly well-suited to this approach and to determine requirements for deploying algorithms on platforms of opportunity.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mingqiang Zhang (2022). Dataset for sparse data reconstruction with AI [Dataset]. https://ieee-dataport.org/documents/dataset-sparse-data-reconstruction-ai

Dataset for sparse data reconstruction with AI

Explore at:

Dataset updated

Sep 27, 2022

Authors

Mingqiang Zhang

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

row sparse (Sparse Model B)

Clear search

Close search

Google apps

Main menu

Dataset for sparse data reconstruction with AI

Data from: Sparse Inverse Gaussian Process Regression with Application to...

Data from: Lagrangian analysis of submesoscale flows from sparse data using...

Data from: Clustering High-Dimensional Noisy Categorical Data

Data from: GPerturb: Gaussian process modelling of single-cell perturbation...

Data from: Mapping beta diversity from space: Sparse Generalized...

Data from: Algorithms for Sparse Support Vector Machines

Sparse Partial Least Squares in Time Series for Macroeconomic Forecasting...

Data from: Pseudo-Label Generation for Multi-Label Text Classification

Data from: Discovery of sparse, reliable omic biomarkers with Stabl

Stabl: sparse and reliable biomarker discovery in predictive modeling of high-dimensional omic data

Requirements

Installation

Julia installation

Code and Data from: An Imputation-Based Approach for Augmenting Sparse...

Data from: netDx: Interpretable patient classification using integrated...

Data from: A Comparison of Three Data-driven Techniques for Prognostics

Data from: How Many Events Do You Need? Event-Based Visual Place Recognition...

Data for ANN Analysis

The 11th SPE Comparative Solution Project: Submitted Data

Data from: A change-point–based control chart for detecting sparse mean...

National Forest and Sparse Woody Vegetation Data (Version 3, 2018 Release)

Efficient Matlab Programs

Data Mining for IVHM using Sparse Binary Ensembles, Phase I

Dataset for sparse data reconstruction with AI