100+ datasets found

f
Performance comparison for user cold-start problem on BE-dataset.
plos.figshare.com
xls
Updated Jun 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan (2023). Performance comparison for user cold-start problem on BE-dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0273486.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0273486.t006
Dataset updated
Jun 6, 2023
Dataset provided by
PLOS ONE
Authors
Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance comparison for user cold-start problem on BE-dataset.
f
Statistics of sample data from the Last.fm and Douban datasets.
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Waleed Reafee; Naomie Salim; Atif Khan (2023). Statistics of sample data from the Last.fm and Douban datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0154848.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0154848.t001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Waleed Reafee; Naomie Salim; Atif Khan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistics of sample data from the Last.fm and Douban datasets.
d
Data from: Sparse Inverse Gaussian Process Regression with Application to...
catalog.data.gov
data.nasa.gov
+2more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Sparse Inverse Gaussian Process Regression with Application to Climate Network Discovery [Dataset]. https://catalog.data.gov/dataset/sparse-inverse-gaussian-process-regression-with-application-to-climate-network-discovery
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and finances. Gaussian Process regression is a popular technique for modeling the input-output relations of a set of variables under the assumption that the weight vector has a Gaussian prior. However, it is challenging to apply Gaussian Process regression to large data sets since prediction based on the learned model requires inversion of an order n kernel matrix. Approximate solutions for sparse Gaussian Processes have been proposed for sparse problems. However, in almost all cases, these solution techniques are agnostic to the input domain and do not preserve the similarity structure in the data. As a result, although these solutions sometimes provide excellent accuracy, the models do not have interpretability. Such interpretable sparsity patterns are very important for many applications. We propose a new technique for sparse Gaussian Process regression that allows us to compute a parsimonious model while preserving the interpretability of the sparsity structure in the data. We discuss how the inverse kernel matrix used in Gaussian Process prediction gives valuable domain information and then adapt the inverse covariance estimation from Gaussian graphical models to estimate the Gaussian kernel. We solve the optimization problem using the alternating direction method of multipliers that is amenable to parallel computation. We demonstrate the performance of our method in terms of accuracy, scalability and interpretability on a climate data set.
d
Data from: Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach
catalog.data.gov
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach [Dataset]. https://catalog.data.gov/dataset/sparse-solutions-for-single-class-svms-a-bi-criterion-approach
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
In this paper we propose an innovative learning algorithm - a variation of One-class ? Support Vector Machines (SVMs) learning algorithm to produce sparser solutions with much reduced computational complexities. The proposed technique returns an approximate solution, nearly as good as the solution set obtained by the classical approach, by minimizing the original risk function along with a regularization term. We introduce a bi-criterion optimization that helps guide the search towards the optimal set in much reduced time. The outcome of the proposed learning technique was compared with the benchmark one-class Support Vector machines algorithm which more often leads to solutions with redundant support vectors. Through out the analysis, the problem size for both optimization routines was kept consistent. We have tested the proposed algorithm on a variety of data sources under different conditions to demonstrate the effectiveness. In all cases the proposed algorithm closely preserves the accuracy of standard one-class ? SVMs while reducing both training time and test time by several factors.
Z
Data from: MetaFlux: Meta-learning global carbon fluxes from sparse...
data.niaid.nih.gov
zenodo.org
Updated Apr 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liu, Jiangong (2024). MetaFlux: Meta-learning global carbon fluxes from sparse spatiotemporal observations [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7761880
Explore at:
Dataset updated
Apr 14, 2024
Dataset provided by
Nathaniel, Juan
Gentine, Pierre
Liu, Jiangong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MetaFlux is a global, long-term carbon flux dataset of gross primary production and ecosystem respiration that is generated using meta-learning. The principle of meta-learning stems from the need to solve the problem of learning in the face of sparse data availability. Data sparsity is a prevalent challenge in climate and ecology science. For instance, in-situ observations tend to be spatially and temporally sparse. This issue can arise from sensor malfunctions, limited sensor locations, or non-ideal climate conditions such as persistent cloud cover. The lack of high-quality continuous data can make it difficult to understand many climate processes that are otherwise critical. The machine-learning community has attempted to tackle this problem by developing several learning approaches, including meta-learning that learns how to learn broad features across tasks to better infer other poorly sampled ones. In this work, we applied meta-learning to solve the problem of upscaling continuous carbon fluxes from sparse observations. Data scarcity in carbon flux applications is particularly problematic in the tropics and semi-arid regions, where only around 8–11% of long-term eddy covariance stations are currently operational. Unfortunately, these regions are important in modulating the global carbon cycle and its interannual variability. In general, we find that meta-trained machine models, including multi-layer perceptrons (MLP), long-short-term memory (LSTM), and bi-directional LSTM (BiLSTM), have lower validation errors on flux estimates by 9–16% when compared to their non-meta-trained counterparts. In addition, meta-trained models are more robust to extreme conditions, with 4–24% lower overall errors. Finally, we use an ensemble of meta-trained deep networks to generate a global product of ecosystem-scale photosynthesis and respiration fluxes from in-situ observations to daily and monthly global products at a 0.25-degree spatial resolution from 2001 to 2023, called "MetaFlux". We also checked for the seasonality, interannual variability, and correlation to solar-induced fluorescence of the upscaled product and found that MetaFlux outperformed state-of-the-art machine learning upscaling models, especially in critical semi-arid and tropical regions.
f
The performance comparison based on Last.fm dataset.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Waleed Reafee; Naomie Salim; Atif Khan (2023). The performance comparison based on Last.fm dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0154848.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0154848.t002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Waleed Reafee; Naomie Salim; Atif Khan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The performance comparison based on Last.fm dataset.
Parameters used for the AD methods.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vyron Christodoulou; Yaxin Bi; George Wilkie (2023). Parameters used for the AD methods. [Dataset]. http://doi.org/10.1371/journal.pone.0212098.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0212098.t003
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Vyron Christodoulou; Yaxin Bi; George Wilkie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Parameters used for the AD methods.
E
Replication Data for: Sparse multi-trait genomic prediction under incomplete...
data.moa.gov.et
html
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CIMMYT Ethiopia (2025). Replication Data for: Sparse multi-trait genomic prediction under incomplete block designs [Dataset]. https://data.moa.gov.et/dataset/hdl-11529-10548787
Explore at:
htmlAvailable download formats
Dataset updated
Jan 20, 2025
Dataset provided by
CIMMYT Ethiopia
Description
The efficiency of genomic selection methodologies can be increased by sparse testing where a subset of materials are evaluated in different environments. Seven different multi-environment plant breeding datasets were used to evaluate four different methods for allocating lines to environments in a multi-trait genomic prediction problem. The results of the analysis are presented in the accompanying article.
f
Training and testing set for E-Commerce product images dataset.
plos.figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan (2023). Training and testing set for E-Commerce product images dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0273486.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0273486.t002
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Training and testing set for E-Commerce product images dataset.
J
Sparse Partial Least Squares in Time Series for Macroeconomic Forecasting...
jda-test.zbw.eu
txt
Updated Nov 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julieta Fuentes; Pilar Poncela; Julio Rodríguez; Julieta Fuentes; Pilar Poncela; Julio Rodríguez (2022). Sparse Partial Least Squares in Time Series for Macroeconomic Forecasting (replication data) [Dataset]. https://jda-test.zbw.eu/dataset/sparse-partial-least-squares-in-time-series-for-macroeconomic-forecasting
Explore at:
txt(417806), txt(413218), txt(1525)Available download formats
Dataset updated
Nov 8, 2022
Dataset provided by
ZBW - Leibniz Informationszentrum Wirtschaft
Authors
Julieta Fuentes; Pilar Poncela; Julio Rodríguez; Julieta Fuentes; Pilar Poncela; Julio Rodríguez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Factor models have been applied extensively for forecasting when high-dimensional datasets are available. In this case, the number of variables can be very large. For instance, usual dynamic factor models in central banks handle over 100 variables. However, there is a growing body of literature indicating that more variables do not necessarily lead to estimated factors with lower uncertainty or better forecasting results. This paper investigates the usefulness of partial least squares techniques that take into account the variable to be forecast when reducing the dimension of the problem from a large number of variables to a smaller number of factors. We propose different approaches of dynamic sparse partial least squares as a means of improving forecast efficiency by simultaneously taking into account the variable forecast while forming an informative subset of predictors, instead of using all the available ones to extract the factors. We use the well-known Stock and Watson database to check the forecasting performance of our approach. The proposed dynamic sparse models show good performance in improving efficiency compared to widely used factor methods in macroeconomic forecasting.
Min-Cut/Max-Flow Problem Instances for Benchmarking
zenodo.org
data.dtu.dk
txt, zip
Updated Dec 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick M. Jensen; Niels Jeppesen; Anders B. Dahl; Vedrana A. Dahl; Patrick M. Jensen; Niels Jeppesen; Anders B. Dahl; Vedrana A. Dahl (2022). Min-Cut/Max-Flow Problem Instances for Benchmarking [Dataset]. http://doi.org/10.5281/zenodo.4905882
Explore at:
zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4905882
Dataset updated
Dec 5, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Patrick M. Jensen; Niels Jeppesen; Anders B. Dahl; Vedrana A. Dahl; Patrick M. Jensen; Niels Jeppesen; Anders B. Dahl; Vedrana A. Dahl
Description
NOTE: This dataset is now outdated. Please see https://doi.org/10.11583/DTU.17091101 for the updated version with many more problems.

This is a collection of min-cut/max-flow problem instances that can be used for benchmarking min-cut/max-flow algorithms. The collection is released in companionship with the paper:

Jensen et al., "Review of Serial and Parallel Min-Cut/Max-Flow Algorithms for Computer Vision", T-PAMI, 2022.

The problem instances are collected from a wide selection of sources to be as representative as possible. Specifically, this collection contains:

Most of the problem instances (some are unavailable due to dead links) published by the University of Waterloo: https://vision.cs.uwaterloo.ca/data/maxflow

Super-resolution, texture restoration, deconvolution, decision tree field (DTF) and automatic labeling environment from Verma & Batra, "MaxFlow Revisited: An Empirical Comparison of Maxflow Algorithms for Dense Vision Problems", 2012, BMVC

Sparse Layered Graph instances from Jeppesen et al., "Sparse Layered Graphs for Multi-Object Segmentation", 2020, CVPR

Multi-object surface fitting from Jensen et al., "Multi-Object Graph-Based Segmentation With Non-Overlapping Surfaces", 2020, CVPRW

The reason for releasing this collection is to provide a single place to download all datasets used in our paper (and various previous paper) instead of having to scavenge from multiple sources. Furthermore, several of the problem instances typically used for benchmarking min-cut/max-flow algorithms are no longer available at their original locations and may be difficult to find. By storing the data on Zenodo with a dedicated DOI we hope to avoid this. For license information, see below.

Files and formats

We provide all problem instances in two file formats: DIMACS and a custom binary format. Both are described below. Each file has been zipped, and similar files have then been grouped into their own zip file (i.e. it is a zip of zips). DIMACS files have been prefixed with `dimacs_` and binary files have been prefixed with `bin_`.

DIMACS

All problem instances are available in DIMACS format (explained here: http://lpsolve.sourceforge.net/5.5/DIMACS_maxf.htm).

For the larger problem instances, we have also published a partition of the graph nodes into blocks for block-based parallel min-cut/max-flow. The partition matches the one used in the companion review paper (Jensen et al., 2021). For a problem instance with filename `

4 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3

Binary files

While DIMACS has the advantage of being human-readable, storing everything as text requires a lot of space. This makes the files unnecessarily large and slow to parse. To overcome this, we also release all problem instances in a simple binary storage format. We have two formats: one for graphs and one for quadratic pseudo-boolean optimization (QPBO) problems. Code to convert to/from DIMACS is also available at: https://www.doi.org/10.5281/zenodo.4903946 or https://github.com/patmjen/maxflow_algorithms.

Binary BK (`.bbk`) files are for storing normal graphs for min-cut/max-flow. They closely follow the internal storage format used in the original implementation of the Boykov-Kolmogorov algorithm, meaning that terminal arcs are stored in a separate list from normal neighbor arcs. The format is:

Uncompressed:

Header: (3 x uint8) 'BBQ' Types codes: (2 x uint8) captype, tcaptype Sizes: (3 x uint64) num_nodes, num_terminal_arcs, num_neighbor_arcs Terminal arcs: (num_terminal_arcs x BkTermArc) Neighbor arcs: (num_neighbor_arcs x BkNborArc)

Compressed (using Google's snappy: https://github.com/google/snappy):

Header: (3 x uint8) 'bbq' Types codes: (2 x uint8) captype, tcaptype Sizes: (3 x uint64) num_nodes, num_terminal_arcs, num_neighbor_arcs Terminal arcs: (1 x uint64) compressed_bytes_1 (compressed_bytes_1 x uint8) compressed num_terminal_arcs x BkTermArc Neighbor arcs: (1 x uint64) compressed_bytes_2 (compressed_bytes_2 x uint8) compressed num_neighbor_arcs x BkNborArc

Where:

/** Enum for switching over POD types. */ enum TypeCode : uint8_t { TYPE_UINT8, TYPE_INT8, TYPE_UINT16, TYPE_INT16, TYPE_UINT32, TYPE_INT32, TYPE_UINT64, TYPE_INT64, TYPE_FLOAT, TYPE_DOUBLE, TYPE_INVALID = 0xFF }; /** Terminal arc with source and sink capacity for given node. */ template

Binary QPBO (`.bq`) files are for storing QPBO problems. Unary and binary terms are stored in separate lists. The format is:

Uncompressed:

Header: (5 x uint8) 'BQPBO' Types codes: (1 x uint8) captype Sizes: (3 x uint64) num_nodes, num_unary_terms, num_binary_terms Unary arcs: (num_unary_terms x BkUnaryTerm) Binary arcs: (num_binary_terms x BkBinaryTerm)

Compressed (using Google's snappy: https://github.com/google/snappy):

Header: (5 x uint8) 'bqpbo' Types codes: (1 x uint8) captype Sizes: (3 x uint64) num_nodes, num_unary_terms, num_binary_terms Unary terms: (1 x uint64) compressed_bytes_1 (compressed_bytes_1 x uint8) compressed num_unary_terms x BkUnaryTerm Binary terms: (1 x uint64) compressed_bytes_2 (compressed_bytes_2 x uint8) compressed num_binary_terms x BkBinaryTerm

Where:

/** Enum for switching over POD types. */ enum TypeCode : uint8_t { TYPE_UINT8, TYPE_INT8, TYPE_UINT16, TYPE_INT16, TYPE_UINT32, TYPE_INT32, TYPE_UINT64, TYPE_INT64, TYPE_FLOAT, TYPE_DOUBLE, TYPE_INVALID = 0xFF }; /** Unary term */ template

Block (`.blk`) files are for storing a partition of the graph nodes into disjoint blocks. The format is:

Nodes: uint64_t num_nodes Blocks: uint16_t num_blocks Data: (num_nodes x uint16_t) node_blocks

We do not claim ownership over the problem instances from the University of Waterloo and those from (Verma and Batra, 2012). Please contact the original sources for additional information. We publish the datasets from (Jeppesen et al., 2020) and (Jensen et al., 2020) under the Creative Commons Attribution 4.0 International (CC BY 4.0), see https://creativecommons.org/licenses/by/4.0/. The specific files are (not counting their `dimacs_`/`bin_` prefix):

University of Waterloo (consult original sources)

adhead.zip

babyface.zip

bone.zip

bone_sub.zip

liver.zip

BL06.zip

BVZ.zip

LB07.zip

KZ2.zip

Verma & Batra, 2012 (consult original sources):

ale.zip

dtf.zip

deconv.zip

super_res.zip

texture.zip

Jeppesen et al., 2020 (CC BY 4.0):

NT32_tomo3.zip

NT32_tomo3_216.zip

Jensen et al., 2020 (CC BY 4.0):

cells.zip

foam.zip

simcells.zip
n
Sparsity-constrained wavefront optimization by leveraging complex media
data.niaid.nih.gov
datadryad.org
zip
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li-Yu Yu; Sixian You (2024). Sparsity-constrained wavefront optimization by leveraging complex media [Dataset]. http://doi.org/10.5061/dryad.wdbrv15wk
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.wdbrv15wk
Dataset updated
May 31, 2024
Dataset provided by
Massachusetts Institute of Technology
Authors
Li-Yu Yu; Sixian You
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Wavefront shaping gains increasing importance in complex photonics, which can manipulate light spatially and temporally to counter the scattering effect. Important applications include deep-tissue imaging, microendoscopy, optical communications, nanofabrication, and remote sensing. However, high-speed and high-fidelity wavefront shaping is fundamentally hindered by the dimensionality limitation of hardware devices, evinced by the competition between the frame rate, pixel count, and modulation depth. To overcome the speed-fidelity tradeoff, we leverage complex media (e.g., diffusers or multimode fibers) as analogue random multiplexers for pattern compression to address the demand for high-dimensional spatiotemporal control. Sparsity-constrained wavefront optimization is designed to solve the problem by seeking a low-dimensional, robust representation of wavefronts with a carefully designed sparsity constraint. This optimization framework can achieve high-fidelity wavefront shaping through complex media using high-speed, yet relatively low-precision spatial light modulation devices (e.g., digital micromirror devices) without compromising the frame rate.

Methods The dataset contains an experimentally calibrated complex-field transmission matrix of a graded-index multimode fiber (GIF50C, Thorlabs) and 1000 preprocessed test images extracted from the Fashion-MNIST dataset. The script takes the preprocessed test images as the ground truth. It carries out the sparsity-constrained wavefront optimization to solve for the wavefront to generate those test images through the multimode fiber given its transmission matrix.
In brief, the transmission matrix was measured by raster scanning the proximal end of the multimode fiber using a DMD and recording the corresponding speckles at the distal end using off-axis holography. The test images were first downsampled and interpolated to match the coordinate of the distal end of the multimode fiber. Then, they were vectorized to a one-dimensional vector to comply with the format of the transmission matrix. The initial guesses were obtained by performing the Gerchberg-Saxton algorithm with 10 iterations. All the implementation details can be found in Section 4.3 of our paper https://arxiv.org/abs/2302.10254.
Data Mining for IVHM using Sparse Binary Ensembles, Phase I
data.nasa.gov
application/rdfxml +5
Updated Jun 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Data Mining for IVHM using Sparse Binary Ensembles, Phase I [Dataset]. https://data.nasa.gov/dataset/Data-Mining-for-IVHM-using-Sparse-Binary-Ensembles/qfus-evzq
Explore at:
xml, tsv, csv, application/rssxml, application/rdfxml, jsonAvailable download formats
Dataset updated
Jun 26, 2018
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
In response to NASA SBIR topic A1.05, "Data Mining for Integrated Vehicle Health Management", Michigan Aerospace Corporation (MAC) asserts that our unique SPADE (Sparse Processing Applied to Data Exploitation) technology meets a significant fraction of the stated criteria and has functionality that enables it to handle many applications within the aircraft lifecycle. SPADE distills input data into highly quantized features and uses MAC's novel techniques for constructing Ensembles of Decision Trees to develop extremely accurate diagnostic/prognostic models for classification, regression, clustering, anomaly detection and semi-supervised learning tasks. These techniques are currently being employed to do Threat Assessment for satellites in conjunction with researchers at the Air Force Research Lab. Significant advantages to this approach include: 1) completely data driven; 2) training and evaluation are faster than conventional methods; 3) operates effectively on huge datasets (> billion samples X > million features), 4) proven to be as accurate as state-of-the-art techniques in many significant real-world applications. The specific goals for Phase 1 will be to work with domain experts at NASA and with our partners Boeing, SpaceX and GMV Space Systems to delineate a subset of problems that are particularly well-suited to this approach and to determine requirements for deploying algorithms on platforms of opportunity.
f
The basic statistics of using a PFNF algorithm on Douban datasets.
plos.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Waleed Reafee; Naomie Salim; Atif Khan (2023). The basic statistics of using a PFNF algorithm on Douban datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0154848.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0154848.t005
Dataset updated
May 30, 2023
Dataset provided by
PLOS ONE
Authors
Waleed Reafee; Naomie Salim; Atif Khan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The basic statistics of using a PFNF algorithm on Douban datasets.
p
PubMed MultiLabel Text Classification Dataset MeSH - Dataset - CKAN
data.poltekkes-smg.ac.id
Updated Oct 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). PubMed MultiLabel Text Classification Dataset MeSH - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/pubmed-multilabel-text-classification-dataset-mesh
Explore at:
Dataset updated
Oct 9, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset consists of an approx 50k collection of research articles from PubMed repository. Originally these documents are manually annotated by Biomedical Experts with their MeSH labels and each article are described in terms of 10-15 MeSH labels. In this Dataset we have huge numbers of labels present as a MeSH major, raising the issue of extremely large output space and severe label sparsity issues. To solve this issue, the Dataset has been Processed and mapped to its root as described below.
m
Data from: Local kernel regression and neural network approaches to the...
archive.materialscloud.org
application/gzip, bin +1
Updated Aug 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raimon Fabregat; Alberto Fabrizio; Edgar Engel; Benjamin Meyer; Veronika Juraskova; Michele Ceriotti; Clemence Corminboeuf; Raimon Fabregat; Alberto Fabrizio; Edgar Engel; Benjamin Meyer; Veronika Juraskova; Michele Ceriotti; Clemence Corminboeuf (2021). Local kernel regression and neural network approaches to the conformational landscapes of oligopeptides [Dataset]. http://doi.org/10.24435/materialscloud:kp-82
Explore at:
text/markdown, bin, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.24435/materialscloud:kp-82
Dataset updated
Aug 10, 2021
Dataset provided by
Materials Cloud
Authors
Raimon Fabregat; Alberto Fabrizio; Edgar Engel; Benjamin Meyer; Veronika Juraskova; Michele Ceriotti; Clemence Corminboeuf; Raimon Fabregat; Alberto Fabrizio; Edgar Engel; Benjamin Meyer; Veronika Juraskova; Michele Ceriotti; Clemence Corminboeuf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The application of machine learning to theoretical chemistry has made it possible to combine the accuracy of quantum chemical energetics with the thorough sampling of finite-temperature fluctuations. To reach this goal, a diverse set of methods has been proposed, ranging from simple linear models to kernel regression and highly nonlinear neural networks. Here we apply two widely different approaches to the same, challenging problem - the sampling of the conformational landscape of polypeptides at finite temperature. We develop a Local Kernel Regression (LKR) coupled with a supervised sparsity method and compare it with a more established approach based on Behler-Parrinello type Neural Networks. In the context of the LKR, we discuss how the supervised selection of the reference pool of environments is crucial to achieve accurate potential energy surfaces at a competitive computational cost and leverage the locality of the model to infer which chemical environments are poorly described by the DFTB baseline. We then discuss the relative merits of the two frameworks and perform Hamiltonian-reservoir replica-exchange Monte Carlo sampling and metadynamics simulations, respectively, to demonstrate that both frameworks can achieve converged and transferable sampling of the conformational landscape of complex and flexible biomolecules with comparable accuracy and computational cost.
u
Data from: Video-rate raman-based metabolic imaging by airy light-sheet...
data.nkn.uidaho.edu
verso.uidaho.edu
+1more
Updated Feb 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreas E. Vasdekis (2023). Data from: Video-rate raman-based metabolic imaging by airy light-sheet illumination and photon-sparse detection [Dataset]. http://doi.org/10.11578/1908656
Explore at:
Unique identifier
https://doi.org/10.11578/1908656
Dataset updated
Feb 12, 2023
Dataset provided by
University of Idaho
Authors
Andreas E. Vasdekis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data supporting manuscript submitted to PNAS: Video-Rate Raman-based Metabolic Imaging by Airy Light-Sheet Illumination and Photon-Sparse Detection. The data set includes: [1] raw data and [2] related images used in the analyses described within the manuscript. Despite its massive potential, Raman imaging represents just a modest fraction of all research and clinical microscopy to date. This is due to the ultralow Raman scattering cross-sections of most biomolecules that impose low-light or photon-sparse conditions. Bioimaging under such conditions is suboptimal, as it either results in ultralow frame rates or requires increased levels of irradiance. Here, we overcome this tradeoff by introducing Raman imaging that operates at both video rates and 1,000-fold lower irradiance than state-of-the-art methods. To accomplish this, we deployed a judicially designed Airy light-sheet microscope to efficiently image large specimen regions. Further, we implemented subphoton per pixel image acquisition and reconstruction to confront issues arising from photon sparsity at just millisecond integrations. We demonstrate the versatility of our approach by imaging a variety of samples, including the three-dimensional (3D) metabolic activity of single microbial cells and the underlying cell-to-cell variability. To image such small-scale targets, we again harnessed photon sparsity to increase magnification without a field-of-view penalty, thus, overcoming another key limitation in modern light-sheet microscopy.
f
Comparison of the proposed model with baseline algorithms for RMSE on the...
plos.figshare.com
xls
Updated Jun 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan (2023). Comparison of the proposed model with baseline algorithms for RMSE on the basis of SR. [Dataset]. http://doi.org/10.1371/journal.pone.0273486.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0273486.t004
Dataset updated
Jun 10, 2023
Dataset provided by
PLOS ONE
Authors
Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of the proposed model with baseline algorithms for RMSE on the basis of SR.
d
Data from: A Davidson program for finding a few selected extreme eigenpairs...
elsevier.digitalcommonsdata.com
Updated Jan 1, 1994
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreas Stathopoulos (1994). A Davidson program for finding a few selected extreme eigenpairs of a large, sparse, real, symmetric matrix [Dataset]. http://doi.org/10.17632/tss3rwyynt.1
Explore at:
Unique identifier
https://doi.org/10.17632/tss3rwyynt.1
Dataset updated
Jan 1, 1994
Authors
Andreas Stathopoulos
License
https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/
Description
Abstract A program is presented for determining a few selected eigenvalues and their eigenvectors on either end of the spectrum of a large, real, symmetric matrix. Based on the Davidson method, which is extensively used in quantum chemistry/physics, the current implementation improves the power of the original algorithm by adopting several extensions. The matrix-vector multiplication routine that it requires is to be provided by the user. Different matrix formats and optimizations are thus feasible. E...

Title of program: DVDSON Catalogue Id: ACPZ_v1_0

Nature of problem Finding a few extreme eigenpairs of a real, symmetric matrix is of great importance in scientific computations. Examples abound in structural engineering, quantum chemistry and electronic structure physics [1,2]. The matrices involved are usually too large to be efficiently solved using standard methods. Moreover, their large size often prohibits full storage forcing various sparse representations. Even sparse representations cannot always be stored in main memory [3]. Thus, an iterative method ...

Versions of this program held in the CPC repository in Mendeley Data ACPZ_v1_0; DVDSON; 10.1016/0010-4655(94)90073-6

This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)
d
Data from: Pseudo-Label Generation for Multi-Label Text Classification
catalog.data.gov
datasets.ai
+1more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Pseudo-Label Generation for Multi-Label Text Classification [Dataset]. https://catalog.data.gov/dataset/pseudo-label-generation-for-multi-label-text-classification
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.

Facebook

Twitter

Click to copy link

Link copied

Cite

Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan (2023). Performance comparison for user cold-start problem on BE-dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0273486.t006

Performance comparison for user cold-start problem on BE-dataset.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0273486.t006

Dataset updated

Jun 6, 2023

Dataset provided by

PLOS ONE

Authors

Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Performance comparison for user cold-start problem on BE-dataset.

Clear search

Close search

Google apps

Main menu

Performance comparison for user cold-start problem on BE-dataset.

Statistics of sample data from the Last.fm and Douban datasets.

Data from: Sparse Inverse Gaussian Process Regression with Application to...

Data from: Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach

Data from: MetaFlux: Meta-learning global carbon fluxes from sparse...

The performance comparison based on Last.fm dataset.

Parameters used for the AD methods.

Replication Data for: Sparse multi-trait genomic prediction under incomplete...

Training and testing set for E-Commerce product images dataset.

Sparse Partial Least Squares in Time Series for Macroeconomic Forecasting...

Min-Cut/Max-Flow Problem Instances for Benchmarking

Sparsity-constrained wavefront optimization by leveraging complex media

Data Mining for IVHM using Sparse Binary Ensembles, Phase I

The basic statistics of using a PFNF algorithm on Douban datasets.

PubMed MultiLabel Text Classification Dataset MeSH - Dataset - CKAN

Data from: Local kernel regression and neural network approaches to the...

Data from: Video-rate raman-based metabolic imaging by airy light-sheet...

Comparison of the proposed model with baseline algorithms for RMSE on the...

Data from: A Davidson program for finding a few selected extreme eigenpairs...

Data from: Pseudo-Label Generation for Multi-Label Text Classification

Performance comparison for user cold-start problem on BE-dataset.See More Versions

Performance comparison for user cold-start problem on BE-dataset.