100+ datasets found

d
Data from: Highly Scalable Matching Pursuit Signal Decomposition Algorithm
catalog.data.gov
datasets.ai
+3more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Highly Scalable Matching Pursuit Signal Decomposition Algorithm [Dataset]. https://catalog.data.gov/dataset/highly-scalable-matching-pursuit-signal-decomposition-algorithm
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
In this research, we propose a variant of the classical Matching Pursuit Decomposition (MPD) algorithm with significantly improved scalability and computational performance. MPD is a powerful iterative algorithm that decomposes a signal into linear combinations of its dictionary elements or “atoms”. A best fit atom from an arbitrarily defined dictionary is determined through cross-correlation. The selected atom is subtracted from the signal and this procedure is repeated on the residual in the subsequent iterations until a stopping criteria is met. A sufficiently large dictionary is required for an accurate reconstruction; this in return increases the computational burden of the algorithm, thus limiting its applicability and level of adoption. Our main contribution lies in improving the computational efficiency of the algorithm to allow faster decomposition while maintaining a similar level of accuracy. The Correlation Thresholding and Multiple Atom Extractions techniques were proposed to decrease the computational burden of the algorithm. Correlation thresholds prune insignificant atoms from the dictionary. The ability to extract multiple atoms within a single iteration enhanced the effectiveness and efficiency of each iteration. The proposed algorithm, entitled MPD++, was demonstrated using real world data set.
Matching results between landmark in different sources and landmark in a...
zenodo.org
csv
Updated Oct 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marie-Dominique Van Damme; Ana-Maria Olteanu-Raimond; Marie-Dominique Van Damme; Ana-Maria Olteanu-Raimond (2022). Matching results between landmark in different sources and landmark in a referenced dataset (BDTOPO) [Dataset]. http://doi.org/10.5281/zenodo.6483785
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6483785
Dataset updated
Oct 1, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marie-Dominique Van Damme; Ana-Maria Olteanu-Raimond; Marie-Dominique Van Damme; Ana-Maria Olteanu-Raimond
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The four datasets represent the results of a two sequentials processus. The first processus consists on a automatic matching between landmark in different sources and landmark in a referenced dataset (french national topographic data: BDTOPO). Then the links 1:1 are manually validated by experts in the second processus.

The four different datasets and the BDTOPO dataset are archived here.

The data matching algorithm is described in this paper.

Each file represents the result matching for features belonging to a data source with:

- the name of file depends on the data source

- id_source corresponds to the identify of the landmark in data source

- types_of_matching_results describes the type of result matching :

« 1:0 » means that a landmark from a data source (e.g. Camptocamp) has no homologue landmark in BDTOPO

« 1:1 validated » means that a homologous feature exist in BDTOPO and the link was validated

« 1:1 non validated » means that the matching link was not validated

« without candidates » represents the non-matched landmarks because there are no candidates in BDTOPO or because the landmark in data source is far away from its homologous in BDTOPO

« uncertain » : uncertainty cases are complex cases where any decision is taken by the data matching algorithm

- id_bdtopo corresponds to the identify of the landmark in BDTOPO if and only if there is a validated matching link
Data from: Stochastic Matching DataSet
kaggle.com
Updated Jul 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
knightwayne (2023). Stochastic Matching DataSet [Dataset]. https://www.kaggle.com/datasets/knightwayne/stochastic-matching-dataset/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 2, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
knightwayne
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Dataset

This dataset was created by knightwayne

Released under CC BY-SA 3.0

Contents
o
Valentine Datasets
explore.openaire.eu
data.niaid.nih.gov
Updated Jul 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christos Koutras; Georgios Siachamis; Andra Ionescu; Kyriakos Psarakis; Jerry Brons; Marios Fragkoulis; Christoph Lofi; Angela Bonifati; Asterios Katsifodimos (2021). Valentine Datasets [Dataset]. http://doi.org/10.5281/zenodo.5084604
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5084604
Dataset updated
Jul 9, 2021
Authors
Christos Koutras; Georgios Siachamis; Andra Ionescu; Kyriakos Psarakis; Jerry Brons; Marios Fragkoulis; Christoph Lofi; Angela Bonifati; Asterios Katsifodimos
Description
Datasets used for evaluating state-of-the-art schema matching methods in the paper "Valentine: Evaluating Matching Techniques for Dataset Discovery" , which was accepted for presentation in IEEE ICDE 2021. They come in the form of fabricated pairs respecting a relatedness scenario as discussed in the paper.
ONC Patient Matching Algorithm Challenge Data
linkagelibrary.icpsr.umich.edu
Updated Sep 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of the National Coordinator for Health (2019). ONC Patient Matching Algorithm Challenge Data [Dataset]. http://doi.org/10.3886/E111962V1
Explore at:
Unique identifier
https://doi.org/10.3886/E111962V1
Dataset updated
Sep 20, 2019
Dataset provided by
Office of the National Coordinator for Health Information Technologyhttp://healthit.gov/
Authors
Office of the National Coordinator for Health
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The goal of the Patient Matching Algorithm Challenge is to bring about greater transparency and data on the performance of existing patient matching algorithms, spur the adoption of performance metrics for patient data matching algorithm vendors, and positively impact other aspects of patient matching such as deduplication and linking to clinical data. Participants will be provided a data set and will have their answers evaluated and scored against a master key. Up to 6 cash prizes will be awarded with a total purse of up to $75,000.00.https://www.patientmatchingchallenge.com/The test dataset used in the ONC Patient Matching Algorithm Challenge is available for download by students, researchers, or anyone else interested in additional analysis and patient matching algorithm development. More information about the Patient Matching Algorithm Challenge can be found: https://www.patientmatchingchallenge.com/.The dataset containing 1 million patients was split into eight files of alphabetical groupings by the the patient's last name, plus an additional file containing test patients with no last name recorded (Null). All files should be downloaded and merged for analysis.https://github.com/onc-healthit/patient-matching
i
Map Matching Dataset
ieee-dataport.org
Updated Oct 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Youliang chen (2023). Map Matching Dataset [Dataset]. https://ieee-dataport.org/documents/map-matching-dataset
Explore at:
Dataset updated
Oct 10, 2023
Authors
Youliang chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
China
H
Replication Data for: Why Propensity Scores Should Not Be Used for Matching
dataverse.harvard.edu
Updated Jan 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Nielsen; Gary King (2019). Replication Data for: Why Propensity Scores Should Not Be Used for Matching [Dataset]. http://doi.org/10.7910/DVN/A9LZNV
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/A9LZNV
Dataset updated
Jan 28, 2019
Dataset provided by
Harvard Dataverse
Authors
Richard Nielsen; Gary King
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Abstract: We show that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal — thus increasing imbalance, inefficiency, model dependence, and bias. The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods. Moreover, in data balanced enough to approximate complete randomization, either to begin with or after pruning some observations, PSM approximates random matching which, we show, increases imbalance even relative to the original data. Although these results suggest researchers replace PSM with one of the other available matching methods, propensity scores have other productive uses.
Data from: Automated Linking of Historical Data
linkagelibrary.icpsr.umich.edu
Updated Aug 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ran Abramitzky; Leah Boustan; Katherine Eriksson; James Feigenbaum; Santiago Perez (2020). Automated Linking of Historical Data [Dataset]. http://doi.org/10.3886/E120703V1
Explore at:
Unique identifier
https://doi.org/10.3886/E120703V1
Dataset updated
Aug 20, 2020
Authors
Ran Abramitzky; Leah Boustan; Katherine Eriksson; James Feigenbaum; Santiago Perez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1850 - 1940
Area covered
United States
Description
Currently, the repository provides codes for two such methods:The ABE fully automated approach: This approach is a fully automated method for linking historical datasets (e.g. complete-count Censuses) by first name, last name and age. The approach was first developed by Ferrie (1996) and adapted and scaled for the computer by Abramitzky, Boustan and Eriksson (2012, 2014, 2017). Because names are often misspelled or mistranscribed, our approach suggests testing robustness to alternative name matching (using raw names, NYSIIS standardization, and Jaro-Winkler distance). To reduce the chances of false positives, our approach suggests testing robustness by requiring names to be unique within a five year window and/or requiring the match on age to be exact.A fully automated probabilistic approach (EM): This approach (Abramitzky, Mill, and Perez 2019) suggests a fully automated probabilistic method for linking historical datasets. We combine distances in reported names and ages between each two potential records into a single score, roughly corresponding to the probability that both records belong to the same individual. We estimate these probabilities using the Expectation-Maximization (EM) algorithm, a standard technique in the statistical literature. We suggest a number of decision rules that use these estimated probabilities to determine which records to use in the analysis.
J
Random Recursive Partitioning: a matching method for the estimation of the...
journaldata.zbw.eu
.rda, csv, txt, zip
Updated Dec 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giuseppe Porro; Stefano Maria Iacus; Giuseppe Porro; Stefano Maria Iacus (2022). Random Recursive Partitioning: a matching method for the estimation of the average treatment effect (replication data) [Dataset]. http://doi.org/10.15456/jae.2022319.1304251755
Explore at:
csv(13692), .rda(118659), zip(18375), txt(3478), csv(40569), csv(166644), csv(169710), csv(21498), csv(177445)Available download formats
Unique identifier
https://doi.org/10.15456/jae.2022319.1304251755
Dataset updated
Dec 8, 2022
Dataset provided by
ZBW - Leibniz Informationszentrum Wirtschaft
Authors
Giuseppe Porro; Stefano Maria Iacus; Giuseppe Porro; Stefano Maria Iacus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this paper we introduce the Random Recursive Partitioning (RRP) matching method. RRP generates a proximity matrix which might be useful in econometric applications like average treatment effect estimation. RRP is a Monte Carlo method that randomly generates non-empty recursive partitions of the data and evaluates the proximity between two observations as the empirical frequency they fall in a same cell of these random partitions over all Monte Carlo replications. From the proximity matrix it is possible to derive both graphical and analytical tools to evaluate the extent of the common support between data sets. The RRP method is honest in that it does not match observations at any cost: if data sets are separated, the method clearly states it. The match obtained with RRP is invariant under monotonic transformation of the data. Average treatment effect estimators derived from the proximity matrix seem to be competitive compared to more commonly used estimators. RRP method does not require a particular structure of the data and for this reason it can be applied when distances like Mahalanobis or Euclidean are not suitable, in the presence of missing data or when the estimated propensity score is too sensitive to model specifications.
Web Data Commons Phones Dataset, Augmented Version, Fixed Splits
linkagelibrary.icpsr.umich.edu
delimited
Updated Nov 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Primpeli; Christian Bizer (2020). Web Data Commons Phones Dataset, Augmented Version, Fixed Splits [Dataset]. http://doi.org/10.3886/E127243V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E127243V1
Dataset updated
Nov 23, 2020
Dataset provided by
University of Mannheim (Germany)
Authors
Anna Primpeli; Christian Bizer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Motivation: Entity Matching is the task of determining which records from different data sources describe the same real-world entity. It is an important task for data integration and has been the focus of many research works. A large number of entity matching/record linkage tasks has been made available for evaluating entity matching methods. However, the lack of fixed development and test splits as well as correspondence sets including both matching and non-matching record pairs hinders the reproducibility and comparability of benchmark experiments. In an effort to enhance the reproducibility and comparability of the experiments, we complement existing entity matching benchmark tasks with fixed sets of non-matching pairs as well as fixed development and test splits. Dataset Description: An augmented version of the wdc phones dataset for benchmarking entity matching/record linkage methods found at:http://webdatacommons.org/productcorpus/index.html#toc4 The augmented version adds fixed splits for training, validation and testing as well as their corresponding feature vectors. The feature vectors are built using data type specific similarity metrics.The dataset contains 447 records describing products deriving from 17 e-shops which are matched against a product catalog of 50 products. The gold standards have manual annotations for 258 matching and 22,092 non-matching pairs. The total number of attributes used to decribe the product records are 26 while the attribute density is 0.25. The augmented dataset enhances the reproducibility of matching methods and the comparability of matching results. The dataset is part of the CompERBench repository which provides 21 complete benchmark tasks for entity matching for public download: http://data.dws.informatik.uni-mannheim.de/benchmarkmatchingtasks/index.html
m
Ground-roll separation using intelligence based-matching method
data.mendeley.com
narcis.nl
Updated Feb 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jinghe Li (2020). Ground-roll separation using intelligence based-matching method [Dataset]. http://doi.org/10.17632/xg237bzyxb.1
Explore at:
Unique identifier
https://doi.org/10.17632/xg237bzyxb.1
Dataset updated
Feb 27, 2020
Authors
Jinghe Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Separation is achieved by intelligence based-matching of the curvelet coefficients.
P
WDC LSPM Dataset
paperswithcode.com
library.toponeai.link
Updated May 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). WDC LSPM Dataset [Dataset]. https://paperswithcode.com/dataset/wdc-products
Explore at:
Dataset updated
May 31, 2022
Description
Many e-shops have started to mark-up product data within their HTML pages using the schema.org vocabulary. The Web Data Commons project regularly extracts such data from the Common Crawl, a large public web crawl. The Web Data Commons Training and Test Sets for Large-Scale Product Matching contain product offers from different e-shops in the form of binary product pairs (with corresponding label "match" or "no match") for four product categories, computers, cameras, watches and shoes.

In order to support the evaluation of machine learning-based matching methods, the data is split into training, validation and test sets. For each product category, we provide training sets in four different sizes (2.000-70.000 pairs). Furthermore there are sets of ids for each training set for a possible validation split (stratified random draw) available. The test set for each product category consists of 1.100 product pairs. The labels of the test sets were manually checked while those of the training sets were derived using shared product identifiers from the Web via weak supervision.

The data stems from the WDC Product Data Corpus for Large-Scale Product Matching - Version 2.0 which consists of 26 million product offers originating from 79 thousand websites.
Serie A Matches Dataset (2020-2025)
kaggle.com
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcel Biezunski (2025). Serie A Matches Dataset (2020-2025) [Dataset]. https://www.kaggle.com/datasets/marcelbiezunski/serie-a-matches-dataset-2020-2025
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Marcel Biezunski
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Don't forget to upvote if you enjoy my work :)

Serie A Match Results Dataset (2020–2025) was created in response to community requests following the release of my LaLiga Match Results Dataset.

This dataset contains match-level results and performance stats from the Italian Serie A football league, covering seasons 2020 to 2025.

Source: Data was collected using a custom Python web scraper from FBref.com (https://fbref.com/en/comps/11/Serie-A-Stats).

Uses: - Match prediction models - Sports analytics - Feature engineering experiments - Educational ML datasets

Licensing Intended for educational and research use only. All rights remain with original data providers.
Data from: Comparative study on matching methods for the distinction of...
figshare.com
zip
Updated Jan 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Schorcht; Robert Hecht; Gotthard Meinel (2022). Comparative study on matching methods for the distinction of building modifications and replacements based on mul-ti-temporal building footprint data [Dataset]. http://doi.org/10.6084/m9.figshare.18027683.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.18027683.v1
Dataset updated
Jan 26, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Martin Schorcht; Robert Hecht; Gotthard Meinel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the input data and results used in the paper"Comparative study on matching methods for the distinction of building modifications and replacements based on mul-ti-temporal building footprint data".License information:The LoD1 data used as input in this study are openly available at Transparenzportal Hamburg (https://transparenz.hamburg.de/),from Freie und Hansestadt Hamburg, Landesbetrieb Geoinformation und Vermessung (LGV), in compliance with the licence dl-de/by-2-0 (https://www.govdata.de/dl-de/by-2-0):)Content:1. Input Footprints of non-identical pairs:input_reference_objects.zip2. Results without additional position deviation:results_without_deviation.zip3. Results with generated position deviation including geometries:results_with_deviation.zip
H
Replication Data for: The Balance-Sample Size Frontier in Matching Methods...
dataverse.harvard.edu
pdf, tsv, txt +1
Updated Jul 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2017). Replication Data for: The Balance-Sample Size Frontier in Matching Methods for Causal Inference [Dataset]. http://doi.org/10.7910/DVN/SURSEO
Explore at:
tsv(184878), tsv(925446), type/x-r-syntax(42824), pdf(66052), txt(1742)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/SURSEO
Dataset updated
Jul 1, 2017
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
We propose a simplified approach to matching for causal inference that simultaneously optimizes both balance (similarity between the treated and control groups) and matched sample size. Existing approaches either fix the matched sample size and maximize balance or fix balance and maximize sample size, leaving analysts to settle for suboptimal solutions or attempt manual optimization by iteratively tweaking their matching method and rechecking balance. To jointly maximize balance and sample size, we introduce the matching frontier, the set of matching solutions with maximum balance for each possible sample size. Rather than iterating, researchers can choose matching solutions from the frontier for analysis in one step. We derive fast algorithms that calculate the matching frontier for several commonly used balance metrics. We demonstrate with analyses of the effect of sex on judging and job training programs that show how the methods we introduce can extract new knowledge from existing data sets.
n
Data from: From classification to matching: A CNN-based approach for...
narcis.nl
data.mendeley.com
Updated Jan 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhao, X (via Mendeley Data) (2023). From classification to matching: A CNN-based approach for retrieving painted pottery images [Dataset]. http://doi.org/10.17632/xnk7s6xgxz.1
Explore at:
Unique identifier
https://doi.org/10.17632/xnk7s6xgxz.1
Dataset updated
Jan 9, 2023
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Zhao, X (via Mendeley Data)
Description
Recently artificial intelligence has begun to assist archaeologists in processing images of archaeological artifacts. We report a convolutional neural network approach to obtain feature vectors of painted pottery images by a preliminary classification machine learning of the cultural types. The model, trained on a photographic image dataset of Chinese Neolithic color-painted pottery, achieved 92.58% precision in assigning vessel images to corresponding archaeological types. The feature vectors contain information of vessel shape, color, and ornamentation design, based on which similarity coefficients for the images in the dataset were calculated. The quantitative measurement of similarity allows searching for the closest match to artefacts in the dataset, as well as a network of vessels in terms of similarity. This work highlights the potential of CNN approaches in curating of archaeological artifacts, providing a new tool assisting to study chronology, typology, decoration design, etc.
S
The dataset after matching segmentation of Sohu events
scidb.cn
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
zhang xiu (2025). The dataset after matching segmentation of Sohu events [Dataset]. http://doi.org/10.57760/sciencedb.j00133.00302
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.j00133.00302
Dataset updated
Feb 5, 2025
Dataset provided by
Science Data Bank
Authors
zhang xiu
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The dataset is sourced from the event matching part of the 2021 Sohu Campus Text Matching Algorithm Competition（https://www.biendata.xyz/competition/sohu_2021/). The event matching datasets released in the preliminary and final rounds were merged, and the event matching parts of short text and short text, short text and long text, and long text and long text were selected. 20% were used as the test set, the remaining 20% were used as the validation set, and the rest were used as the training set.
i
Data of Variable-Gain Servo Matching Lyu
ieee-dataport.org
Updated Jul 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dun Lyu (2021). Data of Variable-Gain Servo Matching Lyu [Dataset]. https://ieee-dataport.org/documents/data-variable-gain-servo-matching-lyu
Explore at:
Dataset updated
Jul 19, 2021
Authors
Dun Lyu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
5
C
AgLiMatch dataset
dataverse.csuc.cat
portalrecerca.udl.cat
+1more
pcap, png, tsv, txt
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Javier Guevara; Javier Guevara; Jordi Gené Mola; Jordi Gené Mola; Eduard Gregorio López; Eduard Gregorio López; Miguel Torres-Torriti; Miguel Torres-Torriti; Giulio Reina; Giulio Reina; Fernando Auat Cheein; Fernando Auat Cheein (2025). AgLiMatch dataset [Dataset]. http://doi.org/10.34810/data2320
Explore at:
pcap(588461494), txt(89361), pcap(355137568), pcap(731088388), pcap(699472654), tsv(1135), txt(258148), txt(2774), pcap(552702982), txt(164443), txt(104279), txt(69031), pcap(514651510), txt(175790), png(1269595), pcap(368572624), pcap(64437102), txt(98805), txt(69652), pcap(265225278), txt(265982), txt(182), pcap(347767372), txt(211224), pcap(161080724), txt(205850)Available download formats
Unique identifier
https://doi.org/10.34810/data2320
Dataset updated
Jun 6, 2025
Dataset provided by
CORA.Repositori de Dades de Recerca
Authors
Javier Guevara; Javier Guevara; Jordi Gené Mola; Jordi Gené Mola; Eduard Gregorio López; Eduard Gregorio López; Miguel Torres-Torriti; Miguel Torres-Torriti; Giulio Reina; Giulio Reina; Fernando Auat Cheein; Fernando Auat Cheein
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset funded by
Agencia Estatal de Investigación
Generalitat de Catalunya
Description
The agricultural LiDAR data to evaluate scan matching techniques (AgLiMatch dataset) is comprised of a set of Velodyne VLP-16 LiDAR captures and the corresponding GNSS-RTK tracks acquired in a Fuji apple orchard using an autonomous platform. This dataset was used in [1] to evaluate scan matching techniques by comparing the platform path calculated using LiDAR scan matching techniques and the actual platform path ground truth measured with a GNSS-RTK system. The correspondence between each LiDAR file (inside /velodyne_data folder) and GNSS track file (inside /GNSS_data folder) is detailed in “Velodyne-GNSS_correspondence-data.xlsx” file. The relative position between the LiDAR sensor and the GNSS rover is shown in “experimental_setup.png”. Distance units are in mm.
Pedestrian network attributes-- Datasets
figshare.com
bin
Updated Mar 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xue Yang; Kathleen Stewart; Mengyuan Fang; Luliang Tang (2021). Pedestrian network attributes-- Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.12660467.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12660467.v2
Dataset updated
Mar 5, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Xue Yang; Kathleen Stewart; Mengyuan Fang; Luliang Tang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Title: Attributing pedestrian networks with semantic information based on multi-source spatial dataAbstract: The lack of associating pedestrian networks, i.e., the paths and roads used for non-vehicular travel, with information about semantic attribution is a major weakness for many applications, especially those supporting accurate pedestrian routing. Researchers have developed various algorithms to generate pedestrian walkways based on datasets, including high-resolution images, existing map databases, and GPS data; however, the semantic attribution of pedestrian walkways is often ignored. The objective of our study is to automatically extract semantic information including incline values and the different categories of pedestrian paths from multi-source spatial data, such as crowdsourced GPS tracking data, land use data, and motor vehicle road (MVR) networks. Incline values for each pedestrian path were derived from tracking data through elevation filtering using wavelet theory and a similarity-based map-matching method. To automatically categorize pedestrian paths into five classes including sidewalk, crosswalk, entrance walkway, indoor path, and greenway, we developed a hierarchical strategy of spatial analysis using land use data and MVR networks. The effectiveness of our proposed method is demonstrated using real datasets including GPS tracking data collected by volunteers, land use data acquired from OpenStreetMap, and MVR network data downloaded from Gaode Map.

Facebook

Twitter

Click to copy link

Link copied

Cite

Dashlink (2025). Highly Scalable Matching Pursuit Signal Decomposition Algorithm [Dataset]. https://catalog.data.gov/dataset/highly-scalable-matching-pursuit-signal-decomposition-algorithm

Data from: Highly Scalable Matching Pursuit Signal Decomposition Algorithm

Explore at:

Dataset updated

Apr 10, 2025

Dataset provided by

Dashlink

Description

In this research, we propose a variant of the classical Matching Pursuit Decomposition (MPD) algorithm with significantly improved scalability and computational performance. MPD is a powerful iterative algorithm that decomposes a signal into linear combinations of its dictionary elements or “atoms”. A best fit atom from an arbitrarily defined dictionary is determined through cross-correlation. The selected atom is subtracted from the signal and this procedure is repeated on the residual in the subsequent iterations until a stopping criteria is met. A sufficiently large dictionary is required for an accurate reconstruction; this in return increases the computational burden of the algorithm, thus limiting its applicability and level of adoption. Our main contribution lies in improving the computational efficiency of the algorithm to allow faster decomposition while maintaining a similar level of accuracy. The Correlation Thresholding and Multiple Atom Extractions techniques were proposed to decrease the computational burden of the algorithm. Correlation thresholds prune insignificant atoms from the dictionary. The ability to extract multiple atoms within a single iteration enhanced the effectiveness and efficiency of each iteration. The proposed algorithm, entitled MPD++, was demonstrated using real world data set.

Clear search

Close search

Google apps

Main menu

Data from: Highly Scalable Matching Pursuit Signal Decomposition Algorithm

Matching results between landmark in different sources and landmark in a...

Data from: Stochastic Matching DataSet

Dataset

Contents

Valentine Datasets

ONC Patient Matching Algorithm Challenge Data

Map Matching Dataset

Replication Data for: Why Propensity Scores Should Not Be Used for Matching

Data from: Automated Linking of Historical Data

Random Recursive Partitioning: a matching method for the estimation of the...

Web Data Commons Phones Dataset, Augmented Version, Fixed Splits

Ground-roll separation using intelligence based-matching method

WDC LSPM Dataset

Serie A Matches Dataset (2020-2025)

Data from: Comparative study on matching methods for the distinction of...

Replication Data for: The Balance-Sample Size Frontier in Matching Methods...

Data from: From classification to matching: A CNN-based approach for...

The dataset after matching segmentation of Sohu events

Data of Variable-Gain Servo Matching Lyu

AgLiMatch dataset

Pedestrian network attributes-- Datasets

Data from: Highly Scalable Matching Pursuit Signal Decomposition AlgorithmSee More Versions

Data from: Highly Scalable Matching Pursuit Signal Decomposition Algorithm