100+ datasets found

Data from: Fast Bayesian Record Linkage With Record-Specific Disagreement...
tandf.figshare.com
txt
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Stringham (2023). Fast Bayesian Record Linkage With Record-Specific Disagreement Parameters [Dataset]. http://doi.org/10.6084/m9.figshare.14687696.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14687696.v1
Dataset updated
Jun 2, 2023
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Thomas Stringham
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Researchers are often interested in linking individuals between two datasets that lack a common unique identifier. Matching procedures often struggle to match records with common names, birthplaces, or other field values. Computational feasibility is also a challenge, particularly when linking large datasets. We develop a Bayesian method for automated probabilistic record linkage and show it recovers more than 50% more true matches, holding accuracy constant, than comparable methods in a matching of military recruitment data to the 1900 U.S. Census for which expert-labeled matches are available. Our approach, which builds on a recent state-of-the-art Bayesian method, refines the modeling of comparison data, allowing disagreement probability parameters conditional on nonmatch status to be record-specific in the smaller of the two datasets. This flexibility significantly improves matching when many records share common field values. We show that our method is computationally feasible in practice, despite the added complexity, with an R/C++ implementation that achieves a significant improvement in speed over comparable recent methods. We also suggest a lightweight method for treatment of very common names and show how to estimate true positive rate and positive predictive value when true match status is unavailable.
Data Matching Imputation System
fisheries.noaa.gov
catalog.data.gov
+1more
Updated Jan 1, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Greater Atlantic Regional Fisheries Office (2012). Data Matching Imputation System [Dataset]. https://www.fisheries.noaa.gov/inport/item/17328
Explore at:
Dataset updated
Jan 1, 2012
Dataset provided by
Greater Atlantic Regional Fisheries Office
Time period covered
2000 - Dec 3, 2125
Area covered
northeast, Northeast fishery management area; Maine coast southward to North Carolina.
Description
The DMIS dataset is a flat file record of the matching of several data set collections. Primarily it consists of VTRs, dealer records, Observer data in conjunction with vessel permit information for the purpose of supporting North East Regional quota monitoring projects.
r
Assessing the performance of matching algorithms when selection into...
resodate.org
Updated Oct 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boris Augurzky (2025). Assessing the performance of matching algorithms when selection into treatment is strong (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9hc3Nlc3NpbmctdGhlLXBlcmZvcm1hbmNlLW9mLW1hdGNoaW5nLWFsZ29yaXRobXMtd2hlbi1zZWxlY3Rpb24taW50by10cmVhdG1lbnQtaXMtc3Ryb25n
Explore at:
Dataset updated
Oct 2, 2025
Dataset provided by
Journal of Applied Econometrics
ZBW Journal Data Archive
ZBW
Authors
Boris Augurzky
Description
This paper investigates the method of matching regarding two crucial implementation choices: the distance measure and the type of algorithm. We implement optimal full matching a fully efficient algorithm, and present a framework for statistical inference. The implementation uses data from the NLSY79 to study the effect of college education on earnings. We find that decisions regarding the matching algorithm depend on the structure of the data: In the case of strong selection into treatment and treatment effect heterogeneity a full matching seems preferable. If heterogeneity is weak, pair matching suffices.
z
Data from: TACO: a benchmark for connectivity-invariance in shape...
zenodo.org
zip
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simone Pedico; Simone Pedico; Simone Melzi; Simone Melzi; Filippo Maggioli; Filippo Maggioli (2024). TACO: a benchmark for connectivity-invariance in shape correspondence [Dataset]. http://doi.org/10.5281/zenodo.14066437
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14066437
Dataset updated
Nov 11, 2024
Dataset provided by
Smart Tools and Applications in Graphics 2024
Authors
Simone Pedico; Simone Pedico; Simone Melzi; Simone Melzi; Filippo Maggioli; Filippo Maggioli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TACO: a benchmark for connectivity-invariance in shape correspondence

In real-world scenarios, a major limitation for shape-matching datasets is represented by having all the meshes of the same subject share their connectivity across different poses. Specifically, similar connectivities could provide a significant bias for shape-matching algorithms, simplifying the matching process and potentially leading to correspondences based on recurring triangle patterns rather than geometric correspondences between mesh parts. As a consequence, the resulting correspondence may be meaningless, and the evaluation of the algorithm may be misled.
To overcome this limitation, we introduce TACO, a new dataset where meshes representing the same subject in different poses do not share the same connectivity, and we compute new ground truth correspondences between shapes. We extensively evaluate our dataset to ensure that ground truth isometries are properly preserved. We also use our dataset to validate state-of-the-art shape-matching algorithms, verifying a degradation in performance when the connectivity gets altered.

Dataset structure

offs: a directory containing all the triangular meshes in the dataset in OFF file format

pairs.txt: a list of all the 420 possible pairs of shapes in the dataset

gt_matches: a directory containing all the ground truth correspondences listed in `pairs.txt` and stored in MAT file format
H
Replication Data for: The Balance-Sample Size Frontier in Matching Methods...
dataverse.harvard.edu
Updated Jul 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gary King; Christopher Lucas; Richard Nielsen (2017). Replication Data for: The Balance-Sample Size Frontier in Matching Methods for Causal Inference [Dataset]. http://doi.org/10.7910/DVN/SURSEO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/SURSEO
Dataset updated
Jul 1, 2017
Dataset provided by
Harvard Dataverse
Authors
Gary King; Christopher Lucas; Richard Nielsen
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/SURSEOhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/SURSEO
Description
We propose a simplified approach to matching for causal inference that simultaneously optimizes both balance (similarity between the treated and control groups) and matched sample size. Existing approaches either fix the matched sample size and maximize balance or fix balance and maximize sample size, leaving analysts to settle for suboptimal solutions or attempt manual optimization by iteratively tweaking their matching method and rechecking balance. To jointly maximize balance and sample size, we introduce the matching frontier, the set of matching solutions with maximum balance for each possible sample size. Rather than iterating, researchers can choose matching solutions from the frontier for analysis in one step. We derive fast algorithms that calculate the matching frontier for several commonly used balance metrics. We demonstrate with analyses of the effect of sex on judging and job training programs that show how the methods we introduce can extract new knowledge from existing data sets.
f
Data from: Graded Matching for Large Observational Studies
datasetcatalog.nlm.nih.gov
tandf.figshare.com
Updated Mar 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yu, Ruoqi; Rosenbaum, Paul R. (2022). Graded Matching for Large Observational Studies [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000420071
Explore at:
Dataset updated
Mar 28, 2022
Authors
Yu, Ruoqi; Rosenbaum, Paul R.
Description
Observational studies of causal effects often use multivariate matching to control imbalances in measured covariates. For instance, using network optimization, one may seek the closest possible pairing for key covariates among all matches that balance a propensity score and finely balance a nominal covariate, perhaps one with many categories. This is all straightforward when matching thousands of individuals, but requires some adjustments when matching tens or hundreds of thousands of individuals. In various senses, a sparser network—one with fewer edges—permits optimization in larger samples. The question is: What is the best way to make the network sparse for matching? A network that is too sparse will eliminate from consideration possible pairings that it should consider. A network that is not sparse enough will waste computation considering pairings that do not deserve serious consideration. We propose a new graded strategy in which potential pairings are graded, with a preference for higher grade pairings. We try to match with pairs of the best grade, incorporating progressively lower grade pairs only to the degree they are needed. In effect, only sparse networks are built, stored and optimized. Two examples are discussed, a small example with 1567 matched pairs from clinical medicine, and a slightly larger example with 22,111 matched pairs from economics. The method is implemented in an R package RBestMatch available at https://github.com/ruoqiyu/RBestMatch. Supplementary materials for this article are available online.
H
Replication Data for: Adjusting for Confounding with Text Matching
dataverse.harvard.edu
datasetcatalog.nlm.nih.gov
Updated Dec 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Margaret E. Roberts; Brandon M. Stewart; Richard A. Nielsen (2021). Replication Data for: Adjusting for Confounding with Text Matching [Dataset]. http://doi.org/10.7910/DVN/HTMX3K
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/HTMX3K
Dataset updated
Dec 6, 2021
Dataset provided by
Harvard Dataverse
Authors
Margaret E. Roberts; Brandon M. Stewart; Richard A. Nielsen
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/HTMX3Khttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/HTMX3K
Area covered
China
Description
We identify situations in which conditioning on text can address confounding in observational studies. We argue that a matching approach is particularly well-suited to this task, but existing matching methods are ill-equipped to handle high-dimensional text data. Our proposed solution is to estimate a low-dimensional summary of the text and condition on this summary via matching. We propose a method of text matching, topical inverse regression matching, that allows the analyst to match both on the topical content of confounding documents and the probability that each of these documents is treated. We validate our approach and illustrate the importance of conditioning on text to address confounding with two applications: the effect of perceptions of author gender on citation counts in the international relations literature and the effects of censorship on Chinese social media users.
d
Leadbook B2B Contact Custom Datasets - Global Coverage, 200 Million Business...
datarade.ai
.csv
Updated Aug 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leadbook (2020). Leadbook B2B Contact Custom Datasets - Global Coverage, 200 Million Business Contacts with Advanced Targeting and Quarterly Refresh [Dataset]. https://datarade.ai/data-products/custom-datasets-with-advanced-targeting-and-quarterly-refresh
Explore at:
.csvAvailable download formats
Dataset updated
Aug 5, 2020
Dataset authored and provided by
Leadbook
Area covered
Réunion, Mexico, Martinique, Qatar, Malta, Saint Kitts and Nevis, Guyana, Poland, Togo, Vietnam
Description
Build highly targeted, custom datasets from a database of 200 million global contacts to match your target audience profile, and receive quarterly refreshes that are powered by Leadbook's proprietary A.I. powered data technology.

Build your dataset with custom attributes and conditions like: - Usage of a specific technology - Minimum number of records per organisation - Data matching against a list of area codes - Data matching against a list of business registration numbers - Specific headquarter and branch location combinations

Complimentary de-duplication is provided to ensure that you only pay for contacts that you don't already own.

All records include: - Contact name - Job title - Contact email address - Contact phone number - Contact location - Organisation name - Organisation type - Organisation headcount - Primary industry

Additional information like social media handles, secondary industries, and organisation websites may be provided where available.

Pricing includes a one-time data processing fee and additional fees per data refresh.
💌 Predict Online Dating Matches Dataset
kaggle.com
zip
Updated Jun 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rabie El Kharoua (2024). 💌 Predict Online Dating Matches Dataset [Dataset]. https://www.kaggle.com/datasets/rabieelkharoua/predict-online-dating-matches-dataset/code
Explore at:
zip(7223 bytes)Available download formats
Dataset updated
Jun 21, 2024
Authors
Rabie El Kharoua
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data:

The Dataset provides a comprehensive view into the dynamics of online matchmaking interactions. It captures essential variables that influence the likelihood of successful matches across different genders. This dataset allows researchers and analysts to explore how factors such as VIP subscription status, income levels, parental status, age, and self-perceived attractiveness contribute to the outcomes of online dating endeavors.

Variables:

Gender: 0 (Male), 1 (Female)

PurchasedVIP: 0 (No), 1 (Yes)

Income: Annual income in USD

Children: Number of children

Age: Age of the user

Attractiveness: Subjective rating of attractiveness (1-10)

Matches: Number of matches obtained based on criteria

Target Variable:

Matches: Number of matches received, indicative of success rate in online dating

Usage:

Analyze gender-specific dating preferences and behaviors.

Predict match success.

Explanation of Zero Matches for Some Users:

The occurrence of zero matches for certain users within the dataset can be attributed to the presence of "ghost users." These are users who create an account but subsequently abandon the app without engaging further. Consequently, their profiles do not participate in any matching activities, leading to a recorded match count of zero. This phenomenon should be taken into account when analyzing user activity and match data, as it impacts the overall interpretation of user engagement and match success rates.

Disclaimer:

This dataset contains 1000 records, which is considered relatively low within this category of datasets. Additionally, the dataset may not accurately reflect reality as it was captured intermittently over different periods of time.

Furthermore, certain match categories are missing due to confidentiality constraints, and several other crucial variables are also absent for the same reason. Consequently, the machine learning models employed may not achieve high accuracy in predicting the number of matches.

It is important to acknowledge these limitations when interpreting the results derived from this dataset. Careful consideration of these factors is advised when drawing conclusions or making decisions based on the findings of any analyses conducted using this data.

Warning:

Due to confidentiality constraints, only a small amount of data was collected. Additionally, only users with variables showing high correlation with the matching variable were included in the dataset.

As a result, the high performance of machine learning models on this dataset is primarily due to the data collection method (i.e., only high-correlation data was included).

Therefore, the findings you may derive from manipulating this dataset are not representative of the real dating world.

Data Source:

The source of this dataset is confidential, and it may be released in the future. For the present, this dataset can be utilized under the terms of the license visible on the dataset's card.

Users are advised to review and adhere to the terms specified in the dataset's license when using the data for any purpose.

Conclusion:

This dataset provides insights into the dynamics of online dating interactions, allowing for predictive modeling and analysis of factors influencing matchmaking success.

Dataset Usage and Attribution Notice

This dataset, shared by Rabie El Kharoua, is original and has never been shared before. It is made available under the CC BY 4.0 license, allowing anyone to use the dataset in any form as long as proper citation is given to the author. A DOI is provided for proper referencing. Please note that duplication of this work within Kaggle is not permitted.

Exclusive Synthetic Dataset

This dataset is synthetic and was generated for educational purposes, making it ideal for data science and machine learning projects. It is an original dataset, owned by Mr. Rabie El Kharoua, and has not been previously shared. You are free to use it under the license outlined on the data card. The dataset is offered without any guarantees. Details about the data provider will be shared soon.
d
Data from: Highly Scalable Matching Pursuit Signal Decomposition Algorithm
catalog.data.gov
datasets.ai
+2more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Highly Scalable Matching Pursuit Signal Decomposition Algorithm [Dataset]. https://catalog.data.gov/dataset/highly-scalable-matching-pursuit-signal-decomposition-algorithm
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
In this research, we propose a variant of the classical Matching Pursuit Decomposition (MPD) algorithm with significantly improved scalability and computational performance. MPD is a powerful iterative algorithm that decomposes a signal into linear combinations of its dictionary elements or “atoms”. A best fit atom from an arbitrarily defined dictionary is determined through cross-correlation. The selected atom is subtracted from the signal and this procedure is repeated on the residual in the subsequent iterations until a stopping criteria is met. A sufficiently large dictionary is required for an accurate reconstruction; this in return increases the computational burden of the algorithm, thus limiting its applicability and level of adoption. Our main contribution lies in improving the computational efficiency of the algorithm to allow faster decomposition while maintaining a similar level of accuracy. The Correlation Thresholding and Multiple Atom Extractions techniques were proposed to decrease the computational burden of the algorithm. Correlation thresholds prune insignificant atoms from the dictionary. The ability to extract multiple atoms within a single iteration enhanced the effectiveness and efficiency of each iteration. The proposed algorithm, entitled MPD++, was demonstrated using real world data set.
Figure 8
figshare.com
tiff
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous Readmore (2025). Figure 8 [Dataset]. http://doi.org/10.6084/m9.figshare.28129472.v1
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28129472.v1
Dataset updated
Jan 3, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Anonymous Readmore
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Figure8-The dynamic sorting and sample point exclusion process of the proposed method
Dataset: Problem-centred interviews results for Matching Data Life Cycle and...
meta4ds.fokus.fraunhofer.de
unknown
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo, Dataset: Problem-centred interviews results for Matching Data Life Cycle and Research Processes in Engineering Sciences [Dataset]. https://meta4ds.fokus.fraunhofer.de/datasets/oai-zenodo-org-11198842?locale=en
Explore at:
unknown(11449)Available download formats
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The authors would like to thank the Federal Government and the Heads of Government of the Länder, as well as the Joint Science Conference (GWK), for their funding and support within the framework of the NFDI4Ing consortium. Funded by the German Research Foundation (DFG) - project number 442146713.
Z
Mix-and-Match Dataset
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Dec 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verstraaten, Merijn (2020). Mix-and-Match Dataset [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_4317448
Explore at:
Dataset updated
Dec 12, 2020
Dataset provided by
University of Amsterdam
Authors
Verstraaten, Merijn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Benchmark results for "Mix-and-Match: A Model-driven Runtime Optimisation Strategy for BFS on GPUs" paper.

Performance data for Breadth-First Search on NVidia TitanX. Including trained Binary Decision Tree model for predicting the best implementation on an input graph.
o
Data and Code for: Matching and Network Effects in Ride-Hailing
openicpsr.org
Updated Mar 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan Camilo Castillo; Shreya Mathur (2023). Data and Code for: Matching and Network Effects in Ride-Hailing [Dataset]. http://doi.org/10.3886/E186903V1
Explore at:
Unique identifier
https://doi.org/10.3886/E186903V1
Dataset updated
Mar 20, 2023
Dataset provided by
American Economic Association
Authors
Juan Camilo Castillo; Shreya Mathur
License
https://opensource.org/licenses/GPL-3.0https://opensource.org/licenses/GPL-3.0
Time period covered
Mar 16, 2017 - Apr 8, 2017
Area covered
Texas, USA, Houston
Description
A recent empirical literature models search and matching frictions by means of a reduced-form matching function. An alternative approach is to simulate the matching process directly. In this paper, we follow the latter approach to model matching in ride-hailing. We compute the matching function implied by the matching process. It exhibits increasing returns to scale, and it does not resemble the commonly used Cobb-Douglas functional form. We then use this matching function to quantify network externalities. A subsidy on the order of $2 per trip is needed to correct for these externalities and induce the market to operate efficiently. This repository contains the code and a subset of the data used for the paper.
Claim Detection and Matching for Indian Languages
zenodo.org
data.niaid.nih.gov
+1more
csv
Updated Jun 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ashkan Kazemi; Kiran Garimella; Devin Gaffney; Scott A. Hale; Ashkan Kazemi; Kiran Garimella; Devin Gaffney; Scott A. Hale (2021). Claim Detection and Matching for Indian Languages [Dataset]. http://doi.org/10.5281/zenodo.4890950
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4890950
Dataset updated
Jun 6, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ashkan Kazemi; Kiran Garimella; Devin Gaffney; Scott A. Hale; Ashkan Kazemi; Kiran Garimella; Devin Gaffney; Scott A. Hale
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
Two datasets are included in this repository: claim matching and claim detection datasets. The collections contain data in 5 languages: Bengali, English, Hindi, Malayalam and Tamil.

The "claim detection" dataset contains textual claims from social media and fact-checking websites annotated for the "fact-check worthiness" of the claims in each message. Data points have one of the three labels of "Yes" (text contains one or more check-worthy claims), "No" and "Probably".

The "claim matching" dataset is a curated collection of pairs of textual claims from social media and fact-checking websites for the purpose of automatic and multilingual claim matching. Pairs of data have one of the four labels of "Very Similar", "Somewhat Similar", "Somewhat Dissimilar" and "Very Dissimilar".

All personally identifiable information (PII) including phone numbers, email addresses, license plate numbers and addresses have been replaced with general tags (e.g.
, etc) to protect user anonymity. A detailed explanation on the curation and annotation process is provided in our ACL 2021 paper:
Kazemi, A.; Garimella, K.; Gaffney, D.; and Hale, S. A. 2021. Claim Matching Beyond English to Scale Global Fact-Checking. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, ACL 2021.
r
Matching as a stochastic process (replication data)
resodate.org
Updated Oct 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Friedel Bolle; Philipp E. Otto (2025). Matching as a stochastic process (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9tYXRjaGluZy1hcy1hLXN0b2NoYXN0aWMtcHJvY2VzLXJlcGxpY2F0aW9uLWRhdGFz
Explore at:
Dataset updated
Oct 2, 2025
Dataset provided by
Journal of Economics and Statistics
ZBW Journal Data Archive
ZBW
Authors
Friedel Bolle; Philipp E. Otto
Description
Results of multi-party bargaining are usually described by concepts from cooperative game theory, in particular by the core. In one-on-one matching,
core allocations are stable in the sense that no pair of unmatched or otherwise matched players can improve their incomes by forming a match. Because of incomplete information and bounded rationality, it is difficult to adopt a core allocation immediately. Theoretical investigations cope with the problem of whether core allocations can be adopted in a stochastic process with repeated re-matching. In this paper, we investigate sequences of matching with data from an experimental 2×2 labor market with wage negotiations. This market has seven possible matching structures (states) and is additionally characterized by the negotiated wages and profits. First, we describe the stochastic process of transitions from one state to another including the average transition times. Second, we identify different influences on the process parameters as, for example, the difference of incomes in a match. Third, allocations in the core should be completely durable or at least more durable than comparable out-of-core allocations, but they are not. Final bargaining results (induced by a time limit) appear as snapshots of a stochastic process without absorbing states and with only weak systematic influences.

Data and R code of the analysis are provided.
Z
Data from: Explaining human mobility predictions through a pattern matching...
data.niaid.nih.gov
zenodo.org
Updated Dec 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Smolak, Kamil; Rohm, Witold; Siła-Nowicka, Katarzyna (2021). Explaining human mobility predictions through a pattern matching algorithm [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5788700
Explore at:
Dataset updated
Dec 18, 2021
Dataset provided by
University of Auckland
Wrocław University of Environmental and Life Sciences
Authors
Smolak, Kamil; Rohm, Witold; Siła-Nowicka, Katarzyna
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The name of the file indicate information: {type of sequence}_{type of measure}_{sequence properites}_{additional information}.csv

{type of sequence} - 'synth' for synthetic or 'london' for real mobility data from London, UK. {type of measure} - 'r2' for R-squared measure or 'corr' for Spearman's correlation {sequence properties} - for synthetic data there are three types of sequences, described in the research article (random, markovian, nonstationary). For real mobility data this part includes information about data processing parameters: (...)_london_{type of mobility sequence}_{DBSCAN epsilon value}_{DBSCAN min_pts value}. {type of mobility sequence} is 'seq' for next-place sequences and '30min' or '1H' for the next time-bin sequences and indicate the size of the time-bin. Files with 'predictability' at the end of the file contain R-squared and Spearman's correlation of measures calculated in relation to the predictability measure.

R2 files include values of R-squared for all types of modelled regression functions. 'line' indicates {y = ax + b} for single variable and {y = ax + by + c} for two variables. 'expo' indicates {y = a*x^b + c} for single variable and {y = a*x^b + c*y^d + e} for two variables 'log' indicates {y = a*log(x*b) + c} for single variable and {y = a * x + c * log(y) + e + d*x * log(y)} for two variables. 'logf' indicates {y = a*log(x) + c * log(y) + e + b*log(x) * log(y)} for two variables
n
Ground-roll separation using intelligence based-matching method
narcis.nl
data.mendeley.com
Updated Feb 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li, J (via Mendeley Data) (2020). Ground-roll separation using intelligence based-matching method [Dataset]. http://doi.org/10.17632/xg237bzyxb.1
Explore at:
Unique identifier
https://doi.org/10.17632/xg237bzyxb.1
Dataset updated
Feb 27, 2020
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Li, J (via Mendeley Data)
Description
Separation is achieved by intelligence based-matching of the curvelet coefficients.
D
Machine Learning Assisted History Matching Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Oct 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Machine Learning Assisted History Matching Market Research Report 2033 [Dataset]. https://dataintelo.com/report/machine-learning-assisted-history-matching-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Oct 1, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Machine Learning Assisted History Matching Market Outlook

According to our latest research, the global machine learning assisted history matching market size reached USD 1.83 billion in 2024, reflecting a robust surge in adoption across the oil & gas, mining, and geothermal sectors. The market is experiencing a strong growth trajectory, with a recorded CAGR of 13.7% from 2025 to 2033. By the end of 2033, the market is forecasted to reach USD 5.46 billion, driven by the increasing need for efficient reservoir management, enhanced production optimization, and the integration of advanced data analytics in subsurface modeling. The primary growth factor for this market is the escalating demand for digital transformation in upstream energy operations, where machine learning technologies are revolutionizing traditional history matching processes.

The rapid adoption of machine learning assisted history matching is largely attributed to the growing complexities of subsurface reservoirs and the ever-increasing volume of data generated by modern exploration and production activities. As energy companies strive to maximize reservoir recovery and minimize operational risks, machine learning algorithms offer unprecedented capabilities in automating the history matching process, reducing manual intervention, and providing more accurate reservoir models. This shift is further propelled by the oil & gas industry's ongoing transition towards digitalization, with operators seeking to leverage artificial intelligence and machine learning for predictive analytics, real-time decision making, and cost optimization. The ability of machine learning solutions to handle multi-dimensional datasets and deliver faster, more reliable results is a key driver behind the market’s impressive CAGR.

Another significant growth factor is the increasing focus on maximizing resource extraction while adhering to stringent environmental and regulatory standards. Machine learning assisted history matching allows operators to simulate numerous reservoir scenarios swiftly, enabling them to identify optimal production strategies and mitigate potential environmental impacts. The integration of cloud computing and advanced analytics platforms has further democratized access to these technologies, enabling small and medium enterprises (SMEs) to adopt sophisticated history matching solutions without the need for heavy upfront investments in IT infrastructure. Moreover, the rising demand for enhanced oil recovery (EOR) techniques, coupled with the depletion of conventional reserves, is compelling operators to invest in advanced machine learning solutions that can unlock new value from mature fields.

From a regional perspective, North America continues to dominate the machine learning assisted history matching market, accounting for the largest share in 2024, followed closely by Europe and Asia Pacific. The presence of major oil & gas companies, a mature digital ecosystem, and a strong focus on innovation are key factors underpinning North America’s leadership. Meanwhile, Asia Pacific is emerging as the fastest-growing regional market, bolstered by rising energy demand, significant investments in exploration activities, and the increasing adoption of digital technologies in countries such as China, India, and Australia. The Middle East & Africa region also presents substantial growth opportunities, driven by ongoing investments in upstream projects and the adoption of advanced reservoir management practices.

Solution Type Analysis

The machine learning assisted history matching market is segmented by solution type into software and services. The software segment currently holds the largest share, primarily due to the proliferation of advanced analytics platforms and specialized machine learning tools designed for reservoir engineers and geoscientists. These software solutions are continually evolving, incorporating new algorithms and user-friendly interfaces that streamline the history matching process. The availability of customizable software packages enables operators to tailor solutions to their specific reservoir characteristics, leading to improved model accuracy and reduced cycle times. Furthermore, the integration of cloud-based software has significantly enhanced scalability and collaboration, allowing geographically dispersed teams to work seamlessly on complex projects.

Within the software segment, the adoption of artificial intelligence (AI)
d
Effects of process ambidexterity on coordination and outcomes in software...
da-ra.de
Updated Jan 23, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ye Li (2014). Effects of process ambidexterity on coordination and outcomes in software project teams - survey data [Dataset]. http://doi.org/10.7801/62
Explore at:
Unique identifier
https://doi.org/10.7801/62
Dataset updated
Jan 23, 2014
Dataset provided by
da|ra
Mannheim University Library
Authors
Ye Li
Description
This data set was collected in a survey study on the effects of process alignment and process agility on coordination and outcomes in software project teams.

Facebook

Twitter

Click to copy link

Link copied

Cite

Thomas Stringham (2023). Fast Bayesian Record Linkage With Record-Specific Disagreement Parameters [Dataset]. http://doi.org/10.6084/m9.figshare.14687696.v1

Data from: Fast Bayesian Record Linkage With Record-Specific Disagreement Parameters

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.14687696.v1

Dataset updated

Jun 2, 2023

Dataset provided by

Taylor & Francishttps://taylorandfrancis.com/

Authors

Thomas Stringham

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Researchers are often interested in linking individuals between two datasets that lack a common unique identifier. Matching procedures often struggle to match records with common names, birthplaces, or other field values. Computational feasibility is also a challenge, particularly when linking large datasets. We develop a Bayesian method for automated probabilistic record linkage and show it recovers more than 50% more true matches, holding accuracy constant, than comparable methods in a matching of military recruitment data to the 1900 U.S. Census for which expert-labeled matches are available. Our approach, which builds on a recent state-of-the-art Bayesian method, refines the modeling of comparison data, allowing disagreement probability parameters conditional on nonmatch status to be record-specific in the smaller of the two datasets. This flexibility significantly improves matching when many records share common field values. We show that our method is computationally feasible in practice, despite the added complexity, with an R/C++ implementation that achieves a significant improvement in speed over comparable recent methods. We also suggest a lightweight method for treatment of very common names and show how to estimate true positive rate and positive predictive value when true match status is unavailable.

Clear search

Close search

Google apps

Main menu

Data from: Fast Bayesian Record Linkage With Record-Specific Disagreement...

Data Matching Imputation System

Assessing the performance of matching algorithms when selection into...

Data from: TACO: a benchmark for connectivity-invariance in shape...

TACO: a benchmark for connectivity-invariance in shape correspondence

Dataset structure

Replication Data for: The Balance-Sample Size Frontier in Matching Methods...

Data from: Graded Matching for Large Observational Studies

Replication Data for: Adjusting for Confounding with Text Matching

Leadbook B2B Contact Custom Datasets - Global Coverage, 200 Million Business...

💌 Predict Online Dating Matches Dataset

Data:

Variables:

Target Variable:

Usage:

Explanation of Zero Matches for Some Users:

Disclaimer:

Warning:

Data Source:

Conclusion:

Dataset Usage and Attribution Notice

Exclusive Synthetic Dataset

Data from: Highly Scalable Matching Pursuit Signal Decomposition Algorithm

Figure 8

Dataset: Problem-centred interviews results for Matching Data Life Cycle and...

Mix-and-Match Dataset

Data and Code for: Matching and Network Effects in Ride-Hailing

Claim Detection and Matching for Indian Languages

Matching as a stochastic process (replication data)

Data from: Explaining human mobility predictions through a pattern matching...

Ground-roll separation using intelligence based-matching method

Machine Learning Assisted History Matching Market Research Report 2033

Machine Learning Assisted History Matching Market Outlook

Solution Type Analysis

Effects of process ambidexterity on coordination and outcomes in software...

Data from: Fast Bayesian Record Linkage With Record-Specific Disagreement Parameters