100+ datasets found
  1. Data from: Fast Bayesian Record Linkage With Record-Specific Disagreement...

    • tandf.figshare.com
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Stringham (2023). Fast Bayesian Record Linkage With Record-Specific Disagreement Parameters [Dataset]. http://doi.org/10.6084/m9.figshare.14687696.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Thomas Stringham
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Researchers are often interested in linking individuals between two datasets that lack a common unique identifier. Matching procedures often struggle to match records with common names, birthplaces, or other field values. Computational feasibility is also a challenge, particularly when linking large datasets. We develop a Bayesian method for automated probabilistic record linkage and show it recovers more than 50% more true matches, holding accuracy constant, than comparable methods in a matching of military recruitment data to the 1900 U.S. Census for which expert-labeled matches are available. Our approach, which builds on a recent state-of-the-art Bayesian method, refines the modeling of comparison data, allowing disagreement probability parameters conditional on nonmatch status to be record-specific in the smaller of the two datasets. This flexibility significantly improves matching when many records share common field values. We show that our method is computationally feasible in practice, despite the added complexity, with an R/C++ implementation that achieves a significant improvement in speed over comparable recent methods. We also suggest a lightweight method for treatment of very common names and show how to estimate true positive rate and positive predictive value when true match status is unavailable.

  2. Data Matching Imputation System

    • fisheries.noaa.gov
    • catalog.data.gov
    • +1more
    Updated Jan 1, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Greater Atlantic Regional Fisheries Office (2012). Data Matching Imputation System [Dataset]. https://www.fisheries.noaa.gov/inport/item/17328
    Explore at:
    Dataset updated
    Jan 1, 2012
    Dataset provided by
    Greater Atlantic Regional Fisheries Office
    Time period covered
    2000 - Dec 3, 2125
    Area covered
    northeast, Northeast fishery management area; Maine coast southward to North Carolina.
    Description

    The DMIS dataset is a flat file record of the matching of several data set collections. Primarily it consists of VTRs, dealer records, Observer data in conjunction with vessel permit information for the purpose of supporting North East Regional quota monitoring projects.

  3. r

    Assessing the performance of matching algorithms when selection into...

    • resodate.org
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boris Augurzky (2025). Assessing the performance of matching algorithms when selection into treatment is strong (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9hc3Nlc3NpbmctdGhlLXBlcmZvcm1hbmNlLW9mLW1hdGNoaW5nLWFsZ29yaXRobXMtd2hlbi1zZWxlY3Rpb24taW50by10cmVhdG1lbnQtaXMtc3Ryb25n
    Explore at:
    Dataset updated
    Oct 2, 2025
    Dataset provided by
    Journal of Applied Econometrics
    ZBW Journal Data Archive
    ZBW
    Authors
    Boris Augurzky
    Description

    This paper investigates the method of matching regarding two crucial implementation choices: the distance measure and the type of algorithm. We implement optimal full matching a fully efficient algorithm, and present a framework for statistical inference. The implementation uses data from the NLSY79 to study the effect of college education on earnings. We find that decisions regarding the matching algorithm depend on the structure of the data: In the case of strong selection into treatment and treatment effect heterogeneity a full matching seems preferable. If heterogeneity is weak, pair matching suffices.

  4. z

    Data from: TACO: a benchmark for connectivity-invariance in shape...

    • zenodo.org
    zip
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simone Pedico; Simone Pedico; Simone Melzi; Simone Melzi; Filippo Maggioli; Filippo Maggioli (2024). TACO: a benchmark for connectivity-invariance in shape correspondence [Dataset]. http://doi.org/10.5281/zenodo.14066437
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    Smart Tools and Applications in Graphics 2024
    Authors
    Simone Pedico; Simone Pedico; Simone Melzi; Simone Melzi; Filippo Maggioli; Filippo Maggioli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TACO: a benchmark for connectivity-invariance in shape correspondence

    In real-world scenarios, a major limitation for shape-matching datasets is represented by having all the meshes of the same subject share their connectivity across different poses. Specifically, similar connectivities could provide a significant bias for shape-matching algorithms, simplifying the matching process and potentially leading to correspondences based on recurring triangle patterns rather than geometric correspondences between mesh parts. As a consequence, the resulting correspondence may be meaningless, and the evaluation of the algorithm may be misled.
    To overcome this limitation, we introduce TACO, a new dataset where meshes representing the same subject in different poses do not share the same connectivity, and we compute new ground truth correspondences between shapes. We extensively evaluate our dataset to ensure that ground truth isometries are properly preserved. We also use our dataset to validate state-of-the-art shape-matching algorithms, verifying a degradation in performance when the connectivity gets altered.

    Dataset structure

    • offs: a directory containing all the triangular meshes in the dataset in OFF file format
    • pairs.txt: a list of all the 420 possible pairs of shapes in the dataset
    • gt_matches: a directory containing all the ground truth correspondences listed in `pairs.txt` and stored in MAT file format
  5. H

    Replication Data for: The Balance-Sample Size Frontier in Matching Methods...

    • dataverse.harvard.edu
    Updated Jul 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gary King; Christopher Lucas; Richard Nielsen (2017). Replication Data for: The Balance-Sample Size Frontier in Matching Methods for Causal Inference [Dataset]. http://doi.org/10.7910/DVN/SURSEO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 1, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Gary King; Christopher Lucas; Richard Nielsen
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/SURSEOhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/SURSEO

    Description

    We propose a simplified approach to matching for causal inference that simultaneously optimizes both balance (similarity between the treated and control groups) and matched sample size. Existing approaches either fix the matched sample size and maximize balance or fix balance and maximize sample size, leaving analysts to settle for suboptimal solutions or attempt manual optimization by iteratively tweaking their matching method and rechecking balance. To jointly maximize balance and sample size, we introduce the matching frontier, the set of matching solutions with maximum balance for each possible sample size. Rather than iterating, researchers can choose matching solutions from the frontier for analysis in one step. We derive fast algorithms that calculate the matching frontier for several commonly used balance metrics. We demonstrate with analyses of the effect of sex on judging and job training programs that show how the methods we introduce can extract new knowledge from existing data sets.

  6. f

    Data from: Graded Matching for Large Observational Studies

    • datasetcatalog.nlm.nih.gov
    • tandf.figshare.com
    Updated Mar 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu, Ruoqi; Rosenbaum, Paul R. (2022). Graded Matching for Large Observational Studies [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000420071
    Explore at:
    Dataset updated
    Mar 28, 2022
    Authors
    Yu, Ruoqi; Rosenbaum, Paul R.
    Description

    Observational studies of causal effects often use multivariate matching to control imbalances in measured covariates. For instance, using network optimization, one may seek the closest possible pairing for key covariates among all matches that balance a propensity score and finely balance a nominal covariate, perhaps one with many categories. This is all straightforward when matching thousands of individuals, but requires some adjustments when matching tens or hundreds of thousands of individuals. In various senses, a sparser network—one with fewer edges—permits optimization in larger samples. The question is: What is the best way to make the network sparse for matching? A network that is too sparse will eliminate from consideration possible pairings that it should consider. A network that is not sparse enough will waste computation considering pairings that do not deserve serious consideration. We propose a new graded strategy in which potential pairings are graded, with a preference for higher grade pairings. We try to match with pairs of the best grade, incorporating progressively lower grade pairs only to the degree they are needed. In effect, only sparse networks are built, stored and optimized. Two examples are discussed, a small example with 1567 matched pairs from clinical medicine, and a slightly larger example with 22,111 matched pairs from economics. The method is implemented in an R package RBestMatch available at https://github.com/ruoqiyu/RBestMatch. Supplementary materials for this article are available online.

  7. H

    Replication Data for: Adjusting for Confounding with Text Matching

    • dataverse.harvard.edu
    • datasetcatalog.nlm.nih.gov
    Updated Dec 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Margaret E. Roberts; Brandon M. Stewart; Richard A. Nielsen (2021). Replication Data for: Adjusting for Confounding with Text Matching [Dataset]. http://doi.org/10.7910/DVN/HTMX3K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 6, 2021
    Dataset provided by
    Harvard Dataverse
    Authors
    Margaret E. Roberts; Brandon M. Stewart; Richard A. Nielsen
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/HTMX3Khttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/HTMX3K

    Area covered
    China
    Description

    We identify situations in which conditioning on text can address confounding in observational studies. We argue that a matching approach is particularly well-suited to this task, but existing matching methods are ill-equipped to handle high-dimensional text data. Our proposed solution is to estimate a low-dimensional summary of the text and condition on this summary via matching. We propose a method of text matching, topical inverse regression matching, that allows the analyst to match both on the topical content of confounding documents and the probability that each of these documents is treated. We validate our approach and illustrate the importance of conditioning on text to address confounding with two applications: the effect of perceptions of author gender on citation counts in the international relations literature and the effects of censorship on Chinese social media users.

  8. d

    Leadbook B2B Contact Custom Datasets - Global Coverage, 200 Million Business...

    • datarade.ai
    .csv
    Updated Aug 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leadbook (2020). Leadbook B2B Contact Custom Datasets - Global Coverage, 200 Million Business Contacts with Advanced Targeting and Quarterly Refresh [Dataset]. https://datarade.ai/data-products/custom-datasets-with-advanced-targeting-and-quarterly-refresh
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Aug 5, 2020
    Dataset authored and provided by
    Leadbook
    Area covered
    Réunion, Mexico, Martinique, Qatar, Malta, Saint Kitts and Nevis, Guyana, Poland, Togo, Vietnam
    Description

    Build highly targeted, custom datasets from a database of 200 million global contacts to match your target audience profile, and receive quarterly refreshes that are powered by Leadbook's proprietary A.I. powered data technology.

    Build your dataset with custom attributes and conditions like: - Usage of a specific technology - Minimum number of records per organisation - Data matching against a list of area codes - Data matching against a list of business registration numbers - Specific headquarter and branch location combinations

    Complimentary de-duplication is provided to ensure that you only pay for contacts that you don't already own.

    All records include: - Contact name - Job title - Contact email address - Contact phone number - Contact location - Organisation name - Organisation type - Organisation headcount - Primary industry

    Additional information like social media handles, secondary industries, and organisation websites may be provided where available.

    Pricing includes a one-time data processing fee and additional fees per data refresh.

  9. 💌 Predict Online Dating Matches Dataset

    • kaggle.com
    zip
    Updated Jun 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rabie El Kharoua (2024). 💌 Predict Online Dating Matches Dataset [Dataset]. https://www.kaggle.com/datasets/rabieelkharoua/predict-online-dating-matches-dataset/code
    Explore at:
    zip(7223 bytes)Available download formats
    Dataset updated
    Jun 21, 2024
    Authors
    Rabie El Kharoua
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data:

    The Dataset provides a comprehensive view into the dynamics of online matchmaking interactions. It captures essential variables that influence the likelihood of successful matches across different genders. This dataset allows researchers and analysts to explore how factors such as VIP subscription status, income levels, parental status, age, and self-perceived attractiveness contribute to the outcomes of online dating endeavors.

    Variables:

    • Gender: 0 (Male), 1 (Female)
    • PurchasedVIP: 0 (No), 1 (Yes)
    • Income: Annual income in USD
    • Children: Number of children
    • Age: Age of the user
    • Attractiveness: Subjective rating of attractiveness (1-10)
    • Matches: Number of matches obtained based on criteria

    Target Variable:

    • Matches: Number of matches received, indicative of success rate in online dating

    Usage:

    • Analyze gender-specific dating preferences and behaviors.
    • Predict match success.

    Explanation of Zero Matches for Some Users:

    The occurrence of zero matches for certain users within the dataset can be attributed to the presence of "ghost users." These are users who create an account but subsequently abandon the app without engaging further. Consequently, their profiles do not participate in any matching activities, leading to a recorded match count of zero. This phenomenon should be taken into account when analyzing user activity and match data, as it impacts the overall interpretation of user engagement and match success rates.

    Disclaimer:

    This dataset contains 1000 records, which is considered relatively low within this category of datasets. Additionally, the dataset may not accurately reflect reality as it was captured intermittently over different periods of time.

    Furthermore, certain match categories are missing due to confidentiality constraints, and several other crucial variables are also absent for the same reason. Consequently, the machine learning models employed may not achieve high accuracy in predicting the number of matches.

    It is important to acknowledge these limitations when interpreting the results derived from this dataset. Careful consideration of these factors is advised when drawing conclusions or making decisions based on the findings of any analyses conducted using this data.

    Warning:

    Due to confidentiality constraints, only a small amount of data was collected. Additionally, only users with variables showing high correlation with the matching variable were included in the dataset.

    As a result, the high performance of machine learning models on this dataset is primarily due to the data collection method (i.e., only high-correlation data was included).

    Therefore, the findings you may derive from manipulating this dataset are not representative of the real dating world.

    Data Source:

    The source of this dataset is confidential, and it may be released in the future. For the present, this dataset can be utilized under the terms of the license visible on the dataset's card.

    Users are advised to review and adhere to the terms specified in the dataset's license when using the data for any purpose.

    Conclusion:

    This dataset provides insights into the dynamics of online dating interactions, allowing for predictive modeling and analysis of factors influencing matchmaking success.

    Dataset Usage and Attribution Notice

    This dataset, shared by Rabie El Kharoua, is original and has never been shared before. It is made available under the CC BY 4.0 license, allowing anyone to use the dataset in any form as long as proper citation is given to the author. A DOI is provided for proper referencing. Please note that duplication of this work within Kaggle is not permitted.

    Exclusive Synthetic Dataset

    This dataset is synthetic and was generated for educational purposes, making it ideal for data science and machine learning projects. It is an original dataset, owned by Mr. Rabie El Kharoua, and has not been previously shared. You are free to use it under the license outlined on the data card. The dataset is offered without any guarantees. Details about the data provider will be shared soon.

  10. d

    Data from: Highly Scalable Matching Pursuit Signal Decomposition Algorithm

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Highly Scalable Matching Pursuit Signal Decomposition Algorithm [Dataset]. https://catalog.data.gov/dataset/highly-scalable-matching-pursuit-signal-decomposition-algorithm
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    In this research, we propose a variant of the classical Matching Pursuit Decomposition (MPD) algorithm with significantly improved scalability and computational performance. MPD is a powerful iterative algorithm that decomposes a signal into linear combinations of its dictionary elements or “atoms”. A best fit atom from an arbitrarily defined dictionary is determined through cross-correlation. The selected atom is subtracted from the signal and this procedure is repeated on the residual in the subsequent iterations until a stopping criteria is met. A sufficiently large dictionary is required for an accurate reconstruction; this in return increases the computational burden of the algorithm, thus limiting its applicability and level of adoption. Our main contribution lies in improving the computational efficiency of the algorithm to allow faster decomposition while maintaining a similar level of accuracy. The Correlation Thresholding and Multiple Atom Extractions techniques were proposed to decrease the computational burden of the algorithm. Correlation thresholds prune insignificant atoms from the dictionary. The ability to extract multiple atoms within a single iteration enhanced the effectiveness and efficiency of each iteration. The proposed algorithm, entitled MPD++, was demonstrated using real world data set.

  11. Figure 8

    • figshare.com
    tiff
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous Readmore (2025). Figure 8 [Dataset]. http://doi.org/10.6084/m9.figshare.28129472.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jan 3, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Anonymous Readmore
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Figure8-The dynamic sorting and sample point exclusion process of the proposed method

  12. Dataset: Problem-centred interviews results for Matching Data Life Cycle and...

    • meta4ds.fokus.fraunhofer.de
    unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo, Dataset: Problem-centred interviews results for Matching Data Life Cycle and Research Processes in Engineering Sciences [Dataset]. https://meta4ds.fokus.fraunhofer.de/datasets/oai-zenodo-org-11198842?locale=en
    Explore at:
    unknown(11449)Available download formats
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The authors would like to thank the Federal Government and the Heads of Government of the Länder, as well as the Joint Science Conference (GWK), for their funding and support within the framework of the NFDI4Ing consortium. Funded by the German Research Foundation (DFG) - project number 442146713.

  13. Z

    Mix-and-Match Dataset

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Dec 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verstraaten, Merijn (2020). Mix-and-Match Dataset [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_4317448
    Explore at:
    Dataset updated
    Dec 12, 2020
    Dataset provided by
    University of Amsterdam
    Authors
    Verstraaten, Merijn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Benchmark results for "Mix-and-Match: A Model-driven Runtime Optimisation Strategy for BFS on GPUs" paper.

    Performance data for Breadth-First Search on NVidia TitanX. Including trained Binary Decision Tree model for predicting the best implementation on an input graph.

  14. o

    Data and Code for: Matching and Network Effects in Ride-Hailing

    • openicpsr.org
    Updated Mar 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Camilo Castillo; Shreya Mathur (2023). Data and Code for: Matching and Network Effects in Ride-Hailing [Dataset]. http://doi.org/10.3886/E186903V1
    Explore at:
    Dataset updated
    Mar 20, 2023
    Dataset provided by
    American Economic Association
    Authors
    Juan Camilo Castillo; Shreya Mathur
    License

    https://opensource.org/licenses/GPL-3.0https://opensource.org/licenses/GPL-3.0

    Time period covered
    Mar 16, 2017 - Apr 8, 2017
    Area covered
    Texas, USA, Houston
    Description

    A recent empirical literature models search and matching frictions by means of a reduced-form matching function. An alternative approach is to simulate the matching process directly. In this paper, we follow the latter approach to model matching in ride-hailing. We compute the matching function implied by the matching process. It exhibits increasing returns to scale, and it does not resemble the commonly used Cobb-Douglas functional form. We then use this matching function to quantify network externalities. A subsidy on the order of $2 per trip is needed to correct for these externalities and induce the market to operate efficiently. This repository contains the code and a subset of the data used for the paper.

  15. Claim Detection and Matching for Indian Languages

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    csv
    Updated Jun 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashkan Kazemi; Kiran Garimella; Devin Gaffney; Scott A. Hale; Ashkan Kazemi; Kiran Garimella; Devin Gaffney; Scott A. Hale (2021). Claim Detection and Matching for Indian Languages [Dataset]. http://doi.org/10.5281/zenodo.4890950
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 6, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ashkan Kazemi; Kiran Garimella; Devin Gaffney; Scott A. Hale; Ashkan Kazemi; Kiran Garimella; Devin Gaffney; Scott A. Hale
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    Two datasets are included in this repository: claim matching and claim detection datasets. The collections contain data in 5 languages: Bengali, English, Hindi, Malayalam and Tamil.

    The "claim detection" dataset contains textual claims from social media and fact-checking websites annotated for the "fact-check worthiness" of the claims in each message. Data points have one of the three labels of "Yes" (text contains one or more check-worthy claims), "No" and "Probably".

    The "claim matching" dataset is a curated collection of pairs of textual claims from social media and fact-checking websites for the purpose of automatic and multilingual claim matching. Pairs of data have one of the four labels of "Very Similar", "Somewhat Similar", "Somewhat Dissimilar" and "Very Dissimilar".

    All personally identifiable information (PII) including phone numbers, email addresses, license plate numbers and addresses have been replaced with general tags (e.g.

    , etc) to protect user anonymity. A detailed explanation on the curation and annotation process is provided in our ACL 2021 paper:
    Kazemi, A.; Garimella, K.; Gaffney, D.; and Hale, S. A. 2021. Claim Matching Beyond English to Scale Global Fact-Checking. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, ACL 2021.

  16. r

    Matching as a stochastic process (replication data)

    • resodate.org
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Friedel Bolle; Philipp E. Otto (2025). Matching as a stochastic process (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9tYXRjaGluZy1hcy1hLXN0b2NoYXN0aWMtcHJvY2VzLXJlcGxpY2F0aW9uLWRhdGFz
    Explore at:
    Dataset updated
    Oct 2, 2025
    Dataset provided by
    Journal of Economics and Statistics
    ZBW Journal Data Archive
    ZBW
    Authors
    Friedel Bolle; Philipp E. Otto
    Description

    Results of multi-party bargaining are usually described by concepts from cooperative game theory, in particular by the core. In one-on-one matching,
    core allocations are stable in the sense that no pair of unmatched or otherwise matched players can improve their incomes by forming a match. Because of incomplete information and bounded rationality, it is difficult to adopt a core allocation immediately. Theoretical investigations cope with the problem of whether core allocations can be adopted in a stochastic process with repeated re-matching. In this paper, we investigate sequences of matching with data from an experimental 2×2 labor market with wage negotiations. This market has seven possible matching structures (states) and is additionally characterized by the negotiated wages and profits. First, we describe the stochastic process of transitions from one state to another including the average transition times. Second, we identify different influences on the process parameters as, for example, the difference of incomes in a match. Third, allocations in the core should be completely durable or at least more durable than comparable out-of-core allocations, but they are not. Final bargaining results (induced by a time limit) appear as snapshots of a stochastic process without absorbing states and with only weak systematic influences.

    Data and R code of the analysis are provided.

  17. Z

    Data from: Explaining human mobility predictions through a pattern matching...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Smolak, Kamil; Rohm, Witold; Siła-Nowicka, Katarzyna (2021). Explaining human mobility predictions through a pattern matching algorithm [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5788700
    Explore at:
    Dataset updated
    Dec 18, 2021
    Dataset provided by
    University of Auckland
    Wrocław University of Environmental and Life Sciences
    Authors
    Smolak, Kamil; Rohm, Witold; Siła-Nowicka, Katarzyna
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The name of the file indicate information: {type of sequence}_{type of measure}_{sequence properites}_{additional information}.csv

    {type of sequence} - 'synth' for synthetic or 'london' for real mobility data from London, UK. {type of measure} - 'r2' for R-squared measure or 'corr' for Spearman's correlation {sequence properties} - for synthetic data there are three types of sequences, described in the research article (random, markovian, nonstationary). For real mobility data this part includes information about data processing parameters: (...)_london_{type of mobility sequence}_{DBSCAN epsilon value}_{DBSCAN min_pts value}. {type of mobility sequence} is 'seq' for next-place sequences and '30min' or '1H' for the next time-bin sequences and indicate the size of the time-bin. Files with 'predictability' at the end of the file contain R-squared and Spearman's correlation of measures calculated in relation to the predictability measure.

    R2 files include values of R-squared for all types of modelled regression functions. 'line' indicates {y = ax + b} for single variable and {y = ax + by + c} for two variables. 'expo' indicates {y = a*x^b + c} for single variable and {y = a*x^b + c*y^d + e} for two variables 'log' indicates {y = a*log(x*b) + c} for single variable and {y = a * x + c * log(y) + e + d*x * log(y)} for two variables. 'logf' indicates {y = a*log(x) + c * log(y) + e + b*log(x) * log(y)} for two variables

  18. n

    Ground-roll separation using intelligence based-matching method

    • narcis.nl
    • data.mendeley.com
    Updated Feb 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li, J (via Mendeley Data) (2020). Ground-roll separation using intelligence based-matching method [Dataset]. http://doi.org/10.17632/xg237bzyxb.1
    Explore at:
    Dataset updated
    Feb 27, 2020
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Li, J (via Mendeley Data)
    Description

    Separation is achieved by intelligence based-matching of the curvelet coefficients.

  19. D

    Machine Learning Assisted History Matching Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Machine Learning Assisted History Matching Market Research Report 2033 [Dataset]. https://dataintelo.com/report/machine-learning-assisted-history-matching-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Machine Learning Assisted History Matching Market Outlook



    According to our latest research, the global machine learning assisted history matching market size reached USD 1.83 billion in 2024, reflecting a robust surge in adoption across the oil & gas, mining, and geothermal sectors. The market is experiencing a strong growth trajectory, with a recorded CAGR of 13.7% from 2025 to 2033. By the end of 2033, the market is forecasted to reach USD 5.46 billion, driven by the increasing need for efficient reservoir management, enhanced production optimization, and the integration of advanced data analytics in subsurface modeling. The primary growth factor for this market is the escalating demand for digital transformation in upstream energy operations, where machine learning technologies are revolutionizing traditional history matching processes.




    The rapid adoption of machine learning assisted history matching is largely attributed to the growing complexities of subsurface reservoirs and the ever-increasing volume of data generated by modern exploration and production activities. As energy companies strive to maximize reservoir recovery and minimize operational risks, machine learning algorithms offer unprecedented capabilities in automating the history matching process, reducing manual intervention, and providing more accurate reservoir models. This shift is further propelled by the oil & gas industry's ongoing transition towards digitalization, with operators seeking to leverage artificial intelligence and machine learning for predictive analytics, real-time decision making, and cost optimization. The ability of machine learning solutions to handle multi-dimensional datasets and deliver faster, more reliable results is a key driver behind the market’s impressive CAGR.




    Another significant growth factor is the increasing focus on maximizing resource extraction while adhering to stringent environmental and regulatory standards. Machine learning assisted history matching allows operators to simulate numerous reservoir scenarios swiftly, enabling them to identify optimal production strategies and mitigate potential environmental impacts. The integration of cloud computing and advanced analytics platforms has further democratized access to these technologies, enabling small and medium enterprises (SMEs) to adopt sophisticated history matching solutions without the need for heavy upfront investments in IT infrastructure. Moreover, the rising demand for enhanced oil recovery (EOR) techniques, coupled with the depletion of conventional reserves, is compelling operators to invest in advanced machine learning solutions that can unlock new value from mature fields.




    From a regional perspective, North America continues to dominate the machine learning assisted history matching market, accounting for the largest share in 2024, followed closely by Europe and Asia Pacific. The presence of major oil & gas companies, a mature digital ecosystem, and a strong focus on innovation are key factors underpinning North America’s leadership. Meanwhile, Asia Pacific is emerging as the fastest-growing regional market, bolstered by rising energy demand, significant investments in exploration activities, and the increasing adoption of digital technologies in countries such as China, India, and Australia. The Middle East & Africa region also presents substantial growth opportunities, driven by ongoing investments in upstream projects and the adoption of advanced reservoir management practices.



    Solution Type Analysis



    The machine learning assisted history matching market is segmented by solution type into software and services. The software segment currently holds the largest share, primarily due to the proliferation of advanced analytics platforms and specialized machine learning tools designed for reservoir engineers and geoscientists. These software solutions are continually evolving, incorporating new algorithms and user-friendly interfaces that streamline the history matching process. The availability of customizable software packages enables operators to tailor solutions to their specific reservoir characteristics, leading to improved model accuracy and reduced cycle times. Furthermore, the integration of cloud-based software has significantly enhanced scalability and collaboration, allowing geographically dispersed teams to work seamlessly on complex projects.




    Within the software segment, the adoption of artificial intelligence (AI)

  20. d

    Effects of process ambidexterity on coordination and outcomes in software...

    • da-ra.de
    Updated Jan 23, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ye Li (2014). Effects of process ambidexterity on coordination and outcomes in software project teams - survey data [Dataset]. http://doi.org/10.7801/62
    Explore at:
    Dataset updated
    Jan 23, 2014
    Dataset provided by
    da|ra
    Mannheim University Library
    Authors
    Ye Li
    Description

    This data set was collected in a survey study on the effects of process alignment and process agility on coordination and outcomes in software project teams.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Thomas Stringham (2023). Fast Bayesian Record Linkage With Record-Specific Disagreement Parameters [Dataset]. http://doi.org/10.6084/m9.figshare.14687696.v1
Organization logo

Data from: Fast Bayesian Record Linkage With Record-Specific Disagreement Parameters

Related Article
Explore at:
txtAvailable download formats
Dataset updated
Jun 2, 2023
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Thomas Stringham
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Researchers are often interested in linking individuals between two datasets that lack a common unique identifier. Matching procedures often struggle to match records with common names, birthplaces, or other field values. Computational feasibility is also a challenge, particularly when linking large datasets. We develop a Bayesian method for automated probabilistic record linkage and show it recovers more than 50% more true matches, holding accuracy constant, than comparable methods in a matching of military recruitment data to the 1900 U.S. Census for which expert-labeled matches are available. Our approach, which builds on a recent state-of-the-art Bayesian method, refines the modeling of comparison data, allowing disagreement probability parameters conditional on nonmatch status to be record-specific in the smaller of the two datasets. This flexibility significantly improves matching when many records share common field values. We show that our method is computationally feasible in practice, despite the added complexity, with an R/C++ implementation that achieves a significant improvement in speed over comparable recent methods. We also suggest a lightweight method for treatment of very common names and show how to estimate true positive rate and positive predictive value when true match status is unavailable.

Search
Clear search
Close search
Google apps
Main menu