100+ datasets found
  1. n

    Data from: Advanced Topics in Differentially Private Statistical Learning

    • curate.nd.edu
    pdf
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spencer Tate Giddens (2025). Advanced Topics in Differentially Private Statistical Learning [Dataset]. http://doi.org/10.7274/29498438.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 14, 2025
    Dataset provided by
    University of Notre Dame
    Authors
    Spencer Tate Giddens
    License

    https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

    Description

    Collecting and utilizing data to understand population trends, make predictions, and guide decisions is becoming increasingly common in today's world. In particular, statistical learning allows users to infer relationships between variables, learn patterns, and predict outcomes for previously unseen data via concepts and techniques from statistics and machine learning. Although many of the results of this practice have been beneficial, the data used often contain sensitive information, such as medical records or financial information, so maintaining privacy is of paramount importance when releasing statistics, parameter estimates, and other results. Differential privacy (DP) is the state-of-the-art framework for guaranteeing privacy when releasing aggregate information and statistics from a dataset. It provides a provable bound on the incurred privacy loss via the injection of random noise, at the cost of a reduction in utility. While many works have been devoted to establishing DP guarantees for various analysis tools in the past two decades since DP's introduction, many popular statistical learning approaches still lack a DP counterpart. This dissertation addresses this issue in three original research topics, as listed below.

    First, the dissertation presents the first differentially private algorithm for general weighted empirical risk minimization (wERM), along with theoretical DP guarantees. It evaluates the performance of the DP-wERM framework applied to outcome weighted learning (OWL), a method for learning individualized treatment rules, in both simulation studies and in a real clinical trial. The results demonstrate the feasibility of training OWL models via wERM with DP guarantees while maintaining sufficiently robust model performance.

    Second, the dissertation presents several original approaches with proven DP guarantees for linear mixed-effects (LME) models. LME models are popular, especially among statisticians, but lack sufficient work on integrating DP. The work leverages some recent advancements in the DP literature, particularly in DP stochastic gradient descent (SGD), to estimate LME model parameters with DP guarantees with better privacy-utility trade-offs. Theoretical results for an upper bound for the mean squared error between private parameter estimates vs the true parameters for DP-SGD-based approaches are provided, and a simulation study and a real-world case study provide further empirical evidence for the feasibility of the approaches at practically reasonable privacy budgets.

    Third, this dissertation introduces SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure that sequentially combines DP data synthesis with a fairness-aware data transformation. Alongside privacy, the fairness of decisions made by a statistical learning model is also crucial to address, though the vast majority of existing literature treats the two concerns independently. For methods that do consider privacy and fairness simultaneously, they often only apply to a specific machine learning task, limiting their generalizability. SAFES allows full control over the privacy-fairness-utility trade-off via tunable privacy and fairness parameters. SAFES is illustrated by combining a graphical model-based DP data synthesizer with a popular fairness-aware data pre-processing transformation, and empirical evaluations on two popular benchmark datasets demonstrate that for reasonable privacy loss, SAFES-generated synthetic data achieve significantly improved fairness metrics with relatively low utility loss.

  2. d

    Statistics on the number of scholarships for masters and doctoral...

    • data.gov.tw
    csv
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Student Affairs and Special Education (2025). Statistics on the number of scholarships for masters and doctoral dissertations and journal papers in gender equality education [Dataset]. https://data.gov.tw/en/datasets/159100
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 1, 2025
    Dataset authored and provided by
    Department of Student Affairs and Special Education
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    In order to encourage academic and related research on gender equality education and improve the academic standards of the above-mentioned topics, the Ministry of Education has formulated the "Key Points for the Ministry of Education to Award Master's and Doctoral Thesis and Journal Papers on Gender Equality Education" for awards.

  3. 4

    Metadata for the dissertation: Improving Commercial Property Price...

    • data.4tu.nl
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farley Ishaak (2024). Metadata for the dissertation: Improving Commercial Property Price Statistics [Dataset]. http://doi.org/10.4121/cab0cf0e-668f-46db-82bb-94abe78faeb0.v1
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Farley Ishaak
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2008 - 2023
    Area covered
    Netherlands
    Description

    This metadata document provides details of the data used for the dissertation: “Improving Commercial Property Price Statistics”. The study explores data related and methodological challenges in the construction of price statistics for commercial real estate.


    Short abstract of the dissertation

    Since the financial crisis of 2008, National Statistical Institutes (NSIs) have worked to develop commercial real estate (CRE) indicators for official statistics. These indicators are considered essential in financial stability monitoring and may help contain the consequences of future crises or even prevent future crises. However, progress at NSIs to develop these indicators has been slow due to challenges like low observation numbers and high heterogeneity. This dissertation addresses these challenges by exploring data issues and suggesting methodological improvements.


    The first three studies focus on data challenges regarding share deals and portfolio sales. Both are real estate trading constructions that are specific to CRE. The results show that share deals and portfolio sales significantly differ from the rest of the market. Therefore, under specific circumstances, CRE indicators could benefit from including these trading types. The final two studies focus on methodological challenges regarding index construction methods and the role of sustainability in real estate pricing. The results show that, by combining established techniques, it is possible to construct price indices that meet official statistics’ standards. Furthermore, the results uncover a complex relationship between sustainability and prices: while energy efficiency generally involves price premiums, others aspects like health and environment display a discount for low sustainable properties.


    Overall, this dissertation contributes to the legislative framework that is currently being developed for EU countries to publish official statistics for commercial real estate and adds to the academic discussion by presenting innovative techniques for data analyses and index construction.


    Data sources

    The following data sources were used:

    1. Bussiness Register (Statistics Netherlands)
    2. Transactions linked to the Register of Adresses and Buildings (BAG)
    3. Linking table buildings and companies (Dutch Land Registry Office)
    4. Property Transfer Tax data (Dutch Tax Authorities)
    5. Building sustainability scores (W/E advisors)Commercial real estate transactions (Dutch Land Registry Office)
    6. Commercial real estate transactions (Dutch Land Registry Office)


    Processing methodology

    1. The data is originally stored in an SQL database and is processed with SQL and R code (version 4.2). In the code, the name of the table is tbl_SPE_2_ABR_Bedrijfsinfo. The data is used for deriving company transfers by comparing ownership states of various periods. The first period that an ownership differs of the same company indicates an ownership transfer.
    2. The data is originally stored in an SQL database and is processed with SQL and R code (version 4.2). In the code, the name of the table is tbl_SPE_6_ABR_CompleetMicro. The data is used for calcuting the size of real estate share deals and estimating price developments by applying appropriate filters and counting the output.
    3. The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is SPE_KADASTER. The data is used for finding real estate information that corresponds to company transfers by linking the company register (ABR) to the real estate register (BAG).
    4. The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is tbl_SPE_3_OVB_Bedrijfsinfo. The data is used for deriving real estate share deals by linking this table (Kadaster) to the real estate register (BAG).
    5. The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is duurzaamheid_input_regressie2. The data is used for finding the relationship between sustainabilty measures and real estate transaction prices by linking sustainabilty scores from a consultancy (WE) to transaction prices (Cadastre) and running regression analyses.
    6. The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is tbl_OV20_pand. The data is used for 4 purposes (separate studies).
    • (1) Chapter 3: Determining the price effect of portfolio sale by running regression analyses
    • (2) Chapter 4: Developing methods to include portfolio sales in CPPI calcutions by using auxilary data of the real estate properties.
    • (3) Chapter 5: Developing a price index method for small domains by using these data to test the outcomes
    • (4) Chapter 6: Determining the relationship between sustatinability by running regression analyses


    Data restrictions

    As part of the CBS law, sharing micro-data outside of the CBS-environment is prohibited. Furthermore, CBS manages the data, but in some cases other parties are still formal owners of the data. The 2 other parties are The Land Registry Office and WE consultancy. Ownership and intellectual property rights are managed in contracts with both owners. It was agreed upon that the data can only be used for the purpose of the PhD study and that the microdata will never be externally disseminated. The data is still owned by them and the intellectual property rights of the analyses belong to me. An intended use of the microdata should be approved by both Statistics Netherlands and the formal data owner. Because of the above, no data can be publicly shared.


    If one intends to do research on these data, an application for data use can be requested at CBS. CBS will charge costs for anonymising the data and providing a closed environment to work with the data. More information on this can be found at: https://www.cbs.nl/en-gb/our-services/customised-services-microdata/microdata-conducting-your-own-research


    Contact information

    Author: Farley Ishaak

    Statistics Netherlands | Henri Faasdreef 312 | P.O. Box 24500 | 2490 HA The Hague

    TU Delft | Delft University of Technology | Faculty of Architecture and the Built Environment

    Department of Management in the Built Environment | P.O. Box 5043 | 2600 GA Delft

    M +31 6 46307974 | ff.ishaak@cbs.nl | f.f.ishaak@tudelft.nl

  4. Data from: AckSent: Human Annotated Dataset of Support and Sentiments in...

    • zenodo.org
    Updated Nov 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Author; Author (2024). AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments [Dataset]. http://doi.org/10.5281/zenodo.13283331
    Explore at:
    Dataset updated
    Nov 5, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Author; Author
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Time period covered
    Aug 10, 2024
    Description

    This data is supplementary to the paper "AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments" .

  5. m

    2025 Green Card Report for Psychometrics and Statistics Equiv To Us Phd In...

    • myvisajobs.com
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MyVisaJobs (2025). 2025 Green Card Report for Psychometrics and Statistics Equiv To Us Phd In Statistics [Dataset]. https://www.myvisajobs.com/reports/green-card/major/psychometrics-and-statistics-equiv-to-us-phd-in-statistics
    Explore at:
    Dataset updated
    Jan 16, 2025
    Dataset authored and provided by
    MyVisaJobs
    License

    https://www.myvisajobs.com/terms-of-service/https://www.myvisajobs.com/terms-of-service/

    Variables measured
    Major, Salary, Petitions Filed
    Description

    A dataset that explores Green Card sponsorship trends, salary data, and employer insights for psychometrics and statistics equiv to us phd in statistics in the U.S.

  6. u

    Thesis Data Repository

    • figshare.unimelb.edu.au
    zip
    Updated Oct 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gregory White (2023). Thesis Data Repository [Dataset]. http://doi.org/10.26188/24295243.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 11, 2023
    Dataset provided by
    The University of Melbourne
    Authors
    Gregory White
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Availability of data, code, and plot creation for various figures throughout my PhD thesis. Rough organisation currently. Pertains to Figures 5.4, 5.8, 6.11, 6.18, 7.3, 7.12, and Table 6.1.

  7. h

    Data for the PhD thesis "Modeling Lexical Fields for Translation: a...

    • heidata.uni-heidelberg.de
    • b2find.eudat.eu
    • +1more
    zip
    Updated Aug 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meri Dallakyan; Meri Dallakyan (2025). Data for the PhD thesis "Modeling Lexical Fields for Translation: a Corpus-Based Study of Armenian, German, and English Culinary Verbs" [Dataset]. http://doi.org/10.11588/DATA/3MPL7E
    Explore at:
    zip(166634), zip(1130199), zip(617108), zip(167898), zip(4471905), zip(5882160), zip(1203076), zip(334871), zip(3353340), zip(2699455), zip(436611), zip(412972), zip(125927), zip(22647800)Available download formats
    Dataset updated
    Aug 4, 2025
    Dataset provided by
    heiDATA
    Authors
    Meri Dallakyan; Meri Dallakyan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains in high resolution all graphical visualizations of data analysis provided in my doctoral dissertation. The graphs are organized according to chapters and subchapters and titeled respectively. Additionally, this dataset provides all dataframes (German, English, and Armenian) in XLSX format of the manual semantic annotation based on which the graphs are generated. Among presented graphical visualizations are (Multiple) Correspondence Analysis (MCA vs. CA), Mosaic-Plots, Conditional Infererence Trees (CIT), and Context-Conditional Correlations Graphs (CCCG).

  8. o

    Dissertations and Data

    • explore.openaire.eu
    • ssh.datastations.nl
    Updated Jan 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. Schöpfel (2016). Dissertations and Data [Dataset]. http://doi.org/10.17026/dans-xg6-xnj4
    Explore at:
    Dataset updated
    Jan 1, 2016
    Authors
    J. Schöpfel
    Description

    We present the results of a quantitative assessment of research data produced and submitted with dissertations Special attention is paid to the size of the research data in appendices, to their presentation and link to the text, to their sources and typology, and to their potential for further research. The discussion puts the focus on legal aspects (database protection, intellectual property, privacy, third-party rights) and other barriers to data sharing, reuse and dissemination through open access. Another part adds insight into the potential handling of these data, in the framework of the French and Slovenian dissertation infrastructures. What could be done to valorize these data in a centralized system for electronic theses and dissertations (ETDs)? The topics are formats, metadata (including attribution of unique identifiers), submission/deposit, long-term preservation and dissemination. This part will also draw on experiences from other campuses and make use of results from surveys on data management at the Universities of Berlin and Lille.

  9. 4

    Supplementary data files for the PhD thesis "Design for Interpersonal Mood...

    • data.4tu.nl
    zip
    Updated Jun 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pelin Esnaf-Uslu; Pieter M. A. Desmet; Rick Schifferstein (2024). Supplementary data files for the PhD thesis "Design for Interpersonal Mood Regulation: Introducing a Framework and Three Tools to Support Mood-Sensitive Service Encounters" [Dataset]. http://doi.org/10.4121/8a9b21b2-6411-42ed-a0e4-05be50fc5a69.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 14, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Pelin Esnaf-Uslu; Pieter M. A. Desmet; Rick Schifferstein
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Dataset funded by
    The Netherlands Organization for Scientific Research (NWO), Division for the Social and Behavioural Sciences
    Description

    This dataset comprises five sets of data collected throughout the PhD Thesis project of Pelin Esnaf-Uslu.

    Esnaf-Uslu, P. (2024). Design for Interpersonal Mood Regulation: Introducing a Framework and Three Tools to Support Mood-Sensitive Service Encounters. (Doctoral dissertation in review). Delft University of Technology, Delft, the Netherlands.

    The research in this thesis is based on the premise that service providers can enhance their effectiveness in client interactions by acquiring a detailed understanding of IMR strategies and effectively applying this knowledge. To achieve this overall aim, the current research aimed to explore (1) the current role of mood in service encounters, (2) the IMR strategies used by service providers during service encounters in response to client’s moods, (3) how IMR strategies can be facilitated by means of tools for service providers and the (4) strengths and limitations of the developed materials.

    This research was supported by VICI grant number 453-16-009 from The Netherlands Organization for Scientific Research (NWO), Division for the Social and Behavioral Sciences, awarded to Pieter M. A. Desmet.

    The data is organized into folders corresponding to the chapters of the thesis. Each folder contains a README file with specific information about the dataset.

    Chapter_2: This study investigates the role of mood in service encounters. Samples are collected from service providers experiences during service encounters and in-depth interviews are conducted. The dataset includes the blank diary and the interview protocol.

    Chapter_3: This study investigates the clarity of the images developed representing Interpersonal Mood Regulation (IMR) strategies. The dataset includes anonymized scores from 27 and 29 participants, showing the associations between images representing nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants. Additionally, the dataset contains a screenshot of the workshop material used in the implementation study.

    Chapter_4: This study examines the clarity of developed videos depicting IMR strategies. The dataset includes anonymized scores from 32 participants, showing the associations between videos depicting nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants. In addition, the dataset contains the workshop guideline developed for the implementation study.

    Chapter_5: This study evaluates the clarity of character animations depicting Interpersonal Mood Regulation (IMR) strategies. The dataset includes anonymized scores from 39 participants, demonstrating the associations between videos illustrating nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants.

    Chapter_6: This dataset comprises correspondence analysis files for each material, created for the purpose of comparison.

    All the data is anonymized by removing the names of individuals and institutions.

  10. l

    Coding Set: Social Network Analysis Data for the PhD Thesis "More than...

    • pubdata.leuphana.de
    xlsx
    Updated 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roman Isaac; Berta Martín-López (2024). Coding Set: Social Network Analysis Data for the PhD Thesis "More than trees" [Dataset]. http://doi.org/10.48548/pubdata-217
    Explore at:
    xlsx(18596)Available download formats
    Dataset updated
    2024
    Authors
    Roman Isaac; Berta Martín-López
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Dataset funded by
    Deutsche Forschungsgemeinschaft (DFG)
    Description

    To identify relevant actors for the governance of co-produced forest nature's contributions to people (NCP) the researchers conducted a social-network analysis based on 39 semi-structured interviews with foresters and conservation managers. These interviews were conducted across three case study sites in Germany: Schorfheide-Chorin in the Northeast, Hainich-Dün in the Centre, and Schwäbische Alb in the Southwest. All three case study sites belong to the large-scale and long-term research platform Biodiversity Exploratories. The researchers employed a predefined coding set to analyse the interviews and grasp the relationships between different actors based on the anthropogenic capitals they used to co-produce forest nature's contributions to people (NCP). To secure the interviewees anonymity this coding cannot be published. Therefore, this data set is limited to this coding set.

  11. s

    Dataset in support of the thesis 'Automating the detection of marine sound...

    • eprints.soton.ac.uk
    Updated Dec 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    White, Ellen Louise (2023). Dataset in support of the thesis 'Automating the detection of marine sound sources' [Dataset]. http://doi.org/10.5281/zenodo.10276722
    Explore at:
    Dataset updated
    Dec 6, 2023
    Dataset provided by
    Zenodo
    Authors
    White, Ellen Louise
    Description

    The dataset associated with the PhD Thesis '', by Ellen L White. This dataset does not include any raw passive acoustic data, please contact me if you are interested in access to any of this data. The data repository contains the training and testing data used within the thesis to develop a CNN for muli-sound source detection, including all required scripts to replicate this work.

  12. PhD thesis

    • figshare.com
    pdf
    Updated Jun 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anders Eklund (2023). PhD thesis [Dataset]. http://doi.org/10.6084/m9.figshare.704865.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 20, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Anders Eklund
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    My PhD thesis

    Computational medical image analysis - With a focus on real-time fMRI and non-parametric statistics

  13. 4

    Research data supporting chapter 'A Hybrid Neural Model Approach for Health...

    • data.4tu.nl
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wassamon Phusakulkajorn; Siwarak Unsiwilai; Ling Chang; Alfredo Núñez; Zili Li (2024). Research data supporting chapter 'A Hybrid Neural Model Approach for Health Assessment of Transition Zones with Multiple Data' of dissertation 'AI Solutions for Maintenance Decision Support in Railway Infrastructure' [Dataset]. http://doi.org/10.4121/43b96757-fd3f-4e89-b9ac-e0caad30f0f0.v1
    Explore at:
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Wassamon Phusakulkajorn; Siwarak Unsiwilai; Ling Chang; Alfredo Núñez; Zili Li
    License

    https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf

    Dataset funded by
    Europe’s Rail Flagship Project
    ProRail
    Description

    The data and codes were prepared and uploaded to 4TU.ResearchData by Wassamon Phusakulkajorn to support the results in Chapter 5 (A Hybrid Neural Model Approach for Health Assessment of Transition Zones with Multiple Data) of her dissertation. This chapter has been submitted for publication as Phusakulkajorn, W., Unsiwilai, S., Chang, L., Núñez, A., Li, Z., A Hybrid Neural Model Approach for Health Assessment of Railway Transition Zones with Multiple Data Sources. In this research, we develop a framework that enables a more frequent evaluation of transition zone health by integrating multiple monitoring technologies, including track geometry measurements, interferometric synthetic aperture radar (InSAR), and axle box acceleration (ABA). This aims to improve an early detection capability for track irregularities. The data used in this research contain ABA, track geometry, InSAR measurements at transitions zone collected from a railway bridge between Dordrecht and Lage Zwaluwe station in the Netherlands. All implementations are done in MATLAB, where (.mat) files are analytical solutions and (.eps) and (.jpg) are figures used in the main manuscript.

  14. n

    Data from: Knowledge-centric Machine Learning on Graphs

    • curate.nd.edu
    pdf
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yijun Tian (2024). Knowledge-centric Machine Learning on Graphs [Dataset]. http://doi.org/10.7274/25607826.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    University of Notre Dame
    Authors
    Yijun Tian
    License

    https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

    Description

    Graph Machine Learning (GML) has gained considerable attention in modeling complex graph-structured data, but many of them focus on collecting high-quality data (i.e., data-centric) and developing complex model architectures (i.e., model-centric). However, these two paradigms come with inherent limitations and challenges: data-centric approaches often demand intensive labor for tasks like data annotation and cleaning, while model-centric approaches usually require specialized expertise for model refinements. There remains a significant reservoir of unexplored potential in harnessing useful information that already exists in the data and learned by models, i.e., knowledge, as a directive force for learning.

    In this dissertation, I introduce a new paradigm of machine learning on graphs: knowledge-centric. This paradigm seeks to leverage all available knowledge, which may come from data, models, or external sources, to facilitate an effective learning process. My research focuses on three different facets to obtain and leverage knowledge in GML, including learning knowledge from data, distilling knowledge from models, and encoding knowledge from external sources. By anchoring on the knowledge, there is a reduced reliance on massive data and intricate model architectures. In addition, knowledge can enhance GML models' performance, trustworthiness, and efficiency.

  15. n

    Data from: Graph-Based Approaches for Prediction and Similarity Analysis

    • curate.nd.edu
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lin Xing (2024). Graph-Based Approaches for Prediction and Similarity Analysis [Dataset]. http://doi.org/10.7274/25575060.v1
    Explore at:
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    University of Notre Dame
    Authors
    Lin Xing
    License

    https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

    Description

    This thesis explores graph-based approaches for prediction and similarity analysis problems within networks and hypergraphs. While existing algorithms for link prediction in networks predominantly target the existence or weights of edges, our study expands the scope by delving into the prediction of both vertex and edge weights using metric geometry and machine learning approaches. Additionally, our investigation extends into weight prediction in higher-order networks, often referred to as hypergraphs. We propose a novel notion of neighborhood for hyperedges, utilizing the topological structures of hypergraphs and weights of hyperedges from a given training set. We construct metric spaces on the set of hyperedges based on the neighborhood information. Furthermore, we explore the practical application of graph similarity algorithms in DNA sequence analysis, introducing an accurate and computationally efficient approach to analyze the similarities among DNA sequences. Our proposed methods were tested on diverse real-world datasets and yielded promising results. The main implication of our research is offering a more comprehensive framework for prediction tasks in networks and hypergraphs, providing alternative avenues to gain a deeper understanding of the intricate relationships within complex networks.

  16. PhD Thesis: Development of Equitable Algorithms for Road Funds Allocation...

    • figshare.com
    application/cdfv2
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Naimanye (2016). PhD Thesis: Development of Equitable Algorithms for Road Funds Allocation and Road Scheme Priritization in Developing Countries: A Case Study of Sub-Saharan Africa [Dataset]. http://doi.org/10.6084/m9.figshare.1396244.v1
    Explore at:
    application/cdfv2Available download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Andrew Naimanye
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Sub-Saharan Africa
    Description

    Uganda Road Fund Allocation Formula application 2014 and 2015

  17. J

    Jordan Number of Enrolled PHD Students

    • ceicdata.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). Jordan Number of Enrolled PHD Students [Dataset]. https://www.ceicdata.com/en/jordan/education-statistics/number-of-enrolled-phd-students
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 1, 2004 - Jun 1, 2016
    Area covered
    Jordan
    Variables measured
    Education Statistics
    Description

    Jordan Number of Enrolled PHD Students data was reported at 3,362.000 Person in 2017. This records an increase from the previous number of 3,276.000 Person for 2016. Jordan Number of Enrolled PHD Students data is updated yearly, averaging 1,892.000 Person from Jun 2002 (Median) to 2017, with 15 observations. The data reached an all-time high of 3,362.000 Person in 2017 and a record low of 682.000 Person in 2002. Jordan Number of Enrolled PHD Students data remains active status in CEIC and is reported by Ministry of Higher Education and Scientific Research. The data is categorized under Global Database’s Jordan – Table JO.G007: Education Statistics.

  18. 4

    Data and scripts underlying the PhD thesis: Irreducible antifluorite...

    • data.4tu.nl
    zip
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victor Landgraf (2024). Data and scripts underlying the PhD thesis: Irreducible antifluorite electrolytes [Dataset]. http://doi.org/10.4121/2068edec-1ec9-40c8-91c2-78857ce90743.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Victor Landgraf
    License

    https://www.apache.org/licenses/LICENSE-2.0.htmlhttps://www.apache.org/licenses/LICENSE-2.0.html

    Dataset funded by
    NWO
    Description

    This data set contains the raw data and analysis scripts to help reproduce the data presented in the thesis "Irreducible antifluorite electrolytes". The data set contains the data for Chapters 2,3,4, and 5 which are the chapters in which acquired data is presented. (Chapter 1 of the thesis is the Introduction, thus acquired data is shown in this chapter).

  19. R

    New Thesis Data Sets Dataset

    • universe.roboflow.com
    zip
    Updated Feb 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Conveyor (2024). New Thesis Data Sets Dataset [Dataset]. https://universe.roboflow.com/conveyor/new-thesis-data-sets
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 10, 2024
    Dataset authored and provided by
    Conveyor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Fruits Pineapple Mango Papaya Bounding Boxes
    Description

    New Thesis Data Sets

    ## Overview
    
    New Thesis Data Sets is a dataset for object detection tasks - it contains Fruits Pineapple Mango Papaya annotations for 4,346 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  20. e

    Parameters and KPIs from the Real Data Use Case of Dissertation - Dataset -...

    • b2find.eudat.eu
    Updated Apr 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Parameters and KPIs from the Real Data Use Case of Dissertation - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/27c32216-21c9-5185-a734-bef7f2d5a682
    Explore at:
    Dataset updated
    Apr 5, 2024
    Description

    Column names indicate parameter and resulting KPI values, row indizees indicate unique part number location combination

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Spencer Tate Giddens (2025). Advanced Topics in Differentially Private Statistical Learning [Dataset]. http://doi.org/10.7274/29498438.v1

Data from: Advanced Topics in Differentially Private Statistical Learning

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
Jul 14, 2025
Dataset provided by
University of Notre Dame
Authors
Spencer Tate Giddens
License

https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

Description

Collecting and utilizing data to understand population trends, make predictions, and guide decisions is becoming increasingly common in today's world. In particular, statistical learning allows users to infer relationships between variables, learn patterns, and predict outcomes for previously unseen data via concepts and techniques from statistics and machine learning. Although many of the results of this practice have been beneficial, the data used often contain sensitive information, such as medical records or financial information, so maintaining privacy is of paramount importance when releasing statistics, parameter estimates, and other results. Differential privacy (DP) is the state-of-the-art framework for guaranteeing privacy when releasing aggregate information and statistics from a dataset. It provides a provable bound on the incurred privacy loss via the injection of random noise, at the cost of a reduction in utility. While many works have been devoted to establishing DP guarantees for various analysis tools in the past two decades since DP's introduction, many popular statistical learning approaches still lack a DP counterpart. This dissertation addresses this issue in three original research topics, as listed below.

First, the dissertation presents the first differentially private algorithm for general weighted empirical risk minimization (wERM), along with theoretical DP guarantees. It evaluates the performance of the DP-wERM framework applied to outcome weighted learning (OWL), a method for learning individualized treatment rules, in both simulation studies and in a real clinical trial. The results demonstrate the feasibility of training OWL models via wERM with DP guarantees while maintaining sufficiently robust model performance.

Second, the dissertation presents several original approaches with proven DP guarantees for linear mixed-effects (LME) models. LME models are popular, especially among statisticians, but lack sufficient work on integrating DP. The work leverages some recent advancements in the DP literature, particularly in DP stochastic gradient descent (SGD), to estimate LME model parameters with DP guarantees with better privacy-utility trade-offs. Theoretical results for an upper bound for the mean squared error between private parameter estimates vs the true parameters for DP-SGD-based approaches are provided, and a simulation study and a real-world case study provide further empirical evidence for the feasibility of the approaches at practically reasonable privacy budgets.

Third, this dissertation introduces SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure that sequentially combines DP data synthesis with a fairness-aware data transformation. Alongside privacy, the fairness of decisions made by a statistical learning model is also crucial to address, though the vast majority of existing literature treats the two concerns independently. For methods that do consider privacy and fairness simultaneously, they often only apply to a specific machine learning task, limiting their generalizability. SAFES allows full control over the privacy-fairness-utility trade-off via tunable privacy and fairness parameters. SAFES is illustrated by combining a graphical model-based DP data synthesizer with a popular fairness-aware data pre-processing transformation, and empirical evaluations on two popular benchmark datasets demonstrate that for reasonable privacy loss, SAFES-generated synthetic data achieve significantly improved fairness metrics with relatively low utility loss.

Search
Clear search
Close search
Google apps
Main menu