https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Collecting and utilizing data to understand population trends, make predictions, and guide decisions is becoming increasingly common in today's world. In particular, statistical learning allows users to infer relationships between variables, learn patterns, and predict outcomes for previously unseen data via concepts and techniques from statistics and machine learning. Although many of the results of this practice have been beneficial, the data used often contain sensitive information, such as medical records or financial information, so maintaining privacy is of paramount importance when releasing statistics, parameter estimates, and other results. Differential privacy (DP) is the state-of-the-art framework for guaranteeing privacy when releasing aggregate information and statistics from a dataset. It provides a provable bound on the incurred privacy loss via the injection of random noise, at the cost of a reduction in utility. While many works have been devoted to establishing DP guarantees for various analysis tools in the past two decades since DP's introduction, many popular statistical learning approaches still lack a DP counterpart. This dissertation addresses this issue in three original research topics, as listed below.
First, the dissertation presents the first differentially private algorithm for general weighted empirical risk minimization (wERM), along with theoretical DP guarantees. It evaluates the performance of the DP-wERM framework applied to outcome weighted learning (OWL), a method for learning individualized treatment rules, in both simulation studies and in a real clinical trial. The results demonstrate the feasibility of training OWL models via wERM with DP guarantees while maintaining sufficiently robust model performance.
Second, the dissertation presents several original approaches with proven DP guarantees for linear mixed-effects (LME) models. LME models are popular, especially among statisticians, but lack sufficient work on integrating DP. The work leverages some recent advancements in the DP literature, particularly in DP stochastic gradient descent (SGD), to estimate LME model parameters with DP guarantees with better privacy-utility trade-offs. Theoretical results for an upper bound for the mean squared error between private parameter estimates vs the true parameters for DP-SGD-based approaches are provided, and a simulation study and a real-world case study provide further empirical evidence for the feasibility of the approaches at practically reasonable privacy budgets.
Third, this dissertation introduces SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure that sequentially combines DP data synthesis with a fairness-aware data transformation. Alongside privacy, the fairness of decisions made by a statistical learning model is also crucial to address, though the vast majority of existing literature treats the two concerns independently. For methods that do consider privacy and fairness simultaneously, they often only apply to a specific machine learning task, limiting their generalizability. SAFES allows full control over the privacy-fairness-utility trade-off via tunable privacy and fairness parameters. SAFES is illustrated by combining a graphical model-based DP data synthesizer with a popular fairness-aware data pre-processing transformation, and empirical evaluations on two popular benchmark datasets demonstrate that for reasonable privacy loss, SAFES-generated synthetic data achieve significantly improved fairness metrics with relatively low utility loss.
https://data.gov.tw/licensehttps://data.gov.tw/license
In order to encourage academic and related research on gender equality education and improve the academic standards of the above-mentioned topics, the Ministry of Education has formulated the "Key Points for the Ministry of Education to Award Master's and Doctoral Thesis and Journal Papers on Gender Equality Education" for awards.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This metadata document provides details of the data used for the dissertation: “Improving Commercial Property Price Statistics”. The study explores data related and methodological challenges in the construction of price statistics for commercial real estate.
Short abstract of the dissertation
Since the financial crisis of 2008, National Statistical Institutes (NSIs) have worked to develop commercial real estate (CRE) indicators for official statistics. These indicators are considered essential in financial stability monitoring and may help contain the consequences of future crises or even prevent future crises. However, progress at NSIs to develop these indicators has been slow due to challenges like low observation numbers and high heterogeneity. This dissertation addresses these challenges by exploring data issues and suggesting methodological improvements.
The first three studies focus on data challenges regarding share deals and portfolio sales. Both are real estate trading constructions that are specific to CRE. The results show that share deals and portfolio sales significantly differ from the rest of the market. Therefore, under specific circumstances, CRE indicators could benefit from including these trading types. The final two studies focus on methodological challenges regarding index construction methods and the role of sustainability in real estate pricing. The results show that, by combining established techniques, it is possible to construct price indices that meet official statistics’ standards. Furthermore, the results uncover a complex relationship between sustainability and prices: while energy efficiency generally involves price premiums, others aspects like health and environment display a discount for low sustainable properties.
Overall, this dissertation contributes to the legislative framework that is currently being developed for EU countries to publish official statistics for commercial real estate and adds to the academic discussion by presenting innovative techniques for data analyses and index construction.
Data sources
The following data sources were used:
Processing methodology
Data restrictions
As part of the CBS law, sharing micro-data outside of the CBS-environment is prohibited. Furthermore, CBS manages the data, but in some cases other parties are still formal owners of the data. The 2 other parties are The Land Registry Office and WE consultancy. Ownership and intellectual property rights are managed in contracts with both owners. It was agreed upon that the data can only be used for the purpose of the PhD study and that the microdata will never be externally disseminated. The data is still owned by them and the intellectual property rights of the analyses belong to me. An intended use of the microdata should be approved by both Statistics Netherlands and the formal data owner. Because of the above, no data can be publicly shared.
If one intends to do research on these data, an application for data use can be requested at CBS. CBS will charge costs for anonymising the data and providing a closed environment to work with the data. More information on this can be found at: https://www.cbs.nl/en-gb/our-services/customised-services-microdata/microdata-conducting-your-own-research
Contact information
Author: Farley Ishaak
Statistics Netherlands | Henri Faasdreef 312 | P.O. Box 24500 | 2490 HA The Hague
TU Delft | Delft University of Technology | Faculty of Architecture and the Built Environment
Department of Management in the Built Environment | P.O. Box 5043 | 2600 GA Delft
M +31 6 46307974 | ff.ishaak@cbs.nl | f.f.ishaak@tudelft.nl
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This data is supplementary to the paper "AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments" .
https://www.myvisajobs.com/terms-of-service/https://www.myvisajobs.com/terms-of-service/
A dataset that explores Green Card sponsorship trends, salary data, and employer insights for psychometrics and statistics equiv to us phd in statistics in the U.S.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Availability of data, code, and plot creation for various figures throughout my PhD thesis. Rough organisation currently. Pertains to Figures 5.4, 5.8, 6.11, 6.18, 7.3, 7.12, and Table 6.1.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains in high resolution all graphical visualizations of data analysis provided in my doctoral dissertation. The graphs are organized according to chapters and subchapters and titeled respectively. Additionally, this dataset provides all dataframes (German, English, and Armenian) in XLSX format of the manual semantic annotation based on which the graphs are generated. Among presented graphical visualizations are (Multiple) Correspondence Analysis (MCA vs. CA), Mosaic-Plots, Conditional Infererence Trees (CIT), and Context-Conditional Correlations Graphs (CCCG).
We present the results of a quantitative assessment of research data produced and submitted with dissertations Special attention is paid to the size of the research data in appendices, to their presentation and link to the text, to their sources and typology, and to their potential for further research. The discussion puts the focus on legal aspects (database protection, intellectual property, privacy, third-party rights) and other barriers to data sharing, reuse and dissemination through open access. Another part adds insight into the potential handling of these data, in the framework of the French and Slovenian dissertation infrastructures. What could be done to valorize these data in a centralized system for electronic theses and dissertations (ETDs)? The topics are formats, metadata (including attribution of unique identifiers), submission/deposit, long-term preservation and dissemination. This part will also draw on experiences from other campuses and make use of results from surveys on data management at the Universities of Berlin and Lille.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset comprises five sets of data collected throughout the PhD Thesis project of Pelin Esnaf-Uslu.
Esnaf-Uslu, P. (2024). Design for Interpersonal Mood Regulation: Introducing a Framework and Three Tools to Support Mood-Sensitive Service Encounters. (Doctoral dissertation in review). Delft University of Technology, Delft, the Netherlands.
The research in this thesis is based on the premise that service providers can enhance their effectiveness in client interactions by acquiring a detailed understanding of IMR strategies and effectively applying this knowledge. To achieve this overall aim, the current research aimed to explore (1) the current role of mood in service encounters, (2) the IMR strategies used by service providers during service encounters in response to client’s moods, (3) how IMR strategies can be facilitated by means of tools for service providers and the (4) strengths and limitations of the developed materials.
This research was supported by VICI grant number 453-16-009 from The Netherlands Organization for Scientific Research (NWO), Division for the Social and Behavioral Sciences, awarded to Pieter M. A. Desmet.
The data is organized into folders corresponding to the chapters of the thesis. Each folder contains a README file with specific information about the dataset.
Chapter_2: This study investigates the role of mood in service encounters. Samples are collected from service providers experiences during service encounters and in-depth interviews are conducted. The dataset includes the blank diary and the interview protocol.
Chapter_3: This study investigates the clarity of the images developed representing Interpersonal Mood Regulation (IMR) strategies. The dataset includes anonymized scores from 27 and 29 participants, showing the associations between images representing nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants. Additionally, the dataset contains a screenshot of the workshop material used in the implementation study.
Chapter_4: This study examines the clarity of developed videos depicting IMR strategies. The dataset includes anonymized scores from 32 participants, showing the associations between videos depicting nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants. In addition, the dataset contains the workshop guideline developed for the implementation study.
Chapter_5: This study evaluates the clarity of character animations depicting Interpersonal Mood Regulation (IMR) strategies. The dataset includes anonymized scores from 39 participants, demonstrating the associations between videos illustrating nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants.
Chapter_6: This dataset comprises correspondence analysis files for each material, created for the purpose of comparison.
All the data is anonymized by removing the names of individuals and institutions.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
To identify relevant actors for the governance of co-produced forest nature's contributions to people (NCP) the researchers conducted a social-network analysis based on 39 semi-structured interviews with foresters and conservation managers. These interviews were conducted across three case study sites in Germany: Schorfheide-Chorin in the Northeast, Hainich-Dün in the Centre, and Schwäbische Alb in the Southwest. All three case study sites belong to the large-scale and long-term research platform Biodiversity Exploratories. The researchers employed a predefined coding set to analyse the interviews and grasp the relationships between different actors based on the anthropogenic capitals they used to co-produce forest nature's contributions to people (NCP). To secure the interviewees anonymity this coding cannot be published. Therefore, this data set is limited to this coding set.
The dataset associated with the PhD Thesis '', by Ellen L White. This dataset does not include any raw passive acoustic data, please contact me if you are interested in access to any of this data. The data repository contains the training and testing data used within the thesis to develop a CNN for muli-sound source detection, including all required scripts to replicate this work.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
My PhD thesis
Computational medical image analysis - With a focus on real-time fMRI and non-parametric statistics
https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf
The data and codes were prepared and uploaded to 4TU.ResearchData by Wassamon Phusakulkajorn to support the results in Chapter 5 (A Hybrid Neural Model Approach for Health Assessment of Transition Zones with Multiple Data) of her dissertation. This chapter has been submitted for publication as Phusakulkajorn, W., Unsiwilai, S., Chang, L., Núñez, A., Li, Z., A Hybrid Neural Model Approach for Health Assessment of Railway Transition Zones with Multiple Data Sources. In this research, we develop a framework that enables a more frequent evaluation of transition zone health by integrating multiple monitoring technologies, including track geometry measurements, interferometric synthetic aperture radar (InSAR), and axle box acceleration (ABA). This aims to improve an early detection capability for track irregularities. The data used in this research contain ABA, track geometry, InSAR measurements at transitions zone collected from a railway bridge between Dordrecht and Lage Zwaluwe station in the Netherlands. All implementations are done in MATLAB, where (.mat) files are analytical solutions and (.eps) and (.jpg) are figures used in the main manuscript.
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Graph Machine Learning (GML) has gained considerable attention in modeling complex graph-structured data, but many of them focus on collecting high-quality data (i.e., data-centric) and developing complex model architectures (i.e., model-centric). However, these two paradigms come with inherent limitations and challenges: data-centric approaches often demand intensive labor for tasks like data annotation and cleaning, while model-centric approaches usually require specialized expertise for model refinements. There remains a significant reservoir of unexplored potential in harnessing useful information that already exists in the data and learned by models, i.e., knowledge, as a directive force for learning.
In this dissertation, I introduce a new paradigm of machine learning on graphs: knowledge-centric. This paradigm seeks to leverage all available knowledge, which may come from data, models, or external sources, to facilitate an effective learning process. My research focuses on three different facets to obtain and leverage knowledge in GML, including learning knowledge from data, distilling knowledge from models, and encoding knowledge from external sources. By anchoring on the knowledge, there is a reduced reliance on massive data and intricate model architectures. In addition, knowledge can enhance GML models' performance, trustworthiness, and efficiency.
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
This thesis explores graph-based approaches for prediction and similarity analysis problems within networks and hypergraphs. While existing algorithms for link prediction in networks predominantly target the existence or weights of edges, our study expands the scope by delving into the prediction of both vertex and edge weights using metric geometry and machine learning approaches. Additionally, our investigation extends into weight prediction in higher-order networks, often referred to as hypergraphs. We propose a novel notion of neighborhood for hyperedges, utilizing the topological structures of hypergraphs and weights of hyperedges from a given training set. We construct metric spaces on the set of hyperedges based on the neighborhood information. Furthermore, we explore the practical application of graph similarity algorithms in DNA sequence analysis, introducing an accurate and computationally efficient approach to analyze the similarities among DNA sequences. Our proposed methods were tested on diverse real-world datasets and yielded promising results. The main implication of our research is offering a more comprehensive framework for prediction tasks in networks and hypergraphs, providing alternative avenues to gain a deeper understanding of the intricate relationships within complex networks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Uganda Road Fund Allocation Formula application 2014 and 2015
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Jordan Number of Enrolled PHD Students data was reported at 3,362.000 Person in 2017. This records an increase from the previous number of 3,276.000 Person for 2016. Jordan Number of Enrolled PHD Students data is updated yearly, averaging 1,892.000 Person from Jun 2002 (Median) to 2017, with 15 observations. The data reached an all-time high of 3,362.000 Person in 2017 and a record low of 682.000 Person in 2002. Jordan Number of Enrolled PHD Students data remains active status in CEIC and is reported by Ministry of Higher Education and Scientific Research. The data is categorized under Global Database’s Jordan – Table JO.G007: Education Statistics.
https://www.apache.org/licenses/LICENSE-2.0.htmlhttps://www.apache.org/licenses/LICENSE-2.0.html
This data set contains the raw data and analysis scripts to help reproduce the data presented in the thesis "Irreducible antifluorite electrolytes". The data set contains the data for Chapters 2,3,4, and 5 which are the chapters in which acquired data is presented. (Chapter 1 of the thesis is the Introduction, thus acquired data is shown in this chapter).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
New Thesis Data Sets is a dataset for object detection tasks - it contains Fruits Pineapple Mango Papaya annotations for 4,346 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Column names indicate parameter and resulting KPI values, row indizees indicate unique part number location combination
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Collecting and utilizing data to understand population trends, make predictions, and guide decisions is becoming increasingly common in today's world. In particular, statistical learning allows users to infer relationships between variables, learn patterns, and predict outcomes for previously unseen data via concepts and techniques from statistics and machine learning. Although many of the results of this practice have been beneficial, the data used often contain sensitive information, such as medical records or financial information, so maintaining privacy is of paramount importance when releasing statistics, parameter estimates, and other results. Differential privacy (DP) is the state-of-the-art framework for guaranteeing privacy when releasing aggregate information and statistics from a dataset. It provides a provable bound on the incurred privacy loss via the injection of random noise, at the cost of a reduction in utility. While many works have been devoted to establishing DP guarantees for various analysis tools in the past two decades since DP's introduction, many popular statistical learning approaches still lack a DP counterpart. This dissertation addresses this issue in three original research topics, as listed below.
First, the dissertation presents the first differentially private algorithm for general weighted empirical risk minimization (wERM), along with theoretical DP guarantees. It evaluates the performance of the DP-wERM framework applied to outcome weighted learning (OWL), a method for learning individualized treatment rules, in both simulation studies and in a real clinical trial. The results demonstrate the feasibility of training OWL models via wERM with DP guarantees while maintaining sufficiently robust model performance.
Second, the dissertation presents several original approaches with proven DP guarantees for linear mixed-effects (LME) models. LME models are popular, especially among statisticians, but lack sufficient work on integrating DP. The work leverages some recent advancements in the DP literature, particularly in DP stochastic gradient descent (SGD), to estimate LME model parameters with DP guarantees with better privacy-utility trade-offs. Theoretical results for an upper bound for the mean squared error between private parameter estimates vs the true parameters for DP-SGD-based approaches are provided, and a simulation study and a real-world case study provide further empirical evidence for the feasibility of the approaches at practically reasonable privacy budgets.
Third, this dissertation introduces SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure that sequentially combines DP data synthesis with a fairness-aware data transformation. Alongside privacy, the fairness of decisions made by a statistical learning model is also crucial to address, though the vast majority of existing literature treats the two concerns independently. For methods that do consider privacy and fairness simultaneously, they often only apply to a specific machine learning task, limiting their generalizability. SAFES allows full control over the privacy-fairness-utility trade-off via tunable privacy and fairness parameters. SAFES is illustrated by combining a graphical model-based DP data synthesizer with a popular fairness-aware data pre-processing transformation, and empirical evaluations on two popular benchmark datasets demonstrate that for reasonable privacy loss, SAFES-generated synthetic data achieve significantly improved fairness metrics with relatively low utility loss.