Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This data is supplementary to the paper "AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments" .
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This metadata document provides details of the data used for the dissertation: “Improving Commercial Property Price Statistics”. The study explores data related and methodological challenges in the construction of price statistics for commercial real estate.
Short abstract of the dissertation
Since the financial crisis of 2008, National Statistical Institutes (NSIs) have worked to develop commercial real estate (CRE) indicators for official statistics. These indicators are considered essential in financial stability monitoring and may help contain the consequences of future crises or even prevent future crises. However, progress at NSIs to develop these indicators has been slow due to challenges like low observation numbers and high heterogeneity. This dissertation addresses these challenges by exploring data issues and suggesting methodological improvements.
The first three studies focus on data challenges regarding share deals and portfolio sales. Both are real estate trading constructions that are specific to CRE. The results show that share deals and portfolio sales significantly differ from the rest of the market. Therefore, under specific circumstances, CRE indicators could benefit from including these trading types. The final two studies focus on methodological challenges regarding index construction methods and the role of sustainability in real estate pricing. The results show that, by combining established techniques, it is possible to construct price indices that meet official statistics’ standards. Furthermore, the results uncover a complex relationship between sustainability and prices: while energy efficiency generally involves price premiums, others aspects like health and environment display a discount for low sustainable properties.
Overall, this dissertation contributes to the legislative framework that is currently being developed for EU countries to publish official statistics for commercial real estate and adds to the academic discussion by presenting innovative techniques for data analyses and index construction.
Data sources
The following data sources were used:
Processing methodology
Data restrictions
As part of the CBS law, sharing micro-data outside of the CBS-environment is prohibited. Furthermore, CBS manages the data, but in some cases other parties are still formal owners of the data. The 2 other parties are The Land Registry Office and WE consultancy. Ownership and intellectual property rights are managed in contracts with both owners. It was agreed upon that the data can only be used for the purpose of the PhD study and that the microdata will never be externally disseminated. The data is still owned by them and the intellectual property rights of the analyses belong to me. An intended use of the microdata should be approved by both Statistics Netherlands and the formal data owner. Because of the above, no data can be publicly shared.
If one intends to do research on these data, an application for data use can be requested at CBS. CBS will charge costs for anonymising the data and providing a closed environment to work with the data. More information on this can be found at: https://www.cbs.nl/en-gb/our-services/customised-services-microdata/microdata-conducting-your-own-research
Contact information
Author: Farley Ishaak
Statistics Netherlands | Henri Faasdreef 312 | P.O. Box 24500 | 2490 HA The Hague
TU Delft | Delft University of Technology | Faculty of Architecture and the Built Environment
Department of Management in the Built Environment | P.O. Box 5043 | 2600 GA Delft
M +31 6 46307974 | ff.ishaak@cbs.nl | f.f.ishaak@tudelft.nl
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Collecting and utilizing data to understand population trends, make predictions, and guide decisions is becoming increasingly common in today's world. In particular, statistical learning allows users to infer relationships between variables, learn patterns, and predict outcomes for previously unseen data via concepts and techniques from statistics and machine learning. Although many of the results of this practice have been beneficial, the data used often contain sensitive information, such as medical records or financial information, so maintaining privacy is of paramount importance when releasing statistics, parameter estimates, and other results. Differential privacy (DP) is the state-of-the-art framework for guaranteeing privacy when releasing aggregate information and statistics from a dataset. It provides a provable bound on the incurred privacy loss via the injection of random noise, at the cost of a reduction in utility. While many works have been devoted to establishing DP guarantees for various analysis tools in the past two decades since DP's introduction, many popular statistical learning approaches still lack a DP counterpart. This dissertation addresses this issue in three original research topics, as listed below.
First, the dissertation presents the first differentially private algorithm for general weighted empirical risk minimization (wERM), along with theoretical DP guarantees. It evaluates the performance of the DP-wERM framework applied to outcome weighted learning (OWL), a method for learning individualized treatment rules, in both simulation studies and in a real clinical trial. The results demonstrate the feasibility of training OWL models via wERM with DP guarantees while maintaining sufficiently robust model performance.
Second, the dissertation presents several original approaches with proven DP guarantees for linear mixed-effects (LME) models. LME models are popular, especially among statisticians, but lack sufficient work on integrating DP. The work leverages some recent advancements in the DP literature, particularly in DP stochastic gradient descent (SGD), to estimate LME model parameters with DP guarantees with better privacy-utility trade-offs. Theoretical results for an upper bound for the mean squared error between private parameter estimates vs the true parameters for DP-SGD-based approaches are provided, and a simulation study and a real-world case study provide further empirical evidence for the feasibility of the approaches at practically reasonable privacy budgets.
Third, this dissertation introduces SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure that sequentially combines DP data synthesis with a fairness-aware data transformation. Alongside privacy, the fairness of decisions made by a statistical learning model is also crucial to address, though the vast majority of existing literature treats the two concerns independently. For methods that do consider privacy and fairness simultaneously, they often only apply to a specific machine learning task, limiting their generalizability. SAFES allows full control over the privacy-fairness-utility trade-off via tunable privacy and fairness parameters. SAFES is illustrated by combining a graphical model-based DP data synthesizer with a popular fairness-aware data pre-processing transformation, and empirical evaluations on two popular benchmark datasets demonstrate that for reasonable privacy loss, SAFES-generated synthetic data achieve significantly improved fairness metrics with relatively low utility loss.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is Successful dissertations : the complete guide for education, childhood and early childhood studies students. It features 7 columns including author, publication date, language, and book publisher.
https://data.gov.tw/licensehttps://data.gov.tw/license
In order to encourage academic and related research on gender equality education and improve the academic standards of the above-mentioned topics, the Ministry of Education has formulated the "Key Points for the Ministry of Education to Award Master's and Doctoral Thesis and Journal Papers on Gender Equality Education" for awards.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This folder contains (references to) research data related to the dissertation entitled "Automated Layout Generation and Design Rationale Capture to Support Early-Stage Complex Ship Design" by Joan le Poole. Specifically, the data underlying the layout generation case study 4, as well as the data underlying case study 7 on design rationale is provided. In addition, the references to the earlier published research data underlying case studies 1-3 and 5-6 are given. The earlier published data sets as well as this file comprise all data underlying the dissertation mentioned above.
This research investigates community exposure to coastal climate hazards in Alaska's North Slope and incorporates community assessment of the potential effects on loss of land, infrastructure, and other assets. This analysis will inform response strategies and planning by developing new methods of hazard assessment that can support community resilience in the North Slope and potentially serve as a model for advancing assessment and planning in other rural and urban communities. This research will expand traditional assessments of financial exposure to also include non-material factors such as values and priorities of diverse social groups within a community including a diverse set of stakeholders, ranging from multinational oil companies to individual subsistence hunters. This study surveys community views of asset importance and integrates results with a geophysical hazard data model for a coproduced community exposure map of the North Slope coast. This research will contribute to understanding the human and social dimensions of climate change impacts, including how social, economic, political, and cultural factors shape vulnerabilities and condition response strategies. Methods and findings could enhance nation-wide efforts in the United States to map community exposure to coastal climate hazards by demonstrating methods for, and the importance of systematically incorporating non-market values in exposure analysis.
The objectives of the proposed research include adapting the U.S. Geological Survey's (USGS) coastal vulnerability index (CVI) to the Arctic context, and integrating results with formal asset databases and a spatial community landscape value model while working with affected communities during the process to coproduce exposure maps. Specifically, working with North Slope Alaskan communities the study will incorporate wind fetch (i.e., the open water distance over which wind can generate near shore waves, determined by sea ice extent) into the CVI and get community feedback on the results. In addition to community input on the CVI maps, coproducing the exposure maps includes the community assigning values to traditional land use places using existing spatial datasets and mapping and investigating specific sites threatened by coastal hazards with the aim to learn why exposed assets threaten the community.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Text files:
readme_general.txt contains a brief description of files included.
readme_modeldata.txt contains a metadata description of model_data.csv.
readme_languageslandNorthAmerica.txt contains a metadata description of Languages_land_NorthAmerica.csv.
readme_languagerevitalizationdatabase.txt contains a metadata description of Language_revitalization_database.csv.
CSV files:
Languages_land_NorthAmerica.csv is a version of the Languages of Government-Recognized Native Land Areas in the Continental United States database. It includes data from the US Census 2017 TIGER/Line AIANNH shapefile with one row per Native land area and additional columns for associated information that was coded and calculated for this dissertation as discussed in Section 3.3.1.
Language_revitalization_database.csv is the Language Revitalization Database. It contains the master language list used for this dissertation and columns created while coding data for the language revitalization variable, as discussed in Section 3.3.2.
model_data.csv contains data for all variables used in the analysis and is the .csv file needed to run LanguageVitalityModels.R.
R scripts:
LanguageVitalityModels.R is the R script for the main part of the dissertation analysis.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset comprises five sets of data collected throughout the PhD Thesis project of Pelin Esnaf-Uslu.
Esnaf-Uslu, P. (2024). Design for Interpersonal Mood Regulation: Introducing a Framework and Three Tools to Support Mood-Sensitive Service Encounters. (Doctoral dissertation in review). Delft University of Technology, Delft, the Netherlands.
The research in this thesis is based on the premise that service providers can enhance their effectiveness in client interactions by acquiring a detailed understanding of IMR strategies and effectively applying this knowledge. To achieve this overall aim, the current research aimed to explore (1) the current role of mood in service encounters, (2) the IMR strategies used by service providers during service encounters in response to client’s moods, (3) how IMR strategies can be facilitated by means of tools for service providers and the (4) strengths and limitations of the developed materials.
This research was supported by VICI grant number 453-16-009 from The Netherlands Organization for Scientific Research (NWO), Division for the Social and Behavioral Sciences, awarded to Pieter M. A. Desmet.
The data is organized into folders corresponding to the chapters of the thesis. Each folder contains a README file with specific information about the dataset.
Chapter_2: This study investigates the role of mood in service encounters. Samples are collected from service providers experiences during service encounters and in-depth interviews are conducted. The dataset includes the blank diary and the interview protocol.
Chapter_3: This study investigates the clarity of the images developed representing Interpersonal Mood Regulation (IMR) strategies. The dataset includes anonymized scores from 27 and 29 participants, showing the associations between images representing nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants. Additionally, the dataset contains a screenshot of the workshop material used in the implementation study.
Chapter_4: This study examines the clarity of developed videos depicting IMR strategies. The dataset includes anonymized scores from 32 participants, showing the associations between videos depicting nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants. In addition, the dataset contains the workshop guideline developed for the implementation study.
Chapter_5: This study evaluates the clarity of character animations depicting Interpersonal Mood Regulation (IMR) strategies. The dataset includes anonymized scores from 39 participants, demonstrating the associations between videos illustrating nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants.
Chapter_6: This dataset comprises correspondence analysis files for each material, created for the purpose of comparison.
All the data is anonymized by removing the names of individuals and institutions.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This data is supplementary to the paper:
Manika Lamba, You Peng, Sophie Nikolov, and J. Stephen Downie. 2024. AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments. In The 2024 ACM/IEEE Joint Conference on Digital Libraries (JCDL ’24), December 2024, Hong Kong, China. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3677389.3702594
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data used in the publication "Identifying publications of cumulative dissertation theses by bilingual text similarity. Evaluation of similarity methods on a new short text task". It included bibliographical data for German PhD theses (dissertations) and associated publications for cumulative dissertations. Not included is content from Elsevier's Scopus database used in the study, except item identifiers. Users with access to the data can use these for matching.
File diss_data.csv contains bibliographic data of dissertation theses obtained from German National Library and cleaned and postprocessed The columns are: REQUIZ_NORM_ID: Identifier for the thesis TITLE: Cleaned thesis title HEADING: Descriptor terms (German) AUTO_LANG: Language, either from original record or automatically derived from title
File ground_truth_pub_metadata.csv contains bibliographic data for identified consitutive publications of theses. If columns 2 to 7 are empty, the thesis did not include any publications ("stand-alone" or monograph thesis).
The columns are: REQUIZ_NORM_ID: Identifier for the thesis, for matching with the data in file SCOPUS_ID: Scopus ID for the identified publication AUTORS: Author names of the publication as in the original thesis citation YEAR: Publication year of the publication as in the original thesis citation TITLE: Publication title as in the original thesis citation SOURCETITLE: Source title as in the original thesis citation PAGES: Page information of the publication as in the original thesis citation
Scopus identifiers are published with permission by Elsevier.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These data stem from my dissertation project, Framing the Law: Judges and Jury Instructions. This is an original dataset for federal criminal jury trial from January 1, 2015-December 31, 2018. These data come from 23 Federal Districts, and code 51 different variables.
https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf
The data and codes were prepared and uploaded to 4TU.ResearchData by Wassamon Phusakulkajorn to support the results in Chapter 5 (A Hybrid Neural Model Approach for Health Assessment of Transition Zones with Multiple Data) of her dissertation. This chapter has been submitted for publication as Phusakulkajorn, W., Unsiwilai, S., Chang, L., Núñez, A., Li, Z., A Hybrid Neural Model Approach for Health Assessment of Railway Transition Zones with Multiple Data Sources. In this research, we develop a framework that enables a more frequent evaluation of transition zone health by integrating multiple monitoring technologies, including track geometry measurements, interferometric synthetic aperture radar (InSAR), and axle box acceleration (ABA). This aims to improve an early detection capability for track irregularities. The data used in this research contain ABA, track geometry, InSAR measurements at transitions zone collected from a railway bridge between Dordrecht and Lage Zwaluwe station in the Netherlands. All implementations are done in MATLAB, where (.mat) files are analytical solutions and (.eps) and (.jpg) are figures used in the main manuscript.
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Numerous types of real-world data can be naturally represented as graphs, such as social networks, trading networks, and biological molecules. This highlights the need for effective graph representations to support various tasks. In recent years, graph neural networks (GNNs) have demonstrated remarkable success in extracting information from graphs and enabling graph-related tasks. However, they still face a series of challenges in solving real-world problems, including scarcity of labeled data, scalability issues, potential bias, etc. These challenges stem from both domain-specific issues and inherent limitations of GNNs. This thesis introduces various strategies to tackle these challenges and empower GNNs on real-world tasks.
For the domain-specific challenges, in this thesis, we especially focus on challenges in the chemistry domain, which plays a pivotal role in the drug discovery process. Considering the significant resources needed for labeling through wet lab experiments, the AI for chemistry domain struggles with the scarcity of labeled datasets. To address this, we present a comprehensive set of strategies that span model-based and data-based strategies alongside a hybrid method. These methods ingeniously utilize the diversity of data, models, and molecular representations to compensate for the lack of labels in individual datasets. For the inherent challenges, this thesis introduces strategies to overcome two main challenges: scalability and degree-based issues, especially in the context of link prediction tasks. Both of these two challenges originate from the mechanism of GNNs, which involves the iterative aggregation of neighboring nodes' information to update each central node. For the scalability issue, our work not only preserves GNNs' prediction performance but also significantly boosts inference speed. Regarding degree bias, our work highly improves the effectiveness of GNNs for underrepresented nodes with very light additional computational costs. These contributions not only address critical gaps in applying GNNs to specific domains but also lay the groundwork for future exploration in the broader field of graph-based real-world tasks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A collection of data used within a dissertation submitted to the London School of Economics.The data concerns the occurrences of three disarmament-related words across approximately four years in seven news publications
Data for replication of qualitative and quantitative methods in Dissertation of author Margarete Schweizerhof
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
This dissertation presents multiple novel methodological advancements in the realm of machine learning (ML) for spatio-temporal data applications. Traditional machine learning approaches typically have difficultly producing both accurate point predictions and adequate uncertainty quantification for these data, especially in instances where the data themselves are sampled at a fine temporal scale. This is due to the fact that inference on these complex ML models is notably difficult and can impose a significant computational burden. The challenge of forecasting spatio-temporal data is further heightened when attempting to ensure the forecast themselves obey any known physical laws which dictate or influence the underlying data structure.
We explore the current challenges in properly quantifying the uncertainty of forecasts for spatio-temporal data applications stemming from contemporary ML models. Methods are introduced to not only calibrate the uncertainty estimates such that proper coverage is achieved but also so there is a realistic expansion of the uncertainty through time. These contemporary ML models are also adapted such that the physical processes present throughout that data are used to inform the learning procedures, so that the forecasts themselves are influenced to be more physically compliant. We demonstrate the power in combining ML models in an ensemble to improve model accuracy in predicting nonstationary, complex temporal data. Finally, a general comparison is made to explore the benefits and drawbacks of ML approaches to time-series forecasting versus the popular and standard statistical approaches, and as a guide to explain how these newfound advanced ML modelling techniques are not necessarily meant to act as a universal best approach for prediction and forecasting.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an archive for the data used for my MSc dissertation, where I assessed the accuracy of the Ohtsuki-Nowak approximation for games on graphs. The datasets that do not begin with "NEWEST" are the ones used in that analysis undertaken for the project.
The data that doesn't have "conjoined" in its name contains fixation probabilities from Moran processes on graphs and in well-mixed populations, and also steady states of the replicator equation on graphs. The datasets that do have "conjoined" in their name contain fixation probabilities from Moran processes on a number of different types of graphs, which was used to test an extension of Ohtsuki and Nowak's work onto graphs made from conjoining two regular ones.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Undergraduate dissertation project on understanding the habitat preferences of a reintroduced population of European water voles
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We present the results of a quantitative assessment of research data produced and submitted with dissertations Special attention is paid to the size of the research data in appendices, to their presentation and link to the text, to their sources and typology, and to their potential for further research. The discussion puts the focus on legal aspects (database protection, intellectual property, privacy, third-party rights) and other barriers to data sharing, reuse and dissemination through open access.Another part adds insight into the potential handling of these data, in the framework of the French and Slovenian dissertation infrastructures. What could be done to valorize these data in a centralized system for electronic theses and dissertations (ETDs)? The topics are formats, metadata (including attribution of unique identifiers), submission/deposit, long-term preservation and dissemination. This part will also draw on experiences from other campuses and make use of results from surveys on data management at the Universities of Berlin and Lille.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This data is supplementary to the paper "AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments" .