100+ datasets found

Data from: AckSent: Human Annotated Dataset of Support and Sentiments in...
zenodo.org
Updated Nov 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Author; Author (2024). AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments [Dataset]. http://doi.org/10.5281/zenodo.13283331
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.13283331
Dataset updated
Nov 5, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Author; Author
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Time period covered
Aug 10, 2024
Description
This data is supplementary to the paper "AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments" .
4
Metadata for the dissertation: Improving Commercial Property Price...
data.4tu.nl
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farley Ishaak (2024). Metadata for the dissertation: Improving Commercial Property Price Statistics [Dataset]. http://doi.org/10.4121/cab0cf0e-668f-46db-82bb-94abe78faeb0.v1
Explore at:
Unique identifier
https://doi.org/10.4121/cab0cf0e-668f-46db-82bb-94abe78faeb0.v1
Dataset updated
Nov 25, 2024
Dataset provided by
4TU.ResearchData
Authors
Farley Ishaak
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2008 - 2023
Area covered
Netherlands
Description
This metadata document provides details of the data used for the dissertation: “Improving Commercial Property Price Statistics”. The study explores data related and methodological challenges in the construction of price statistics for commercial real estate.

Short abstract of the dissertation
Since the financial crisis of 2008, National Statistical Institutes (NSIs) have worked to develop commercial real estate (CRE) indicators for official statistics. These indicators are considered essential in financial stability monitoring and may help contain the consequences of future crises or even prevent future crises. However, progress at NSIs to develop these indicators has been slow due to challenges like low observation numbers and high heterogeneity. This dissertation addresses these challenges by exploring data issues and suggesting methodological improvements.

The first three studies focus on data challenges regarding share deals and portfolio sales. Both are real estate trading constructions that are specific to CRE. The results show that share deals and portfolio sales significantly differ from the rest of the market. Therefore, under specific circumstances, CRE indicators could benefit from including these trading types. The final two studies focus on methodological challenges regarding index construction methods and the role of sustainability in real estate pricing. The results show that, by combining established techniques, it is possible to construct price indices that meet official statistics’ standards. Furthermore, the results uncover a complex relationship between sustainability and prices: while energy efficiency generally involves price premiums, others aspects like health and environment display a discount for low sustainable properties.

Overall, this dissertation contributes to the legislative framework that is currently being developed for EU countries to publish official statistics for commercial real estate and adds to the academic discussion by presenting innovative techniques for data analyses and index construction.

Data sources
The following data sources were used:
Bussiness Register (Statistics Netherlands)
Transactions linked to the Register of Adresses and Buildings (BAG)
Linking table buildings and companies (Dutch Land Registry Office)
Property Transfer Tax data (Dutch Tax Authorities)
Building sustainability scores (W/E advisors)Commercial real estate transactions (Dutch Land Registry Office)
Commercial real estate transactions (Dutch Land Registry Office)

Processing methodology
The data is originally stored in an SQL database and is processed with SQL and R code (version 4.2). In the code, the name of the table is tbl_SPE_2_ABR_Bedrijfsinfo. The data is used for deriving company transfers by comparing ownership states of various periods. The first period that an ownership differs of the same company indicates an ownership transfer.
The data is originally stored in an SQL database and is processed with SQL and R code (version 4.2). In the code, the name of the table is tbl_SPE_6_ABR_CompleetMicro. The data is used for calcuting the size of real estate share deals and estimating price developments by applying appropriate filters and counting the output.
The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is SPE_KADASTER. The data is used for finding real estate information that corresponds to company transfers by linking the company register (ABR) to the real estate register (BAG).
The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is tbl_SPE_3_OVB_Bedrijfsinfo. The data is used for deriving real estate share deals by linking this table (Kadaster) to the real estate register (BAG).
The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is duurzaamheid_input_regressie2. The data is used for finding the relationship between sustainabilty measures and real estate transaction prices by linking sustainabilty scores from a consultancy (WE) to transaction prices (Cadastre) and running regression analyses.
The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is tbl_OV20_pand. The data is used for 4 purposes (separate studies).
(1) Chapter 3: Determining the price effect of portfolio sale by running regression analyses
(2) Chapter 4: Developing methods to include portfolio sales in CPPI calcutions by using auxilary data of the real estate properties.
(3) Chapter 5: Developing a price index method for small domains by using these data to test the outcomes
(4) Chapter 6: Determining the relationship between sustatinability by running regression analyses

Data restrictions
As part of the CBS law, sharing micro-data outside of the CBS-environment is prohibited. Furthermore, CBS manages the data, but in some cases other parties are still formal owners of the data. The 2 other parties are The Land Registry Office and WE consultancy. Ownership and intellectual property rights are managed in contracts with both owners. It was agreed upon that the data can only be used for the purpose of the PhD study and that the microdata will never be externally disseminated. The data is still owned by them and the intellectual property rights of the analyses belong to me. An intended use of the microdata should be approved by both Statistics Netherlands and the formal data owner. Because of the above, no data can be publicly shared.

If one intends to do research on these data, an application for data use can be requested at CBS. CBS will charge costs for anonymising the data and providing a closed environment to work with the data. More information on this can be found at: https://www.cbs.nl/en-gb/our-services/customised-services-microdata/microdata-conducting-your-own-research

Contact information
Author: Farley Ishaak
Statistics Netherlands | Henri Faasdreef 312 | P.O. Box 24500 | 2490 HA The Hague
TU Delft | Delft University of Technology | Faculty of Architecture and the Built Environment
Department of Management in the Built Environment | P.O. Box 5043 | 2600 GA Delft
M +31 6 46307974 | ff.ishaak@cbs.nl | f.f.ishaak@tudelft.nl
n
Advanced Topics in Differentially Private Statistical Learning
curate.nd.edu
pdf
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Spencer Tate Giddens (2025). Advanced Topics in Differentially Private Statistical Learning [Dataset]. http://doi.org/10.7274/29498438.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.7274/29498438.v1
Dataset updated
Jul 14, 2025
Dataset provided by
University of Notre Dame
Authors
Spencer Tate Giddens
License
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Description
Collecting and utilizing data to understand population trends, make predictions, and guide decisions is becoming increasingly common in today's world. In particular, statistical learning allows users to infer relationships between variables, learn patterns, and predict outcomes for previously unseen data via concepts and techniques from statistics and machine learning. Although many of the results of this practice have been beneficial, the data used often contain sensitive information, such as medical records or financial information, so maintaining privacy is of paramount importance when releasing statistics, parameter estimates, and other results. Differential privacy (DP) is the state-of-the-art framework for guaranteeing privacy when releasing aggregate information and statistics from a dataset. It provides a provable bound on the incurred privacy loss via the injection of random noise, at the cost of a reduction in utility. While many works have been devoted to establishing DP guarantees for various analysis tools in the past two decades since DP's introduction, many popular statistical learning approaches still lack a DP counterpart. This dissertation addresses this issue in three original research topics, as listed below.

First, the dissertation presents the first differentially private algorithm for general weighted empirical risk minimization (wERM), along with theoretical DP guarantees. It evaluates the performance of the DP-wERM framework applied to outcome weighted learning (OWL), a method for learning individualized treatment rules, in both simulation studies and in a real clinical trial. The results demonstrate the feasibility of training OWL models via wERM with DP guarantees while maintaining sufficiently robust model performance.

Second, the dissertation presents several original approaches with proven DP guarantees for linear mixed-effects (LME) models. LME models are popular, especially among statisticians, but lack sufficient work on integrating DP. The work leverages some recent advancements in the DP literature, particularly in DP stochastic gradient descent (SGD), to estimate LME model parameters with DP guarantees with better privacy-utility trade-offs. Theoretical results for an upper bound for the mean squared error between private parameter estimates vs the true parameters for DP-SGD-based approaches are provided, and a simulation study and a real-world case study provide further empirical evidence for the feasibility of the approaches at practically reasonable privacy budgets.

Third, this dissertation introduces SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure that sequentially combines DP data synthesis with a fairness-aware data transformation. Alongside privacy, the fairness of decisions made by a statistical learning model is also crucial to address, though the vast majority of existing literature treats the two concerns independently. For methods that do consider privacy and fairness simultaneously, they often only apply to a specific machine learning task, limiting their generalizability. SAFES allows full control over the privacy-fairness-utility trade-off via tunable privacy and fairness parameters. SAFES is illustrated by combining a graphical model-based DP data synthesizer with a popular fairness-aware data pre-processing transformation, and empirical evaluations on two popular benchmark datasets demonstrate that for reasonable privacy loss, SAFES-generated synthetic data achieve significantly improved fairness metrics with relatively low utility loss.
w
Dataset of books called Successful dissertations : the complete guide for...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Successful dissertations : the complete guide for education, childhood and early childhood studies students [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Successful+dissertations+%3A+the+complete+guide+for+education%2C+childhood+and+early+childhood+studies+students
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is Successful dissertations : the complete guide for education, childhood and early childhood studies students. It features 7 columns including author, publication date, language, and book publisher.
d
Statistics on the number of scholarships for masters and doctoral...
data.gov.tw
csv
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Student Affairs and Special Education (2025). Statistics on the number of scholarships for masters and doctoral dissertations and journal papers in gender equality education [Dataset]. https://data.gov.tw/en/datasets/159100
Explore at:
csvAvailable download formats
Dataset updated
Jun 1, 2025
Dataset authored and provided by
Department of Student Affairs and Special Education
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
In order to encourage academic and related research on gender equality education and improve the academic standards of the above-mentioned topics, the Ministry of Education has formulated the "Key Points for the Ministry of Education to Award Master's and Doctoral Thesis and Journal Papers on Gender Equality Education" for awards.
4
Data underlying the PhD dissertation: Automated Layout Generation and Design...
data.4tu.nl
zip
Updated Feb 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joan le Poole (2024). Data underlying the PhD dissertation: Automated Layout Generation and Design Rationale Capture to Support Early-Stage Complex Ship Design [Dataset]. http://doi.org/10.4121/e3eb2bab-8e28-4477-a34f-ba7c94d0d80b.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/e3eb2bab-8e28-4477-a34f-ba7c94d0d80b.v1
Dataset updated
Feb 5, 2024
Dataset provided by
4TU.ResearchData
Authors
Joan le Poole
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
This folder contains (references to) research data related to the dissertation entitled "Automated Layout Generation and Design Rationale Capture to Support Early-Stage Complex Ship Design" by Joan le Poole. Specifically, the data underlying the layout generation case study 4, as well as the data underlying case study 7 on design rationale is provided. In addition, the references to the earlier published research data underlying case studies 1-3 and 5-6 are given. The earlier published data sets as well as this file comprise all data underlying the dissertation mentioned above.
a
Data from: Doctoral Dissertation Research: Mapping Community Exposure to...
arcticdata.io
search.dataone.org
+1more
Updated Apr 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Brady (2022). Doctoral Dissertation Research: Mapping Community Exposure to Coastal Climate Hazards in the Arctic: A Case Study in Alaska's North Slope [Dataset]. http://doi.org/10.18739/A28G8FJ8X
Explore at:
Unique identifier
https://doi.org/10.18739/A28G8FJ8X
Dataset updated
Apr 11, 2022
Dataset provided by
Arctic Data Center
Authors
Michael Brady
Time period covered
Oct 1, 2015 - Sep 30, 2016
Area covered

Description
This research investigates community exposure to coastal climate hazards in Alaska's North Slope and incorporates community assessment of the potential effects on loss of land, infrastructure, and other assets. This analysis will inform response strategies and planning by developing new methods of hazard assessment that can support community resilience in the North Slope and potentially serve as a model for advancing assessment and planning in other rural and urban communities. This research will expand traditional assessments of financial exposure to also include non-material factors such as values and priorities of diverse social groups within a community including a diverse set of stakeholders, ranging from multinational oil companies to individual subsistence hunters. This study surveys community views of asset importance and integrates results with a geophysical hazard data model for a coproduced community exposure map of the North Slope coast. This research will contribute to understanding the human and social dimensions of climate change impacts, including how social, economic, political, and cultural factors shape vulnerabilities and condition response strategies. Methods and findings could enhance nation-wide efforts in the United States to map community exposure to coastal climate hazards by demonstrating methods for, and the importance of systematically incorporating non-market values in exposure analysis.

The objectives of the proposed research include adapting the U.S. Geological Survey's (USGS) coastal vulnerability index (CVI) to the Arctic context, and integrating results with formal asset databases and a spatial community landscape value model while working with affected communities during the process to coproduce exposure maps. Specifically, working with North Slope Alaskan communities the study will incorporate wind fetch (i.e., the open water distance over which wind can generate near shore waves, determined by sea ice extent) into the CVI and get community feedback on the results. In addition to community input on the CVI maps, coproducing the exposure maps includes the community assigning values to traditional land use places using existing spatial datasets and mapping and investigating specific sites threatened by coastal hazards with the aim to learn why exposed assets threaten the community.
Z
Data to accompany dissertation: Geographic, Cultural, and Ecological...
data.niaid.nih.gov
Updated Sep 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Helgeson, Kirsten (2022). Data to accompany dissertation: Geographic, Cultural, and Ecological Correlations with Indigenous Language Vitality in North America [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6982147
Explore at:
Dataset updated
Sep 1, 2022
Dataset authored and provided by
Helgeson, Kirsten
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
North America
Description
Text files:

readme_general.txt contains a brief description of files included.

readme_modeldata.txt contains a metadata description of model_data.csv.

readme_languageslandNorthAmerica.txt contains a metadata description of Languages_land_NorthAmerica.csv.

readme_languagerevitalizationdatabase.txt contains a metadata description of Language_revitalization_database.csv.

CSV files:

Languages_land_NorthAmerica.csv is a version of the Languages of Government-Recognized Native Land Areas in the Continental United States database. It includes data from the US Census 2017 TIGER/Line AIANNH shapefile with one row per Native land area and additional columns for associated information that was coded and calculated for this dissertation as discussed in Section 3.3.1.

Language_revitalization_database.csv is the Language Revitalization Database. It contains the master language list used for this dissertation and columns created while coding data for the language revitalization variable, as discussed in Section 3.3.2.

model_data.csv contains data for all variables used in the analysis and is the .csv file needed to run LanguageVitalityModels.R.

R scripts:

LanguageVitalityModels.R is the R script for the main part of the dissertation analysis.
4
Supplementary data files for the PhD thesis "Design for Interpersonal Mood...
data.4tu.nl
zip
Updated Jun 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pelin Esnaf-Uslu; Pieter M. A. Desmet; Rick Schifferstein (2024). Supplementary data files for the PhD thesis "Design for Interpersonal Mood Regulation: Introducing a Framework and Three Tools to Support Mood-Sensitive Service Encounters" [Dataset]. http://doi.org/10.4121/8a9b21b2-6411-42ed-a0e4-05be50fc5a69.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/8a9b21b2-6411-42ed-a0e4-05be50fc5a69.v1
Dataset updated
Jun 14, 2024
Dataset provided by
4TU.ResearchData
Authors
Pelin Esnaf-Uslu; Pieter M. A. Desmet; Rick Schifferstein
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Dataset funded by
The Netherlands Organization for Scientific Research (NWO), Division for the Social and Behavioural Sciences
Description
This dataset comprises five sets of data collected throughout the PhD Thesis project of Pelin Esnaf-Uslu.

Esnaf-Uslu, P. (2024). Design for Interpersonal Mood Regulation: Introducing a Framework and Three Tools to Support Mood-Sensitive Service Encounters. (Doctoral dissertation in review). Delft University of Technology, Delft, the Netherlands.

The research in this thesis is based on the premise that service providers can enhance their effectiveness in client interactions by acquiring a detailed understanding of IMR strategies and effectively applying this knowledge. To achieve this overall aim, the current research aimed to explore (1) the current role of mood in service encounters, (2) the IMR strategies used by service providers during service encounters in response to client’s moods, (3) how IMR strategies can be facilitated by means of tools for service providers and the (4) strengths and limitations of the developed materials.

This research was supported by VICI grant number 453-16-009 from The Netherlands Organization for Scientific Research (NWO), Division for the Social and Behavioral Sciences, awarded to Pieter M. A. Desmet.

The data is organized into folders corresponding to the chapters of the thesis. Each folder contains a README file with specific information about the dataset.

Chapter_2: This study investigates the role of mood in service encounters. Samples are collected from service providers experiences during service encounters and in-depth interviews are conducted. The dataset includes the blank diary and the interview protocol.

Chapter_3: This study investigates the clarity of the images developed representing Interpersonal Mood Regulation (IMR) strategies. The dataset includes anonymized scores from 27 and 29 participants, showing the associations between images representing nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants. Additionally, the dataset contains a screenshot of the workshop material used in the implementation study.

Chapter_4: This study examines the clarity of developed videos depicting IMR strategies. The dataset includes anonymized scores from 32 participants, showing the associations between videos depicting nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants. In addition, the dataset contains the workshop guideline developed for the implementation study.

Chapter_5: This study evaluates the clarity of character animations depicting Interpersonal Mood Regulation (IMR) strategies. The dataset includes anonymized scores from 39 participants, demonstrating the associations between videos illustrating nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants.

Chapter_6: This dataset comprises correspondence analysis files for each material, created for the purpose of comparison.

All the data is anonymized by removing the names of individuals and institutions.
Data from: AckSent: Human Annotated Dataset of Support and Sentiments in...
zenodo.org
csv, pdf
Updated Dec 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manika Lamba; Manika Lamba; You Peng; You Peng; Sophie Nikolov; Sophie Nikolov; John Stephen Downie; John Stephen Downie (2024). AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments [Dataset]. http://doi.org/10.5281/zenodo.14509104
Explore at:
csv, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14509104
Dataset updated
Dec 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Manika Lamba; Manika Lamba; You Peng; You Peng; Sophie Nikolov; Sophie Nikolov; John Stephen Downie; John Stephen Downie
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Time period covered
2024
Description
This data is supplementary to the paper:

Manika Lamba, You Peng, Sophie Nikolov, and J. Stephen Downie. 2024. AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments. In The 2024 ACM/IEEE Joint Conference on Digital Libraries (JCDL ’24), December 2024, Hong Kong, China. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3677389.3702594
Z
Ground truth data for "Identifying publications of cumulative dissertation...
data.niaid.nih.gov
Updated May 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donner, Paul (2021). Ground truth data for "Identifying publications of cumulative dissertation theses by bilingual text similarity" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4733849
Explore at:
Dataset updated
May 3, 2021
Dataset authored and provided by
Donner, Paul
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data used in the publication "Identifying publications of cumulative dissertation theses by bilingual text similarity. Evaluation of similarity methods on a new short text task". It included bibliographical data for German PhD theses (dissertations) and associated publications for cumulative dissertations. Not included is content from Elsevier's Scopus database used in the study, except item identifiers. Users with access to the data can use these for matching.

File diss_data.csv contains bibliographic data of dissertation theses obtained from German National Library and cleaned and postprocessed The columns are: REQUIZ_NORM_ID: Identifier for the thesis TITLE: Cleaned thesis title HEADING: Descriptor terms (German) AUTO_LANG: Language, either from original record or automatically derived from title

File ground_truth_pub_metadata.csv contains bibliographic data for identified consitutive publications of theses. If columns 2 to 7 are empty, the thesis did not include any publications ("stand-alone" or monograph thesis).

The columns are: REQUIZ_NORM_ID: Identifier for the thesis, for matching with the data in file SCOPUS_ID: Scopus ID for the identified publication AUTORS: Author names of the publication as in the original thesis citation YEAR: Publication year of the publication as in the original thesis citation TITLE: Publication title as in the original thesis citation SOURCETITLE: Source title as in the original thesis citation PAGES: Page information of the publication as in the original thesis citation

Scopus identifiers are published with permission by Elsevier.
H
Dissertation Project data for Framing the Law: Judges and Jury Instructions
dataverse.harvard.edu
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Baker (2025). Dissertation Project data for Framing the Law: Judges and Jury Instructions [Dataset]. http://doi.org/10.7910/DVN/EPSDSA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/EPSDSA
Dataset updated
Mar 14, 2025
Dataset provided by
Harvard Dataverse
Authors
Matthew Baker
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
These data stem from my dissertation project, Framing the Law: Judges and Jury Instructions. This is an original dataset for federal criminal jury trial from January 1, 2015-December 31, 2018. These data come from 23 Federal Districts, and code 51 different variables.
4
Research data supporting chapter 'A Hybrid Neural Model Approach for Health...
data.4tu.nl
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wassamon Phusakulkajorn; Siwarak Unsiwilai; Ling Chang; Alfredo Núñez; Zili Li (2024). Research data supporting chapter 'A Hybrid Neural Model Approach for Health Assessment of Transition Zones with Multiple Data' of dissertation 'AI Solutions for Maintenance Decision Support in Railway Infrastructure' [Dataset]. http://doi.org/10.4121/43b96757-fd3f-4e89-b9ac-e0caad30f0f0.v1
Explore at:
Unique identifier
https://doi.org/10.4121/43b96757-fd3f-4e89-b9ac-e0caad30f0f0.v1
Dataset updated
Jul 22, 2024
Dataset provided by
4TU.ResearchData
Authors
Wassamon Phusakulkajorn; Siwarak Unsiwilai; Ling Chang; Alfredo Núñez; Zili Li
License
https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf
Dataset funded by
Europe’s Rail Flagship Project
ProRail
Description
The data and codes were prepared and uploaded to 4TU.ResearchData by Wassamon Phusakulkajorn to support the results in Chapter 5 (A Hybrid Neural Model Approach for Health Assessment of Transition Zones with Multiple Data) of her dissertation. This chapter has been submitted for publication as Phusakulkajorn, W., Unsiwilai, S., Chang, L., Núñez, A., Li, Z., A Hybrid Neural Model Approach for Health Assessment of Railway Transition Zones with Multiple Data Sources. In this research, we develop a framework that enables a more frequent evaluation of transition zone health by integrating multiple monitoring technologies, including track geometry measurements, interferometric synthetic aperture radar (InSAR), and axle box acceleration (ABA). This aims to improve an early detection capability for track irregularities. The data used in this research contain ABA, track geometry, InSAR measurements at transitions zone collected from a railway bridge between Dordrecht and Lage Zwaluwe station in the Netherlands. All implementations are done in MATLAB, where (.mat) files are analytical solutions and (.eps) and (.jpg) are figures used in the main manuscript.
n
Data from: Empowering Graph Neural Networks for Real-World Tasks
curate.nd.edu
pdf
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhichun Guo (2024). Empowering Graph Neural Networks for Real-World Tasks [Dataset]. http://doi.org/10.7274/25608504.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.7274/25608504.v1
Dataset updated
Nov 11, 2024
Dataset provided by
University of Notre Dame
Authors
Zhichun Guo
License
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Description
Numerous types of real-world data can be naturally represented as graphs, such as social networks, trading networks, and biological molecules. This highlights the need for effective graph representations to support various tasks. In recent years, graph neural networks (GNNs) have demonstrated remarkable success in extracting information from graphs and enabling graph-related tasks. However, they still face a series of challenges in solving real-world problems, including scarcity of labeled data, scalability issues, potential bias, etc. These challenges stem from both domain-specific issues and inherent limitations of GNNs. This thesis introduces various strategies to tackle these challenges and empower GNNs on real-world tasks.

For the domain-specific challenges, in this thesis, we especially focus on challenges in the chemistry domain, which plays a pivotal role in the drug discovery process. Considering the significant resources needed for labeling through wet lab experiments, the AI for chemistry domain struggles with the scarcity of labeled datasets. To address this, we present a comprehensive set of strategies that span model-based and data-based strategies alongside a hybrid method. These methods ingeniously utilize the diversity of data, models, and molecular representations to compensate for the lack of labels in individual datasets. For the inherent challenges, this thesis introduces strategies to overcome two main challenges: scalability and degree-based issues, especially in the context of link prediction tasks. Both of these two challenges originate from the mechanism of GNNs, which involves the iterative aggregation of neighboring nodes' information to update each central node. For the scalability issue, our work not only preserves GNNs' prediction performance but also significantly boosts inference speed. Regarding degree bias, our work highly improves the effectiveness of GNNs for underrepresented nodes with very light additional computational costs. These contributions not only address critical gaps in applying GNNs to specific domains but also lay the groundwork for future exploration in the broader field of graph-based real-world tasks.
Dissertation Data - Raw Data and Tests
figshare.com
xlsx
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
[CANDIDATE 48758] (2025). Dissertation Data - Raw Data and Tests [Dataset]. http://doi.org/10.6084/m9.figshare.28789946.v3
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28789946.v3
Dataset updated
May 7, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
[CANDIDATE 48758]
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A collection of data used within a dissertation submitted to the London School of Economics.The data concerns the occurrences of three disarmament-related words across approximately four years in seven news publications
d
Replication Data Dissertation Margarete Schweizerhof
dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schweizerhof, Margarete (2023). Replication Data Dissertation Margarete Schweizerhof [Dataset]. http://doi.org/10.7910/DVN/O6OROH
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/O6OROH
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Schweizerhof, Margarete
Description
Data for replication of qualitative and quantitative methods in Dissertation of author Margarete Schweizerhof
n
Data from: Probabilistic Machine Learning Methods for Spatio-Temporal Data
curate.nd.edu
pdf
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Bonas (2024). Probabilistic Machine Learning Methods for Spatio-Temporal Data [Dataset]. http://doi.org/10.7274/25595235.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.7274/25595235.v1
Dataset updated
Nov 11, 2024
Dataset provided by
University of Notre Dame
Authors
Matthew Bonas
License
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
Description
This dissertation presents multiple novel methodological advancements in the realm of machine learning (ML) for spatio-temporal data applications. Traditional machine learning approaches typically have difficultly producing both accurate point predictions and adequate uncertainty quantification for these data, especially in instances where the data themselves are sampled at a fine temporal scale. This is due to the fact that inference on these complex ML models is notably difficult and can impose a significant computational burden. The challenge of forecasting spatio-temporal data is further heightened when attempting to ensure the forecast themselves obey any known physical laws which dictate or influence the underlying data structure.

We explore the current challenges in properly quantifying the uncertainty of forecasts for spatio-temporal data applications stemming from contemporary ML models. Methods are introduced to not only calibrate the uncertainty estimates such that proper coverage is achieved but also so there is a realistic expansion of the uncertainty through time. These contemporary ML models are also adapted such that the physical processes present throughout that data are used to inform the learning procedures, so that the forecasts themselves are influenced to be more physically compliant. We demonstrate the power in combining ML models in an ensemble to improve model accuracy in predicting nonstationary, complex temporal data. Finally, a general comparison is made to explore the benefits and drawbacks of ML approaches to time-series forecasting versus the popular and standard statistical approaches, and as a guide to explain how these newfound advanced ML modelling techniques are not necessarily meant to act as a universal best approach for prediction and forecasting.
Z
Ben Black MSc Dissertation Data
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Black, Ben (2020). Ben Black MSc Dissertation Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1423177
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Knight, Vincent
Glynatsi, Nikoleta
Black, Ben
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is an archive for the data used for my MSc dissertation, where I assessed the accuracy of the Ohtsuki-Nowak approximation for games on graphs. The datasets that do not begin with "NEWEST" are the ones used in that analysis undertaken for the project.

The data that doesn't have "conjoined" in its name contains fixation probabilities from Moran processes on graphs and in well-mixed populations, and also steady states of the replicator equation on graphs. The datasets that do have "conjoined" in their name contain fixation probabilities from Moran processes on a number of different types of graphs, which was used to test an extension of Ohtsuki and Nowak's work onto graphs made from conjoining two regular ones.
data available for dissertation
figshare.com
txt
Updated Feb 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rhiannon Golledge (2025). data available for dissertation [Dataset]. http://doi.org/10.6084/m9.figshare.28459385.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28459385.v1
Dataset updated
Feb 23, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Rhiannon Golledge
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Undergraduate dissertation project on understanding the habitat preferences of a reintroduced population of European water voles
D
Dissertations and Data
ssh.datastations.nl
datacatalogue.cessda.eu
pdf +3
Updated Dec 2, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J. Schöpfel; J. Schöpfel (2015). Dissertations and Data [Dataset]. http://doi.org/10.17026/DANS-XG6-XNJ4
Explore at:
pdf(2207884), zip(20103), text/comma-separated-values(412002), xls(622080)Available download formats
Unique identifier
https://doi.org/10.17026/DANS-XG6-XNJ4
Dataset updated
Dec 2, 2015
Dataset provided by
DANS Data Station Social Sciences and Humanities
Authors
J. Schöpfel; J. Schöpfel
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
We present the results of a quantitative assessment of research data produced and submitted with dissertations Special attention is paid to the size of the research data in appendices, to their presentation and link to the text, to their sources and typology, and to their potential for further research. The discussion puts the focus on legal aspects (database protection, intellectual property, privacy, third-party rights) and other barriers to data sharing, reuse and dissemination through open access.Another part adds insight into the potential handling of these data, in the framework of the French and Slovenian dissertation infrastructures. What could be done to valorize these data in a centralized system for electronic theses and dissertations (ETDs)? The topics are formats, metadata (including attribution of unique identifiers), submission/deposit, long-term preservation and dissemination. This part will also draw on experiences from other campuses and make use of results from surveys on data management at the Universities of Berlin and Lille.

Facebook

Twitter

Click to copy link

Link copied

Cite

Author; Author (2024). AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments [Dataset]. http://doi.org/10.5281/zenodo.13283331

Data from: AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments

Explore at:

Unique identifier

https://doi.org/10.5281/zenodo.13283331

Dataset updated

Nov 5, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Author; Author

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Time period covered

Aug 10, 2024

Description

This data is supplementary to the paper "AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments" .

Clear search

Close search

Google apps

Main menu

Data from: AckSent: Human Annotated Dataset of Support and Sentiments in...

Metadata for the dissertation: Improving Commercial Property Price...

Advanced Topics in Differentially Private Statistical Learning

Dataset of books called Successful dissertations : the complete guide for...

Statistics on the number of scholarships for masters and doctoral...

Data underlying the PhD dissertation: Automated Layout Generation and Design...

Data from: Doctoral Dissertation Research: Mapping Community Exposure to...

Data to accompany dissertation: Geographic, Cultural, and Ecological...

Supplementary data files for the PhD thesis "Design for Interpersonal Mood...

Data from: AckSent: Human Annotated Dataset of Support and Sentiments in...

Ground truth data for "Identifying publications of cumulative dissertation...

Dissertation Project data for Framing the Law: Judges and Jury Instructions

Research data supporting chapter 'A Hybrid Neural Model Approach for Health...

Data from: Empowering Graph Neural Networks for Real-World Tasks

Dissertation Data - Raw Data and Tests

Replication Data Dissertation Margarete Schweizerhof

Data from: Probabilistic Machine Learning Methods for Spatio-Temporal Data

Ben Black MSc Dissertation Data

data available for dissertation

Dissertations and Data

Data from: AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments