100+ datasets found

n
Data from: Advanced Topics in Differentially Private Statistical Learning
curate.nd.edu
pdf
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Spencer Tate Giddens (2025). Advanced Topics in Differentially Private Statistical Learning [Dataset]. http://doi.org/10.7274/29498438.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.7274/29498438.v1
Dataset updated
Jul 14, 2025
Dataset provided by
University of Notre Dame
Authors
Spencer Tate Giddens
License
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Description
Collecting and utilizing data to understand population trends, make predictions, and guide decisions is becoming increasingly common in today's world. In particular, statistical learning allows users to infer relationships between variables, learn patterns, and predict outcomes for previously unseen data via concepts and techniques from statistics and machine learning. Although many of the results of this practice have been beneficial, the data used often contain sensitive information, such as medical records or financial information, so maintaining privacy is of paramount importance when releasing statistics, parameter estimates, and other results. Differential privacy (DP) is the state-of-the-art framework for guaranteeing privacy when releasing aggregate information and statistics from a dataset. It provides a provable bound on the incurred privacy loss via the injection of random noise, at the cost of a reduction in utility. While many works have been devoted to establishing DP guarantees for various analysis tools in the past two decades since DP's introduction, many popular statistical learning approaches still lack a DP counterpart. This dissertation addresses this issue in three original research topics, as listed below.

First, the dissertation presents the first differentially private algorithm for general weighted empirical risk minimization (wERM), along with theoretical DP guarantees. It evaluates the performance of the DP-wERM framework applied to outcome weighted learning (OWL), a method for learning individualized treatment rules, in both simulation studies and in a real clinical trial. The results demonstrate the feasibility of training OWL models via wERM with DP guarantees while maintaining sufficiently robust model performance.

Second, the dissertation presents several original approaches with proven DP guarantees for linear mixed-effects (LME) models. LME models are popular, especially among statisticians, but lack sufficient work on integrating DP. The work leverages some recent advancements in the DP literature, particularly in DP stochastic gradient descent (SGD), to estimate LME model parameters with DP guarantees with better privacy-utility trade-offs. Theoretical results for an upper bound for the mean squared error between private parameter estimates vs the true parameters for DP-SGD-based approaches are provided, and a simulation study and a real-world case study provide further empirical evidence for the feasibility of the approaches at practically reasonable privacy budgets.

Third, this dissertation introduces SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure that sequentially combines DP data synthesis with a fairness-aware data transformation. Alongside privacy, the fairness of decisions made by a statistical learning model is also crucial to address, though the vast majority of existing literature treats the two concerns independently. For methods that do consider privacy and fairness simultaneously, they often only apply to a specific machine learning task, limiting their generalizability. SAFES allows full control over the privacy-fairness-utility trade-off via tunable privacy and fairness parameters. SAFES is illustrated by combining a graphical model-based DP data synthesizer with a popular fairness-aware data pre-processing transformation, and empirical evaluations on two popular benchmark datasets demonstrate that for reasonable privacy loss, SAFES-generated synthetic data achieve significantly improved fairness metrics with relatively low utility loss.
d
Statistics on the number of scholarships for masters and doctoral...
data.gov.tw
csv
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Student Affairs and Special Education (2025). Statistics on the number of scholarships for masters and doctoral dissertations and journal papers in gender equality education [Dataset]. https://data.gov.tw/en/datasets/159100
Explore at:
csvAvailable download formats
Dataset updated
Jun 1, 2025
Dataset authored and provided by
Department of Student Affairs and Special Education
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
In order to encourage academic and related research on gender equality education and improve the academic standards of the above-mentioned topics, the Ministry of Education has formulated the "Key Points for the Ministry of Education to Award Master's and Doctoral Thesis and Journal Papers on Gender Equality Education" for awards.
4
Metadata for the dissertation: Improving Commercial Property Price...
data.4tu.nl
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farley Ishaak (2024). Metadata for the dissertation: Improving Commercial Property Price Statistics [Dataset]. http://doi.org/10.4121/cab0cf0e-668f-46db-82bb-94abe78faeb0.v1
Explore at:
Unique identifier
https://doi.org/10.4121/cab0cf0e-668f-46db-82bb-94abe78faeb0.v1
Dataset updated
Nov 25, 2024
Dataset provided by
4TU.ResearchData
Authors
Farley Ishaak
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2008 - 2023
Area covered
Netherlands
Description
This metadata document provides details of the data used for the dissertation: “Improving Commercial Property Price Statistics”. The study explores data related and methodological challenges in the construction of price statistics for commercial real estate.

Short abstract of the dissertation
Since the financial crisis of 2008, National Statistical Institutes (NSIs) have worked to develop commercial real estate (CRE) indicators for official statistics. These indicators are considered essential in financial stability monitoring and may help contain the consequences of future crises or even prevent future crises. However, progress at NSIs to develop these indicators has been slow due to challenges like low observation numbers and high heterogeneity. This dissertation addresses these challenges by exploring data issues and suggesting methodological improvements.

The first three studies focus on data challenges regarding share deals and portfolio sales. Both are real estate trading constructions that are specific to CRE. The results show that share deals and portfolio sales significantly differ from the rest of the market. Therefore, under specific circumstances, CRE indicators could benefit from including these trading types. The final two studies focus on methodological challenges regarding index construction methods and the role of sustainability in real estate pricing. The results show that, by combining established techniques, it is possible to construct price indices that meet official statistics’ standards. Furthermore, the results uncover a complex relationship between sustainability and prices: while energy efficiency generally involves price premiums, others aspects like health and environment display a discount for low sustainable properties.

Overall, this dissertation contributes to the legislative framework that is currently being developed for EU countries to publish official statistics for commercial real estate and adds to the academic discussion by presenting innovative techniques for data analyses and index construction.

Data sources
The following data sources were used:
Bussiness Register (Statistics Netherlands)
Transactions linked to the Register of Adresses and Buildings (BAG)
Linking table buildings and companies (Dutch Land Registry Office)
Property Transfer Tax data (Dutch Tax Authorities)
Building sustainability scores (W/E advisors)Commercial real estate transactions (Dutch Land Registry Office)
Commercial real estate transactions (Dutch Land Registry Office)

Processing methodology
The data is originally stored in an SQL database and is processed with SQL and R code (version 4.2). In the code, the name of the table is tbl_SPE_2_ABR_Bedrijfsinfo. The data is used for deriving company transfers by comparing ownership states of various periods. The first period that an ownership differs of the same company indicates an ownership transfer.
The data is originally stored in an SQL database and is processed with SQL and R code (version 4.2). In the code, the name of the table is tbl_SPE_6_ABR_CompleetMicro. The data is used for calcuting the size of real estate share deals and estimating price developments by applying appropriate filters and counting the output.
The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is SPE_KADASTER. The data is used for finding real estate information that corresponds to company transfers by linking the company register (ABR) to the real estate register (BAG).
The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is tbl_SPE_3_OVB_Bedrijfsinfo. The data is used for deriving real estate share deals by linking this table (Kadaster) to the real estate register (BAG).
The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is duurzaamheid_input_regressie2. The data is used for finding the relationship between sustainabilty measures and real estate transaction prices by linking sustainabilty scores from a consultancy (WE) to transaction prices (Cadastre) and running regression analyses.
The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is tbl_OV20_pand. The data is used for 4 purposes (separate studies).
(1) Chapter 3: Determining the price effect of portfolio sale by running regression analyses
(2) Chapter 4: Developing methods to include portfolio sales in CPPI calcutions by using auxilary data of the real estate properties.
(3) Chapter 5: Developing a price index method for small domains by using these data to test the outcomes
(4) Chapter 6: Determining the relationship between sustatinability by running regression analyses

Data restrictions
As part of the CBS law, sharing micro-data outside of the CBS-environment is prohibited. Furthermore, CBS manages the data, but in some cases other parties are still formal owners of the data. The 2 other parties are The Land Registry Office and WE consultancy. Ownership and intellectual property rights are managed in contracts with both owners. It was agreed upon that the data can only be used for the purpose of the PhD study and that the microdata will never be externally disseminated. The data is still owned by them and the intellectual property rights of the analyses belong to me. An intended use of the microdata should be approved by both Statistics Netherlands and the formal data owner. Because of the above, no data can be publicly shared.

If one intends to do research on these data, an application for data use can be requested at CBS. CBS will charge costs for anonymising the data and providing a closed environment to work with the data. More information on this can be found at: https://www.cbs.nl/en-gb/our-services/customised-services-microdata/microdata-conducting-your-own-research

Contact information
Author: Farley Ishaak
Statistics Netherlands | Henri Faasdreef 312 | P.O. Box 24500 | 2490 HA The Hague
TU Delft | Delft University of Technology | Faculty of Architecture and the Built Environment
Department of Management in the Built Environment | P.O. Box 5043 | 2600 GA Delft
M +31 6 46307974 | ff.ishaak@cbs.nl | f.f.ishaak@tudelft.nl
Data from: AckSent: Human Annotated Dataset of Support and Sentiments in...
zenodo.org
Updated Nov 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Author; Author (2024). AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments [Dataset]. http://doi.org/10.5281/zenodo.13283331
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.13283331
Dataset updated
Nov 5, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Author; Author
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Time period covered
Aug 10, 2024
Description
This data is supplementary to the paper "AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments" .
m
2025 Green Card Report for Psychometrics and Statistics Equiv To Us Phd In...
myvisajobs.com
Updated Jan 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MyVisaJobs (2025). 2025 Green Card Report for Psychometrics and Statistics Equiv To Us Phd In Statistics [Dataset]. https://www.myvisajobs.com/reports/green-card/major/psychometrics-and-statistics-equiv-to-us-phd-in-statistics
Explore at:
Dataset updated
Jan 16, 2025
Dataset authored and provided by
MyVisaJobs
License
https://www.myvisajobs.com/terms-of-service/https://www.myvisajobs.com/terms-of-service/
Variables measured
Major, Salary, Petitions Filed
Description
A dataset that explores Green Card sponsorship trends, salary data, and employer insights for psychometrics and statistics equiv to us phd in statistics in the U.S.
u
Thesis Data Repository
figshare.unimelb.edu.au
zip
Updated Oct 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregory White (2023). Thesis Data Repository [Dataset]. http://doi.org/10.26188/24295243.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.26188/24295243.v1
Dataset updated
Oct 11, 2023
Dataset provided by
The University of Melbourne
Authors
Gregory White
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Availability of data, code, and plot creation for various figures throughout my PhD thesis. Rough organisation currently. Pertains to Figures 5.4, 5.8, 6.11, 6.18, 7.3, 7.12, and Table 6.1.
h
Data for the PhD thesis "Modeling Lexical Fields for Translation: a...
heidata.uni-heidelberg.de
b2find.eudat.eu
+1more
zip
Updated Aug 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meri Dallakyan; Meri Dallakyan (2025). Data for the PhD thesis "Modeling Lexical Fields for Translation: a Corpus-Based Study of Armenian, German, and English Culinary Verbs" [Dataset]. http://doi.org/10.11588/DATA/3MPL7E
Explore at:
zip(166634), zip(1130199), zip(617108), zip(167898), zip(4471905), zip(5882160), zip(1203076), zip(334871), zip(3353340), zip(2699455), zip(436611), zip(412972), zip(125927), zip(22647800)Available download formats
Unique identifier
https://doi.org/10.11588/DATA/3MPL7E
Dataset updated
Aug 4, 2025
Dataset provided by
heiDATA
Authors
Meri Dallakyan; Meri Dallakyan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains in high resolution all graphical visualizations of data analysis provided in my doctoral dissertation. The graphs are organized according to chapters and subchapters and titeled respectively. Additionally, this dataset provides all dataframes (German, English, and Armenian) in XLSX format of the manual semantic annotation based on which the graphs are generated. Among presented graphical visualizations are (Multiple) Correspondence Analysis (MCA vs. CA), Mosaic-Plots, Conditional Infererence Trees (CIT), and Context-Conditional Correlations Graphs (CCCG).
o
Dissertations and Data
explore.openaire.eu
ssh.datastations.nl
Updated Jan 1, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J. Schöpfel (2016). Dissertations and Data [Dataset]. http://doi.org/10.17026/dans-xg6-xnj4
Explore at:
Unique identifier
https://doi.org/10.17026/dans-xg6-xnj4
Dataset updated
Jan 1, 2016
Authors
J. Schöpfel
Description
We present the results of a quantitative assessment of research data produced and submitted with dissertations Special attention is paid to the size of the research data in appendices, to their presentation and link to the text, to their sources and typology, and to their potential for further research. The discussion puts the focus on legal aspects (database protection, intellectual property, privacy, third-party rights) and other barriers to data sharing, reuse and dissemination through open access. Another part adds insight into the potential handling of these data, in the framework of the French and Slovenian dissertation infrastructures. What could be done to valorize these data in a centralized system for electronic theses and dissertations (ETDs)? The topics are formats, metadata (including attribution of unique identifiers), submission/deposit, long-term preservation and dissemination. This part will also draw on experiences from other campuses and make use of results from surveys on data management at the Universities of Berlin and Lille.
4
Supplementary data files for the PhD thesis "Design for Interpersonal Mood...
data.4tu.nl
zip
Updated Jun 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pelin Esnaf-Uslu; Pieter M. A. Desmet; Rick Schifferstein (2024). Supplementary data files for the PhD thesis "Design for Interpersonal Mood Regulation: Introducing a Framework and Three Tools to Support Mood-Sensitive Service Encounters" [Dataset]. http://doi.org/10.4121/8a9b21b2-6411-42ed-a0e4-05be50fc5a69.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/8a9b21b2-6411-42ed-a0e4-05be50fc5a69.v1
Dataset updated
Jun 14, 2024
Dataset provided by
4TU.ResearchData
Authors
Pelin Esnaf-Uslu; Pieter M. A. Desmet; Rick Schifferstein
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Dataset funded by
The Netherlands Organization for Scientific Research (NWO), Division for the Social and Behavioural Sciences
Description
This dataset comprises five sets of data collected throughout the PhD Thesis project of Pelin Esnaf-Uslu.

Esnaf-Uslu, P. (2024). Design for Interpersonal Mood Regulation: Introducing a Framework and Three Tools to Support Mood-Sensitive Service Encounters. (Doctoral dissertation in review). Delft University of Technology, Delft, the Netherlands.

The research in this thesis is based on the premise that service providers can enhance their effectiveness in client interactions by acquiring a detailed understanding of IMR strategies and effectively applying this knowledge. To achieve this overall aim, the current research aimed to explore (1) the current role of mood in service encounters, (2) the IMR strategies used by service providers during service encounters in response to client’s moods, (3) how IMR strategies can be facilitated by means of tools for service providers and the (4) strengths and limitations of the developed materials.

This research was supported by VICI grant number 453-16-009 from The Netherlands Organization for Scientific Research (NWO), Division for the Social and Behavioral Sciences, awarded to Pieter M. A. Desmet.

The data is organized into folders corresponding to the chapters of the thesis. Each folder contains a README file with specific information about the dataset.

Chapter_2: This study investigates the role of mood in service encounters. Samples are collected from service providers experiences during service encounters and in-depth interviews are conducted. The dataset includes the blank diary and the interview protocol.

Chapter_3: This study investigates the clarity of the images developed representing Interpersonal Mood Regulation (IMR) strategies. The dataset includes anonymized scores from 27 and 29 participants, showing the associations between images representing nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants. Additionally, the dataset contains a screenshot of the workshop material used in the implementation study.

Chapter_4: This study examines the clarity of developed videos depicting IMR strategies. The dataset includes anonymized scores from 32 participants, showing the associations between videos depicting nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants. In addition, the dataset contains the workshop guideline developed for the implementation study.

Chapter_5: This study evaluates the clarity of character animations depicting Interpersonal Mood Regulation (IMR) strategies. The dataset includes anonymized scores from 39 participants, demonstrating the associations between videos illustrating nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants.

Chapter_6: This dataset comprises correspondence analysis files for each material, created for the purpose of comparison.

All the data is anonymized by removing the names of individuals and institutions.
l
Coding Set: Social Network Analysis Data for the PhD Thesis "More than...
pubdata.leuphana.de
xlsx
Updated 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roman Isaac; Berta Martín-López (2024). Coding Set: Social Network Analysis Data for the PhD Thesis "More than trees" [Dataset]. http://doi.org/10.48548/pubdata-217
Explore at:
xlsx(18596)Available download formats
Unique identifier
https://doi.org/10.48548/pubdata-217
Dataset updated
2024
Authors
Roman Isaac; Berta Martín-López
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset funded by
Deutsche Forschungsgemeinschaft (DFG)
Description
To identify relevant actors for the governance of co-produced forest nature's contributions to people (NCP) the researchers conducted a social-network analysis based on 39 semi-structured interviews with foresters and conservation managers. These interviews were conducted across three case study sites in Germany: Schorfheide-Chorin in the Northeast, Hainich-Dün in the Centre, and Schwäbische Alb in the Southwest. All three case study sites belong to the large-scale and long-term research platform Biodiversity Exploratories. The researchers employed a predefined coding set to analyse the interviews and grasp the relationships between different actors based on the anthropogenic capitals they used to co-produce forest nature's contributions to people (NCP). To secure the interviewees anonymity this coding cannot be published. Therefore, this data set is limited to this coding set.
s
Dataset in support of the thesis 'Automating the detection of marine sound...
eprints.soton.ac.uk
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
White, Ellen Louise (2023). Dataset in support of the thesis 'Automating the detection of marine sound sources' [Dataset]. http://doi.org/10.5281/zenodo.10276722
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.10276722
Dataset updated
Dec 6, 2023
Dataset provided by
Zenodo
Authors
White, Ellen Louise
Description
The dataset associated with the PhD Thesis '', by Ellen L White. This dataset does not include any raw passive acoustic data, please contact me if you are interested in access to any of this data. The data repository contains the training and testing data used within the thesis to develop a CNN for muli-sound source detection, including all required scripts to replicate this work.
PhD thesis
figshare.com
pdf
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anders Eklund (2023). PhD thesis [Dataset]. http://doi.org/10.6084/m9.figshare.704865.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.704865.v1
Dataset updated
Jun 20, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Anders Eklund
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
My PhD thesis

Computational medical image analysis - With a focus on real-time fMRI and non-parametric statistics
4
Research data supporting chapter 'A Hybrid Neural Model Approach for Health...
data.4tu.nl
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wassamon Phusakulkajorn; Siwarak Unsiwilai; Ling Chang; Alfredo Núñez; Zili Li (2024). Research data supporting chapter 'A Hybrid Neural Model Approach for Health Assessment of Transition Zones with Multiple Data' of dissertation 'AI Solutions for Maintenance Decision Support in Railway Infrastructure' [Dataset]. http://doi.org/10.4121/43b96757-fd3f-4e89-b9ac-e0caad30f0f0.v1
Explore at:
Unique identifier
https://doi.org/10.4121/43b96757-fd3f-4e89-b9ac-e0caad30f0f0.v1
Dataset updated
Jul 22, 2024
Dataset provided by
4TU.ResearchData
Authors
Wassamon Phusakulkajorn; Siwarak Unsiwilai; Ling Chang; Alfredo Núñez; Zili Li
License
https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf
Dataset funded by
Europe’s Rail Flagship Project
ProRail
Description
The data and codes were prepared and uploaded to 4TU.ResearchData by Wassamon Phusakulkajorn to support the results in Chapter 5 (A Hybrid Neural Model Approach for Health Assessment of Transition Zones with Multiple Data) of her dissertation. This chapter has been submitted for publication as Phusakulkajorn, W., Unsiwilai, S., Chang, L., Núñez, A., Li, Z., A Hybrid Neural Model Approach for Health Assessment of Railway Transition Zones with Multiple Data Sources. In this research, we develop a framework that enables a more frequent evaluation of transition zone health by integrating multiple monitoring technologies, including track geometry measurements, interferometric synthetic aperture radar (InSAR), and axle box acceleration (ABA). This aims to improve an early detection capability for track irregularities. The data used in this research contain ABA, track geometry, InSAR measurements at transitions zone collected from a railway bridge between Dordrecht and Lage Zwaluwe station in the Netherlands. All implementations are done in MATLAB, where (.mat) files are analytical solutions and (.eps) and (.jpg) are figures used in the main manuscript.
n
Data from: Knowledge-centric Machine Learning on Graphs
curate.nd.edu
pdf
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yijun Tian (2024). Knowledge-centric Machine Learning on Graphs [Dataset]. http://doi.org/10.7274/25607826.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.7274/25607826.v1
Dataset updated
Nov 11, 2024
Dataset provided by
University of Notre Dame
Authors
Yijun Tian
License
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Description
Graph Machine Learning (GML) has gained considerable attention in modeling complex graph-structured data, but many of them focus on collecting high-quality data (i.e., data-centric) and developing complex model architectures (i.e., model-centric). However, these two paradigms come with inherent limitations and challenges: data-centric approaches often demand intensive labor for tasks like data annotation and cleaning, while model-centric approaches usually require specialized expertise for model refinements. There remains a significant reservoir of unexplored potential in harnessing useful information that already exists in the data and learned by models, i.e., knowledge, as a directive force for learning.

In this dissertation, I introduce a new paradigm of machine learning on graphs: knowledge-centric. This paradigm seeks to leverage all available knowledge, which may come from data, models, or external sources, to facilitate an effective learning process. My research focuses on three different facets to obtain and leverage knowledge in GML, including learning knowledge from data, distilling knowledge from models, and encoding knowledge from external sources. By anchoring on the knowledge, there is a reduced reliance on massive data and intricate model architectures. In addition, knowledge can enhance GML models' performance, trustworthiness, and efficiency.
n
Data from: Graph-Based Approaches for Prediction and Similarity Analysis
curate.nd.edu
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lin Xing (2024). Graph-Based Approaches for Prediction and Similarity Analysis [Dataset]. http://doi.org/10.7274/25575060.v1
Explore at:
Unique identifier
https://doi.org/10.7274/25575060.v1
Dataset updated
Nov 11, 2024
Dataset provided by
University of Notre Dame
Authors
Lin Xing
License
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Description
This thesis explores graph-based approaches for prediction and similarity analysis problems within networks and hypergraphs. While existing algorithms for link prediction in networks predominantly target the existence or weights of edges, our study expands the scope by delving into the prediction of both vertex and edge weights using metric geometry and machine learning approaches. Additionally, our investigation extends into weight prediction in higher-order networks, often referred to as hypergraphs. We propose a novel notion of neighborhood for hyperedges, utilizing the topological structures of hypergraphs and weights of hyperedges from a given training set. We construct metric spaces on the set of hyperedges based on the neighborhood information. Furthermore, we explore the practical application of graph similarity algorithms in DNA sequence analysis, introducing an accurate and computationally efficient approach to analyze the similarities among DNA sequences. Our proposed methods were tested on diverse real-world datasets and yielded promising results. The main implication of our research is offering a more comprehensive framework for prediction tasks in networks and hypergraphs, providing alternative avenues to gain a deeper understanding of the intricate relationships within complex networks.
PhD Thesis: Development of Equitable Algorithms for Road Funds Allocation...
figshare.com
application/cdfv2
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Naimanye (2016). PhD Thesis: Development of Equitable Algorithms for Road Funds Allocation and Road Scheme Priritization in Developing Countries: A Case Study of Sub-Saharan Africa [Dataset]. http://doi.org/10.6084/m9.figshare.1396244.v1
Explore at:
application/cdfv2Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1396244.v1
Dataset updated
Jan 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Andrew Naimanye
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Sub-Saharan Africa
Description
Uganda Road Fund Allocation Formula application 2014 and 2015
J
Jordan Number of Enrolled PHD Students
ceicdata.com
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2025). Jordan Number of Enrolled PHD Students [Dataset]. https://www.ceicdata.com/en/jordan/education-statistics/number-of-enrolled-phd-students
Explore at:
Dataset updated
Jan 15, 2025
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 1, 2004 - Jun 1, 2016
Area covered
Jordan
Variables measured
Education Statistics
Description
Jordan Number of Enrolled PHD Students data was reported at 3,362.000 Person in 2017. This records an increase from the previous number of 3,276.000 Person for 2016. Jordan Number of Enrolled PHD Students data is updated yearly, averaging 1,892.000 Person from Jun 2002 (Median) to 2017, with 15 observations. The data reached an all-time high of 3,362.000 Person in 2017 and a record low of 682.000 Person in 2002. Jordan Number of Enrolled PHD Students data remains active status in CEIC and is reported by Ministry of Higher Education and Scientific Research. The data is categorized under Global Database’s Jordan – Table JO.G007: Education Statistics.
4
Data and scripts underlying the PhD thesis: Irreducible antifluorite...
data.4tu.nl
zip
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Victor Landgraf (2024). Data and scripts underlying the PhD thesis: Irreducible antifluorite electrolytes [Dataset]. http://doi.org/10.4121/2068edec-1ec9-40c8-91c2-78857ce90743.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/2068edec-1ec9-40c8-91c2-78857ce90743.v1
Dataset updated
Nov 21, 2024
Dataset provided by
4TU.ResearchData
Authors
Victor Landgraf
License
https://www.apache.org/licenses/LICENSE-2.0.htmlhttps://www.apache.org/licenses/LICENSE-2.0.html
Dataset funded by
NWO
Description
This data set contains the raw data and analysis scripts to help reproduce the data presented in the thesis "Irreducible antifluorite electrolytes". The data set contains the data for Chapters 2,3,4, and 5 which are the chapters in which acquired data is presented. (Chapter 1 of the thesis is the Introduction, thus acquired data is shown in this chapter).
R
New Thesis Data Sets Dataset
universe.roboflow.com
zip
Updated Feb 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Conveyor (2024). New Thesis Data Sets Dataset [Dataset]. https://universe.roboflow.com/conveyor/new-thesis-data-sets
Explore at:
zipAvailable download formats
Dataset updated
Feb 10, 2024
Dataset authored and provided by
Conveyor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Fruits Pineapple Mango Papaya Bounding Boxes
Description
New Thesis Data Sets

## Overview New Thesis Data Sets is a dataset for object detection tasks - it contains Fruits Pineapple Mango Papaya annotations for 4,346 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
e
Parameters and KPIs from the Real Data Use Case of Dissertation - Dataset -...
b2find.eudat.eu
Updated Apr 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Parameters and KPIs from the Real Data Use Case of Dissertation - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/27c32216-21c9-5185-a734-bef7f2d5a682
Explore at:
Dataset updated
Apr 5, 2024
Description
Column names indicate parameter and resulting KPI values, row indizees indicate unique part number location combination

Facebook

Twitter

Click to copy link

Link copied

Cite

Spencer Tate Giddens (2025). Advanced Topics in Differentially Private Statistical Learning [Dataset]. http://doi.org/10.7274/29498438.v1

Data from: Advanced Topics in Differentially Private Statistical Learning

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.7274/29498438.v1

Dataset updated

Jul 14, 2025

Dataset provided by

University of Notre Dame

Authors

Spencer Tate Giddens

License

https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

Description

Collecting and utilizing data to understand population trends, make predictions, and guide decisions is becoming increasingly common in today's world. In particular, statistical learning allows users to infer relationships between variables, learn patterns, and predict outcomes for previously unseen data via concepts and techniques from statistics and machine learning. Although many of the results of this practice have been beneficial, the data used often contain sensitive information, such as medical records or financial information, so maintaining privacy is of paramount importance when releasing statistics, parameter estimates, and other results. Differential privacy (DP) is the state-of-the-art framework for guaranteeing privacy when releasing aggregate information and statistics from a dataset. It provides a provable bound on the incurred privacy loss via the injection of random noise, at the cost of a reduction in utility. While many works have been devoted to establishing DP guarantees for various analysis tools in the past two decades since DP's introduction, many popular statistical learning approaches still lack a DP counterpart. This dissertation addresses this issue in three original research topics, as listed below.

First, the dissertation presents the first differentially private algorithm for general weighted empirical risk minimization (wERM), along with theoretical DP guarantees. It evaluates the performance of the DP-wERM framework applied to outcome weighted learning (OWL), a method for learning individualized treatment rules, in both simulation studies and in a real clinical trial. The results demonstrate the feasibility of training OWL models via wERM with DP guarantees while maintaining sufficiently robust model performance.

Second, the dissertation presents several original approaches with proven DP guarantees for linear mixed-effects (LME) models. LME models are popular, especially among statisticians, but lack sufficient work on integrating DP. The work leverages some recent advancements in the DP literature, particularly in DP stochastic gradient descent (SGD), to estimate LME model parameters with DP guarantees with better privacy-utility trade-offs. Theoretical results for an upper bound for the mean squared error between private parameter estimates vs the true parameters for DP-SGD-based approaches are provided, and a simulation study and a real-world case study provide further empirical evidence for the feasibility of the approaches at practically reasonable privacy budgets.

Third, this dissertation introduces SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure that sequentially combines DP data synthesis with a fairness-aware data transformation. Alongside privacy, the fairness of decisions made by a statistical learning model is also crucial to address, though the vast majority of existing literature treats the two concerns independently. For methods that do consider privacy and fairness simultaneously, they often only apply to a specific machine learning task, limiting their generalizability. SAFES allows full control over the privacy-fairness-utility trade-off via tunable privacy and fairness parameters. SAFES is illustrated by combining a graphical model-based DP data synthesizer with a popular fairness-aware data pre-processing transformation, and empirical evaluations on two popular benchmark datasets demonstrate that for reasonable privacy loss, SAFES-generated synthetic data achieve significantly improved fairness metrics with relatively low utility loss.

Clear search

Close search

Google apps

Main menu

Data from: Advanced Topics in Differentially Private Statistical Learning

Statistics on the number of scholarships for masters and doctoral...

Metadata for the dissertation: Improving Commercial Property Price...

Data from: AckSent: Human Annotated Dataset of Support and Sentiments in...

2025 Green Card Report for Psychometrics and Statistics Equiv To Us Phd In...

Thesis Data Repository

Data for the PhD thesis "Modeling Lexical Fields for Translation: a...

Dissertations and Data

Supplementary data files for the PhD thesis "Design for Interpersonal Mood...

Coding Set: Social Network Analysis Data for the PhD Thesis "More than...

Dataset in support of the thesis 'Automating the detection of marine sound...

PhD thesis

Research data supporting chapter 'A Hybrid Neural Model Approach for Health...

Data from: Knowledge-centric Machine Learning on Graphs

Data from: Graph-Based Approaches for Prediction and Similarity Analysis

PhD Thesis: Development of Equitable Algorithms for Road Funds Allocation...

Jordan Number of Enrolled PHD Students

Data and scripts underlying the PhD thesis: Irreducible antifluorite...

New Thesis Data Sets Dataset

New Thesis Data Sets

Parameters and KPIs from the Real Data Use Case of Dissertation - Dataset -...

Data from: Advanced Topics in Differentially Private Statistical Learning