100+ datasets found
  1. 4

    Metadata for the dissertation: Improving Commercial Property Price...

    • data.4tu.nl
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farley Ishaak (2024). Metadata for the dissertation: Improving Commercial Property Price Statistics [Dataset]. http://doi.org/10.4121/cab0cf0e-668f-46db-82bb-94abe78faeb0.v1
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Farley Ishaak
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2008 - 2023
    Area covered
    Netherlands
    Description

    This metadata document provides details of the data used for the dissertation: “Improving Commercial Property Price Statistics”. The study explores data related and methodological challenges in the construction of price statistics for commercial real estate.


    Short abstract of the dissertation

    Since the financial crisis of 2008, National Statistical Institutes (NSIs) have worked to develop commercial real estate (CRE) indicators for official statistics. These indicators are considered essential in financial stability monitoring and may help contain the consequences of future crises or even prevent future crises. However, progress at NSIs to develop these indicators has been slow due to challenges like low observation numbers and high heterogeneity. This dissertation addresses these challenges by exploring data issues and suggesting methodological improvements.


    The first three studies focus on data challenges regarding share deals and portfolio sales. Both are real estate trading constructions that are specific to CRE. The results show that share deals and portfolio sales significantly differ from the rest of the market. Therefore, under specific circumstances, CRE indicators could benefit from including these trading types. The final two studies focus on methodological challenges regarding index construction methods and the role of sustainability in real estate pricing. The results show that, by combining established techniques, it is possible to construct price indices that meet official statistics’ standards. Furthermore, the results uncover a complex relationship between sustainability and prices: while energy efficiency generally involves price premiums, others aspects like health and environment display a discount for low sustainable properties.


    Overall, this dissertation contributes to the legislative framework that is currently being developed for EU countries to publish official statistics for commercial real estate and adds to the academic discussion by presenting innovative techniques for data analyses and index construction.


    Data sources

    The following data sources were used:

    1. Bussiness Register (Statistics Netherlands)
    2. Transactions linked to the Register of Adresses and Buildings (BAG)
    3. Linking table buildings and companies (Dutch Land Registry Office)
    4. Property Transfer Tax data (Dutch Tax Authorities)
    5. Building sustainability scores (W/E advisors)Commercial real estate transactions (Dutch Land Registry Office)
    6. Commercial real estate transactions (Dutch Land Registry Office)


    Processing methodology

    1. The data is originally stored in an SQL database and is processed with SQL and R code (version 4.2). In the code, the name of the table is tbl_SPE_2_ABR_Bedrijfsinfo. The data is used for deriving company transfers by comparing ownership states of various periods. The first period that an ownership differs of the same company indicates an ownership transfer.
    2. The data is originally stored in an SQL database and is processed with SQL and R code (version 4.2). In the code, the name of the table is tbl_SPE_6_ABR_CompleetMicro. The data is used for calcuting the size of real estate share deals and estimating price developments by applying appropriate filters and counting the output.
    3. The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is SPE_KADASTER. The data is used for finding real estate information that corresponds to company transfers by linking the company register (ABR) to the real estate register (BAG).
    4. The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is tbl_SPE_3_OVB_Bedrijfsinfo. The data is used for deriving real estate share deals by linking this table (Kadaster) to the real estate register (BAG).
    5. The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is duurzaamheid_input_regressie2. The data is used for finding the relationship between sustainabilty measures and real estate transaction prices by linking sustainabilty scores from a consultancy (WE) to transaction prices (Cadastre) and running regression analyses.
    6. The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is tbl_OV20_pand. The data is used for 4 purposes (separate studies).
    • (1) Chapter 3: Determining the price effect of portfolio sale by running regression analyses
    • (2) Chapter 4: Developing methods to include portfolio sales in CPPI calcutions by using auxilary data of the real estate properties.
    • (3) Chapter 5: Developing a price index method for small domains by using these data to test the outcomes
    • (4) Chapter 6: Determining the relationship between sustatinability by running regression analyses


    Data restrictions

    As part of the CBS law, sharing micro-data outside of the CBS-environment is prohibited. Furthermore, CBS manages the data, but in some cases other parties are still formal owners of the data. The 2 other parties are The Land Registry Office and WE consultancy. Ownership and intellectual property rights are managed in contracts with both owners. It was agreed upon that the data can only be used for the purpose of the PhD study and that the microdata will never be externally disseminated. The data is still owned by them and the intellectual property rights of the analyses belong to me. An intended use of the microdata should be approved by both Statistics Netherlands and the formal data owner. Because of the above, no data can be publicly shared.


    If one intends to do research on these data, an application for data use can be requested at CBS. CBS will charge costs for anonymising the data and providing a closed environment to work with the data. More information on this can be found at: https://www.cbs.nl/en-gb/our-services/customised-services-microdata/microdata-conducting-your-own-research


    Contact information

    Author: Farley Ishaak

    Statistics Netherlands | Henri Faasdreef 312 | P.O. Box 24500 | 2490 HA The Hague

    TU Delft | Delft University of Technology | Faculty of Architecture and the Built Environment

    Department of Management in the Built Environment | P.O. Box 5043 | 2600 GA Delft

    M +31 6 46307974 | ff.ishaak@cbs.nl | f.f.ishaak@tudelft.nl

  2. Data from: AckSent: Human Annotated Dataset of Support and Sentiments in...

    • zenodo.org
    Updated Nov 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Author; Author (2024). AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments [Dataset]. http://doi.org/10.5281/zenodo.13283331
    Explore at:
    Dataset updated
    Nov 5, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Author; Author
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Time period covered
    Aug 10, 2024
    Description

    This data is supplementary to the paper "AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments" .

  3. n

    Advanced Topics in Differentially Private Statistical Learning

    • curate.nd.edu
    pdf
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spencer Tate Giddens (2025). Advanced Topics in Differentially Private Statistical Learning [Dataset]. http://doi.org/10.7274/29498438.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 14, 2025
    Dataset provided by
    University of Notre Dame
    Authors
    Spencer Tate Giddens
    License

    https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

    Description

    Collecting and utilizing data to understand population trends, make predictions, and guide decisions is becoming increasingly common in today's world. In particular, statistical learning allows users to infer relationships between variables, learn patterns, and predict outcomes for previously unseen data via concepts and techniques from statistics and machine learning. Although many of the results of this practice have been beneficial, the data used often contain sensitive information, such as medical records or financial information, so maintaining privacy is of paramount importance when releasing statistics, parameter estimates, and other results. Differential privacy (DP) is the state-of-the-art framework for guaranteeing privacy when releasing aggregate information and statistics from a dataset. It provides a provable bound on the incurred privacy loss via the injection of random noise, at the cost of a reduction in utility. While many works have been devoted to establishing DP guarantees for various analysis tools in the past two decades since DP's introduction, many popular statistical learning approaches still lack a DP counterpart. This dissertation addresses this issue in three original research topics, as listed below.

    First, the dissertation presents the first differentially private algorithm for general weighted empirical risk minimization (wERM), along with theoretical DP guarantees. It evaluates the performance of the DP-wERM framework applied to outcome weighted learning (OWL), a method for learning individualized treatment rules, in both simulation studies and in a real clinical trial. The results demonstrate the feasibility of training OWL models via wERM with DP guarantees while maintaining sufficiently robust model performance.

    Second, the dissertation presents several original approaches with proven DP guarantees for linear mixed-effects (LME) models. LME models are popular, especially among statisticians, but lack sufficient work on integrating DP. The work leverages some recent advancements in the DP literature, particularly in DP stochastic gradient descent (SGD), to estimate LME model parameters with DP guarantees with better privacy-utility trade-offs. Theoretical results for an upper bound for the mean squared error between private parameter estimates vs the true parameters for DP-SGD-based approaches are provided, and a simulation study and a real-world case study provide further empirical evidence for the feasibility of the approaches at practically reasonable privacy budgets.

    Third, this dissertation introduces SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure that sequentially combines DP data synthesis with a fairness-aware data transformation. Alongside privacy, the fairness of decisions made by a statistical learning model is also crucial to address, though the vast majority of existing literature treats the two concerns independently. For methods that do consider privacy and fairness simultaneously, they often only apply to a specific machine learning task, limiting their generalizability. SAFES allows full control over the privacy-fairness-utility trade-off via tunable privacy and fairness parameters. SAFES is illustrated by combining a graphical model-based DP data synthesizer with a popular fairness-aware data pre-processing transformation, and empirical evaluations on two popular benchmark datasets demonstrate that for reasonable privacy loss, SAFES-generated synthetic data achieve significantly improved fairness metrics with relatively low utility loss.

  4. w

    Dataset of books called Successful dissertations : the complete guide for...

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Successful dissertations : the complete guide for education, childhood and early childhood studies students [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Successful+dissertations+%3A+the+complete+guide+for+education%2C+childhood+and+early+childhood+studies+students
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Successful dissertations : the complete guide for education, childhood and early childhood studies students. It features 7 columns including author, publication date, language, and book publisher.

  5. Z

    Ground truth data for "Identifying publications of cumulative dissertation...

    • data.niaid.nih.gov
    Updated May 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donner, Paul (2021). Ground truth data for "Identifying publications of cumulative dissertation theses by bilingual text similarity" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4733849
    Explore at:
    Dataset updated
    May 3, 2021
    Dataset authored and provided by
    Donner, Paul
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data used in the publication "Identifying publications of cumulative dissertation theses by bilingual text similarity. Evaluation of similarity methods on a new short text task". It included bibliographical data for German PhD theses (dissertations) and associated publications for cumulative dissertations. Not included is content from Elsevier's Scopus database used in the study, except item identifiers. Users with access to the data can use these for matching.

    File diss_data.csv contains bibliographic data of dissertation theses obtained from German National Library and cleaned and postprocessed The columns are: REQUIZ_NORM_ID: Identifier for the thesis TITLE: Cleaned thesis title HEADING: Descriptor terms (German) AUTO_LANG: Language, either from original record or automatically derived from title

    File ground_truth_pub_metadata.csv contains bibliographic data for identified consitutive publications of theses. If columns 2 to 7 are empty, the thesis did not include any publications ("stand-alone" or monograph thesis).

    The columns are: REQUIZ_NORM_ID: Identifier for the thesis, for matching with the data in file SCOPUS_ID: Scopus ID for the identified publication AUTORS: Author names of the publication as in the original thesis citation YEAR: Publication year of the publication as in the original thesis citation TITLE: Publication title as in the original thesis citation SOURCETITLE: Source title as in the original thesis citation PAGES: Page information of the publication as in the original thesis citation

    Scopus identifiers are published with permission by Elsevier.

  6. Data from: AckSent: Human Annotated Dataset of Support and Sentiments in...

    • zenodo.org
    csv, pdf
    Updated Dec 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manika Lamba; Manika Lamba; You Peng; You Peng; Sophie Nikolov; Sophie Nikolov; John Stephen Downie; John Stephen Downie (2024). AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments [Dataset]. http://doi.org/10.5281/zenodo.14509104
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Dec 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Manika Lamba; Manika Lamba; You Peng; You Peng; Sophie Nikolov; Sophie Nikolov; John Stephen Downie; John Stephen Downie
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Time period covered
    2024
    Description

    This data is supplementary to the paper:

    Manika Lamba, You Peng, Sophie Nikolov, and J. Stephen Downie. 2024. AckSent: Human Annotated Dataset of Support and Sentiments in Dissertation Acknowledgments. In The 2024 ACM/IEEE Joint Conference on Digital Libraries (JCDL ’24), December 2024, Hong Kong, China. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3677389.3702594

  7. d

    Statistics on the number of scholarships for masters and doctoral...

    • data.gov.tw
    csv
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Student Affairs and Special Education (2025). Statistics on the number of scholarships for masters and doctoral dissertations and journal papers in gender equality education [Dataset]. https://data.gov.tw/en/datasets/159100
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 1, 2025
    Dataset authored and provided by
    Department of Student Affairs and Special Education
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    In order to encourage academic and related research on gender equality education and improve the academic standards of the above-mentioned topics, the Ministry of Education has formulated the "Key Points for the Ministry of Education to Award Master's and Doctoral Thesis and Journal Papers on Gender Equality Education" for awards.

  8. 4

    Supplementary data files for the PhD thesis "Design for Interpersonal Mood...

    • data.4tu.nl
    zip
    Updated Jun 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pelin Esnaf-Uslu; Pieter M. A. Desmet; Rick Schifferstein (2024). Supplementary data files for the PhD thesis "Design for Interpersonal Mood Regulation: Introducing a Framework and Three Tools to Support Mood-Sensitive Service Encounters" [Dataset]. http://doi.org/10.4121/8a9b21b2-6411-42ed-a0e4-05be50fc5a69.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 14, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Pelin Esnaf-Uslu; Pieter M. A. Desmet; Rick Schifferstein
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Dataset funded by
    The Netherlands Organization for Scientific Research (NWO), Division for the Social and Behavioural Sciences
    Description

    This dataset comprises five sets of data collected throughout the PhD Thesis project of Pelin Esnaf-Uslu.

    Esnaf-Uslu, P. (2024). Design for Interpersonal Mood Regulation: Introducing a Framework and Three Tools to Support Mood-Sensitive Service Encounters. (Doctoral dissertation in review). Delft University of Technology, Delft, the Netherlands.

    The research in this thesis is based on the premise that service providers can enhance their effectiveness in client interactions by acquiring a detailed understanding of IMR strategies and effectively applying this knowledge. To achieve this overall aim, the current research aimed to explore (1) the current role of mood in service encounters, (2) the IMR strategies used by service providers during service encounters in response to client’s moods, (3) how IMR strategies can be facilitated by means of tools for service providers and the (4) strengths and limitations of the developed materials.

    This research was supported by VICI grant number 453-16-009 from The Netherlands Organization for Scientific Research (NWO), Division for the Social and Behavioral Sciences, awarded to Pieter M. A. Desmet.

    The data is organized into folders corresponding to the chapters of the thesis. Each folder contains a README file with specific information about the dataset.

    Chapter_2: This study investigates the role of mood in service encounters. Samples are collected from service providers experiences during service encounters and in-depth interviews are conducted. The dataset includes the blank diary and the interview protocol.

    Chapter_3: This study investigates the clarity of the images developed representing Interpersonal Mood Regulation (IMR) strategies. The dataset includes anonymized scores from 27 and 29 participants, showing the associations between images representing nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants. Additionally, the dataset contains a screenshot of the workshop material used in the implementation study.

    Chapter_4: This study examines the clarity of developed videos depicting IMR strategies. The dataset includes anonymized scores from 32 participants, showing the associations between videos depicting nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants. In addition, the dataset contains the workshop guideline developed for the implementation study.

    Chapter_5: This study evaluates the clarity of character animations depicting Interpersonal Mood Regulation (IMR) strategies. The dataset includes anonymized scores from 39 participants, demonstrating the associations between videos illustrating nine IMR strategies and their corresponding labels and descriptions, along with the free descriptions provided by the participants.

    Chapter_6: This dataset comprises correspondence analysis files for each material, created for the purpose of comparison.

    All the data is anonymized by removing the names of individuals and institutions.

  9. n

    Data from: Advances in Differential Privacy Concepts and Methods

    • curate.nd.edu
    pdf
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xingyuan Zhao (2024). Advances in Differential Privacy Concepts and Methods [Dataset]. http://doi.org/10.7274/25565250.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    University of Notre Dame
    Authors
    Xingyuan Zhao
    License

    https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

    Description

    Differential privacy (DP) formalizes privacy guarantees in a rigorous mathematical framework and is a state-of-the-art concept in data privacy research. The DP mechanisms ensure the privacy of each individual in a sensitive dataset while releasing useful information about the whole population in that dataset. Since its debut in 2006, significant advancements in DP theory, methodologies, and applications have been made; new research topics and questions have been proposed and studied. This dissertation aims to contribute to the advancement of DP concepts and methods in the robustness of DP mechanisms to privacy attacks, privacy amplification through subsampling, and DP guarantees of procedures with their intrinsic randomness. Specifically, this dissertation consists of three research projects on DP. The first project explores the protection potency of DP mechanisms against homogeneity attacks (HA) by providing analytical relations between measures of disclosure risk from HA and privacy loss parameters, which will assist practitioners in understanding the abstract concepts of DP by putting them in a concrete privacy attack model and offer a perspective for choosing privacy loss parameters. The second project proposes a class of subsampling methods ``MUltistage Sampling Technique (MUST)'' for privacy amplification. It provides the privacy composition analysis over repeated applications of MUST via the Fourier accountant algorithm. The utility experiments show that MUST demonstrates comparable utility and stability in privacy-preserving outputs compared to one-stage subsampling methods at similar privacy loss while improving the computational efficiency of algorithms requiring complex function calculations on distinct data points. MUST can be seamlessly integrated into stochastic optimization algorithms or procedures involving parallel or simultaneous subsampling when DP guarantees are necessary. The third project investigates the inherent DP guarantees in Bayesian posterior sampling. It provides a new privacy loss bound in releasing a single posterior sample with any prior given a bounded log ratio of the likelihood kernels based on two neighboring data sets. The new bound is tighter than the existing bounds and consistent with the likelihood principle. Experiments show that the privacy-preserving synthetic data released from Bayesian models leveraging the inherently private posterior samples are of improved utility compared to those generated by sanitizing the original information through explicit DP mechanisms.

  10. a

    Data from: Doctoral Dissertation Research: Mapping Community Exposure to...

    • arcticdata.io
    • search.dataone.org
    • +1more
    Updated Apr 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Brady (2022). Doctoral Dissertation Research: Mapping Community Exposure to Coastal Climate Hazards in the Arctic: A Case Study in Alaska's North Slope [Dataset]. http://doi.org/10.18739/A28G8FJ8X
    Explore at:
    Dataset updated
    Apr 11, 2022
    Dataset provided by
    Arctic Data Center
    Authors
    Michael Brady
    Time period covered
    Oct 1, 2015 - Sep 30, 2016
    Area covered
    Description

    This research investigates community exposure to coastal climate hazards in Alaska's North Slope and incorporates community assessment of the potential effects on loss of land, infrastructure, and other assets. This analysis will inform response strategies and planning by developing new methods of hazard assessment that can support community resilience in the North Slope and potentially serve as a model for advancing assessment and planning in other rural and urban communities. This research will expand traditional assessments of financial exposure to also include non-material factors such as values and priorities of diverse social groups within a community including a diverse set of stakeholders, ranging from multinational oil companies to individual subsistence hunters. This study surveys community views of asset importance and integrates results with a geophysical hazard data model for a coproduced community exposure map of the North Slope coast. This research will contribute to understanding the human and social dimensions of climate change impacts, including how social, economic, political, and cultural factors shape vulnerabilities and condition response strategies. Methods and findings could enhance nation-wide efforts in the United States to map community exposure to coastal climate hazards by demonstrating methods for, and the importance of systematically incorporating non-market values in exposure analysis.

    The objectives of the proposed research include adapting the U.S. Geological Survey's (USGS) coastal vulnerability index (CVI) to the Arctic context, and integrating results with formal asset databases and a spatial community landscape value model while working with affected communities during the process to coproduce exposure maps. Specifically, working with North Slope Alaskan communities the study will incorporate wind fetch (i.e., the open water distance over which wind can generate near shore waves, determined by sea ice extent) into the CVI and get community feedback on the results. In addition to community input on the CVI maps, coproducing the exposure maps includes the community assigning values to traditional land use places using existing spatial datasets and mapping and investigating specific sites threatened by coastal hazards with the aim to learn why exposed assets threaten the community.

  11. c

    Dissertations and Data

    • datacatalogue.cessda.eu
    • ssh.datastations.nl
    Updated Apr 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. Schöpfel (2023). Dissertations and Data [Dataset]. http://doi.org/10.17026/dans-xg6-xnj4
    Explore at:
    Dataset updated
    Apr 11, 2023
    Dataset provided by
    University of Lille 3
    Authors
    J. Schöpfel
    Description

    We present the results of a quantitative assessment of research data produced and submitted with dissertations Special attention is paid to the size of the research data in appendices, to their presentation and link to the text, to their sources and typology, and to their potential for further research. The discussion puts the focus on legal aspects (database protection, intellectual property, privacy, third-party rights) and other barriers to data sharing, reuse and dissemination through open access.
    Another part adds insight into the potential handling of these data, in the framework of the French and Slovenian dissertation infrastructures. What could be done to valorize these data in a centralized system for electronic theses and dissertations (ETDs)? The topics are formats, metadata (including attribution of unique identifiers), submission/deposit, long-term preservation and dissemination. This part will also draw on experiences from other campuses and make use of results from surveys on data management at the Universities of Berlin and Lille.

  12. n

    Data from: Empowering Graph Neural Networks for Real-World Tasks

    • curate.nd.edu
    pdf
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhichun Guo (2024). Empowering Graph Neural Networks for Real-World Tasks [Dataset]. http://doi.org/10.7274/25608504.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    University of Notre Dame
    Authors
    Zhichun Guo
    License

    https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

    Description

    Numerous types of real-world data can be naturally represented as graphs, such as social networks, trading networks, and biological molecules. This highlights the need for effective graph representations to support various tasks. In recent years, graph neural networks (GNNs) have demonstrated remarkable success in extracting information from graphs and enabling graph-related tasks. However, they still face a series of challenges in solving real-world problems, including scarcity of labeled data, scalability issues, potential bias, etc. These challenges stem from both domain-specific issues and inherent limitations of GNNs. This thesis introduces various strategies to tackle these challenges and empower GNNs on real-world tasks.

    For the domain-specific challenges, in this thesis, we especially focus on challenges in the chemistry domain, which plays a pivotal role in the drug discovery process. Considering the significant resources needed for labeling through wet lab experiments, the AI for chemistry domain struggles with the scarcity of labeled datasets. To address this, we present a comprehensive set of strategies that span model-based and data-based strategies alongside a hybrid method. These methods ingeniously utilize the diversity of data, models, and molecular representations to compensate for the lack of labels in individual datasets. For the inherent challenges, this thesis introduces strategies to overcome two main challenges: scalability and degree-based issues, especially in the context of link prediction tasks. Both of these two challenges originate from the mechanism of GNNs, which involves the iterative aggregation of neighboring nodes' information to update each central node. For the scalability issue, our work not only preserves GNNs' prediction performance but also significantly boosts inference speed. Regarding degree bias, our work highly improves the effectiveness of GNNs for underrepresented nodes with very light additional computational costs. These contributions not only address critical gaps in applying GNNs to specific domains but also lay the groundwork for future exploration in the broader field of graph-based real-world tasks.

  13. H

    Replication Data Dissertation Margarete Schweizerhof

    • dataverse.harvard.edu
    • dataone.org
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Margarete Schweizerhof (2023). Replication Data Dissertation Margarete Schweizerhof [Dataset]. http://doi.org/10.7910/DVN/O6OROH
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Margarete Schweizerhof
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data for replication of qualitative and quantitative methods in Dissertation of author Margarete Schweizerhof

  14. H

    Dissertation Project data for Framing the Law: Judges and Jury Instructions

    • dataverse.harvard.edu
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Baker (2025). Dissertation Project data for Framing the Law: Judges and Jury Instructions [Dataset]. http://doi.org/10.7910/DVN/EPSDSA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Matthew Baker
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    These data stem from my dissertation project, Framing the Law: Judges and Jury Instructions. This is an original dataset for federal criminal jury trial from January 1, 2015-December 31, 2018. These data come from 23 Federal Districts, and code 51 different variables.

  15. Z

    Data to accompany dissertation: Geographic, Cultural, and Ecological...

    • data.niaid.nih.gov
    Updated Sep 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Helgeson, Kirsten (2022). Data to accompany dissertation: Geographic, Cultural, and Ecological Correlations with Indigenous Language Vitality in North America [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6982147
    Explore at:
    Dataset updated
    Sep 1, 2022
    Dataset authored and provided by
    Helgeson, Kirsten
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    North America
    Description

    Text files:

    readme_general.txt contains a brief description of files included.

    readme_modeldata.txt contains a metadata description of model_data.csv.

    readme_languageslandNorthAmerica.txt contains a metadata description of Languages_land_NorthAmerica.csv.

    readme_languagerevitalizationdatabase.txt contains a metadata description of Language_revitalization_database.csv.

    CSV files:

    Languages_land_NorthAmerica.csv is a version of the Languages of Government-Recognized Native Land Areas in the Continental United States database. It includes data from the US Census 2017 TIGER/Line AIANNH shapefile with one row per Native land area and additional columns for associated information that was coded and calculated for this dissertation as discussed in Section 3.3.1.

    Language_revitalization_database.csv is the Language Revitalization Database. It contains the master language list used for this dissertation and columns created while coding data for the language revitalization variable, as discussed in Section 3.3.2.

    model_data.csv contains data for all variables used in the analysis and is the .csv file needed to run LanguageVitalityModels.R.

    R scripts:

    LanguageVitalityModels.R is the R script for the main part of the dissertation analysis.

  16. Dissertation Supplementary Files

    • figshare.com
    html
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Augustine Dunn (2023). Dissertation Supplementary Files [Dataset]. http://doi.org/10.6084/m9.figshare.810442.v7
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Augustine Dunn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Purpose These are a collection of supplementary files that are to be included in my dissertation. They include but are not limited to small IPython notebooks, extra figures, data-sets that are too large to publish in the main document such as full ortholog lists and other primary data.

    Viewing IPython notebooks (ipynb files) To view an IPython notebook, "right-click" its download link and select "Copy link address". Then navigate to the the free notebook viewer by following this link: http://nbviewer.ipython.org/. Finally, paste the link to the ipynb file that you copied into the URL form on the nbviewer page and click "Go".

  17. n

    Data from: Probabilistic Machine Learning Methods for Spatio-Temporal Data

    • curate.nd.edu
    pdf
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Bonas (2024). Probabilistic Machine Learning Methods for Spatio-Temporal Data [Dataset]. http://doi.org/10.7274/25595235.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    University of Notre Dame
    Authors
    Matthew Bonas
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Description

    This dissertation presents multiple novel methodological advancements in the realm of machine learning (ML) for spatio-temporal data applications. Traditional machine learning approaches typically have difficultly producing both accurate point predictions and adequate uncertainty quantification for these data, especially in instances where the data themselves are sampled at a fine temporal scale. This is due to the fact that inference on these complex ML models is notably difficult and can impose a significant computational burden. The challenge of forecasting spatio-temporal data is further heightened when attempting to ensure the forecast themselves obey any known physical laws which dictate or influence the underlying data structure.

    We explore the current challenges in properly quantifying the uncertainty of forecasts for spatio-temporal data applications stemming from contemporary ML models. Methods are introduced to not only calibrate the uncertainty estimates such that proper coverage is achieved but also so there is a realistic expansion of the uncertainty through time. These contemporary ML models are also adapted such that the physical processes present throughout that data are used to inform the learning procedures, so that the forecasts themselves are influenced to be more physically compliant. We demonstrate the power in combining ML models in an ensemble to improve model accuracy in predicting nonstationary, complex temporal data. Finally, a general comparison is made to explore the benefits and drawbacks of ML approaches to time-series forecasting versus the popular and standard statistical approaches, and as a guide to explain how these newfound advanced ML modelling techniques are not necessarily meant to act as a universal best approach for prediction and forecasting.

  18. n

    Data from: Knowledge-centric Machine Learning on Graphs

    • curate.nd.edu
    pdf
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yijun Tian (2024). Knowledge-centric Machine Learning on Graphs [Dataset]. http://doi.org/10.7274/25607826.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    University of Notre Dame
    Authors
    Yijun Tian
    License

    https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

    Description

    Graph Machine Learning (GML) has gained considerable attention in modeling complex graph-structured data, but many of them focus on collecting high-quality data (i.e., data-centric) and developing complex model architectures (i.e., model-centric). However, these two paradigms come with inherent limitations and challenges: data-centric approaches often demand intensive labor for tasks like data annotation and cleaning, while model-centric approaches usually require specialized expertise for model refinements. There remains a significant reservoir of unexplored potential in harnessing useful information that already exists in the data and learned by models, i.e., knowledge, as a directive force for learning.

    In this dissertation, I introduce a new paradigm of machine learning on graphs: knowledge-centric. This paradigm seeks to leverage all available knowledge, which may come from data, models, or external sources, to facilitate an effective learning process. My research focuses on three different facets to obtain and leverage knowledge in GML, including learning knowledge from data, distilling knowledge from models, and encoding knowledge from external sources. By anchoring on the knowledge, there is a reduced reliance on massive data and intricate model architectures. In addition, knowledge can enhance GML models' performance, trustworthiness, and efficiency.

  19. t

    Bach, Jakob (2024). Dataset: Experimental data for the dissertation...

    • service.tib.eu
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Bach, Jakob (2024). Dataset: Experimental data for the dissertation "leveraging constraints for user-centric feature selection". https://doi.org/10.35097/4kjyeg0z2bxmr6eh [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-4kjyeg0z2bxmr6eh
    Explore at:
    Dataset updated
    Nov 28, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract: These are the experimental data for the dissertation Bach, Jakob. "Leveraging Constraints for User-Centric Feature Selection" at the Department of Informatics of the Karlsruhe Institute of Technology. See the README for details. Many input datasets (which we also provide here) either originate from OpenML and are CC-BY-licensed or originate from PMLB and are MIT-licensed. Please see the LICENSE files in the corresponding datasets/ subfolders for details. TechnicalRemarks: # Experimental Data for the Dissertation "Leveraging Constraints for User-Centric Feature Selection" These are the experimental data for the dissertation Bach, Jakob. "Leveraging Constraints for User-Centric Feature Selection" at the Department of Informatics of the Karlsruhe Institute of Technology. The subfolders correspond to individual chapters of the dissertation: chap4-syn: Chapter 4 - "Evaluating the Impact of Constraints on Feature-Selection Results" chap5-ms: Chapter 5 - "Formulating Scientific Hypotheses as Constraints - A Case Study" chap6-afs: Chapter 6 - "Finding Alternative Feature Sets" chap7-csd: Chapter 7 - "Discovering Sparse and Alternative Subgroup Descriptions" See the corresponding README files in the subfolders for more information. We already published prior versions of the experimental data, as the dissertation bases on prior papers: Chapters 4 and 5: Data for the paper "An Empirical Evaluation of Constrained Feature Selection" Chapter 6: Data for the paper "Finding Optimal Diverse Feature Sets with Alternative Feature Selection" (Version 2) Chapter 7: Data for the paper "Using Constraints to Discover Sparse and Alternative Subgroup Descriptions" (Version 1) For Chapters 4, 5, and 7, we mainly consolidate the existing data. In particular, all *.csv files (datasets and results) remain unchanged compared to the data linked above. For Chapter 6, we reran the experimental pipeline to integrate a change for the feature-selection method "Greedy Wrapper". The other feature-selection methods have not changed, but experimental data may slightly differ regarding runtimes and for results affected by solver timeouts. For all four chapters, the following files (in each subfolder) differ from prior versions: Evaluation_console_output.txt: The dissertation's evaluation partly differs from the papers' evaluations (e.g., some analyses added, adapted, or removed).

  20. 4

    Data underlying the PhD dissertation: Automated Layout Generation and Design...

    • data.4tu.nl
    zip
    Updated Feb 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joan le Poole (2024). Data underlying the PhD dissertation: Automated Layout Generation and Design Rationale Capture to Support Early-Stage Complex Ship Design [Dataset]. http://doi.org/10.4121/e3eb2bab-8e28-4477-a34f-ba7c94d0d80b.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 5, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Joan le Poole
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This folder contains (references to) research data related to the dissertation entitled "Automated Layout Generation and Design Rationale Capture to Support Early-Stage Complex Ship Design" by Joan le Poole. Specifically, the data underlying the layout generation case study 4, as well as the data underlying case study 7 on design rationale is provided. In addition, the references to the earlier published research data underlying case studies 1-3 and 5-6 are given. The earlier published data sets as well as this file comprise all data underlying the dissertation mentioned above.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Farley Ishaak (2024). Metadata for the dissertation: Improving Commercial Property Price Statistics [Dataset]. http://doi.org/10.4121/cab0cf0e-668f-46db-82bb-94abe78faeb0.v1

Metadata for the dissertation: Improving Commercial Property Price Statistics

Explore at:
Dataset updated
Nov 25, 2024
Dataset provided by
4TU.ResearchData
Authors
Farley Ishaak
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Time period covered
2008 - 2023
Area covered
Netherlands
Description

This metadata document provides details of the data used for the dissertation: “Improving Commercial Property Price Statistics”. The study explores data related and methodological challenges in the construction of price statistics for commercial real estate.


Short abstract of the dissertation

Since the financial crisis of 2008, National Statistical Institutes (NSIs) have worked to develop commercial real estate (CRE) indicators for official statistics. These indicators are considered essential in financial stability monitoring and may help contain the consequences of future crises or even prevent future crises. However, progress at NSIs to develop these indicators has been slow due to challenges like low observation numbers and high heterogeneity. This dissertation addresses these challenges by exploring data issues and suggesting methodological improvements.


The first three studies focus on data challenges regarding share deals and portfolio sales. Both are real estate trading constructions that are specific to CRE. The results show that share deals and portfolio sales significantly differ from the rest of the market. Therefore, under specific circumstances, CRE indicators could benefit from including these trading types. The final two studies focus on methodological challenges regarding index construction methods and the role of sustainability in real estate pricing. The results show that, by combining established techniques, it is possible to construct price indices that meet official statistics’ standards. Furthermore, the results uncover a complex relationship between sustainability and prices: while energy efficiency generally involves price premiums, others aspects like health and environment display a discount for low sustainable properties.


Overall, this dissertation contributes to the legislative framework that is currently being developed for EU countries to publish official statistics for commercial real estate and adds to the academic discussion by presenting innovative techniques for data analyses and index construction.


Data sources

The following data sources were used:

  1. Bussiness Register (Statistics Netherlands)
  2. Transactions linked to the Register of Adresses and Buildings (BAG)
  3. Linking table buildings and companies (Dutch Land Registry Office)
  4. Property Transfer Tax data (Dutch Tax Authorities)
  5. Building sustainability scores (W/E advisors)Commercial real estate transactions (Dutch Land Registry Office)
  6. Commercial real estate transactions (Dutch Land Registry Office)


Processing methodology

  1. The data is originally stored in an SQL database and is processed with SQL and R code (version 4.2). In the code, the name of the table is tbl_SPE_2_ABR_Bedrijfsinfo. The data is used for deriving company transfers by comparing ownership states of various periods. The first period that an ownership differs of the same company indicates an ownership transfer.
  2. The data is originally stored in an SQL database and is processed with SQL and R code (version 4.2). In the code, the name of the table is tbl_SPE_6_ABR_CompleetMicro. The data is used for calcuting the size of real estate share deals and estimating price developments by applying appropriate filters and counting the output.
  3. The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is SPE_KADASTER. The data is used for finding real estate information that corresponds to company transfers by linking the company register (ABR) to the real estate register (BAG).
  4. The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is tbl_SPE_3_OVB_Bedrijfsinfo. The data is used for deriving real estate share deals by linking this table (Kadaster) to the real estate register (BAG).
  5. The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is duurzaamheid_input_regressie2. The data is used for finding the relationship between sustainabilty measures and real estate transaction prices by linking sustainabilty scores from a consultancy (WE) to transaction prices (Cadastre) and running regression analyses.
  6. The data is originally stored in an SQL database and is processed with R code (version 4.2). In the code, the name of the table is tbl_OV20_pand. The data is used for 4 purposes (separate studies).
  • (1) Chapter 3: Determining the price effect of portfolio sale by running regression analyses
  • (2) Chapter 4: Developing methods to include portfolio sales in CPPI calcutions by using auxilary data of the real estate properties.
  • (3) Chapter 5: Developing a price index method for small domains by using these data to test the outcomes
  • (4) Chapter 6: Determining the relationship between sustatinability by running regression analyses


Data restrictions

As part of the CBS law, sharing micro-data outside of the CBS-environment is prohibited. Furthermore, CBS manages the data, but in some cases other parties are still formal owners of the data. The 2 other parties are The Land Registry Office and WE consultancy. Ownership and intellectual property rights are managed in contracts with both owners. It was agreed upon that the data can only be used for the purpose of the PhD study and that the microdata will never be externally disseminated. The data is still owned by them and the intellectual property rights of the analyses belong to me. An intended use of the microdata should be approved by both Statistics Netherlands and the formal data owner. Because of the above, no data can be publicly shared.


If one intends to do research on these data, an application for data use can be requested at CBS. CBS will charge costs for anonymising the data and providing a closed environment to work with the data. More information on this can be found at: https://www.cbs.nl/en-gb/our-services/customised-services-microdata/microdata-conducting-your-own-research


Contact information

Author: Farley Ishaak

Statistics Netherlands | Henri Faasdreef 312 | P.O. Box 24500 | 2490 HA The Hague

TU Delft | Delft University of Technology | Faculty of Architecture and the Built Environment

Department of Management in the Built Environment | P.O. Box 5043 | 2600 GA Delft

M +31 6 46307974 | ff.ishaak@cbs.nl | f.f.ishaak@tudelft.nl

Search
Clear search
Close search
Google apps
Main menu