CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset includes all experimental data used for the PhD thesis of Cong Liu, entitled "Software Data Analytics: Architectural Model Discovery and Design Pattern Detection". These data are generated by instrumenting both synthetic and real-life software systems, and are formated according to the IEEE XES format. See http://www.xes-standard.org/ and https://www.win.tue.nl/ieeetfpm/lib/exe/fetch.php?media=shared:downloads:2017-06-22-xes-software-event-v5-2.pdf for more explanations.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This repository contains the data generated during the PhD project: Structure-Based Prediction of Protein Behavior in Preparative Chromatography
By Tim Neijenhuis in Delft University of Technology
Supervisors: Marcel Ottens and Marieke Klijn
Department of Biotechnology, Section of Bioprocess Engineering.
When using the data, please refer using:
Neijenhuis, T., Le Bussy, O., Geldhof, G., Klijn, M. E., & Ottens, M. (2024). Predicting protein retention in ion‐exchange chromatography using an open source QSPR workflow. Biotechnology Journal, 19(3), 2300708.
Keulen, D., Neijenhuis, T., Lazopoulou, A., Disela, R., Geldhof, G., Le Bussy, O., ... & Ottens, M. (2025). From protein structure to an optimized chromatographic capture step using multiscale modeling. Biotechnology Progress, 41(1), e3505.
Disela, R., Neijenhuis, T., Le Bussy, O., Geldhof, G., Klijn, M., Pabst, M., & Ottens, M. (2024). Experimental characterization and prediction of Escherichia coli host cell proteome retention during preparative chromatography. Biotechnology and Bioengineering, 121(12), 3848-3859.
The objective of the PhD project was to predict the behavior of proteins for different chromatographic columns using molecular modeling methods.
This repository contains all predicted and measured values for each protein for the different columns investigated during the research in a CSV format.
These name of each file is identical to the figure presented in the dissertation stating first the chapter followed by the figure (as {chapter}_data_{figure}.csv).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NOTE FOR WMT PARTICIPANTS:There is an easier version for MT available in Moses format (one sentence per line. The files start with moses_like.If you use this dataset, please cite the following work:@inproceedings{soares2018parallel, title={A Parallel Corpus of Theses and Dissertations Abstracts}, author={Soares, Felipe and Yamashita, Gabrielli Harumi and Anzanello, Michel Jose}, booktitle={International Conference on Computational Processing of the Portuguese Language}, pages={345--352}, year={2018}, organization={Springer} }In Brazil, the governmental body responsible for overseeing and coordinating post-graduate pro-grams, CAPES, keeps records of all thesis and dissertations presented in the country. Informa-tion regarding such documents can be accessed online in the Thesis and Dissertations Catalog(TDC), which contains abstracts in Portuguese and English, and additional data regarding suchdocuments. Thus, this database can be a potential source of parallel corpora for the Portugueseand English languages. In this article, we present the development of a parallel corpus from TDC,which is made available by CAPES under the open data initiative. Approximately 240,000 doc-uments were collected and aligned using the Hunalign algorithm. We demontrate the capabilityof our developed corpus by training Statistical Machine Translation (SMT) and Neural MachineTranslation (NMT) models for both language directions, followed by a comparison with GoogleTranslator (GT). Our both translation models presented better BLEU scores than GT, with NMTsystem being the most accurate one. Sentence alignment was also manually evaluated, presentingan average of XX% correctly aligned sentences. Our parallel corpus is freely available in TMXformat, with complementary infomration regarding document metadata.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the supplementary material to the masters thesis:
"NUTS-3 Regionalization of Industrial Load Shifting Potential in Germany using a Time-Resolved Model"
LICENSE
All output data provided is under Creative Commons Attribution 4.0 International Public License. All Python scripts provided are under Apache License, Version 2.0. Refer to the ‘Input data documentation’ file for source and data license information for the input data. For license information, refer to the LICENSE files.
DATASET DESCRIPTION
The supplementary material is organized into four different subdirectories:
For more information refer to the README file.
For a detailed description of the approach developed by the author, the input data used and the generated results, refer to the masters thesis "NUTS-3 Regionalization of Industrial Load Shifting Potential in Germany using a Time-Resolved Model", available here: https://elib.dlr.de/134116/
In case of questions please contact: bruno.schyska@dlr.de or wilko.heitkoetter@dlr.de
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Bottum up proteomics (BUP) is a powerful analytical technique that involves digesting complex protein mixtures into peptides and analyzing them with liquid chromatography and tandem mass spectrometry to identify and quantify many proteins simultaneously. This produces massive multidimensional datasets which require informatics tools to analyze. The landscape of software tools for BUP analysis is vast and complex, and often custom programs and scripts are required to answer biological questions of interest in any given experiment.
This dissertation introduces novel methods and tools for analyzing BUP experiments and applies those methods to new samples. First, PrIntMap-R, a custom application for intraprotein intensity mapping, is developed and validated. This application is the first open-source tool to allow for statistical comparisons of peptides within a protein sequence along with quantitative sequence coverage visualization. Next, innovative sample preparation techniques and informatics methods are applied to characterize MUC16, a key ovarian cancer biomarker. This includes the proteomic validation of a novel model of MUC16 differing from the dominant isoform reported in literature. Shifting to bacterial studies, custom differential expression workflows are employed to investigate the role of virulence lipids in mycobacterial protein secretion by analyzing mutant strains of mycobacteria. This work links lipid presence and virulence factor secretion for the first time. Building on these efforts, OnePotN??TA, a labeling technique enabling quantification of N-terminal acetylation in mycobacterial samples, introduced. This method is the first technique to simultaneously quantify protein and N-terminal acetylation abundance using bottom-up proteomics, advancing the field of post-translational modification quantification. This project resulted in the identification of 37 new putative substrates for an N-acetyltransferase, three of which have since been validated biochemically. These tools and methodologies are further applied to various biological research areas, including breast cancer drug characterization and insect saliva analysis to perform the first proteomic studies of their kind with these respective treatments and samples. Additionally, a project focused on teaching programming skills relevant to analytical chemistry is presented. Collectively, this work enhances the analytical capabilities of bottom-up proteomics, providing novel tools and methodologies that advance protein characterization, post-translational modification analysis, and biological discovery across diverse research areas.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
A look at the data provided by the Open Access search engine BASE (http://base-search.net) shows that the Open Science compliance among dissertation theses stagnates. BASE knows three categories of accessibility: Open Access, Unknown, Non-Open Access. In the following tables and graphs, figures reported as "Open Access" have been categorised by BASE as Open Access. The tables and graphics show data from BASE (as of 06.03.2018) as follows:
a) Indexed theses, books and journal articles
b) Indexed theses, books and journal articles published by Open Access
c) indexed theses, books and journal articles under Creative Commons licenses.
d) indexed theses, books and journal articles, which are published under open licenses in the sense of the Open License, i. e. reflect terms of use of the Open Source.
Although doctoral theses already had a high share of open access by 2013 (43%), by 2017 it had risen by only 5% (2017:48%). At the same time, the proportion of books published in open access rose by 14% (from 20% to 34%) and articles by 17% from 44% (2013) to 61% (2017). The same effect can be seen in the proportion of CC-licensed items: Their share rose by 4% (from 9% to 13%) for doctoral theses, by 9% for books (from 4% to 13%) and 8% for articles (from 10% to 18%) between 2013 and 2017. However, the share of openly licensed items is most pronounced: it did not increase for doctoral theses, but remained at 2% between 2013 and 2017; in the same period it increased by 5% (from 1% to 6%) for books, and by 5% (from 5% to 10%) for articles.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Doctoral thesis. Open source probabilistic models for human functional genomics. Includes: the press release, the thesis, and LaTeX sources.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The files contain the dataset for the thesis "Development and Validation of Explainable Machine-Learning Prediction Systems: A Study of Biomedical and Clinical Data".
Chapter 3 includes a patient dataset with CDI (Clostridioides difficile infection) admissions from 2009-2014 in Hong Kong.
Chapter 4 includes a list of protein structure data derived from UniProt (www.uniprot.org) (release 2021_03) and their corresponding enzyme functions. The protein structure data file can be downloaded from the open-source database Protein Data Bank (www.rcsb.org). Additionally, a list of AlphaFold 2 predicted structures is also included, and the structural data can be downloaded from www.alphafold.com.
Chapter 5 contains a list of PDB structures derived from UniProt (release 2023_01).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset corresponds to the raw data and experimental measurements of the PhD thesis "Experimental investigation of the effects of particle shape and friction on the mechanics of granular media" of Gustavo Pinzón (2023, Université Grenoble Alpes), available at: https://hal.science/tel-04202827v1.
The experiments correspond to a drained triaxial compression test of cylindrical granular specimens, a common testing procedure used in soil mechanics to characterise the mechanical response of a specimen under deviatoric loading. Each specimen is 140mm in height and 70mm in diameter, and is composed of more than 20000 ellipsoidal particles of a given aspect ratio and interparticle friction. The dataset comprises the test of six specimens, as a result of the combination of 3 particles shapes (Flat, Medium, and Rounded) and 2 values of interparticle friction (Rough and Smooth). A naming system for the specimens is adopted to reflect the morphology of the composing particles (e.g., the test EFR correspond to the specimen with Flat and Rough particles). Further details on the experimental methods are found in Ch. 2 of the thesis.
The compression tests are performed inside the x-ray scanner of Laboratoire 3SR in Grenoble (France), where the specimens are scanned each 0.5% of axial shortening, at an isotropic voxel size of 100 micrometer per pixel. The obtained radiographies are reconstructed using a Filtered Back Projection algorithm, using the software given by the x-ray cabin manufacturer (RX Solutions, France). The series of obtained 16-bit greyscale 3D images are processed with the open source software spam, version 0.6.2. The coordinate system of all the images is ZYX, where Z corresponds to compression direction. Further details on the image analysis techniques are found in Ch. 3 of the thesis.
Additional greyscale images, raw projections, and x-ray tomography files are available upon request. For visualisation purposes, the 3D images in .tif format can be opened using Fiji.
The evolution of a software system can be studied in terms of how various properties as reflected by software metrics change over time. Current models of software evolution have allowed for inferences to be drawn about certain attributes of the software system, for instance, regarding the architecture, complexity and its impact on the development effort. However, an inherent limitation of these models is that they do not provide any direct insight into where growth takes place. In particular, we cannot assess the impact of evolution on the underlying distribution of size and complexity among the various classes. Such an analysis is needed in order to answer questions such as 'do developers tend to evenly distribute complexity as systems get bigger?', and 'do large and complex classes get bigger over time?'. These are questions of more than passing interest since by understanding what typical and successful software evolution looks like, we can identify anomalous situations and take action earlier than might otherwise be possible. Information gained from an analysis of the distribution of growth will also show if there are consistent boundaries within which a software design structure exists. In our study of metric distributions, we focused on 10 different measures that span a range of size and complexity measures. The raw metric data (4 .txt files and 1 .log file in a .zip file measuring ~0.5MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
<> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><>
Welcome to the FEIS (Fourteen-channel EEG with Imagined Speech) dataset.
<>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
The FEIS dataset comprises Emotiv EPOC+ [1] EEG recordings of:
21 participants listening to, imagining speaking, and then actually speaking 16 English phonemes (see supplementary, below)
2 participants listening to, imagining speaking, and then actually speaking 16 Chinese syllables (see supplementary, below)
For replicability and for the benefit of further research, this dataset includes the complete experiment set-up, including participants' recorded audio and 'flashcard' screens for audio-visual prompts, Lua script and .mxs scenario for the OpenVibe [2] environment, as well as all Python scripts for the preparation and processing of data as used in the supporting studies (submitted in support of completion of the MSc Speech and Language Processing with the University of Edinburgh):
J. Clayton, "Towards phone classification from imagined speech using a lightweight EEG brain-computer interface," M.Sc. dissertation, University of Edinburgh, Edinburgh, UK, 2019.
S. Wellington, "An investigation into the possibilities and limitations of decoding heard, imagined and spoken phonemes using a low-density, mobile EEG headset," M.Sc. dissertation, University of Edinburgh, Edinburgh, UK, 2019.
Each participant's data comprise 5 .csv files -- these are the 'raw' (unprocessed) EEG recordings for the 'stimuli', 'articulators' (see supplementary, below) 'thinking', 'speaking' and 'resting' phases per epoch for each trial -- alongside a 'full' .csv file with the end-to-end experiment recording (for the benefit of calculating deltas).
To guard against software deprecation or inaccessability, the full repository of open-source software used in the above studies is also included.
We hope for the FEIS dataset to be of some utility for future researchers, due to the sparsity of similar open-access databases. As such, this dataset is made freely available for all academic and research purposes (non-profit).
<> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><>
REFERENCING
<>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
If you use the FEIS dataset, please reference:
<> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><>
LEGAL
<>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
The research supporting the distribution of this dataset has been approved by the PPLS Research Ethics Committee, School of Philosophy, Psychology and Language Sciences, University of Edinburgh (reference number: 435-1819/2).
This dataset is made available under the Open Data Commons Attribution License (ODC-BY): http://opendatacommons.org/licenses/by/1.0
<> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><>
ACKNOWLEDGEMENTS
<>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
The FEIS database was compiled by:
Scott Wellington (MSc Speech and Language Processing, University of Edinburgh) Jonathan Clayton (MSc Speech and Language Processing, University of Edinburgh)
Principal Investigators:
Oliver Watts (Senior Researcher, CSTR, University of Edinburgh) Cassia Valentini-Botinhao (Senior Researcher, CSTR, University of Edinburgh)
<>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
METADATA
<> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><>
For participants, dataset refs 01 to 21:
01 - NNS 02 - NNS 03 - NNS, Left-handed 04 - E 05 - E, Voice heard as part of 'stimuli' portions of trials belongs to particpant 04, due to microphone becoming damaged and unusable prior to recording 06 - E 07 - E 08 - E, Ambidextrous 09 - NNS, Left-handed 10 - E 11 - NNS 12 - NNS, Only sessions one and two recorded (out of three total), as particpant had to leave the recording session early 13 - E 14 - NNS 15 - NNS 16 - NNS 17 - E 18 - NNS 19 - E 20 - E 21 - E
E = native speaker of English NNS = non-native speaker of English (>= C1 level)
For participants, dataset refs chinese-1 and chinese-2:
chinese-1 - C chinese-2 - C, Voice heard as part of 'stimuli' portions of trials belongs to participant chinese-1
C = native speaker of Chinese
<>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
SUPPLEMENTARY
<> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><>
Under the international 10-20 system, the Emotiv EPOC+ headset 14 channels:
F3 FC5 AF3 F7 T7 P7 O1 O2 P8 T8 F8 AF4 FC6 F4
The 16 English phonemes investigated in dataset refs 01 to 21:
/i/ /u:/ /æ/ /ɔ:/ /m/ /n/ /ŋ/ /f/ /s/ /ʃ/ /v/ /z/ /ʒ/ /p /t/ /k/
The 16 Chinese syllables investigated in dataset refs chinese-1 and chinese-2:
mā má mǎ mà mēng méng měng mèng duō duó duǒ duò tuī tuí tuǐ tuì
All references to 'articulators' (e.g. as part of filenames) refer to the 1-second 'fixation point' portion of trials. The name is a layover from preliminary trials which were modelled on the KARA ONE database (http://www.cs.toronto.edu/~complingweb/data/karaOne/karaOne.html) [3].
<>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
<> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><>
[1] Emotiv EPOC+. https://emotiv.com/epoc. Accessed online 14/08/2019.
[2] Y. Renard, F. Lotte, G. Gibert, M. Congedo, E. Maby, V. Delannoy, O. Bertrand, A. Lécuyer. “OpenViBE: An Open-Source Software Platform to Design, Test and Use Brain-Computer Interfaces in Real and Virtual Environments”, Presence: teleoperators and virtual environments, vol. 19, no 1, 2010.
[3] S. Zhao, F. Rudzicz. "Classifying phonological categories in imagined and articulated speech." In Proceedings of ICASSP 2015, Brisbane Australia, 2015.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data used for developing an application that mapped Fowler's code smell to Pylint code smells. Firstly 179 files that contained the Pylint result were used. Those files came from projects that were submitted for Babes-Bolyai University's "Fundamentals of programming" course for the 2019-20202 academic year. Since those file did not contain main refactoring code smell, the application was tested on another set of data. The second set also came from set of projects submitted for Babes-Bolyai University's "Formal Languages and Compilation Techniques" course for the 2019-2020 academic year. For the second set of data access to code was also provided, so it was easier to perform an analysis on the code as well. The problem was that again, not so many refactoring code smell were discovered here, and in order to test the application a bigger project was needed. The third set of date came form the open source project called "TensorFlow" . This is a machine learning platform that runs from start to finish. It features a large, flexible ecosystem of tools, libraries, and community resources that enable academics to push the boundaries of machine learning and developers to quickly build and deploy ML-powered apps. All of this data can be found in the file.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A photogrammetry dataset was collected using an Unmanned Aerial Vehicle (quadcopter) over a river stretch of the Black Volta at Bamboi Bridge - Ghana. Also Ground Control Points (GCPs) were collected that represent black-and-white markers in the landscape. These can be used to better geographically constrain the photogrammetric solution. GCPs have been associated with row and column pixel location in each photo in which they appear.
The raw data was processed into a 3D point cloud using the open-source software platform WebOpenDroneMap (WebODM). The point cloud was analysed for removal of vegetation using spatial filtering techniques with the intent to make a bare-earth topographical map of the dry part of the riverbed. Both filtered and unfiltered point clouds were further processed into a Digital Surface Model (unfiltered) and Digital Terrain Model (DTM). The unfiltered dataset was also processed into an RGB orthophoto. In the thesis work of Hoogendoorn (2023) further research was done on combining the results of these datasets and analyses with wet bathymetry points collected using a fishfinder equipped with Real-Time-Kinematics GNSS, and using the outcoming full bathymetry for hydraulic modelling and understanding of relationships between wetted geometry and river discharge. For more information, we refer to the MSc thesis work of Hoogendoorn (2023)
The data files consist of three (3) .zip files. Unzip these to get access to all underlying files. For a quick overview, a .qgs file can be opened in QGIS. This will display all layers in a simple GIS project. The point cloud is also visualized but may take significant time before rendered, as points first need to be cached.
References: Hoogendoorn, N. J.: 3D River Discharge Modelling using UAV photogrammetry | TU Delft Repository, Delft University of Technology, Delft, The Netherlands, 2023.
Link: https://repository.tudelft.nl/record/uuid:d4088a50-3590-4675-9600-d715800841a3
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the related dataset for the PhD dissertation by G. A. A. Prana, "Can We Make It Better? Assessing and Improving Quality of GitHub Repositories", available at https://ink.library.smu.edu.sg/etd_coll/373/The code hosting platform GitHub has gained immense popularity worldwide in recent years, with over 200 million repositories hosted as of June 2021. Due to its popularity, it has great potential to facilitate widespread improvements across many software projects. Naturally, GitHub has attracted much research attention, and the source code in the various repositories it hosts also provide opportunity to apply techniques and tools developed by software engineering researchers over the years. However, much of existing body of research applicable to GitHub focuses on code quality of the software projects and ways to improve them. Fewer work focus on potential ways to improve quality of GitHub repositories through other aspects, although quality of a software project on GitHub is also affected by factors outside a project's source code, such as documentation, the project's dependencies, and pool of contributors.The three works that form this dissertation focus on investigating aspects of GitHub repositories beyond the code quality, and identify specific potential improvements that can be applied to improve wide range of GitHub repositories. In the first work, we aim to systematically understand the content of README files in GitHub software projects, and develop a tool that can process them automatically. The work begins with a qualitative study involving 4,226 README file sections from 393 randomly-sampled GitHub repositories, which reveals that many README files contain the What'' and
How'' of the software project, but often do not contain the purpose and status of the project. This is followed by a development and evaluation of a multi-label classifier that can predict eight different README content categories with F1 of 0.746. From our subsequent evaluation of the classifier, which involve twenty software professionals, we find that adding labels generated by the classifier to README files ease information discovery.Our second work focuses on characteristics of vulnerabilities in open-source libraries used by 450 software projects on GitHub that are written in Java, Python, and Ruby. Using an industrial software composition analysis tool, we scanned every version of the projects after each commit made between November 1, 2017 and October 31, 2018. Our subsequent analyses on the discovered library names, versions, and associated vulnerabilities reveal, among others, that Denial of Service'' and
Information Disclosure'' vulnerability types are common. In addition, we also find that most of the vulnerabilities persist throughout the observation period, and that attributes such as project size, project popularity, and experience level of commit authors do not translate to better or worse handling of vulnerabilities in dependent libraries. Based on the findings in the second work, we list a number of implications for library users, library developers, as well as researchers, and provide several concrete recommendations. This includes recommendations to simplify projects' dependency sets, as well as to encourage research into ways to automatically recommend libraries known to be secure to developers.In our third work, we conduct a multi-region geographical analysis of gender inclusion on GitHub. We use a mixed-methods approach involving a quantitative analysis of commit authors of 21,456 project repositories, followed by a survey that is strategically targeted to developers in various regions worldwide and a qualitative analysis of the survey responses. Among other findings, we discover differences in diversity levels between regions, with Asia and Americas being highest. We also find no strong correlation between gender and geographic diversity of a repository's commit authors. Further, from our survey respondents worldwide, we also identify barriers and motivations to contribute to open-source software. The results of this work provides insights on the current state of gender diversity in open source software and potential ways to improve participation of developers from under-represented regions and gender, and subsequently improve the open-source software community in general. Such potential ways include creation of codes of conduct, proximity-based mentorship schemes, and highlighting of women / regional role models.
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Mathematical models have become increasingly critical due to the rapid advances in computational methods in recent decades. However, the validation of these models often demands extensive and costly data, leading to time-consuming processes. Traditional design of experiments (DoE) methods struggle to choose informative experiments, especially for the typically large-scale, nonlinear, and dynamical science-based models in chemical and biomolecular engineering (CBE). In this dissertation, I propose a sequential model validation workflow powered by novel DoE and measurement optimization (MO) frameworks to improve data acquisition efficiency and accelerate the model building and validation process. The workflow relies on two scalable and tractable frameworks, along with their generalized open-source software tools:
Measurement optimization: determines what to measure for experiments to maximize the experimental information content. It guides apparatus preparation during the experimental setup stage, balancing the information content with practical constraints such as budgets
Model-based design of experiments: quantifies experimental information content statistically and optimizes experiment selection based on updated model information. This framework is used throughout the model validation process, recommending new experiments after each iteration to update the model
Both frameworks focus on addressing the challenges of DoE and MO techniques leveraging large-scale, nonlinear, and dynamical models in CBE, providing user-friendly open-source software tools for widespread applications.
In this dissertation, I describe the development of the model-based DoE and the MO frameworks and their generalized tools to streamline the model validation workflow for complex models such as partial differential algebraic equations (PDAEs). I briefly discuss how these frameworks and the open-source software tool contribute to the broad DoE technique paradigm and applications. I demonstrate the tractability and scalability of the frameworks with laboratory and pilot-scale carbon capture experiments. Moreover, generalized open-source software tools are developed and applied to carbon capture experiments, highlighting their versatility and practicality. This dissertation lays the groundwork for a sequential MO and MBDoE workflow that can be readily applied to various challenging problems in CBE and beyond, offering potential benefits to broader science, technology, engineering, and math (STEM) communities. I conclude with a discussion of the future directions, and provide preliminary works for some future directions as a starting point.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data set based on the city of Amsterdam, which has been used in various simulations using the open-source agent-based transport simulation model MATSim (https://www.matsim.org/). It contains a networks, agents' plans (and the description how these have been derived from the ALBATROSS data set), configuration files and additional information material.
Regarding the original ALBATROSS data set, please contact Prof. Soora Rasouli (TU Eindhoven)
egon-data provides a transparent and reproducible open data based data processing pipeline for generating data models suitable for energy system modeling. The data is customized for the requirements of the research project eGon. The research project aims to develop tools for an open and cross-sectoral planning of transmission and distribution grids. For further information please visit the eGon project website or its Github repository.
egon-data retrieves and processes data from several different external input sources. As not all data dependencies can be downloaded automatically from external sources we provide a data bundle to be downloaded by egon-data.
The following data sets are part of the available data bundle:
These data comprise the data of four chapters from the PhD thesis from Frauendorf (2022) entitled 'Causes for spatiotemporal variation in reproductive performance of Eurasian oystercatchers in a human-dominated landscape’. The thesis focusses on quantifying the anthropogenic impacts on the reproductive performance of oystercatchers across the Netherlands. The dataset contains data from chapter 3, 5 and 6, which were used in the thesis but are not published open access yet.
For chapter 3 oystercatchers were caught during winter across the wintering ground of oystercatchers (Wadden sea and Delta estuary) and their condition was measured (physiological measurements through blood and biometric measurements). The data also include resighting data from ringed individuals that were used for mark-recapture survival analysis (state and age matrices). In addition, environmental variables were collected from open source data. Next, the birds were followed from their wintering ground to the breeding ground where we measured their reproductive success.
Chapter 5 includes data about the reproductive performance of oystercatchers (available from several different data sources across the Netherlands). Next, we collected data on the environment from different (open access) data sources about for instance habitat type, land use intensity, predation and food availability.
In chapter 6, we used data on bill shape as a proxy for feeding specialization based on data from winter catches (chapter 3 in PhD thesis) to illustrate the proportion of birds with different feeding specialization in the studied population.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All Raw and Processed Data + written Thesis. Data and Figures are stored in the 'Figures_and_Data' Directory. Experimental Measurements were done by means of BLS Microscopy (group of H. Schultheiß at HZDR). Micromagnetic Simulations were done at the Hemera Cluster (Dr. A. Kakay at HZDR). Data Analysis was done in Python or Jupyter Notebooks (Open Source). All scripts are included. Graphics were done using OmniGraffle and Blender. Plotting was done using Python and 'Plot2' (Mac Only!). All Files/Data/Skripts are sorted by Figure! The entire Latex Package is stored under 'Thesis_Hula' - Dissertation.tex is the main file and shows all required dependencies.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/YYGB58https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/YYGB58
In a democracy, the relationship between the preferences of the citizens and the policies of the government is, in principle, fundamental. Whether this principle holds in practice has been the subject of a long but inconclusive debate in the political science literature. This dissertation focuses primarily on a different question, namely, what are the determinants of mass preferences over welfare state policies? To answer this question, new quantitative methods are developed, implemented in a Free, Libre, and Open Source Software package, and applied to relatively recent data. The primary contributions of this dissertation to the social science literature are two-fold. First, we present new empirical results on mass political preferences that will be of interest to political scientists, economists, and researchers in other fields. Second, those empirical results are obtained from new estimators that are especially useful for modeling preferences but are also useful for modeling other multivariate phenomena. The strength of these empirical results will hopefully spur innovation on a third front, namely the way in which political economists develop theoretical models of the process by which political preferences are aggregated in democracies. The first chapter is largely empirical and tests traditional political economy theories of preferences for redistribution against theories of inequality aversion, using the method developed in the second chapter. The main empirical conclusion of the first chapter is that a plurality of the variance in preferences for redistribution is attributable to differences in inequality aversion. The second chapter is methodological and attempts to answer the question of how many explanatory variables went into the data-generating process for the outcome variables we observe. The third chapter develops another new estimator and applies it to empirical data on preferences for redistribution and immigration. The main empirical conclusion of the third chapter is that not only is inequality aversion important to our understanding of preferences for redistribution but that it is is mostly exogenous to other factors in the model.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset includes all experimental data used for the PhD thesis of Cong Liu, entitled "Software Data Analytics: Architectural Model Discovery and Design Pattern Detection". These data are generated by instrumenting both synthetic and real-life software systems, and are formated according to the IEEE XES format. See http://www.xes-standard.org/ and https://www.win.tue.nl/ieeetfpm/lib/exe/fetch.php?media=shared:downloads:2017-06-22-xes-software-event-v5-2.pdf for more explanations.