Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
My PhD thesis
Computational medical image analysis - With a focus on real-time fMRI and non-parametric statistics
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains in high resolution all graphical visualizations of data analysis provided in my doctoral dissertation. The graphs are organized according to chapters and subchapters and titeled respectively. Additionally, this dataset provides all dataframes (German, English, and Armenian) in XLSX format of the manual semantic annotation based on which the graphs are generated. Among presented graphical visualizations are (Multiple) Correspondence Analysis (MCA vs. CA), Mosaic-Plots, Conditional Infererence Trees (CIT), and Context-Conditional Correlations Graphs (CCCG).
Facebook
TwitterOver the last 20 years, statistics preparation has become vital for a broad range of scientific fields, and statistics coursework has been readily incorporated into undergraduate and graduate programs. However, a gap remains between the computational skills taught in statistics service courses and those required for the use of statistics in scientific research. Ten years after the publication of "Computing in the Statistics Curriculum,'' the nature of statistics continues to change, and computing skills are more necessary than ever for modern scientific researchers. In this paper, we describe research on the design and implementation of a suite of data science workshops for environmental science graduate students, providing students with the skills necessary to retrieve, view, wrangle, visualize, and analyze their data using reproducible tools. These workshops help to bridge the gap between the computing skills necessary for scientific research and the computing skills with which students leave their statistics service courses. Moreover, though targeted to environmental science graduate students, these workshops are open to the larger academic community. As such, they promote the continued learning of the computational tools necessary for working with data, and provide resources for incorporating data science into the classroom.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Probabilistic models such as logistic regression, Bayesian classification, neural networks, and models for natural language processing, are increasingly more present in both undergraduate and graduate statistics and data science curricula due to their wide range of applications. In this article, we present a one-week course module for students in advanced undergraduate and applied graduate courses on variational inference, a popular optimization-based approach for approximate inference with probabilistic models. Our proposed module is guided by active learning principles: In addition to lecture materials on variational inference, we provide an accompanying class activity, an R shiny app, and guided labs based on real data applications of logistic regression and clustering documents using Latent Dirichlet Allocation with R code. The main goal of our module is to expose students to a method that facilitates statistical modeling and inference with large datasets. Using our proposed module as a foundation, instructors can adopt and adapt it to introduce more realistic case studies and applications in data science, Bayesian statistics, multivariate analysis, and statistical machine learning courses.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This folder contains data generated during the PhD project: "Computational modeling and optimization of biopharmaceutical downstream processes".
By Daphne Keulen in Delft University of Technology and GSK
Supervisors: Marcel Ottens & Martin Pabst
Department of Biotechnology, section of Bioprocess Engineering
When using the date, please reference Keulen et al. (2024) dissertation "Computational modeling and optimization of biopharmaceutical downstream processes".
Chapter 4 - Comparing in silico flowsheet optimization strategies in biopharmaceutical downstream processes:
Contains the experimental data of the ultra- and diafiltration experiment using BSA (bovine serum albumin), related to Figure 4.5
UFDF_BSA_protein contains the experimental data of the measured protein amount in mAU.
The measured protein amount in mAU can be converted to concentration (mg/mL) using the calibration line, provided in the document calibration lines.
UFDF_BSA_salt contains the experimental data of the measured salt concentration in conductivity.
The measured salt concentration in mS/cm can be converted to salt concentration (mg/mL) using the calibration line, provided in the document calibration lines.
Chapter 5 - From protein structure to an optimized chromatographic capture step using multiscale modeling:
Contains the experimental data of the mechanistic model validation and the regression plots, related to Figure 5.4, Figure 5.2 and Appendix Figure A2 - A5
Linear gradient experiments using the AKTA were conducted for two protein mixtures at four pH values (pH 3.5, 4.3, 5.0, and 7.0) and various gradient lengths (20, 30, 40, 60, and 80 CV).
Folders pH_3.5, pH_4.3, pH_5, and pH_7 contain the raw measured data during the experiments. pH 5 and pH 7 also contain the converted concentrations used
for the mechanistic model validation (chapter_5_chromatographic_data_concentrations_pH_5.0 & chapter_5_chromatographic_data_concentrations_pH_7.0)
For the experimental conditions and methods, more information can be found in the PhD thesis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The files presented here include information on data access (please click on the file titled "READ_ME_FIRST") and data analysis (i.e., file named "Inequitable_Interactions_Full_Syntax"). These files are presented in support of the article Inequitable interactions: A critical quantitative analysis of mentorship and psychosocial development within computing graduate school pathways, to be published in a forthcoming issue of AERAOpen.Abstract for the paper:Mentorship is vital to increasing graduate school access in computing; however, mentorship must be structured in power-conscious, developmental ways to ensure equitable access to and support within computing graduate pathways. I engage a critical quantitative lens to examine mentoring support among undergraduates with reported graduate aspirations, taking a nuanced look at departmental mentorship to investigate how organizational power in computing may maintain inequitable mentoring outcomes. Descriptive and regression analyses draw from a longitudinal sample of 442 graduate aspirants in computing who completed an introductory course survey (between 2015-2017) and a follow-up survey (fall 2019). Results document significant variation in forms of mentoring support and disciplinary psychosocial beliefs (i.e., computing identity and self-efficacy), with key patterns across graduate aspirants’ social identities and mentors’ organizational power (via their departmental roles). I conclude by discussing structural and social inequities in mentorship, which may underscore disparities in students’ realization of their computing graduate aspirations.
Facebook
TwitterThe joint UNESCO-OECD-Eurostat (UOE) data collection on formal education systems provides annual data on student participation and completion of educational programmes as well as data on personnel, cost and type of resources devoted to education. The reference period for non-monetary education data is the school year and for monetary data it is the calendar year. The International Statistics of Education and Training Systems ÔÇô UNESCO-UIS/OECD/Eurostat (UOE) Questionnaire aims to provide the data required by international bodies, in addition to offering results at the national level. It is a synthesis and analysis operation that appears in the National Statistical Plan 2021-2024 (Prog. 8677) and is carried out by the S.G. of Statistics and Studies of the Ministry of Education and Vocational Training in collaboration with the Ministry of Universities and the National Institute of Statistics. Its purpose is to integrate the statistical information of the activity of the educational-training system in its different levels of education in order to meet the demands of international statistics, of the same name, requested by Eurostat, OECD and UNESCO-UIS. A selection of tables with data derived from this statistic is provided below, together with a presentation summary note:
Facebook
TwitterThe dataset contains csv. and excel files zipped into a tar.gz file which underpin the Predicted crystal structures and calculated properties.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This domain covers statistics and indicators on key aspects of the education systems across Europe. The data show entrants and enrolments in education levels, education personnel and the cost and type of resources dedicated to education.
For a general technical description of the UOE Data Collection see UNESCO OECD Eurostat (UOE) joint data collection – methodology - Statistics Explained (europa.eu).
The standards on international statistics on education and training systems are set by the three international organisations jointly administering the annual UOE data collection:
The following topics are covered:
Data on enrolments in education are disseminated in absolute numbers, with breakdowns available for the following dimensions:
Additionally, the following types of indicators on enrolments are calculated (all indicators using population data use Eurostat’s population database (demo_pjan)):
Data on entrants in education are disseminated in absolute numbers, with breakdowns available for the following dimensions:
Additionally the following indicator on entrants is calculated:
Data on learning mobility is available for degree mobile students, degree mobile graduates and credit mobile graduates. Degree mobility means that students/graduates are/were enrolled as regular students in any semester/term of a programme taught in the country of destination with the intention of graduating from it in the country of destination. Credit mobility is defined as temporary tertiary education or/and study-related traineeship abroad within the framework of enrolment in a tertiary education programme at a "home institution" (usually) for the purpose of gaining academic credit (i.e. credit that will be recognised in that home institution). Further definitions are in Section 2.8 of the UOE manual.
Degree mobile students are referred to as just ‘mobile students’ in UOE learning mobility tables. Data is disseminated for degree mobile students and degree mobile graduates in absolute numbers with breakdowns available for the following dimensions:
Additionally the following types of indicators on degree mobile students and degree mobile graduates are calculated ((all indicators using population data use Eurostat’s population database (demo_pjan)):
For credit mobile graduates, data are disseminated in absolute numbers, with breakdowns available for the following dimensions:
Data on personnel in education are available for classroom teachers/academic staff, teacher aides and school-management personnel. Teachers are employed in a professional capacity to guide and direct the learning experiences of students, irrespective of their training, qualifications or delivery mechanism. Teacher aides support teachers in providing instruction to students. Academic staff are personnel employed at the tertiary level of education whose primary assignment is instruction and/or research. School management personnel covers professional personnel who are responsible for school management/administration (ISCED 0-4) or whose primary or major responsibility is the management of the institution, or a recognised department or subdivision of the institution (tertiary levels). Full definitions of these statistical units are in Section 3.5 of the UOE manual.
Data are disseminated on teachers and academic staff in absolute numbers, with breakdowns available for the following dimensions:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Taiwan Number of Graduate: HE: Female: Computing data was reported at 5,617.000 Person in 2016. This records a decrease from the previous number of 6,052.000 Person for 2015. Taiwan Number of Graduate: HE: Female: Computing data is updated yearly, averaging 7,271.000 Person from Jul 1998 (Median) to 2016, with 19 observations. The data reached an all-time high of 10,973.000 Person in 2005 and a record low of 5,617.000 Person in 2016. Taiwan Number of Graduate: HE: Female: Computing data remains active status in CEIC and is reported by Ministry of Education. The data is categorized under Global Database’s Taiwan – Table TW.G056: Number of Graduate.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The past decade has seen a rapid increase in the ability of biologists to collect large amounts of data. It is therefore vital that research biologists acquire the necessary skills during their training to visualize, analyze, and interpret such data. To begin to meet this need, we have developed a “boot camp” in quantitative methods for biology graduate students at Harvard Medical School. The goal of this short, intensive course is to enable students to use computational tools to visualize and analyze data, to strengthen their computational thinking skills, and to simulate and thus extend their intuition about the behavior of complex biological systems. The boot camp teaches basic programming using biological examples from statistics, image processing, and data analysis. This integrative approach to teaching programming and quantitative reasoning motivates students’ engagement by demonstrating the relevance of these skills to their work in life science laboratories. Students also have the opportunity to analyze their own data or explore a topic of interest in more detail. The class is taught with a mixture of short lectures, Socratic discussion, and in-class exercises. Students spend approximately 40% of their class time working through both short and long problems. A high instructor-to-student ratio allows students to get assistance or additional challenges when needed, thus enhancing the experience for students at all levels of mastery. Data collected from end-of-course surveys from the last five offerings of the course (between 2012 and 2014) show that students report high learning gains and feel that the course prepares them for solving quantitative and computational problems they will encounter in their research. We outline our course here which, together with the course materials freely available online under a Creative Commons License, should help to facilitate similar efforts by others.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset supports the PhD thesis titled "Learning-Based Control Under Constraints: Towards Safety and Computational Efficiency". The thesis comprises six main chapters (Chapters 2–7), and the data is organized accordingly. All simulations were conducted in a MATLAB environment.
Facebook
TwitterThe item contains all supporting data to the PhD thesis. This DVD was created to accompany the thesis "Numerical Solutions of the General Relativistic Equations for Black Hole Fluid Dynamics" by Philip Blakely. These files contains a navigable presentation of the various simulations of Bondi-Hoyle-Lyttleton accretion onto a Kerr black hole. The code used to produce these plots is described in the printed thesis.
Facebook
TwitterThe evolution of a software system can be studied in terms of how various properties as reflected by software metrics change over time. Current models of software evolution have allowed for inferences to be drawn about certain attributes of the software system, for instance, regarding the architecture, complexity and its impact on the development effort. However, an inherent limitation of these models is that they do not provide any direct insight into where growth takes place. In particular, we cannot assess the impact of evolution on the underlying distribution of size and complexity among the various classes. Such an analysis is needed in order to answer questions such as 'do developers tend to evenly distribute complexity as systems get bigger?', and 'do large and complex classes get bigger over time?'. These are questions of more than passing interest since by understanding what typical and successful software evolution looks like, we can identify anomalous situations and take action earlier than might otherwise be possible. Information gained from an analysis of the distribution of growth will also show if there are consistent boundaries within which a software design structure exists. In our study of metric distributions, we focused on 10 different measures that span a range of size and complexity measures. The raw metric data (4 .txt files and 1 .log file in a .zip file measuring ~0.5MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Raw data for the PhD study (K. W. Kwong), including computational, electrochemical, photophysical, NMR, X-ray, OLED application and memory data.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This repository contains code and data related to the underlying PhD thesis: Data-driven methods to design, learn, and interpret complex materials across scales. The repository is divided into the individual codes and datasets of each chapter. Chapter 2 explores the inverse design of 2D metamaterials for elastic properties, utilizing machine learning techniques to optimize material structure and performance. Chapter 3 focuses on learning hyperelastic material models without relying on stress data, employing data-driven approaches to predict material behavior under large strains. Chapter 4 extends this by developing interpretable hyperelastic material models, ensuring both accuracy and physical consistency without stress data. Chapter 5 explores the inverse design of 3D metamaterials under finite strains and applies novel ML frameworks to design these complex material structures. Chapter 6 investigates the use of deep learning to uncover key predictors of thermal conductivity in covalent organic frameworks (COFs) and reveals new insights into the relationship between molecular structure and thermal transport. Chapter 7 introduces a graph grammar-based approach for generating novel polymers in data-scarce settings, thus combines computational design with minimal data.
Facebook
TwitterIt is a widely accepted fact that evolving software systems change and grow. However, it is less well-understood how change is distributed over time, specifically in object oriented software systems. The patterns and techniques used to measure growth permit developers to identify specific releases where significant change took place as well as to inform them of the longer term trend in the distribution profile. This knowledge assists developers in recording systemic and substantial changes to a release, as well as to provide useful information as input into a potential release retrospective. In order to manage the evolution of complex software systems effectively, it is important to identify change-prone classes as early as possible, but these analysis methods can only be applied after a mature release of the code has been developed. Specifically, developers need to know where they can expect change, the likelihood of a change, and the magnitude of these modifications in order to take proactive steps and mitigate any potential risks arising from these changes. We present a statistical analysis of change in approximately 55000 unique classes across all projects under investigation. The raw metric data (4 .txt files and 4 .log files in a .zip file measuring ~2MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).
Facebook
TwitterIn the year 2021, the number of post graduate admissions in the college of computing and informatics at the Saudi Electronic University (SEU) in Saudi Arabia was ***. About ** percent of admissions occurred in 2020 and 2021. The COVID-19 pandemic in 2019 helped accelerating the transition of the Saudi education system to digital readiness.
Facebook
TwitterDataset and associated material from an intervention study to test whether online video resources integrated into an action plan will result in greater engagement resources than when the resources are not part of a plan. Data were collected in partnership with a wellbeing company specialising in the provision of online wellbeing resources. The data collected were quantitative data about participants’ engagement with the wellbeing resources, including demographic and self-reported variables (N = 67), qualitative text data about participants' perceived barriers to the intervention implementation (N = 67) and anonymised transcripts of qualitative follow up interviews with study participants (N = 10).
Proper nutrition and healthy diets are a key aspect of health, which mandatory food labelling in the UK tries to address by empowering people with the information to help them make healthier choices. The format of this information (e.g., verbal quantifiers like 'low fat' or numerical quantifiers like '5% fat') affects whether people can easily understand and use food labels. Examining how people's judgements and decisions with respect to food differ depending on food label format therefore has wide-reaching impact for health policy decisions, consumer behaviour, and food industry practice. This project will use computational methods to identify different strategies people use to decide what foods are healthiest (e.g., less fat, or less sugar, etc.) I will evaluate which strategies produce the healthiest choices, use these insights to inform policy and conduct knowledge exchange with my industry partner. The project will consolidate my PhD, which investigated differences in people's decision-making strategies when using verbal and numerical quantifiers on food labels. Using a mixture of behavioural tasks, surveys, and eye-tracking methodology, I identified that different ways of presenting quantities can lead to people relying on different pieces of information to judge food. I intend to extend this research and maximise its impact in four ways. First, I will apply new and advanced statistical modelling to my research. To classify and predict food choice strategies in my data, I will learn two modelling techniques: multinomial processing trees, a probability-based method to classify choices, and machine learning, which makes predictions based on patterns in data. For example, I would expect the models to identify cues on food labelling that predict the choices people will make. Using the results of these analyses, I will submit a planned research protocol (a 'Registered Report') to test my model on real-life products. Registered Reports receive peer review prior to data collection, so submitting it during the Fellowship supports my future academic research beyond the Fellowship. Second, I will extend the impact of my work through knowledge exchange with the start-up company Keep Fit Eat Fit Wellbeing Ltd (KFEF). As part of a holistic wellness package, KFEF produces healthy eating advice and recipes with nutritional information for their clients. My research will inform the design of their content for clients. In turn, working with them gives me access to usage metrics from their customer portal that I will analyse to determine if the communication formats are effective. These real-world data will reinforce the lab studies from my PhD and help KFEF improve their product offering. Third, I will disseminate my research findings to academic and non-academic audiences. For academic audiences, I will produce three new journal articles and present my work at one local and one international academic conference. I will also engage with non-academic audiences through preparing press releases, submitting a policy brief to present at the All-Party Parliamentary Food and Health Forum, and attending a Westminster Food and Nutrition Forum conference. Engaging with policy-makers through these channels will help me lobby for positive change to food labelling guidelines. Finally, I will prepare a proposal for funding from the Wellcome Trust to create and test a technological system that supports informed food choices. This future proposal will be informed by my PhD data, computational modelling research, and collaborations with: industry (Keep Fit Eat Fit), experts in shaping behavioural policy (at the University of Reading), and experts in technological health interventions (at the University of Konstanz). Ultimately, my research seeks to improve the food choice environment for consumers and empower them to make informed, healthy choices.
Facebook
TwitterSome research has indicated that the relationship between students' study behavior and their academic performance is as strong as the relationship to more common predictors such as past performance and test scores. However, knowledge about students' study behavior, how behavior develops and is influenced by program and course design, and consequently, the effect various design parameters have on learning is limited. This data is part of a PhD project and relates to Study 2. This mixed-method study followed a population of computing students through their first year. Results from in-depth interviews with students throughout their first year found that the educational structure and organization of a study program conditions the students' study behavior. In order to further investigate these tendencies, two surveys (N=215) were conducted within the whole first-year student population at the beginning and end of the year. The dataset for this analysis is included in this repository. A significant difference found was in the use of surface and deep strategies at the beginning and end for the first year, indicating that students shift from deep to surface learning during the year. Even if students initially seek a deep content-driven approach to learning, the structure of the education and other organizational factors may be the cause of a more surface and task-focused approach towards the end of the first year. Students' study behavior is constrained by the educational design, which furthermore may lead to different learning outcomes than desired. Researching and developing learning goals, course content, lectures and assignments is one way to improve computing education; however, this research suggests that taking a comprehensive and integrated approach to educational design might also lead to improvements.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
My PhD thesis
Computational medical image analysis - With a focus on real-time fMRI and non-parametric statistics