Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is provided in a single .xlsx file named "eucalyptus_growth_environment_data_V2.xlsx" and consists of fifteen sheets:
Codebook: This sheet details the index, values, and descriptions for each field within the dataset, providing a comprehensive guide to understanding the data structure.
ALL NODES: Contains measurements from all devices, totalling 102,916 data points. This sheet aggregates the data across all nodes.
GWD1 to GWD10: These subset sheets include measurements from individual nodes, labelled according to the abbreviation “Generic Wireless Dendrometer” followed by device IDs 1 through 10. Each sheet corresponds to a specific node, representing measurements from ten trees (or nodes).
Metadata: Provides detailed metadata for each node, including species, initial diameter, location, measurement frequency, battery specifications, and irrigation status. This information is essential for identifying and differentiating the nodes and their specific attributes.
Missing Data Intervals: Details gaps in the data stream, including start and end dates and times when data was not uploaded. It includes information on the total duration of each missing interval and the number of missing data points.
Missing Intervals Distribution: Offers a summary of missing data intervals and their distribution, providing insight into data gaps and reasons for missing data.
All nodes utilize LoRaWAN for data transmission. Please note that intermittent data gaps may occur due to connectivity issues between the gateway and the nodes, as well as maintenance activities or experimental procedures.
Software considerations: The provided R code named “Simple_Dendro_Imputation_and_Analysis.R” is a comprehensive analysis workflow that processes and analyses Eucalyptus growth and environmental data from the "eucalyptus_growth_environment_data_V2.xlsx" dataset. The script begins by loading necessary libraries, setting the working directory, and reading the data from the specified Excel sheet. It then combines date and time information into a unified DateTime format and performs data type conversions for relevant columns. The analysis focuses on a specified device, allowing for the selection of neighbouring devices for imputation of missing data. A loop checks for gaps in the time series and fills in missing intervals based on a defined threshold, followed by a function that imputes missing values using the average from nearby devices. Outliers are identified and managed through linear interpolation. The code further calculates vapor pressure metrics and applies temperature corrections to the dendrometer data. Finally, it saves the cleaned and processed data into a new Excel file while conducting dendrometer analysis using the dendRoAnalyst package, which includes visualizations and calculations of daily growth metrics and correlations with environmental factors such as vapour pressure deficit (VPD).
OSU_SnowCourse Summary: Manual snow course observations were collected over WY 2012-2014 from four paired forest-open sites chosen to span a broad elevation range. Study sites were located in the upper McKenzie (McK) River watershed, approximately 100 km east of Corvallis, Oregon, on the western slope of the Cascade Range and in the Middle Fork Willamette (MFW) watershed, located to the south of the McKenzie. The sites were designated based on elevation, with a range of 1110-1480 m. Distributed snow depth and snow water equivalent (SWE) observations were collected via monthly manual snow courses from 1 November through 1 April and bi-weekly thereafter. Snow courses spanned 500 m of forested terrain and 500 m of adjacent open terrain. Snow depth observations were collected approximately every 10 m and SWE was measured every 100 m along the snow courses with a federal snow sampler. These data are raw observations and have not been quality controlled in any way. Distance along the transect was estimated in the field. OSU_SnowDepth Summary: 10-minute snow depth observations collected at OSU met stations in the upper McKenzie River Watershed and the Middle Fork Willamette Watershed during Water Years 2012-2014. Each meterological tower was deployed to represent either a forested or an open area at a particular site, and generally the locations were paired, with a meterological station deployed in the forest and in the open area at a single site. These data were collected in conjunction with manual snow course observations, and the meterological stations were located in the approximate center of each forest or open snow course transect. These data have undergone basic quality control. See manufacturer specifications for individual instruments to determine sensor accuracy. This file was compiled from individual raw data files (named "RawData.txt" within each site and year directory) provided by OSU, along with metadata of site attributes. We converted the Excel-based timestamp (seconds since origin) to a date, changed the NaN flags for missing data to NA, and added site attributes such as site name and cover. We replaced positive values with NA, since snow depth values in raw data are negative (i.e., flipped, with some correction to use the height of the sensor as zero). Thus, positive snow depth values in the raw data equal negative snow depth values. Second, the sign of the data was switched to make them positive. Then, the smooth.m (MATLAB) function was used to roughly smooth the data, with a moving window of 50 points. Third, outliers were removed. All values higher than the smoothed values +10, were replaced with NA. In some cases, further single point outliers were removed. OSU_Met Summary: Raw, 10-minute meteorological observations collected at OSU met stations in the upper McKenzie River Watershed and the Middle Fork Willamette Watershed during Water Years 2012-2014. Each meterological tower was deployed to represent either a forested or an open area at a particular site, and generally the locations were paired, with a meterological station deployed in the forest and in the open area at a single site. These data were collected in conjunction with manual snow course observations, and the meteorological stations were located in the approximate center of each forest or open snow course transect. These stations were deployed to collect numerous meteorological variables, of which snow depth and wind speed are included here. These data are raw datalogger output and have not been quality controlled in any way. See manufacturer specifications for individual instruments to determine sensor accuracy. This file was compiled from individual raw data files (named "RawData.txt" within each site and year directory) provided by OSU, along with metadata of site attributes. We converted the Excel-based timestamp (seconds since origin) to a date, changed the NaN and 7999 flags for missing data to NA, and added site attributes such as site name and cover. OSU_Location Summary: Location Metadata for manual snow course observations and meteorological sensors. These data are compiled from GPS data for which the horizontal accuracy is unknown, and from processed hemispherical photographs. They have not been quality controlled in any way.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead of
urban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
SoilExcel workflow, a tool designed to optimize soil data analysis. It covers data preparation, statistical analysis methods, and result visualization. SoilExcel integrates various environmental data types and applies advanced techniques to enhance accuracy in soil studies. The results demonstrate its effectiveness in interpreting complex data, aiding decision-making in environmental management projects. Background Understanding the intricate relationships and patterns within soil samples is crucial for various environmental and agricultural applications. Principal Component Analysis (PCA) serves as a powerful tool in unraveling the complexity of multivariate soil datasets. Soil datasets often consist of numerous variables representing diverse physicochemical properties, making PCA an invaluable method for: ∙Dimensionality Reduction: Simplifying the analysis without compromising data integrity by reducing the dimensionality of large soil datasets. ∙Identification of Dominant Patterns: Revealing dominant patterns or trends within the data, providing insights into key factors contributing to overall variability. ∙Exploration of Variable Interactions: Enabling the exploration of complex interactions between different soil attributes, enhancing understanding of their relationships. ∙Interpretability of Data Variance: Clarifying how much variance is explained by each principal component, aiding in discerning the significance of different components and variables. ∙Visualization of Data Structure: Facilitating intuitive comprehension of data structure through plots such as scatter plots of principal components, helping identify clusters, trends, and outliers. ∙Decision Support for Subsequent Analyses: Providing a foundation for subsequent analyses by guiding decision-making, whether in identifying influential variables, understanding data patterns, or selecting components for further modeling. Introduction The motivation behind this workflow is rooted in the imperative need to conduct a thorough analysis of a diverse soil dataset, characterized by an array of physicochemical variables. Comprising multiple rows, each representing distinct soil samples, the dataset encompasses variables such as percentage of coarse sands, percentage of organic matter, hydrophobicity, and others. The intricacies of this dataset demand a strategic approach to preprocessing, analysis, and visualization. To lay the groundwork, the workflow begins with the transformation of an initial Excel file into a CSV format, ensuring improved compatibility and ease of use throughout subsequent analyses. Furthermore, the workflow is designed to empower users in the selection of relevant variables, a task facilitated by user-defined parameters. This flexibility allows for a focused and tailored dataset, essential for meaningful analysis. Acknowledging the inherent challenges of missing data, the workflow offers options for data quality improvement, including optional interpolation of missing values or the removal of rows containing such values. Standardizing the dataset and specifying the target variable are crucial, establishing a robust foundation for subsequent statistical analyses. Incorporating PCA offers a sophisticated approach, enabling users to explore inherent patterns and structures within the data. The adaptability of PCA allows users to customize the analysis by specifying the number of components or desired variance. The workflow concludes with practical graphical representations, including covariance and correlation matrices, a scree plot, and a scatter plot, offering users valuable visual insights into the complexities of the soil dataset. Aims The primary objectives of this workflow are tailored to address specific challenges and goals inherent in the analysis of diverse soil samples: ∙Data transformation: Efficiently convert the initial Excel file into a CSV format to enhance compatibility and ease of use. ∙Variable selection: Empower users to extract relevant variables based on user-defined parameters, facilitating a focused and tailored dataset. ∙Data quality improvement: Provide options for interpolation or removal of missing values to ensure dataset integrity for downstream analyses. ∙Standardization and target specification: Standardize the dataset values and designate the target variable, laying the groundwork for subsequent statistical analyses. ∙PCA: Conduct PCA with flexibility, allowing users to specify the number of components or desired variance for a comprehensive understanding of data variance and patterns. ∙Graphical representations: Generate visual outputs, including covariance and correlation matrices, a scree plot, and a scatter plot, enhancing the interpretability of the soil dataset. Scientific questions This workflow addresses critical scientific questions related to soil analysis: ∙Variable importance: Identify variables contributing significantly to principal components through the covariance matrix and PCA. ∙Data structure: Explore correlations between variables and gain insights from the correlation matrix. ∙Optimal component number: Determine the optimal number of principal components using the scree plot for effective representation of data variance. ∙Target-related patterns: Analyze how selected principal components correlate with the target variable in the scatter plot, revealing patterns based on target variable values.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is provided in the form of an excel files with 5 tabs. The first three excel tabs constitute demonstration data on the set up of consumer wearable device for exposure and health monitoring in population studies while the two last excel tabs include the full dataset with actual data collected using the consumer wearable devices in Cyprus and Greece respectively during the Spring of 2020. The data from the last two tabs were used to assess the compliance of asthmatic schoolchildren (n=108) from both countries to public health intervention levels in response to COVID-19 pandemic (lockdown and social distancing measures), using wearable sensors to continuously track personal location and physical activity. Asthmatic children were recruited from primary schools in Cyprus and Greece (Heraklion district, Crete) and were enrolled in the LIFE-MEDEA public health intervention project (Clinical.Trials.gov Identifier: NCT03503812). The LIFE-MEDEA project aimed to evaluate the efficacy of behavioral recommendations to reduce exposure to particulate matter during desert dust storm (DDS) events and thus mitigate disease-specific adverse health effects in vulnerable groups of patients. However, during the COVID-19 pandemic, the collected data were analysed using a mixed effect model adjusted for confounders to estimate the changes in 'fraction time spent at home' and 'total steps/day' during the enforcement of gradually more stringent lockdown measures. Results of this analysis were first presented in the manuscript titled “Use of wearable sensors to assess compliance of asthmatic children in response to lockdown measures for the COVID-19 epidemic” published by Scientific Reports (https://doi.org/10.1038/s41598-021-85358-4). The dataset from LIFE-MEDEA participants (asthmatic children) from Cyprus and Greece, include variables: Study ID, gender, age, study year, ambient temperature, ambient humidity, recording day, percentage of time staying at home, steps per day, callendar day, calendar week, date, lockdown status (phase 1, 2, or 3) due to COVID-19 pandemic, and if the date was during the weekend (binary variable). All data were collected following approvals from relevant authorities at both Cyprus and Greece, according to national legislation. In Cyprus, approvals have been obtained from the Cyprus National Bioethics Committee (EEBK EΠ 2017.01.141), by the Data Protection Commissioner (No. 3.28.223) and Ministry of Education (No 7.15.01.23.5). In Greece, approvals have been obtained from the Scientific Committee (25/04/2018, No: 1748) and the Governing Board of the University General Hospital of Heraklion (25/22/08/2018).
Overall, wearable sensors, often embedded in commercial smartwatches, allow for continuous and non-invasive health measurements and exposure assessment in clinical studies. Nevertheless, the real-life application of these technologies in studies involving many participants for a significant observation period may be hindered by several practical challenges. Using a small subset of the LIFE-MEDEA dataset, in the first excel tab of dataset, we provide demonstration data from a small subset of asthmatic children (n=17) that participated in the LIFE MEDEA study that were equipped with a smartwatch for the assessment of physical activity (heart rate, pedometer, accelerometer) and location (exposure to indoor or outdoor microenvironment using GPS signal). Participants were required to wear the smartwatch, equipped with a data collection application, daily, and data were transmitted via a wireless network to a centrally administered data collection platform. The main technical challenges identified ranged from restricting access to standard smartwatch features such as gaming, internet browser, camera, and audio recording applications, to technical challenges such as loss of GPS signal, especially in indoor environments, and internal smartwatch settings interfering with the data collection application. The dataset includes information on the percentage of time with collected data before and after the implementation of a protocol that relied on setting up the smartwatch device using publicly available Application Lockers and Device Automation applications to address most of these challenges. In addition, the dataset includes example single-day observations that demonstrate how the inclusion of a Wi-Fi received signal strength indicator, significantly improved indoor localization and largely minimised GPS signal misclassification (excel tab 2). Finally excel tab 3, shows the tasks Overall, the implementation of these protocols during the roll-out of the LIFE MEDEA study in the spring of 2020 led to significantly improved results in terms of data completeness and data quality. The protocol and the representative results have been submitted for publication to the Journal of Visualised experiments (submission: JoVE63275). The Variables included in the first three excel tabs were the following: Participant ID (Unique serial number for patient participating in the study), % Time Before (Percentage of time with data before protocol implementation), % Time After (Percentage of time with data after protocol implementation), Timestamp (Date and time of event occurrence), Indoor/Outdoor (Categorical- Classification of GPS signals to Indoor and Outdoor and null(missing value) based on distance from participant home), Filling algorithm (Imputation algorithm), SSID (Wireless network name connected to the smartwatch), Wi-Fi Signal Strength (Connection strength via Wi-Fi between smartwatch and home’s wireless network. (0 maximum strength), IMEI (International mobile equipment identity. Device serial number), GPS_LAT (Latitude), GPS_LONG (Longitude), Accuracy of GPS coordinates (Accuracy in meters of GPS coordinates), Timestamp of GPS coordinates (Obtained GPS coordinates Date and time), Battery Percentage (Battery life), Charger (Connected to the charger status).
Important notes on data collection methodology: Global positioning system (GPS) and physical activity data were recorded using LEMFO-LM25 smartwatch device which was equipped with the embrace™ data collection application. The smartwatch worked as a stand-alone device that was able to transmit data across 5-minute intervals to a cloud-based database via Wi-Fi data transfer. The software was able to synchronize the data collected from the different sensors, so the data are transferred to the cloud with the same timestamp. Data synchronization with the cloud-based database is performed automatically when the smartwatch contacts the Wi-Fi network inside the participants’ homes. According to the study aims, GPS coordinates were used to estimate the fraction of time spent in or out of the participants' residences. The time spent outside was defined as the duration of time with a GPS signal outside a 100-meter radius around the participant’s residence, to account for the signal accuracy in commercially available GPS receivers. Additionally, to address the limitation that signal accuracy in urban and especially indoor environments is diminished, 5-minute intervals with missing GPS signals were classified as either “indoor classification” or “outdoor classification” based on the most recent available GPS recording. The implementation of this GPS data filling algorithm allowed replacing the missing 5-minute intervals with estimated values. Via the described protocol, and through the use of a Device Automation application, information on WiFi connectivity, WiFi signal strength, battery capacity, and whether the device was charging or not was also made available. Data on these additional variables were not automatically synchronised with the cloud-based database but had to be manually downloaded from each smartwatch via Bluetooth after the end of the study period.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This research investigated the use of auricular neuromodulation, a specialized acupuncture technique involving electrostimulation of nerves in the ear, as a potential treatment for acute low back pain in emergency department settings. The aim was to determine its feasibility and effects as an adjuvant therapy alongside usual care.The study utilized a pilot pragmatic parallel randomized controlled trial design, meaning participants were randomly assigned to either receive usual care alone or usual care plus auricular neuromodulation. The researchers evaluated various aspects including feasibility, intervention delivery, and outcome assessments.Despite challenges in recruitment due to the COVID-19 pandemic and organizational changes, the study managed to enroll 40 participants (20 per group) at a slower rate than anticipated. Feasibility of the intervention was established, although there were difficulties in outcome assessment, with up to 25% missing data and unblinded assessors.The pilot trial indicated a potential benefit of auricular neuromodulation, with the group receiving this intervention showing greater reduction in acute pain compared to the control group. However, due to the assessment difficulties and the small sample size, further research with a larger sample size and improved outcome evaluation methods is recommended to better evaluate the efficacy of this technique.In summary, while the pilot trial demonstrated the feasibility of conducting a modified future trial, it also highlighted the need for refinement in outcome evaluation methods. This lays the groundwork for future research to delve deeper into the effectiveness of auricular neuromodulation in managing acute low back pain.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This thesis-mpc-dataset-public-readme.txt file was generated on 2020-10-20 by Masud Petronia
GENERAL INFORMATION
1. Title of Dataset: Data underlying the thesis: Multiparty Computation: The effect of multiparty computation on firms' willingness to contribute protected data
2. Author Information A. Principal Investigator Contact Information Name: Masud Petronia Institution: TU Delft, Faculty of Technology, Policy and Management Address: Mekelweg 5, 2628 CD Delft, Netherlands Email: masud.petronia@gmail.com ORCID: https://orcid.org/0000-0003-2798-046X
3: Description of dataset: This dataset contains perceptual data of firms' willingness to contribute protected data through multi party computation (MPC). Petronia (2020, ch. 6) draws several conclusions from this dataset and provides recommendations for future research Petronia (2020, ch. 7.4).
4. Date of data collection: July-August 2020
5. Geographic location of data collection: Netherlands
6. Information about funding sources that supported the collection of the data: Horizon 2020 Research and Innovation Programme, Grant Agreement no 825225 – Safe Data Enabled Economic Development (SAFE-DEED), from the H2020-ICT-2018-2
SHARING/ACCESS INFORMATION
1. Licenses/restrictions placed on the data: CC 0
2. Links to publications that cite or use the data: Petronia, M. N. (2020). Multiparty Computation: The effect of multiparty computation on firms' willingness to contribute protected data (Master's thesis). Retrieved from http://resolver.tudelft.nl/uuid:b0de4a4b-f5a3-44b8-baa4-a6416cebe26f
3. Was data derived from another source? No
4. Citation for this dataset: Petronia, M. N. (2020). Multiparty Computation: The effect of multiparty computation on firms' willingness to contribute protected data (Master's thesis). Retrieved from https://data.4tu.nl/. doi:10.4121/13102430
DATA & FILE OVERVIEW
1. File List: thesis-mpc-dataset-public.xlsxthesis-mpc-dataset-public-readme.txt (this document)
2. Relationship between files: Dataset metadata and instructions
3. Additional related data collected that was not included in the current data package: Occupation and role of respondents (traceable to unique reference), removed for privacy reasons.
4. Are there multiple versions of the dataset? No
METHODOLOGICAL INFORMATION
1. Description of methods used for collection/generation of data: A pre- and post test experimental design. For more information; see Petronia (2020, ch. 5)
2. Methods for processing the data: Full instructions are provided by Petronia (2020, ch. 6)
3. Instrument- or software-specific information needed to interpret the data: Microsoft Excel can be used to convert the dataset to other formats.
4. Environmental/experimental conditions: This dataset comprises three datasets collected through three channels. These channels are Prolific (incentive), LinkedIn/Twitter (voluntarily), and respondents in a lab setting (voluntarily). For more information; see Petronia (2020, ch. 6.1)
5. Describe any quality-assurance procedures performed on the data: A thorough examination of consistency and reliability is performed. For more information; see Petronia (2020, ch. 6).
6. People involved with sample collection, processing, analysis and/or submission: See Petronia (2020, ch. 6)
DATA-SPECIFIC INFORMATION
1. Number of variables: see worksheet experiment_matrix of thesis-mpc-dataset-public.xlsx
2. Number of cases/rows: see worksheet experiment_matrix of thesis-mpc-dataset-public.xlsx
3. Variable List: see worksheet labels of thesis-mpc-dataset-public.xlsx
4. Missing data codes: see worksheet comments of thesis-mpc-dataset-public.xlsx
5. Specialized formats or other abbreviations used: Multiparty computation (MPC) and Trusted Third Party (TTP).
INSTRUCTIONS
1. Petronia (2020, ch. 6) describes associated tests and respective syntax.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GENERAL INFORMATION
Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation
Date of data collection: January to March 2022
Collection instrument: SurveyMonkey
Funding: Alfred P. Sloan Foundation
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license
Links to publications that cite or use the data:
Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437
Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data: A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266
DATA & FILE OVERVIEW
File List
Filename: MDCDatacitationReuse2021Codebookv2.pdf Codebook
Filename: MDCDataCitationReuse2021surveydatav2.csv Dataset format in csv
Filename: MDCDataCitationReuse2021surveydatav2.sav Dataset format in SPSS
Filename: MDCDataCitationReuseSurvey2021QNR.pdf Questionnaire
Additional related data collected that was not included in the current data package: Open ended questions asked to respondents
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data:
The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.
Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).
Methods for processing the data:
Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.
Instrument- or software-specific information needed to interpret the data:
The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.
DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata
Number of variables: 95
Number of cases/rows: 2,492
Missing data codes: 999 Not asked
Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Excel spreadsheet containing, in separate sheets, the underlying numerical data and statistical analysis for Fig panels 9A-9F, 10A-10D, 11A–11B, S5B, S6B, S7C–S7D, S8C–S8D, S10 and S11.
Spatial extent: CaliforniaSpatial Unit: Census TractCreated: Oct 20, 2021Updated: Oct 20, 2021Source: California Office of Environmental Health Hazard AssessmentContact Email: CalEnviroScreen@oehha.ca.gov Source Link: https://oehha.ca.gov/calenviroscreen/report/calenviroscreen-40Microsoft Excel spreadsheet and PDF with a Data Dictionary: There are two files in this zipped folder. 1) a spreadsheet showing raw data and calculated percentiles for individual indicators and combined CalEnviroScreen scores for individual census tracts with additional demographic information. 2) a pdf document including the data dictionary and information on zeros and missing values: CalEnviroScreen 4.0 Excel and Data Dictionary PDF
This excel spreadsheets provide the USGS and EPA background dataset (climatic variables, pathogen detection results, geochemical analysis results, topological variables, and animal location data) for the data points as collected through the NASGLP project. There is also a metadata file to explain what each variable is and where the data was retrieved from. There is also an Excel file with data that was excluded from the larger dataset (due to missing or incomplete data, etc).
Data from a small meteorological station set-up near 21 plots in 2013. Campbell Scientific CR10 datalogger, Campbell 215 temp/humidity sensor, two Apogee PAR sensors (one facing up, another facing down), soil temperature with type T thermocouple, Campbell CS616 soil reflectometer for soil water content. Data collected between DOY153 and DOY224. Logger collected a measurement every 60 seconds and averaged to 5 min data table. Post-processing to 60 min averages and daily mean, max, and min. MS Excel (.xls) workbook with three worksheets. Worksheet 5_min data columns: year, day of year, hour, minute, fractional day of year, incoming PAR (umol m-2 s-1), reflected PAR (umol m-2 s-1), albedo calculated as (par_out/par_in)*100, air temperature (C), relative humidity (%), soil temp (C), raw reflectance time reported by CS616, calculated volumetric water content corrected for soil temperature (v/v), battery voltage. Worksheet 60_min data columns (units as above): day of year, hour, fractional day of year, week of year, air temperature, relative humidity, incoming PAR, outgoing PAR, albedo, soil temperature, and volumetric water content. Worksheet daily (units as above unless indicated): date, day of year, air temperature min, air temperature max, air temperature mean, relative humidity min, relative humidity max, relative humidity mean, soil temperature mean, soil water content mean, total incoming PAR (mol m-2 d-1), out going PAR (mol m-2 d-1), albedo, minimum battery voltage. missing values are -6999 or 6999. Soil temperature and VWC not valid until instruments could be installed in the soil DOY 163. RH sensor failed DOY177, did not function again. Battery issue DOY 183.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is Fear Excel for Windows no more. It features 7 columns including author, publication date, language, and book publisher.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the raw data used for a research study that examined university students' music listening habits while studying. There are two experiments in this research study. Experiment 1 is a retrospective survey, and Experiment 2 is a mobile experience sampling research study. This repository contains five Microsoft Excel files with data obtained from both experiments. The files are as follows:
onlineSurvey_raw_data.xlsx esm_raw_data.xlsx esm_music_features_analysis.xlsx esm_demographics.xlsx index.xlsx Files Description File: onlineSurvey_raw_data.xlsx This file contains the raw data from Experiment 1, including the (anonymised) demographic information of the sample. The sample characteristics recorded are:
studentship area of study country of study type of accommodation a participant was living in age self-identified gender language ability (mono- or bi-/multilingual) (various) personality traits (various) musicianship (various) everyday music uses (various) music capacity The file also contains raw data of responses to the questions about participants' music listening habits while studying in real life. These pieces of data are:
likelihood of listening to specific (rated across 23) music genres while studying and during everyday listening. likelihood of listening to music with specific acoustic features (e.g., with/without lyrics, loud/soft, fast/slow) music genres while studying and during everyday listening. general likelihood of listening to music while studying in real life. (verbatim) responses to participants' written responses to the open-ended questions about their real-life music listening habits while studying. File: esm_raw_data.xlsx This file contains the raw data from Experiment 2, including the following variables:
information of the music tracks (track name, artist name, and if available, Spotify ID of those tracks) each participant was listening to during each music episode (both while studying and during everyday-listening) level of arousal at the onset of music playing and the end of the 30-minute study period level of valence at the onset of music playing and the end of the 30-minute study period specific mood at the onset of music playing and the end of the 30-minute study period whether participants were studying their location at that moment (if studying) whether they were studying alone (if studying) the types of study tasks (if studying) the perceived level of difficulty of the study task whether participants were planning to listen to music while studying (various) reasons for music listening (various) perceived positive and negative impacts of studying with music Each row represents the data for a single participant. Rows with a record of a participant ID but no associated data indicate that the participant did not respond to the questionnaire (i.e., missing data). File: esm_music_features_analysis.xlsx This file presents the music features of each recorded music track during both the study-episodes and the everyday-episodes (retrieved from Spotify's "Get Track's Audio Features" API). These features are:
energy level loudness valence tempo mode The contextual details of the moments each track was being played are also presented here, which include:
whether the participant was studying their location (e.g., at home, cafe, university) whether they were studying alone the type of study tasks they were engaging with (e.g., reading, writing) the perceived difficulty level of the task File: esm_demographics.xlsx This file contains the demographics of the sample in Experiment 2 (N = 10), which are the same as in Experiment 1 (see above). Each row represents the data for a single participant. Rows with a record of a participant ID but no associated demographic data indicate that the participant did not respond to the questionnaire (i.e., missing data). File: index.xlsx Finally, this file contains all the abbreviations used in each document as well as their explanations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set supports the journal paper "Manipulating the consequences of tests: How Shanghai teens react to different consequences", published in Educational Research and Evaluation, v26 (n5-6), pp.221-251.The data were obtained to test the impact of different levels of consequence for taking a test on student test-taking effort. The data are part of the PhD project of Anran Zhao, supervised by Brown & Meissel.The data set is in MS Excel format. Sheet 1 provides an anonymous wide-format data set post-cleaning and missing value analysis of the data.Sheet 2 provides a description of each variable.
Objective: To determine if health sciences library workers were familiar with best practices regarding informed consent for the publication of medical images and if they incorporate the recommendations into their professional work.Methods: A survey was developed by the authors and distributed to library listservs in the United States. The results of the survey were tabulated in R.Results: A total of 90 respondents were included in the data analysis with all respondents reporting multiple responsibilities in their professional role. While the majority of library workers (59%) were familiar with the best practices, few incorporated the recommendations into their everyday professional work.Conclusions: The professional work of health sciences library workers does not appear to include a significant inclusion of the best practices for the informed consent for the publication of medical images. There is a need for future research to better understand how library workers can better incorporat..., , , # Survey data and code
https://doi.org/10.5061/dryad.vmcvdnd17
The cleaned survey results as well as the R code used to analyze the data is made available here.
Cleaned data.xlsx: This excel sheet contains the results of the survey conducted for our manuscript titled The Role of Health Sciences Libraries in Supporting Medical Image Consent Standards Survey. The research study was about informed consent standards for the publication of medical images. The survey participants were health sciences library workers. The survey participants were given the option to skip questions as well as stop the survey at any time. Based on their specific role in the library, questions were grouped into different sections. So, there will be missing values in the excel sheet either due to the role of the librarian or their decision to skip questions. The missing values have been taken into account in the analysis b...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the results of a survey about the use of open government data applied to public agents working in public institutions in Brazil. It has two sets, one with questionnaire responses and metadata and the second with a coding table with interview extracts: 1) In the first dataset, each row holds a response to a questionnaire about the public agent's perceptions of the use and reuse of open government data in Brazilian public institutions. Columns store the questionnaire questions. Data were collected between 8 June and 13 July 2021, and this sample is composed of responses from 40 federal, state, and municipal public administrators. Thus, this dataset contains 40 rows and 158 columns. Data were collected on the LimeSurvey platform, where it was screened for missing values and incomplete responses. After cleaning, data were exported to Excel in tabular format. Questionnaire responses are provided in two files ResultsSurvey_OGDUseBRPubInstitutions_DataSet_PT and ResultsSurvey_OGDUseBRPubInstitutions_DataSet_EN. They contain the same information in Portuguese and English. 2) The second dataset records the code table of the interviews about the benefits, barriers, enablers, and drivers of open government data (OGD) use in Brazilian public institutions. A questionnaire applied to public agents working in Brazilian public institutions was followed up by interviews to broaden an understanding of the use of OGD. Nine interviews were conducted between May 17-31, 2022. This dataset represents the perspective of these public agents. The dataset contains 97 lines and six columns. Each row of the dataset lists the factor code used in the questionnaire, the factor descriptions in Portuguese and English, the interviewee code, the transcription extract of an interviewee narration collected in Portuguese, and the English translation. After collection in Portuguese, interviews were automatically transcribed using the NVivo Transcription software. Then, they were anonymized, and a human reviewed the transcriptions. Interviews were coded using NVivo and used the questionnaire factors to guide coding. Coded extracts were translated to English using Google and Microsoft translators. Then, translated extracts were revised by a human and were used for reporting. The coding table was exported to Excel. Interviews extracts are provided in one file, InterviewsExtracts_OGDUseBR_PublicInstitutions_Dataset.
[Note 2023-08-14 - Supersedes version 1, https://doi.org/10.15482/USDA.ADC/1528086 ] This dataset contains all code and data necessary to reproduce the analyses in the manuscript: Mengistu, A., Read, Q. D., Sykes, V. R., Kelly, H. M., Kharel, T., & Bellaloui, N. (2023). Cover crop and crop rotation effects on tissue and soil population dynamics of Macrophomina phaseolina and yield under no-till system. Plant Disease. https://doi.org/10.1094/pdis-03-23-0443-re The .zip archive cropping-systems-1.0.zip contains data and code files. Data stem_soil_CFU_by_plant.csv: Soil disease load (SoilCFUg) and stem tissue disease load (StemCFUg) for individual plants in CFU per gram, with columns indicating year, plot ID, replicate, row, plant ID, previous crop treatment, cover crop treatment, and comments. Missing data are indicated with . yield_CFU_by_plot.csv: Yield data (YldKgHa) at the plot level in units of kg/ha, with columns indicating year, plot ID, replicate, and treatments, as well as means of soil and stem disease load at the plot level. Code cropping_system_analysis_v3.0.Rmd: RMarkdown notebook with all data processing, analysis, and visualization code equations.Rmd: RMarkdown notebook with formatted equations formatted_figs_revision.R: R script to produce figures formatted exactly as they appear in the manuscript The Rproject file cropping-systems.Rproj is used to organize the RStudio project. Scripts and notebooks used in older versions of the analysis are found in the testing/ subdirectory. Excel spreadsheets containing raw data from which the cleaned CSV files were created are found in the raw_data subdirectory.
Supplementary_File_S1(geografic_location)Geographic location of D. alata populations.Supplementary_File_S2(ex-situ_data)Excel file containing D. alata ex-situ genotype data (UFG-AS germplasm collection). Individuals are identified in each row. Loci are designated by column with the nine microsatellite listed. (?) refers to missing data.Supplementary_File_S3(in-situ_data)Excel file containing D. alata in-situ genotype data. Individuals are identified by population in each row. Loci are designated by column with the nine microsatellite listed. (?) refers to missing data.
This excel spreadsheets provide the USGS and EPA background dataset (climatic variables, pathogen detection results, geochemical analysis results, topological variables, and animal location data) for the data points as collected through the NASGLP project. There is also a metadata file to explain what each variable is and where the data was retrieved from. There is also an Excel file with data that was excluded from the larger dataset (due to missing or incomplete data, etc).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is provided in a single .xlsx file named "eucalyptus_growth_environment_data_V2.xlsx" and consists of fifteen sheets:
Codebook: This sheet details the index, values, and descriptions for each field within the dataset, providing a comprehensive guide to understanding the data structure.
ALL NODES: Contains measurements from all devices, totalling 102,916 data points. This sheet aggregates the data across all nodes.
GWD1 to GWD10: These subset sheets include measurements from individual nodes, labelled according to the abbreviation “Generic Wireless Dendrometer” followed by device IDs 1 through 10. Each sheet corresponds to a specific node, representing measurements from ten trees (or nodes).
Metadata: Provides detailed metadata for each node, including species, initial diameter, location, measurement frequency, battery specifications, and irrigation status. This information is essential for identifying and differentiating the nodes and their specific attributes.
Missing Data Intervals: Details gaps in the data stream, including start and end dates and times when data was not uploaded. It includes information on the total duration of each missing interval and the number of missing data points.
Missing Intervals Distribution: Offers a summary of missing data intervals and their distribution, providing insight into data gaps and reasons for missing data.
All nodes utilize LoRaWAN for data transmission. Please note that intermittent data gaps may occur due to connectivity issues between the gateway and the nodes, as well as maintenance activities or experimental procedures.
Software considerations: The provided R code named “Simple_Dendro_Imputation_and_Analysis.R” is a comprehensive analysis workflow that processes and analyses Eucalyptus growth and environmental data from the "eucalyptus_growth_environment_data_V2.xlsx" dataset. The script begins by loading necessary libraries, setting the working directory, and reading the data from the specified Excel sheet. It then combines date and time information into a unified DateTime format and performs data type conversions for relevant columns. The analysis focuses on a specified device, allowing for the selection of neighbouring devices for imputation of missing data. A loop checks for gaps in the time series and fills in missing intervals based on a defined threshold, followed by a function that imputes missing values using the average from nearby devices. Outliers are identified and managed through linear interpolation. The code further calculates vapor pressure metrics and applies temperature corrections to the dendrometer data. Finally, it saves the cleaned and processed data into a new Excel file while conducting dendrometer analysis using the dendRoAnalyst package, which includes visualizations and calculations of daily growth metrics and correlations with environmental factors such as vapour pressure deficit (VPD).