14 datasets found
  1. m

    Cross Regional Eucalyptus Growth and Environmental Data

    • data.mendeley.com
    Updated Oct 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Erasmus (2024). Cross Regional Eucalyptus Growth and Environmental Data [Dataset]. http://doi.org/10.17632/2m9rcy3dr9.3
    Explore at:
    Dataset updated
    Oct 7, 2024
    Authors
    Christopher Erasmus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is provided in a single .xlsx file named "eucalyptus_growth_environment_data_V2.xlsx" and consists of fifteen sheets:

    Codebook: This sheet details the index, values, and descriptions for each field within the dataset, providing a comprehensive guide to understanding the data structure.

    ALL NODES: Contains measurements from all devices, totalling 102,916 data points. This sheet aggregates the data across all nodes.

    GWD1 to GWD10: These subset sheets include measurements from individual nodes, labelled according to the abbreviation “Generic Wireless Dendrometer” followed by device IDs 1 through 10. Each sheet corresponds to a specific node, representing measurements from ten trees (or nodes).

    Metadata: Provides detailed metadata for each node, including species, initial diameter, location, measurement frequency, battery specifications, and irrigation status. This information is essential for identifying and differentiating the nodes and their specific attributes.

    Missing Data Intervals: Details gaps in the data stream, including start and end dates and times when data was not uploaded. It includes information on the total duration of each missing interval and the number of missing data points.

    Missing Intervals Distribution: Offers a summary of missing data intervals and their distribution, providing insight into data gaps and reasons for missing data.

    All nodes utilize LoRaWAN for data transmission. Please note that intermittent data gaps may occur due to connectivity issues between the gateway and the nodes, as well as maintenance activities or experimental procedures.

    Software considerations: The provided R code named “Simple_Dendro_Imputation_and_Analysis.R” is a comprehensive analysis workflow that processes and analyses Eucalyptus growth and environmental data from the "eucalyptus_growth_environment_data_V2.xlsx" dataset. The script begins by loading necessary libraries, setting the working directory, and reading the data from the specified Excel sheet. It then combines date and time information into a unified DateTime format and performs data type conversions for relevant columns. The analysis focuses on a specified device, allowing for the selection of neighbouring devices for imputation of missing data. A loop checks for gaps in the time series and fills in missing intervals based on a defined threshold, followed by a function that imputes missing values using the average from nearby devices. Outliers are identified and managed through linear interpolation. The code further calculates vapor pressure metrics and applies temperature corrections to the dendrometer data. Finally, it saves the cleaned and processed data into a new Excel file while conducting dendrometer analysis using the dendRoAnalyst package, which includes visualizations and calculations of daily growth metrics and correlations with environmental factors such as vapour pressure deficit (VPD).

  2. Data from: Data and code from: Cover crop and crop rotation effects on...

    • catalog.data.gov
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data and code from: Cover crop and crop rotation effects on tissue and soil population dynamics of Macrophomina phaseolina and yield in no-till system - V2 [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-cover-crop-and-crop-rotation-effects-on-tissue-and-soil-population-dyna-831b9
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    [Note 2023-08-14 - Supersedes version 1, https://doi.org/10.15482/USDA.ADC/1528086 ] This dataset contains all code and data necessary to reproduce the analyses in the manuscript: Mengistu, A., Read, Q. D., Sykes, V. R., Kelly, H. M., Kharel, T., & Bellaloui, N. (2023). Cover crop and crop rotation effects on tissue and soil population dynamics of Macrophomina phaseolina and yield under no-till system. Plant Disease. https://doi.org/10.1094/pdis-03-23-0443-re The .zip archive cropping-systems-1.0.zip contains data and code files. Data stem_soil_CFU_by_plant.csv: Soil disease load (SoilCFUg) and stem tissue disease load (StemCFUg) for individual plants in CFU per gram, with columns indicating year, plot ID, replicate, row, plant ID, previous crop treatment, cover crop treatment, and comments. Missing data are indicated with . yield_CFU_by_plot.csv: Yield data (YldKgHa) at the plot level in units of kg/ha, with columns indicating year, plot ID, replicate, and treatments, as well as means of soil and stem disease load at the plot level. Code cropping_system_analysis_v3.0.Rmd: RMarkdown notebook with all data processing, analysis, and visualization code equations.Rmd: RMarkdown notebook with formatted equations formatted_figs_revision.R: R script to produce figures formatted exactly as they appear in the manuscript The Rproject file cropping-systems.Rproj is used to organize the RStudio project. Scripts and notebooks used in older versions of the analysis are found in the testing/ subdirectory. Excel spreadsheets containing raw data from which the cleaned CSV files were created are found in the raw_data subdirectory.

  3. EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for...

    • zenodo.org
    zip
    Updated May 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matteo Barigozzi; Matteo Barigozzi; Claudio Lissona; Claudio Lissona (2025). EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic Research [Dataset]. http://doi.org/10.5281/zenodo.15564854
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Matteo Barigozzi; Matteo Barigozzi; Claudio Lissona; Claudio Lissona
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Time period covered
    May 30, 2025
    Description

    EA-MD-QD is a collection of large monthly and quarterly EA and EA member countries datasets for macroeconomic analysis.
    The EA member countries covered are: AT, BE, DE, EL, ES, FR, IE, IT, NL, PT.

    The formal reference to this dataset is:

    Barigozzi, M. and Lissona, C. (2024) "EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic Research". Zenodo.

    Please refer to it when using the data.

    Each zip file contains:

    - Excel files for the EA and the countries covered, each containing an unbalanced panel of raw de-seasonalized data.

    - A Matlab code taking as input the raw data and allowing to perform various operations such as:
    choose the frequency, fill-in missing values, transform data to stationarity, and control for covid outliers.

    - A pdf file with all informations about the series names, sources, and transformation codes.

    This version (05.2025):

    Updated data as of 30-May-2025.

  4. l

    Exploring soil sample variability through principal component analysis (PCA)...

    • metadatacatalogue.lifewatch.eu
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Exploring soil sample variability through principal component analysis (PCA) using excel data [Dataset]. https://metadatacatalogue.lifewatch.eu/geonetwork/search?keyword=Scree%20plot
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    SoilExcel workflow, a tool designed to optimize soil data analysis. It covers data preparation, statistical analysis methods, and result visualization. SoilExcel integrates various environmental data types and applies advanced techniques to enhance accuracy in soil studies. The results demonstrate its effectiveness in interpreting complex data, aiding decision-making in environmental management projects. Background Understanding the intricate relationships and patterns within soil samples is crucial for various environmental and agricultural applications. Principal Component Analysis (PCA) serves as a powerful tool in unraveling the complexity of multivariate soil datasets. Soil datasets often consist of numerous variables representing diverse physicochemical properties, making PCA an invaluable method for: ∙Dimensionality Reduction: Simplifying the analysis without compromising data integrity by reducing the dimensionality of large soil datasets. ∙Identification of Dominant Patterns: Revealing dominant patterns or trends within the data, providing insights into key factors contributing to overall variability. ∙Exploration of Variable Interactions: Enabling the exploration of complex interactions between different soil attributes, enhancing understanding of their relationships. ∙Interpretability of Data Variance: Clarifying how much variance is explained by each principal component, aiding in discerning the significance of different components and variables. ∙Visualization of Data Structure: Facilitating intuitive comprehension of data structure through plots such as scatter plots of principal components, helping identify clusters, trends, and outliers. ∙Decision Support for Subsequent Analyses: Providing a foundation for subsequent analyses by guiding decision-making, whether in identifying influential variables, understanding data patterns, or selecting components for further modeling. Introduction The motivation behind this workflow is rooted in the imperative need to conduct a thorough analysis of a diverse soil dataset, characterized by an array of physicochemical variables. Comprising multiple rows, each representing distinct soil samples, the dataset encompasses variables such as percentage of coarse sands, percentage of organic matter, hydrophobicity, and others. The intricacies of this dataset demand a strategic approach to preprocessing, analysis, and visualization. To lay the groundwork, the workflow begins with the transformation of an initial Excel file into a CSV format, ensuring improved compatibility and ease of use throughout subsequent analyses. Furthermore, the workflow is designed to empower users in the selection of relevant variables, a task facilitated by user-defined parameters. This flexibility allows for a focused and tailored dataset, essential for meaningful analysis. Acknowledging the inherent challenges of missing data, the workflow offers options for data quality improvement, including optional interpolation of missing values or the removal of rows containing such values. Standardizing the dataset and specifying the target variable are crucial, establishing a robust foundation for subsequent statistical analyses. Incorporating PCA offers a sophisticated approach, enabling users to explore inherent patterns and structures within the data. The adaptability of PCA allows users to customize the analysis by specifying the number of components or desired variance. The workflow concludes with practical graphical representations, including covariance and correlation matrices, a scree plot, and a scatter plot, offering users valuable visual insights into the complexities of the soil dataset. Aims The primary objectives of this workflow are tailored to address specific challenges and goals inherent in the analysis of diverse soil samples: ∙Data transformation: Efficiently convert the initial Excel file into a CSV format to enhance compatibility and ease of use. ∙Variable selection: Empower users to extract relevant variables based on user-defined parameters, facilitating a focused and tailored dataset. ∙Data quality improvement: Provide options for interpolation or removal of missing values to ensure dataset integrity for downstream analyses. ∙Standardization and target specification: Standardize the dataset values and designate the target variable, laying the groundwork for subsequent statistical analyses. ∙PCA: Conduct PCA with flexibility, allowing users to specify the number of components or desired variance for a comprehensive understanding of data variance and patterns. ∙Graphical representations: Generate visual outputs, including covariance and correlation matrices, a scree plot, and a scatter plot, enhancing the interpretability of the soil dataset. Scientific questions This workflow addresses critical scientific questions related to soil analysis: ∙Variable importance: Identify variables contributing significantly to principal components through the covariance matrix and PCA. ∙Data structure: Explore correlations between variables and gain insights from the correlation matrix. ∙Optimal component number: Determine the optimal number of principal components using the scree plot for effective representation of data variance. ∙Target-related patterns: Analyze how selected principal components correlate with the target variable in the scatter plot, revealing patterns based on target variable values.

  5. d

    Manual snow course observations, raw met data, raw snow depth observations,...

    • catalog.data.gov
    Updated Jun 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Climate Adaptation Science Centers (2024). Manual snow course observations, raw met data, raw snow depth observations, locations, and associated metadata for Oregon sites [Dataset]. https://catalog.data.gov/dataset/manual-snow-course-observations-raw-met-data-raw-snow-depth-observations-locations-and-ass
    Explore at:
    Dataset updated
    Jun 15, 2024
    Dataset provided by
    Climate Adaptation Science Centers
    Area covered
    Oregon
    Description

    OSU_SnowCourse Summary: Manual snow course observations were collected over WY 2012-2014 from four paired forest-open sites chosen to span a broad elevation range. Study sites were located in the upper McKenzie (McK) River watershed, approximately 100 km east of Corvallis, Oregon, on the western slope of the Cascade Range and in the Middle Fork Willamette (MFW) watershed, located to the south of the McKenzie. The sites were designated based on elevation, with a range of 1110-1480 m. Distributed snow depth and snow water equivalent (SWE) observations were collected via monthly manual snow courses from 1 November through 1 April and bi-weekly thereafter. Snow courses spanned 500 m of forested terrain and 500 m of adjacent open terrain. Snow depth observations were collected approximately every 10 m and SWE was measured every 100 m along the snow courses with a federal snow sampler. These data are raw observations and have not been quality controlled in any way. Distance along the transect was estimated in the field. OSU_SnowDepth Summary: 10-minute snow depth observations collected at OSU met stations in the upper McKenzie River Watershed and the Middle Fork Willamette Watershed during Water Years 2012-2014. Each meterological tower was deployed to represent either a forested or an open area at a particular site, and generally the locations were paired, with a meterological station deployed in the forest and in the open area at a single site. These data were collected in conjunction with manual snow course observations, and the meterological stations were located in the approximate center of each forest or open snow course transect. These data have undergone basic quality control. See manufacturer specifications for individual instruments to determine sensor accuracy. This file was compiled from individual raw data files (named "RawData.txt" within each site and year directory) provided by OSU, along with metadata of site attributes. We converted the Excel-based timestamp (seconds since origin) to a date, changed the NaN flags for missing data to NA, and added site attributes such as site name and cover. We replaced positive values with NA, since snow depth values in raw data are negative (i.e., flipped, with some correction to use the height of the sensor as zero). Thus, positive snow depth values in the raw data equal negative snow depth values. Second, the sign of the data was switched to make them positive. Then, the smooth.m (MATLAB) function was used to roughly smooth the data, with a moving window of 50 points. Third, outliers were removed. All values higher than the smoothed values +10, were replaced with NA. In some cases, further single point outliers were removed. OSU_Met Summary: Raw, 10-minute meteorological observations collected at OSU met stations in the upper McKenzie River Watershed and the Middle Fork Willamette Watershed during Water Years 2012-2014. Each meterological tower was deployed to represent either a forested or an open area at a particular site, and generally the locations were paired, with a meterological station deployed in the forest and in the open area at a single site. These data were collected in conjunction with manual snow course observations, and the meteorological stations were located in the approximate center of each forest or open snow course transect. These stations were deployed to collect numerous meteorological variables, of which snow depth and wind speed are included here. These data are raw datalogger output and have not been quality controlled in any way. See manufacturer specifications for individual instruments to determine sensor accuracy. This file was compiled from individual raw data files (named "RawData.txt" within each site and year directory) provided by OSU, along with metadata of site attributes. We converted the Excel-based timestamp (seconds since origin) to a date, changed the NaN and 7999 flags for missing data to NA, and added site attributes such as site name and cover. OSU_Location Summary: Location Metadata for manual snow course observations and meteorological sensors. These data are compiled from GPS data for which the horizontal accuracy is unknown, and from processed hemispherical photographs. They have not been quality controlled in any way.

  6. EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for...

    • zenodo.org
    zip
    Updated Feb 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matteo Barigozzi; Matteo Barigozzi; Claudio Lissona; Claudio Lissona (2024). EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic Research [Dataset]. http://doi.org/10.5281/zenodo.10706766
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Matteo Barigozzi; Matteo Barigozzi; Claudio Lissona; Claudio Lissona
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Time period covered
    Feb 26, 2024
    Description

    EA-MD-QD is a collection of large monthly and quarterly EA and EA member countries datasets for macroeconomic analysis.
    The EA member countries covered are: AT, BE, DE, EL, ES, FR, IE, IT, NL, PT.

    The formal reference to this dataset is:

    Barigozzi, M. and Lissona, C. (2024) "EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic Research". Zenodo.

    Please refer to it when using the data.

    Each zip file contains:

    - Excel files for the EA and the countries covered, each containing an unbalanced panel of raw de-seasonalized data.

    - A Matlab code taking as input the raw data and allowing to perform various operations such as:
    choose the frequency, fill-in missing values, transform data to stationarity, and control for covid outliers.

    - A pdf file with all informations about the series names, sources, and transformation codes.

    This version (02-2024):

    Updated data as of 26-Feb-2024
    Debugging of Matlab Code

  7. f

    Demonstration data on the set up of consumer wearable device for exposure...

    • figshare.com
    xlsx
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonis Michanikou; Panayiotis Kouis; Panayiotis K. Yiallouros (2023). Demonstration data on the set up of consumer wearable device for exposure and health monitoring in population studies [Dataset]. http://doi.org/10.6084/m9.figshare.21601371.v3
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 19, 2023
    Dataset provided by
    figshare
    Authors
    Antonis Michanikou; Panayiotis Kouis; Panayiotis K. Yiallouros
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is provided in the form of an excel files with 5 tabs. The first three excel tabs constitute demonstration data on the set up of consumer wearable device for exposure and health monitoring in population studies while the two last excel tabs include the full dataset with actual data collected using the consumer wearable devices in Cyprus and Greece respectively during the Spring of 2020. The data from the last two tabs were used to assess the compliance of asthmatic schoolchildren (n=108) from both countries to public health intervention levels in response to COVID-19 pandemic (lockdown and social distancing measures), using wearable sensors to continuously track personal location and physical activity. Asthmatic children were recruited from primary schools in Cyprus and Greece (Heraklion district, Crete) and were enrolled in the LIFE-MEDEA public health intervention project (Clinical.Trials.gov Identifier: NCT03503812). The LIFE-MEDEA project aimed to evaluate the efficacy of behavioral recommendations to reduce exposure to particulate matter during desert dust storm (DDS) events and thus mitigate disease-specific adverse health effects in vulnerable groups of patients. However, during the COVID-19 pandemic, the collected data were analysed using a mixed effect model adjusted for confounders to estimate the changes in 'fraction time spent at home' and 'total steps/day' during the enforcement of gradually more stringent lockdown measures. Results of this analysis were first presented in the manuscript titled “Use of wearable sensors to assess compliance of asthmatic children in response to lockdown measures for the COVID-19 epidemic” published by Scientific Reports (https://doi.org/10.1038/s41598-021-85358-4). The dataset from LIFE-MEDEA participants (asthmatic children) from Cyprus and Greece, include variables: Study ID, gender, age, study year, ambient temperature, ambient humidity, recording day, percentage of time staying at home, steps per day, callendar day, calendar week, date, lockdown status (phase 1, 2, or 3) due to COVID-19 pandemic, and if the date was during the weekend (binary variable). All data were collected following approvals from relevant authorities at both Cyprus and Greece, according to national legislation. In Cyprus, approvals have been obtained from the Cyprus National Bioethics Committee (EEBK EΠ 2017.01.141), by the Data Protection Commissioner (No. 3.28.223) and Ministry of Education (No 7.15.01.23.5). In Greece, approvals have been obtained from the Scientific Committee (25/04/2018, No: 1748) and the Governing Board of the University General Hospital of Heraklion (25/22/08/2018).

    Overall, wearable sensors, often embedded in commercial smartwatches, allow for continuous and non-invasive health measurements and exposure assessment in clinical studies. Nevertheless, the real-life application of these technologies in studies involving many participants for a significant observation period may be hindered by several practical challenges. Using a small subset of the LIFE-MEDEA dataset, in the first excel tab of dataset, we provide demonstration data from a small subset of asthmatic children (n=17) that participated in the LIFE MEDEA study that were equipped with a smartwatch for the assessment of physical activity (heart rate, pedometer, accelerometer) and location (exposure to indoor or outdoor microenvironment using GPS signal). Participants were required to wear the smartwatch, equipped with a data collection application, daily, and data were transmitted via a wireless network to a centrally administered data collection platform. The main technical challenges identified ranged from restricting access to standard smartwatch features such as gaming, internet browser, camera, and audio recording applications, to technical challenges such as loss of GPS signal, especially in indoor environments, and internal smartwatch settings interfering with the data collection application. The dataset includes information on the percentage of time with collected data before and after the implementation of a protocol that relied on setting up the smartwatch device using publicly available Application Lockers and Device Automation applications to address most of these challenges. In addition, the dataset includes example single-day observations that demonstrate how the inclusion of a Wi-Fi received signal strength indicator, significantly improved indoor localization and largely minimised GPS signal misclassification (excel tab 2). Finally excel tab 3, shows the tasks Overall, the implementation of these protocols during the roll-out of the LIFE MEDEA study in the spring of 2020 led to significantly improved results in terms of data completeness and data quality. The protocol and the representative results have been submitted for publication to the Journal of Visualised experiments (submission: JoVE63275). The Variables included in the first three excel tabs were the following: Participant ID (Unique serial number for patient participating in the study), % Time Before (Percentage of time with data before protocol implementation), % Time After (Percentage of time with data after protocol implementation), Timestamp (Date and time of event occurrence), Indoor/Outdoor (Categorical- Classification of GPS signals to Indoor and Outdoor and null(missing value) based on distance from participant home), Filling algorithm (Imputation algorithm), SSID (Wireless network name connected to the smartwatch), Wi-Fi Signal Strength (Connection strength via Wi-Fi between smartwatch and home’s wireless network. (0 maximum strength), IMEI (International mobile equipment identity. Device serial number), GPS_LAT (Latitude), GPS_LONG (Longitude), Accuracy of GPS coordinates (Accuracy in meters of GPS coordinates), Timestamp of GPS coordinates (Obtained GPS coordinates Date and time), Battery Percentage (Battery life), Charger (Connected to the charger status).

    Important notes on data collection methodology: Global positioning system (GPS) and physical activity data were recorded using LEMFO-LM25 smartwatch device which was equipped with the embrace™ data collection application. The smartwatch worked as a stand-alone device that was able to transmit data across 5-minute intervals to a cloud-based database via Wi-Fi data transfer. The software was able to synchronize the data collected from the different sensors, so the data are transferred to the cloud with the same timestamp. Data synchronization with the cloud-based database is performed automatically when the smartwatch contacts the Wi-Fi network inside the participants’ homes. According to the study aims, GPS coordinates were used to estimate the fraction of time spent in or out of the participants' residences. The time spent outside was defined as the duration of time with a GPS signal outside a 100-meter radius around the participant’s residence, to account for the signal accuracy in commercially available GPS receivers. Additionally, to address the limitation that signal accuracy in urban and especially indoor environments is diminished, 5-minute intervals with missing GPS signals were classified as either “indoor classification” or “outdoor classification” based on the most recent available GPS recording. The implementation of this GPS data filling algorithm allowed replacing the missing 5-minute intervals with estimated values. Via the described protocol, and through the use of a Device Automation application, information on WiFi connectivity, WiFi signal strength, battery capacity, and whether the device was charging or not was also made available. Data on these additional variables were not automatically synchronised with the cloud-based database but had to be manually downloaded from each smartwatch via Bluetooth after the end of the study period.

  8. f

    UC_vs_US Statistic Analysis.xlsx

    • figshare.com
    xlsx
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Utrecht University
    Authors
    F. (Fabiano) Dalpiaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

    Tagging scheme:
    Aligned (AL) - A concept is represented as a class in both models, either
    

    with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

    All the calculations and information provided in the following sheets
    

    originate from that raw data.

    Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
    

    including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

    Sheet 3 (Size-Ratio):
    

    The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

    Sheet 4 (Overall):
    

    Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

    For sheet 4 as well as for the following four sheets, diverging stacked bar
    

    charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

    Sheet 5 (By-Notation):
    

    Model correctness and model completeness is compared by notation - UC, US.

    Sheet 6 (By-Case):
    

    Model correctness and model completeness is compared by case - SIM, HOS, IFA.

    Sheet 7 (By-Process):
    

    Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

    Sheet 8 (By-Grade):
    

    Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.

  9. g

    Soil Temperature Station Data from Permafrost Regions of Russia (Selection...

    • data.globalchange.gov
    Updated Feb 17, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). Soil Temperature Station Data from Permafrost Regions of Russia (Selection of Five Stations), 1880s - 2000 [Dataset]. https://data.globalchange.gov/dataset/nsidc-g02189
    Explore at:
    Dataset updated
    Feb 17, 2011
    Description

    This data set includes soil temperature data from boreholes located at five stations in Russia: Yakutsk, Verkhoyansk, Pokrovsk, Isit', and Churapcha. The data have been compiled into five Microsoft Excel files, one for each station. Each Excel file contains three worksheets:

    • G02189info worksheet: Contains the same content in each Excel file - lat/lon info and notes on the stations
    • Jan soil & surface temp worksheet: Contains winter (January) soil temperature and air temperature (except for the Churapcha Excel file that only contains soil temperature - air temperature was not available)
    • Jul soil & surface temp worksheet: Contains summer (July) soil temperature and air temperature (except for the Churapcha Excel file)
    There are two different versions of the Excel files: a complete version and a subsetted version. Both versions exist for each of the five stations for a total of 10 files. The complete versions of the files reside in the directory called complete and have the word full in their filename. These files contain borehole temperature data at all available standard depths: 0.2 m, 0.4 m, 0.6 m, 0.8 m, 1.2 m, 1.6 m, 2.0 m, 2.4 m, and 3.2 m. The subsetted versions of the files reside in the subset directory and have subset in their filename. These files contain data from the 0.8 m and 3.2 m depths only. Missing data are indicated by the value -999.0. The complete version is more applicable to scientific investigation. The subset version is provided for K-12 teachers and is featured in a classroom activity called "How Permanent is Permafrost?" We have included air temperature measured at these five stations when it is available. There are two sources for the surface air temperature data: NCAR World Monthly Surface Station Climatology, 1738-cont and NOAA Global Historical Climatology Network (GHCN) Monthly data set. These two sources both draw on the same single original source: data from the World Meteorological Organization (WMO) station network. The complete files have data from one or both sources, while the subset files only include data from the source with the most complete record. These data are being offered as is. NOAA@NSIDC believes these data to be of value but is unable to research and document these data as we do most data sets we publish.

  10. 3 second abiotic environmental raster data for the NARCLIM region of...

    • data.csiro.au
    • researchdata.edu.au
    Updated Dec 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tom Harwood; Darran King; Martin Nolan; John Gallant; Chris Ware; Jenet Austin; Kristen Williams (2020). 3 second abiotic environmental raster data for the NARCLIM region of Australia aggregated from various sources for modelling biodiversity patterns [Dataset]. http://doi.org/10.25919/8ecs-g970
    Explore at:
    Dataset updated
    Dec 2, 2020
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Tom Harwood; Darran King; Martin Nolan; John Gallant; Chris Ware; Jenet Austin; Kristen Williams
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1975 - Jan 1, 2016
    Area covered
    Dataset funded by
    Office of Environment and Heritage, New South Wales
    CSIROhttp://www.csiro.au/
    Description

    This collection of 9-second raster data was compiled for use in modelling biodiversity pattern by developers engaged in supporting the New South Wales Biodiversity Indicators Program. Substrate and landform data derive from existing collections and have been altered from their native format to fill missing and erroneous data gaps as described in the lineage. Climate data were derived using existing methods as described in the lineage. Masks derived or adopted for use in processing the data are included in this collection. Data are supplied in ESRI float grid format, GCS GDA94 Geographic Coordinate System Geocentric Datum of Australia (GDA) 1994.
    Lineage: The abiotic environmental data in this collection are grouped by broad type - climate, substrate and landform. Datasets are provided in separate compressed folders (*.zip or *.7z). An excel spreadsheet is included with the collection that list and briefly describes all datasets and their source URLs, and the processing location of the data in the CSIRO project archive. A lineage document summarises the mask and gap filling processes. Mask data were developed from existing spatial boundary data including Australian coastline, State and administration boundaries, and previous raster modelling masks for the NARCLIM region. The data gap filling process was conducted in three stages (python processing scripts are included in this collection). In the first stage, the process used a 10 cell Inverse Distance Weighted (IDW) algorithm to fill no Data areas with data. The IDW algorithm used the distance of data values in the search radius as inverse weights in a neighbourhood average. To deal with remaining larger gaps, a second stage IDW was run on the outputs of the first stage with an increased radius of 500 cells. Any remaining data gaps were filled with a global data average. This process of data filling may make the data unsuitable for other uses and should be carefully considered before use. Images of each dataset are provided in the collection for ease of reference. Data are supplied in ESRI float grid format, GCS GDA94 Geographic Coordinate System Geocentric Datum of Australia (GDA) 1994.

  11. d

    The role of health sciences libraries in supporting medical image consent...

    • search.dataone.org
    • zenodo.org
    Updated Apr 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah McClung; Rachel Keiko Stark; Megan DeArmond; Sherli Koshy-Chenthittayil (2024). The role of health sciences libraries in supporting medical image consent standards survey documentation [Dataset]. http://doi.org/10.5061/dryad.vmcvdnd17
    Explore at:
    Dataset updated
    Apr 4, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Sarah McClung; Rachel Keiko Stark; Megan DeArmond; Sherli Koshy-Chenthittayil
    Description

    Objective: To determine if health sciences library workers were familiar with best practices regarding informed consent for the publication of medical images and if they incorporate the recommendations into their professional work.Methods: A survey was developed by the authors and distributed to library listservs in the United States. The results of the survey were tabulated in R.Results: A total of 90 respondents were included in the data analysis with all respondents reporting multiple responsibilities in their professional role. While the majority of library workers (59%) were familiar with the best practices, few incorporated the recommendations into their everyday professional work.Conclusions: The professional work of health sciences library workers does not appear to include a significant inclusion of the best practices for the informed consent for the publication of medical images. There is a need for future research to better understand how library workers can better incorporat..., , , # Survey data and code

    https://doi.org/10.5061/dryad.vmcvdnd17

    The cleaned survey results as well as the R code used to analyze the data is made available here.

    Description of the data and file structure

    Cleaned data.xlsx: This excel sheet contains the results of the survey conducted for our manuscript titled The Role of Health Sciences Libraries in Supporting Medical Image Consent Standards Survey. The research study was about informed consent standards for the publication of medical images. The survey participants were health sciences library workers. The survey participants were given the option to skip questions as well as stop the survey at any time. Based on their specific role in the library, questions were grouped into different sections. So, there will be missing values in the excel sheet either due to the role of the librarian or their decision to skip questions. The missing values have been taken into account in the analysis b...

  12. Z

    Help Me study! Music Listening Habits While Studying (Dataset)

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cheah, Yiting (2024). Help Me study! Music Listening Habits While Studying (Dataset) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10085103
    Explore at:
    Dataset updated
    Apr 4, 2024
    Dataset authored and provided by
    Cheah, Yiting
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the raw data used for a research study that examined university students' music listening habits while studying. There are two experiments in this research study. Experiment 1 is a retrospective survey, and Experiment 2 is a mobile experience sampling research study. This repository contains five Microsoft Excel files with data obtained from both experiments. The files are as follows:

    onlineSurvey_raw_data.xlsx esm_raw_data.xlsx esm_music_features_analysis.xlsx esm_demographics.xlsx index.xlsx Files Description File: onlineSurvey_raw_data.xlsx This file contains the raw data from Experiment 1, including the (anonymised) demographic information of the sample. The sample characteristics recorded are:

    studentship area of study country of study type of accommodation a participant was living in age self-identified gender language ability (mono- or bi-/multilingual) (various) personality traits (various) musicianship (various) everyday music uses (various) music capacity The file also contains raw data of responses to the questions about participants' music listening habits while studying in real life. These pieces of data are:

    likelihood of listening to specific (rated across 23) music genres while studying and during everyday listening. likelihood of listening to music with specific acoustic features (e.g., with/without lyrics, loud/soft, fast/slow) music genres while studying and during everyday listening. general likelihood of listening to music while studying in real life. (verbatim) responses to participants' written responses to the open-ended questions about their real-life music listening habits while studying. File: esm_raw_data.xlsx This file contains the raw data from Experiment 2, including the following variables:

    information of the music tracks (track name, artist name, and if available, Spotify ID of those tracks) each participant was listening to during each music episode (both while studying and during everyday-listening) level of arousal at the onset of music playing and the end of the 30-minute study period level of valence at the onset of music playing and the end of the 30-minute study period specific mood at the onset of music playing and the end of the 30-minute study period whether participants were studying their location at that moment (if studying) whether they were studying alone (if studying) the types of study tasks (if studying) the perceived level of difficulty of the study task whether participants were planning to listen to music while studying (various) reasons for music listening (various) perceived positive and negative impacts of studying with music Each row represents the data for a single participant. Rows with a record of a participant ID but no associated data indicate that the participant did not respond to the questionnaire (i.e., missing data). File: esm_music_features_analysis.xlsx This file presents the music features of each recorded music track during both the study-episodes and the everyday-episodes (retrieved from Spotify's "Get Track's Audio Features" API). These features are:

    energy level loudness valence tempo mode The contextual details of the moments each track was being played are also presented here, which include:

    whether the participant was studying their location (e.g., at home, cafe, university) whether they were studying alone the type of study tasks they were engaging with (e.g., reading, writing) the perceived difficulty level of the task File: esm_demographics.xlsx This file contains the demographics of the sample in Experiment 2 (N = 10), which are the same as in Experiment 1 (see above). Each row represents the data for a single participant. Rows with a record of a participant ID but no associated demographic data indicate that the participant did not respond to the questionnaire (i.e., missing data). File: index.xlsx Finally, this file contains all the abbreviations used in each document as well as their explanations.

  13. Energy Consumption of United States Over Time

    • kaggle.com
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Energy Consumption of United States Over Time [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlocking-the-energy-consumption-of-united-state
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Energy Consumption of United States Over Time

    Building Energy Data Book

    By Department of Energy [source]

    About this dataset

    The Building Energy Data Book (2011) is an invaluable resource for gaining insight into the current state of energy consumption in the buildings sector. This dataset provides comprehensive data on residential, commercial and industrial building energy consumption, construction techniques, building technologies and characteristics. With this resource, you can get an in-depth understanding of how energy is used in various types of buildings - from single family homes to large office complexes - as well as its impact on the environment. The BTO within the U.S Department of Energy's Office of Energy Efficiency and Renewable Energy developed this dataset to provide a wealth of knowledge for researchers, policy makers, engineers and even everyday observers who are interested in learning more about our built environment and its energy usage patterns

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides comprehensive information regarding energy consumption in the buildings sector of the United States. It contains a number of key variables which can be used to analyze and explore the relations between energy consumption and building characteristics, technologies, and construction. The data is provided in both CSV format as well as tabular format which can make it helpful for those who prefer to use programs like Excel or other statistical modeling software.

    In order to get started with this dataset we've developed a guide outlining how to effectively use it for your research or project needs.

    • Understand what's included: Before you start analyzing the data, you should read through the provided documentation so that you fully understand what is included in the datasets. You'll want to be aware of any potential limitations or requirements associated with each type of data point so that your results are valid and reliable when drawing conclusions from them.

    • Clean up any outliers: You may need to take some time upfront investigating suspicious outliers within your dataset before using it in any further analyses — otherwise, they can skew results down the road if not dealt with first-hand! Furthermore, they could also make complex statistical modeling more difficult as well since they artificially inflate values depending on their magnitude within each example data point (i.e., one outlier could affect an entire model’s prior distributions). Missing values should also be accounted for too since these may not always appear obvious at first glance when reviewing a table or graphical representation - but accurate statistics must still be obtained either way no matter how messy things seem!

    • Exploratory data analysis: After cleaning up your dataset you'll want to do some basic exploring by visualizing different types of summaries like boxplots, histograms and scatter plots etc.. This will give you an initial case into what trends might exist within certain demographic/geographic/etc.. regions & variables which can then help inform future predictive models when needed! Additionally this step will highlight any clear discontinuous changes over time due over-generalization (if applicable), making sure predictors themselves don’t become part noise instead contributing meaningful signals towards overall effect predictions accuracy etc…

    • Analyze key metrics & observations: Once exploratory analyses have been carried out on rawsamples post-processing steps are next such as analyzing metrics such ascorrelations amongst explanatory functions; performing significance testing regression models; imputing missing/outlier values and much more depending upon specific project needs at hand… Additionally – interpretation efforts based

    Research Ideas

    • Creating an energy efficiency rating system for buildings - Using the dataset, an organization can develop a metric to rate the energy efficiency of commercial and residential buildings in a standardized way.
    • Developing targeted campaigns to raise awareness about energy conservation - Analyzing data from this dataset can help organizations identify areas of high energy consumption and create targeted campaigns and incentives to encourage people to conserve energy in those areas.
    • Estimating costs associated with upgrading building technologies - By evaluating various trends in building technologies and their associated costs, decision-makers can determine the most cost-effective option when it comes time to upgrade their structures' energy efficiency...
  14. RAAAP-2 Datasets (17 linked datasets)

    • figshare.com
    bin
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Kerridge; Patrice Ajai-Ajagbe; Cindy Kiel; Jennifer Shambrook; BRYONY WAKEFIELD (2023). RAAAP-2 Datasets (17 linked datasets) [Dataset]. http://doi.org/10.6084/m9.figshare.18972935.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Simon Kerridge; Patrice Ajai-Ajagbe; Cindy Kiel; Jennifer Shambrook; BRYONY WAKEFIELD
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This collection contains the 17 anonymised datasets from the RAAAP-2 international survey of research management and administration professional undertaken in 2019. To preserve anonymity the data are presented in 17 datasets linked only by AnalysisRegionofEmployment, as many of the textual responses, even though redacted to remove institutional affiliation could be used to identify some individuals if linked to the other data. Each dataset is presented in the original SPSS format, suitable for further analyses, as well as an Excel equivalent for ease of viewing. There are additional files in this collection showing the the questionnaire and the mappings to the datasets together with the SPSS scripts used to produce the datasets. These data follow on from, but re not directly linked to the first RAAAP survey undertaken in 2016, data from which can also be found in FigShare Errata (16/5/23) an error in v13 of the main Data Cleansing syntax file (now updated to v14) meant that two variables were missing their value labels (the underlying codes were correct) - a new version (SPSS & Excel) of the Main Dataset has been updated

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Christopher Erasmus (2024). Cross Regional Eucalyptus Growth and Environmental Data [Dataset]. http://doi.org/10.17632/2m9rcy3dr9.3

Cross Regional Eucalyptus Growth and Environmental Data

Explore at:
Dataset updated
Oct 7, 2024
Authors
Christopher Erasmus
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The dataset is provided in a single .xlsx file named "eucalyptus_growth_environment_data_V2.xlsx" and consists of fifteen sheets:

Codebook: This sheet details the index, values, and descriptions for each field within the dataset, providing a comprehensive guide to understanding the data structure.

ALL NODES: Contains measurements from all devices, totalling 102,916 data points. This sheet aggregates the data across all nodes.

GWD1 to GWD10: These subset sheets include measurements from individual nodes, labelled according to the abbreviation “Generic Wireless Dendrometer” followed by device IDs 1 through 10. Each sheet corresponds to a specific node, representing measurements from ten trees (or nodes).

Metadata: Provides detailed metadata for each node, including species, initial diameter, location, measurement frequency, battery specifications, and irrigation status. This information is essential for identifying and differentiating the nodes and their specific attributes.

Missing Data Intervals: Details gaps in the data stream, including start and end dates and times when data was not uploaded. It includes information on the total duration of each missing interval and the number of missing data points.

Missing Intervals Distribution: Offers a summary of missing data intervals and their distribution, providing insight into data gaps and reasons for missing data.

All nodes utilize LoRaWAN for data transmission. Please note that intermittent data gaps may occur due to connectivity issues between the gateway and the nodes, as well as maintenance activities or experimental procedures.

Software considerations: The provided R code named “Simple_Dendro_Imputation_and_Analysis.R” is a comprehensive analysis workflow that processes and analyses Eucalyptus growth and environmental data from the "eucalyptus_growth_environment_data_V2.xlsx" dataset. The script begins by loading necessary libraries, setting the working directory, and reading the data from the specified Excel sheet. It then combines date and time information into a unified DateTime format and performs data type conversions for relevant columns. The analysis focuses on a specified device, allowing for the selection of neighbouring devices for imputation of missing data. A loop checks for gaps in the time series and fills in missing intervals based on a defined threshold, followed by a function that imputes missing values using the average from nearby devices. Outliers are identified and managed through linear interpolation. The code further calculates vapor pressure metrics and applies temperature corrections to the dendrometer data. Finally, it saves the cleaned and processed data into a new Excel file while conducting dendrometer analysis using the dendRoAnalyst package, which includes visualizations and calculations of daily growth metrics and correlations with environmental factors such as vapour pressure deficit (VPD).

Search
Clear search
Close search
Google apps
Main menu