99 datasets found

f
UC_vs_US Statistic Analysis.xlsx
figshare.com
xlsx
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.12631628.v1
Dataset updated
Jul 9, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme: Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
d
Manual snow course observations, raw met data, raw snow depth observations,...
catalog.data.gov
Updated Jun 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Climate Adaptation Science Centers (2024). Manual snow course observations, raw met data, raw snow depth observations, locations, and associated metadata for Oregon sites [Dataset]. https://catalog.data.gov/dataset/manual-snow-course-observations-raw-met-data-raw-snow-depth-observations-locations-and-ass
Explore at:
Dataset updated
Jun 15, 2024
Dataset provided by
Climate Adaptation Science Centers
Area covered
Oregon
Description
OSU_SnowCourse Summary: Manual snow course observations were collected over WY 2012-2014 from four paired forest-open sites chosen to span a broad elevation range. Study sites were located in the upper McKenzie (McK) River watershed, approximately 100 km east of Corvallis, Oregon, on the western slope of the Cascade Range and in the Middle Fork Willamette (MFW) watershed, located to the south of the McKenzie. The sites were designated based on elevation, with a range of 1110-1480 m. Distributed snow depth and snow water equivalent (SWE) observations were collected via monthly manual snow courses from 1 November through 1 April and bi-weekly thereafter. Snow courses spanned 500 m of forested terrain and 500 m of adjacent open terrain. Snow depth observations were collected approximately every 10 m and SWE was measured every 100 m along the snow courses with a federal snow sampler. These data are raw observations and have not been quality controlled in any way. Distance along the transect was estimated in the field. OSU_SnowDepth Summary: 10-minute snow depth observations collected at OSU met stations in the upper McKenzie River Watershed and the Middle Fork Willamette Watershed during Water Years 2012-2014. Each meterological tower was deployed to represent either a forested or an open area at a particular site, and generally the locations were paired, with a meterological station deployed in the forest and in the open area at a single site. These data were collected in conjunction with manual snow course observations, and the meterological stations were located in the approximate center of each forest or open snow course transect. These data have undergone basic quality control. See manufacturer specifications for individual instruments to determine sensor accuracy. This file was compiled from individual raw data files (named "RawData.txt" within each site and year directory) provided by OSU, along with metadata of site attributes. We converted the Excel-based timestamp (seconds since origin) to a date, changed the NaN flags for missing data to NA, and added site attributes such as site name and cover. We replaced positive values with NA, since snow depth values in raw data are negative (i.e., flipped, with some correction to use the height of the sensor as zero). Thus, positive snow depth values in the raw data equal negative snow depth values. Second, the sign of the data was switched to make them positive. Then, the smooth.m (MATLAB) function was used to roughly smooth the data, with a moving window of 50 points. Third, outliers were removed. All values higher than the smoothed values +10, were replaced with NA. In some cases, further single point outliers were removed. OSU_Met Summary: Raw, 10-minute meteorological observations collected at OSU met stations in the upper McKenzie River Watershed and the Middle Fork Willamette Watershed during Water Years 2012-2014. Each meterological tower was deployed to represent either a forested or an open area at a particular site, and generally the locations were paired, with a meterological station deployed in the forest and in the open area at a single site. These data were collected in conjunction with manual snow course observations, and the meteorological stations were located in the approximate center of each forest or open snow course transect. These stations were deployed to collect numerous meteorological variables, of which snow depth and wind speed are included here. These data are raw datalogger output and have not been quality controlled in any way. See manufacturer specifications for individual instruments to determine sensor accuracy. This file was compiled from individual raw data files (named "RawData.txt" within each site and year directory) provided by OSU, along with metadata of site attributes. We converted the Excel-based timestamp (seconds since origin) to a date, changed the NaN and 7999 flags for missing data to NA, and added site attributes such as site name and cover. OSU_Location Summary: Location Metadata for manual snow course observations and meteorological sensors. These data are compiled from GPS data for which the horizontal accuracy is unknown, and from processed hemispherical photographs. They have not been quality controlled in any way.
d
Summary of Raw Data
search.dataone.org
dataverse.harvard.edu
Updated Nov 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Doyle, Lindsey (2023). Summary of Raw Data [Dataset]. http://doi.org/10.7910/DVN/5ODXA1
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/5ODXA1
Dataset updated
Nov 12, 2023
Dataset provided by
Harvard Dataverse
Authors
Doyle, Lindsey
Description
A summary of the raw data. Visit https://dataone.org/datasets/sha256%3Ad2b14d6a9da46e707296080c0c4a17242ca7b713e14be24a256c85693535a891 for complete metadata about this dataset.
Lattice Light-Sheet Microscopy Datasets and Workflow for Omero
zenodo.org
bin, csv +2
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). Lattice Light-Sheet Microscopy Datasets and Workflow for Omero [Dataset]. http://doi.org/10.5281/zenodo.14807429
Explore at:
zip, text/x-python, bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14807429
Dataset updated
Feb 5, 2025
Dataset provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 5, 2025
Description
Dataset Overview

This dataset provides a structured workflow for Lattice Light-Sheet Microscopy image processing, including raw data acquisition (.czi), summarised data (extract the .zarr compressed file), metadata extraction, and image enhancement techniques such as deskewing and deconvolution that can be found as a script (main.py). The dataset is intended for researchers working with high-resolution microscopy data.

Contents

Raw Data: Original microscopy images in CZI format

It is recommended to store the raw data (e.g., CZI files) as a baseline for reproducibility. If raw data is too large (e.g., 500 GB), consider downsampling it for testing and archival purposes.

Metadata: Embedded data extracted from Zeiss software can be found directly after processing .czi file, while external metadata is synthetically generated (https://github.com/onionsp/Synthetic-WGS-Dataset-Generator/).

Processing Scripts: Python scripts (as found in main.py) for deskewing, deconvolution, and data summarization.

Use the provided processing scripts to perform deskewing, deconvolution, and other preprocessing steps. Note that processed data can become significantly larger (e.g., a 500 GB raw dataset may expand to 700 GB after processing).

Summarized Data: Processed image outputs in .zarr/.tiff format, reducing storage overhead while maintaining key insights.

Save summarized data to reduce storage requirements. Summarized data could include key metrics, visualizations, or compressed outputs.

Data Transfer Agreement: Documentation regarding data sharing policies and agreements.

Workflow Overview

Deskewing: Corrects image distortions caused during acquisition.

Deconvolution: Enhances image clarity and sharpness.

Downsampling: Reduces resolution for efficient processing and sharing.

Conversion: CZI to Zarr or TIFF format for optimized storage and computational use.

Data Access & Usage

The dataset, including raw and processed files, is hosted on Zenodo.

Users are encouraged to download downsampled versions for testing before using full-resolution data.

Processing scripts enable reproducibility and customization for different research applications.

Data transfer policies are outlined in the included Data Transfer Agreement.

https://github.com/DBK333/Omero-DataPortal/tree/main/OmeroImageSamples

https://github.com/BioimageAnalysisCoreWEHI/napari_lattice
f
Data_Sheet_1_Raw Data Visualization for Common Factorial Designs Using SPSS:...
frontiersin.figshare.com
zip
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Loffing (2023). Data_Sheet_1_Raw Data Visualization for Common Factorial Designs Using SPSS: A Syntax Collection and Tutorial.ZIP [Dataset]. http://doi.org/10.3389/fpsyg.2022.808469.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2022.808469.s001
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Florian Loffing
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.
raw data summary
figshare.com
bin
Updated Jul 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Toshihisa Mizuno (2025). raw data summary [Dataset]. http://doi.org/10.6084/m9.figshare.29642231.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29642231.v1
Dataset updated
Jul 26, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Toshihisa Mizuno
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SDS-PAGE data of amphiphilic proteins for protein capsule construction
d
Smoothed snow depth data, location, raw data with headers, and associated...
catalog.data.gov
datasets.ai
+3more
Updated Jun 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Climate Adaptation Science Centers (2024). Smoothed snow depth data, location, raw data with headers, and associated metadata for University of Idaho Experimental Forest Lawler Landing site [Dataset]. https://catalog.data.gov/dataset/smoothed-snow-depth-data-location-raw-data-with-headers-and-associated-metadata-for-univer
Explore at:
Dataset updated
Jun 15, 2024
Dataset provided by
Climate Adaptation Science Centers
Area covered
Idaho
Description
UIEF_wind Summary: Within the Flat Creek Unit of the University of Idaho Experimental Forest (UIEF) near Moscow, ID, 30-minute snow depth and meteorological data were collected at seven locations across the Lawler Landing site (elevation 880 m) from February to May of WY 2008. A 70 m north-south oriented transect of 5 snow depth sensors was deployed to record sub-daily snow depth, with co-located meteorological instruments. The sensors traversed a 40 m long elliptical forest gap and the adjacent forest in both directions. The locations were the same as those used previously to quantify how shortwave and longwave radiation vary across a forest gap [Lawler and Link, 2011]. Two additional snow depth sensors and meteorological stations were deployed at “interior forest reference” and “open reference” sites, situated 80 m southeast and 1200 m west, respectively, from the main transect. Whereas the forest reference site was similar to the surrounding forest, the open reference site was much more exposed than the forest gap. These data are generally raw datalogger output and have not been quality controlled in any way unless specifically designated in the variable name. See manufacturer specifications for individual instruments to determine sensor accuracy. This file was compiled from individual raw data files provided by IU, along with approximate coordinates of the sensor locations. Collaborators at the University of Washington (Jessica Lundquist) converted the timestamp given in fractional julian days to a dates and added site attributes such as Location ID and cover. UIEF_snowdepth Summary: Observed snow depth from acoustic sensor. Measurements taken within the Lowler Landing Gap, as part of the University of Idaho Experimental Forest. Sensor data was collected half-hourly during February through May 2008. Sensor data collected at 7 different points. See location metadata and data citation for description of locations. These data include raw values and values that were smoothed by Diana Carson, see data citation for details. UIEF_Location Summary: Within the Flat Creek Unit of the University of Idaho Experimental Forest (UIEF) near Moscow, ID, 30-minute snow depth and meteorological data were collected at seven locations across the Lawler Landing site (elevation 880 m) from February to May of WY 2008. These location metadata are assocatied with each unique location identification, which ties to time series data. See Figure 1 of data citation for schematic map of locations. These coordinates are estimated from Google Earth based on Dr. Timothy Link's memory of where the sensors were located. Other attributes of each location were recorded as field notes as part of the study design.
f
Summary table of proteomic data of P. cordatum
fairdomhub.org
xlsx
Updated Sep 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Summary table of proteomic data of P. cordatum [Dataset]. https://fairdomhub.org/data_files/6155
Explore at:
xlsx(5.18 MB)Available download formats
Dataset updated
Sep 15, 2022
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Content: * Nuclear protein fractions: NSP, NMP and NTP R1-R3 (Peptide counts) * Summary nuclear proteins geLC * Cellular protein fractions: CSP, CMP R1-R3 (Peptide counts) * Summary cellular proteins geLC * Summary nuclear protein fractions NSP R1-R3 (iBAQs) * Summary cellular protein fractions CSP R1-R3 (iBAQs) * 15 columns annotation * Final summary of protein data * Prediction results * Annotation (biological processes and categories)

Tab color: raw data, green; summarized data, orange; final data, red; prediction results, blue
d
Data from: Turfgrass Soil Carbon Change Through Time: Raw Data and Code
catalog.data.gov
datasets.ai
+1more
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Turfgrass Soil Carbon Change Through Time: Raw Data and Code [Dataset]. https://catalog.data.gov/dataset/turfgrass-soil-carbon-change-through-time-raw-data-and-code-1dd62
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description
Data Description Managed turfgrass is a common component of urban landscapes that is expanding under current land use trends. Previous studies have reported high rates of soil carbon sequestration in turfgrass, but no systematic review has summarized these rates nor evaluated how they change as turfgrass ages. We conducted a meta-analysis of soil carbon sequestration rates from 63 studies. Those data, as well as the code used to analyze them and create figures, are shared here. Dataset Development We conducted a systematic review from Nov 2020 to Jan 2021 using Google Scholar, Web of Science, and the Michigan Turfgrass Information File Database. The search terms targeted were "soil carbon", "carbon sequestration", "carbon storage", or “carbon stock”, with "turf", "turfgrass", "lawn", "urban ecosystem", or "residential", “Fescue”, “Zoysia”, “Poa”, “Cynodon”, “Bouteloua”, “Lolium”, or “Agrostis”. We included only peer-reviewed studies written in English that measured SOC change over one year or longer, and where grass was managed as turf (mowed or clipped regularly). We included studies that sampled to any soil depth, and included several methodologies: small-plot research conducted over a few years (22 datasets from 4 articles), chronosequences of golf courses or residential lawns (39 datasets from 16 articles), and one study that was a variation on a chronosequence method and compiled long-term soil test data provided by golf courses of various ages (3 datasets from Qian & Follett, 2002). In total, 63 datasets from 21 articles met the search criteria. We excluded 1) duplicate reports of the same data, 2) small plot studies that did not report baseline SOC stocks, and 3) pure modeling studies. We included five papers that only measured changes in SOC concentrations, but not areal stocks (i.e., SOC in Mg ha-1). For these papers, we converted from concentrations to stocks using several approaches. For two papers (Law & Patton, 2017; Y. Qian & Follett, 2002) we used estimated bulk densities provided by the authors. For the chronosequences reported in Selhorst & Lal (2011), we used the average bulk density reported by the author. For the 13 choronosequences reported in Selhorst & Lal (2013), we estimated bulk density from the average relationship between percent C and bulk density reported by Selhorst (2011). For Wang et al. (2014), we used bulk density values from official soil survey descriptions. Data provenance In most cases we contacted authors of the studies to obtain the original data. If authors did not reply after two inquiries, or no longer had access to the data, we captured data from published figures using WebPlotDigitizer (Rohatgi, 2021). For three manuscripts the data was already available, or partially available, in public data repositories. Data provenance information is provided in the document "Dataset summaries and citations.docx". Recommended Uses We recommend the following to data users: Consult and cite the original manuscripts for each dataset, which often provide additional information about turfgrass management, experimental methods, and environmental context. Original citations are provided in the document "Dataset summaries and citations.docx". For datasets that were previously published in public repositories, consult and cite the original datasets, which may provide additional data on turfgrass management practices, soil nitrogen, and natural reference sites. Links to repositories are in the document "Dataset summaries and citations.docx". Consider contacting the dataset authors to notify them of your plans to use the data, and to offer co-authorship as appropriate.
D
Injury/Illness Summary - Operational Source Data (Form 55)
data.transportation.gov
data.virginia.gov
application/rdfxml +5
Updated Jul 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Injury/Illness Summary - Operational Source Data (Form 55) [Dataset]. https://data.transportation.gov/Railroads/Injury-Illness-Summary-Operational-Source-Data-For/unww-uhxd
Explore at:
csv, json, tsv, application/rdfxml, xml, application/rssxmlAvailable download formats
Dataset updated
Jul 23, 2025
Description
This dataset is the source dataset and contains raw data values. It will replace the current data download (https://safetydata.fra.dot.gov/OfficeofSafety/publicsite/on_the_fly_download.aspx) when the safetydata.fra.dot.gov site is decommissioned in 2024. To download data that contains data is a user-friendly human-readable format, please reference https://data.transportation.gov/Railroads/Injury-Illness-Summary-Operational-Data/m8i6-zdsy.
census-bureau-usa
kaggle.com
zip
Updated May 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). census-bureau-usa [Dataset]. https://www.kaggle.com/datasets/bigquery/census-bureau-usa
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 18, 2020
Dataset authored and provided by
Google BigQuery
Area covered
United States
Description
Context :

The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole. Update frequency: Historic (none)

Dataset source

United States Census Bureau

Sample Query

SELECT zipcode, population FROM bigquery-public-data.census_bureau_usa.population_by_zip_2010 WHERE gender = '' ORDER BY population DESC LIMIT 10

Terms of use

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/us-census-data
d
Supplementary Data
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jahn, Laura; Rendsvig, Rasmus K. (2023). Supplementary Data [Dataset]. http://doi.org/10.7910/DVN/RWPZUN
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/RWPZUN
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Jahn, Laura; Rendsvig, Rasmus K.
Description
Supplementary and additional Datasets (raw and preprocessed) in relation to future work of the paper Towards Detecting Inauthentic Coordination in Twitter Likes Data by Laura Jahn and Rasmus K. Rendsvig The supplementary data contains liking and retweeting user data and tweet IDs, supplemented with e.g. Botometer's botscores and later lookups regarding existence. A README facilities ## Repository Structure: - [1] Data from Danish Twitter on National Election - [2] Data from German Twitter - [3] Supplementary data to paper_Towards Detecting Inauthentic Coordination in Twitter Likes Data ## Folder content - [1] - Raw Data: Raw data of liking and retweeting users (you might come across #fv22 in file naming: the hashtag #fv22 is an election hashtag about the Danish National Election) - Preprocessed Data: - Binary like-user and retweet-user matrices - Botscores: Botometer v4 and lite scores for all likers and retweeters, also conveniently summarized in feature-frame tables - Clusters: Bins of perfectly correlated users - Later User and Tweets Lookups: Later (January, February 2023) lookup of previously collected users and tweets they likes/retweeted - Likers Retweeters Pagination: Later (January, February 2023) lookup of likers and retweeters using new pagination parameter - [2] - Raw Data: Raw data of liking and retweeting users (you might come across #bundestag in file naming: the hashtag #bundestag is a German political hashtag) - Preprocessed Data: Binary like-user and retweet-user matrices - [3] - Additional dataset dkpol July - Raw Data: Raw data of liking and retweeting users - Preprocessed Data: - Binary like-user and retweet-user matrices - Supp data to data used in paper_Towards Detecting Inauthentic Coordination in Twitter Likes Data - Botscores: Botometer v4 and lite scores for all likers and retweeters, also conveniently summarized in feature-frame tables - Later User and Tweets Lookups: Later (January, February 2023) lookup of previously collected users and tweets they likes/retweeted - Likers Retweeters Pagination: Later (January, February 2023) lookup of likers and retweeters using new pagination parameter
d
HERO WEC 2024 Hydraulic Configuration Deployment Data
catalog.data.gov
mhkdr.openei.org
+1more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Renewable Energy Laboratory (2025). HERO WEC 2024 Hydraulic Configuration Deployment Data [Dataset]. https://catalog.data.gov/dataset/hero-wec-2024-hydraulic-configuration-deployment-data-501bc
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
National Renewable Energy Laboratory
Description
The following submission includes raw and processed data from the in water deployment of NREL's Hydraulic and Electric Reverse Osmosis Wave Energy Converter (HERO WEC), in the form of parquet files, TDMS files, CSV files, bag files and MATLAB workspaces. This dataset was collected in March 2024 at the Jennette's pier test site in North Carolina. This submission includes the following: Data description document (HERO WEC FY24 Hydraulic Deployment Data Descriptions.doc) - This document includes detailed descriptions of the type of data and how it was processed and/or calculated. Processed MATLAB workspace - The processed data is provided in the form of a single MATLAB workspace containing data from the full deployment. This workspace contains data from all sensors down sampled to 10 Hz along with all array Value Added Products (VAPs). MATLAB visualization scripts - The MATLAB workspaces can be visualized using the file "HERO_WEC_2024_Hydraulic_Config_Data_Viewer.m/mlx". The user simply needs to download the processed MATLAB workspaces, specify the desired start and end times and run this file. Both the .m and .mlx file format has been provided depending on the user's preference. Summary Data - The fully processed data was used to create a summary data set with averages and important calculations performed on 30-minute intervals to align with the intervals of wave resource data reported from nearby CDIP ocean observing buoys located 20km East of Jennette's pier and 40km Northeast of Jennette's pier. The wave resource data provided in this data set is to be used for reference only due the difference in water depth and proximity to shore between the Jennette's pier test site and the locations of the ocean observing buoys. This data is provided in the Summary Data zip folder, which includes this data set in the form of a MATLAB workspace, parquet file, and excel spreadsheet. Processed Parquet File - The processed data is provided in the form of a single parquet file containing data from all HERO WEC sensors collected during the full deployment. Data in these files has been down sampled to 10 Hz and all array VAPs are included. Interim Filtered Data - Raw data from each sensor group partitioned into 30-minute parquet files. These files are outputs from an intermediate stage of data processing and contain the raw data with no Quality Control (QC) or calculations performed in a format that is easier to use than the raw data. Raw Data - Raw, unprocessed data from this deployment can be found in the Raw Data zip folder. This data is provided in the form of TDMS, CSV, and bag files in the original format output by the MODAQ system. Python Data Processing Script - This links to an NREL public github repository containing the python script used to go from raw data to fully processed parquet files. Additional documentation on how to use this script is included in the github repository. This data set has been developed by the National Renewable Energy Laboratory, operated by Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. Funding provided by the U.S. Department of Energy Office of Energy Efficiency and Renewable Energy Water Power Technologies Office.
o
Drinking Water Summary Data for Munster Communal Well
open.ottawa.ca
hamhanding-dcdev.opendata.arcgis.com
+1more
Updated Jun 25, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Ottawa (2019). Drinking Water Summary Data for Munster Communal Well [Dataset]. https://open.ottawa.ca/documents/41b6c1006ae94441aba09d6b812a2076
Explore at:
Dataset updated
Jun 25, 2019
Dataset authored and provided by
City of Ottawa
License
https://ottawa.ca/en/city-hall/get-know-your-city/open-data#open-data-licence-version-2-0https://ottawa.ca/en/city-hall/get-know-your-city/open-data#open-data-licence-version-2-0
Area covered

Description
Provides a summary of water quality results for raw, treated, and distribution groundwater for the City of Ottawa’s Munster Communal Well.

Accuracy: There are no known errors with this data report.

Update Frequency: Annually

Contact: Gwyn Norman
d
Food webs for three burn severities after wildfire in the Eldorado National...
search.dataone.org
data.niaid.nih.gov
+1more
Updated May 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John McLaughlin; John Schroeder; Angela White; Kate Culhane; Haley Mirts; Gina Tarbill; Laura Sire; Matt Page; Elijah Baker; Max Moritz; Justin Brasheres; Hillary Young; Rahel Sollmann (2025). Food webs for three burn severities after wildfire in the Eldorado National Forest, California [Dataset]. http://doi.org/10.5061/dryad.rv15dv47g
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.rv15dv47g
Dataset updated
May 13, 2025
Dataset provided by
Dryad Digital Repository
Authors
John McLaughlin; John Schroeder; Angela White; Kate Culhane; Haley Mirts; Gina Tarbill; Laura Sire; Matt Page; Elijah Baker; Max Moritz; Justin Brasheres; Hillary Young; Rahel Sollmann
Time period covered
Jan 1, 2021
Area covered
California
Description
We provide data on ecological community responses to wildfire, collected three years post-fire, across three burn conditions (unburned, moderate severity and high severity) in the Eldorado National Forest, California. The data were collected with 19 sampling methods deployed across 27 sites (nine in each burn condition) used to estimate richness, body size, abundance and biomass density for 849 species (including 107 primary producers, 634 invertebrates, 94 vertebrates). The sampling methods are detailed in a companion data paper. To maximize transparency and ease of use we have made our data available in four formats: Raw, tidy, temporary and summary. Raw data is as close as possible to the form in which it was collected. As such, raw data is not ready for analysis. We have provided tidy versions of each raw data set that are ready for analysis. Temporary data files are included for transparency but are used to create summary data files, and not intended to be informative as stand alon...
United States Census
kaggle.com
zip
Updated Apr 17, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US Census Bureau (2018). United States Census [Dataset]. https://www.kaggle.com/census/census-bureau-usa
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 17, 2018
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
US Census Bureau
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

The United States Census is a decennial census mandated by Article I, Section 2 of the United States Constitution, which states: "Representatives and direct Taxes shall be apportioned among the several States ... according to their respective Numbers."
Source: https://en.wikipedia.org/wiki/United_States_Census

Content

The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole.

The United States census dataset includes nationwide population counts from the 2000 and 2010 censuses. Data is broken out by gender, age and location using zip code tabular areas (ZCTAs) and GEOIDs. ZCTAs are generalized representations of zip codes, and often, though not always, are the same as the zip code for an area. GEOIDs are numeric codes that uniquely identify all administrative, legal, and statistical geographic areas for which the Census Bureau tabulates data. GEOIDs are useful for correlating census data with other censuses and surveys.

Fork this kernel to get started.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:census_bureau_usa

https://cloud.google.com/bigquery/public-data/us-census

Dataset Source: United States Census Bureau

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by Steve Richey from Unsplash.

Inspiration

What are the ten most populous zip codes in the US in the 2010 census?

What are the top 10 zip codes that experienced the greatest change in population between the 2000 and 2010 censuses?

https://cloud.google.com/bigquery/images/census-population-map.png" alt="https://cloud.google.com/bigquery/images/census-population-map.png"> https://cloud.google.com/bigquery/images/census-population-map.png
e
Retail Trade Index in the Basque Country. Summary. Raw data (p). Current...
euskadi.eus
csv, xlsx
Updated May 8, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Retail Trade Index in the Basque Country. Summary. Raw data (p). Current prices [Dataset]. https://www.euskadi.eus/retail-trade-index-in-the-basque-country-summary-raw-data-p-constant-prices/web01-tramite/es/
Explore at:
csv(1.79), xlsx(18.77)Available download formats
Dataset updated
May 8, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Basque Country
Description
The Retail trade index operation (ICIm) provides a conjunctural indicator of the trade and personnel evolution of the sector, based on the total turnover and the personnel employed in a selection of commercial establishments in the Basque Country with representation in all three provinces
NCS Pb summary statistics
catalog.data.gov
Updated Apr 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2024). NCS Pb summary statistics [Dataset]. https://catalog.data.gov/dataset/ncs-pb-summary-statistics
Explore at:
Dataset updated
Apr 13, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
NA. This dataset is not publicly accessible because: The data used in this manuscript were obtained under Data Use Agreements with the NCS Vanguard Data and Sample Archive and Access System and the NICHD Data and Specimen Hub (DASH). Because of the requirements of the DUA, we are unable to provide raw data; thus, the summary data are provided that are included in the manuscript. It can be accessed through the following means: The manuscript contains tables of the summary statistics. For the original data, users must have an approved DUA with NICHD DASH. Format: Word file of tables with summary statistics for maternal blood Pb, urine Pb, Pb surface wipe loading and Pb vacuum bag dust. This dataset is associated with the following publication: Stanek, L., N. Grokhowsky, B. George, and K. Thomas. Assessing lead exposure in U.S. pregnant women using biological and residential measurements. SCIENCE OF THE TOTAL ENVIRONMENT. Elsevier BV, AMSTERDAM, NETHERLANDS, (905): 167135, (2023).
Z
Data from: "Lithium-ion battery degradation: comprehensive cycle ageing data...
data.niaid.nih.gov
zenodo.org
Updated Mar 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patel, Yatish (2024). Data from: "Lithium-ion battery degradation: comprehensive cycle ageing data and analysis for commercial 21700 cells" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10637533
Explore at:
Dataset updated
Mar 14, 2024
Dataset provided by
Marinescu, Monica
Kirkaldy, Niall
Patel, Yatish
Samieian, Mohammad Amin
Offer, Gregory
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Intro

Dataset from the publication "Lithium-ion battery degradation: comprehensive cycle ageing data and analysis for commercial 21700 cells", DOI: https://doi.org/10.1016/j.jpowsour.2024.234185

Full details of the study can be found in the publication, including thorough descriptions of the experimental methods and structure. A basic desciption of the experimental procedure and data structure is included here for ease of use.

Commercial 21700 cylindrical cells (LG M50T, LG GBM50T2170) were cycle aged under 3 different temperatures [10, 25, 40] °C and 4 different SoC ranges [0-30, 70-85, 85-100, 0-100]%, as well as a further [0-100]% SoC range experiment which utilised a drive-cycle discharge instead of constant-current. The same C-rates (0.3C / 1 C, for charge / discharge) were used in all tests; multiple cells were tested under each condition. These are listed in the table below.

Experiment

SOC Window

Cycles per ageing set

Current

Temperature

Number of Cells

1

0-30%

257

0.3C / 1D

10°C

3

25°C

3

40°C

3

2,2

70-85%

515

0.3C / 1D

10°C

2

25°C

2

40°C

2

3

85-100%

515

0.3C / 1D

10°C

3

25°C

3

40°C

3

4

0-100% (drive-cycle)

78

0.3C / noisy D

10°C

3

25°C

2

40°C

3

5

0-100%

78

0.3C / 1D

10°C

3

25°C

2

40°C

3

Cells were base-cooled at set temperatures using bespoke test rigs (see our linked publications for details; the supporting information file contains detailed descriptions and photographs). Cells were subject to break-in cycles prior to beginning of life (BoL) performance tests using the ‘Reference Performance Test’ (RPT) procedures. They were then alternately subject to ageing sets and RPTs until the end of testing. Full details of each of these procedures are described in the linked publication.

The data contained in this repository is then described in the Data section below. This includes a description of the folder structure and naming conventions, file formats, and data analysis methods used for the ‘Processed Data’ which has been calculated from the raw data.

An 'experimental_metadata' .xlsx file is included to aid parsing of data. A jupyter notebook has also been included to demonstate how to access some of the data.

Data

Data are organised according to their parent ‘Experiment’, as defined above, with a folder for each. Within each Experiment folder, there are 3 subfolders: ‘Summary Data’, ‘Processed Timeseries Data’, and ‘Raw Data’.

Summary Data

This folder contains data which has been extracted by processing the raw data in the ‘Degradation Cycling’ and ‘Performance Checks’ folders. In most cases, the data you are looking for will be stored here.

It contains:

Performance Summary

A summary file for each cell which details key ageing metrics such as number of ageing cycles, charge throughput, cell capacity, resistance, and degradation mode analysis results. Each row of data corresponds to a different SoH.

Degradation Mode Analysis (DMA) was also performed on the C/10 discharge data at each RPT. This analysis uses an optimisation function to determine the capacities and offset of the positive and negative electrodes by calculating a full cell voltage vs capacity curve using 1/2 cell data and comparing against the experimentally measured voltage vs capacity data from the C/10 discharge. See our ACS publication for more details.

Data includes:

· Ageing Set: numbered 0 (BoL) to x, where x is the number of ageing sets the cell has been subject to.

· Ageing Cycles: number of ageing cycles the cell has been subject to. *this is not equivalent full cycles.

· Ageing Set Start Date/ End date: The date that each ageing set began/ ended.

· Days of degradation: Number of days between the date of the first ageing set beginning and the current ageing set ending.

· Age set average temperature: average recorded surface temperature of the cell during cycle ageing. Temperature was recorded approximately 1/2 way up the length of the cell (i.e. between positive and negative caps).

· Charge throughput: total accumulated charge recorded during all cycles during ageing (i.e. sum of charge and discharge). This is the cumulative total since BoL (not including RPTs, and not including break-in cycles).

· Energy throughput: as with "charge throughput", but for energy.

· C/10 Capacity: the capacity recorded during the C/10 discharge test of each RPT.

· C/2 Capacity: the capacity recorded during the C/2 discharge test of each even-numbered RPT.

· 0.1s Resistance: The resistance calculated from the 25-pulse GITT test of each even-numbered RPT. This value is taken from the 12th pulse of the procedure (which corresponds to ~52% SoC at BoL). The resistance is calculated by dividing the voltage drop by the current at a timecale of 0.1 seconds after the current pulse is applied (the fastest timescale possible under the 10 Hz recording condition).

· Fitting parameters: output from the DMA optimisation function; 5 parameters which detail the upper/lower SoCs of each electrode, and the capacity fraction of graphite in the negative electrode.

· Capacity and offset data: calculated based on the fitting parameters above alongside the measured C/10 discharge capacity.

· DM data: Quantities of LLI, LAM-PE, LAM-NE, LAM-NE-Gr, and LAM-NE-Si calculated from the change in capacities/offset of each electrode since BoL.

· RMSE data: the root mean squared error of the optimisation function calculated from the residual between the measured and simulated voltage vs capacity profiles.

Ageing Sets Summary

Data from the ageing cycles, summarised on an average per cycle and an average per ageing set basis. Metrics include mean/ max/ min temperatures, voltages etc.

Processed Timeseries data

Timeseries data (voltage, current, temperature, etc.) from each subtest (pOCV, GITT, etc.) of the RPTs, all grouped by subtest-type and by cell ID.

Contains the same data as in the ‘Performance Checks’ subfolder of the 'Raw Data' folder, but has been processed to slice into relevant subtests from the RPT procedure and includes only limited variables (time, voltage, current, charge, temperature). These are all saved as .csv files. In general this data will be easier to access than the raw data, but perhaps not as rich.

Raw Data

These are the raw data from the performance checks and from the degradation cycles themselves. The data from here has already been processed by me to get values of ‘energy throughput’, ‘charge throughput’, ‘average ageing temperature’, etc., which are all saved in the ‘Summary Data’ folder as described in the relevant section above.

The data in the ‘Degradation Cycling’ folder are organised by ageing set (where an ageing set is a defined number of ageing cycles, as described in the paper). In theory, each cell should have one datafile in each ageing set subfolder. However, due to experimental issues, tests can sometimes be interrupted midway though, requiring the test to be subsequently resumed. In this case, there may be multiple datafiles for each cell in a given ageing set; during analysis, these should be concatenated according to the descriptor in the filename (e.g., ‘cycling7’ + ‘cycling7 (part 2)').

Similarly, the unprocessed raw data from the performance checks (i.e. RPTs) is stored in the 'Performance Checks' folder, and structured in the same way.

The raw data are saved in the .mpr format produced by the Biologic battery cycler. This is a binary format which is storage-efficient but can be more difficult to process for analysis purposes. We have therefore also exported the data into .txt files (called .mpt) for the performance checks (RPTs) which make analysis easier. However, the exported .mpt files could not be included for the degradation cycling files due to their larger size. If you require access these degradation cycle data, the .mpr binary file can be parsed using the Galvani package in python, or you can use Biologic’s (proprietary) BT-Lab software to export the data into .txt files.

File Naming Convention

The raw datafiles are named with a standard format. This is:

NDK - LG M50 deg - exp 1 - rig 1 - 10degC - cell A - RPT1_01_MB_CB1 {NDK - LG M50 deg} - {exp 1} – {rig 1} – {10degC} – {cell A} – {RPT1}_{01}_{MB}_{CB1}

{Standard prefix} – {experiment number} – {ID of test rig} – {control temperature} – {Cell ID} – {RPT number or aging cycle number}_{step number for the characterisation procedure (see above)}_{experimental technique name (will always be “MB”)}_{battery cycler channel ID used (always the same for a particular cell/experiment)}
4
Data underlying the PhD thesis: A Principle-based Framework for Audit...
data.4tu.nl
zip
Updated Mar 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mochammad Gilang Ramadhan; Marijn Janssen; Haiko van der Voort (2025). Data underlying the PhD thesis: A Principle-based Framework for Audit Analytics Implementation [Dataset]. http://doi.org/10.4121/fcfdd1db-b653-4647-9533-11d9231d3e7d.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/fcfdd1db-b653-4647-9533-11d9231d3e7d.v1
Dataset updated
Mar 28, 2025
Dataset provided by
4TU.ResearchData
Authors
Mochammad Gilang Ramadhan; Marijn Janssen; Haiko van der Voort
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset funded by
LPDP
Description
This research aims to develop a principle-based framework for audit analytics implementation, which addresses the challenges of AA implementation and acknowledges its socio-technical complexities and interdependencies among challenges. This research relies on mixed methods to capture the phenomena from the research’s participants through various approaches, i.e., MICMAC-ISM, case study, and interview with practitioners, with literature exploration as the starting point. The raw data collected consists of multimedia data (audio and video recordings of interviews and focused group discussion), which is then transformed into a text file (transcript), complemented with a softcopy of the documents from the case study object.

The published data in this dataset, consists of the summarized or analyzed data, as the raw data (including transcript) is not allowed to be published according to the decision by the Human Research Ethics Committee pertinent to this research (Approval #1979, 14 February 2022). This dataset's published data are text files representing the summarized/analyzed raw data as an online appendices to the thesis.

Facebook

Twitter

Click to copy link

Link copied

Cite

F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1

UC_vs_US Statistic Analysis.xlsx

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.23644/uu.12631628.v1

Dataset updated

Jul 9, 2020

Dataset provided by

Utrecht University

Authors

F. (Fabiano) Dalpiaz

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.

Clear search

Close search

Google apps

Main menu

UC_vs_US Statistic Analysis.xlsx

Manual snow course observations, raw met data, raw snow depth observations,...

Summary of Raw Data

Lattice Light-Sheet Microscopy Datasets and Workflow for Omero

Dataset Overview

Contents

Workflow Overview

Data Access & Usage

Data_Sheet_1_Raw Data Visualization for Common Factorial Designs Using SPSS:...

raw data summary

Smoothed snow depth data, location, raw data with headers, and associated...

Summary table of proteomic data of P. cordatum

Data from: Turfgrass Soil Carbon Change Through Time: Raw Data and Code

Injury/Illness Summary - Operational Source Data (Form 55)

census-bureau-usa

Context :

Dataset source

Sample Query

Terms of use

Supplementary Data

HERO WEC 2024 Hydraulic Configuration Deployment Data

Drinking Water Summary Data for Munster Communal Well

Food webs for three burn severities after wildfire in the Eldorado National...

United States Census

Context

Content

Acknowledgements

Inspiration

Retail Trade Index in the Basque Country. Summary. Raw data (p). Current...

NCS Pb summary statistics

Data from: "Lithium-ion battery degradation: comprehensive cycle ageing data...

Data underlying the PhD thesis: A Principle-based Framework for Audit...

UC_vs_US Statistic Analysis.xlsx