Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead of
urban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
General information
This data is supplementary material to the paper by Watson et al. on sex differences in global reporting of adverse drug reactions [1]. Readers are referred to this paper for a detailed description of the context in which the data was generated. Anyone intending to use this data for any purpose should read the publicly available information on the VigiBase source data [2, 3]. The conditions specified in the caveat document [3] must be adhered to.
Source dataset
The dataset published here is based on analyses performed in VigiBase, the WHO global database of individual case safety reports [4]. All reports entered into VigiBase from its inception in 1967 up to 2 January 2018 with patient sex coded as either female or male have been included, except suspected duplicate reports [5]. In total, the source dataset contained 9,056,566 female and 6,012,804 male reports.
Statistical analysis
The characteristics of the female reports were compared to those of the male reports using a method called vigiPoint [6]. This is a method for comparing two or more sets of reports (here female and male reports) on a large set of reporting variables, and highlight any feature in which the sets are different in a statistically and clinically relevant manner. For example, patient age group is a reporting variable, and the different age groups 0 - 27 days, 28 days - 23 months et cetera are features within this variable. The statistical analysis is based on shrinkage log odds ratios computed as a comparison between the two sets of reports for each feature, including all reports without missing information for the variable under consideration. The specific output from vigiPoint is defined precisely below. Here, the results for 18 different variables with a total of 44,486 features are presented. 74 of these features were highlighted as so called vigiPoint key features, suggesting a statistically and clinically significant difference between female and male reports in VigiBase.
Description of published dataset
The dataset is provided in the form of a MS Excel spreadsheet (.xlsx file) with nine columns and 44,486 rows (excluding the header), each corresponding to a specific feature. Below follows a detailed description of the data included in the different columns.
Variable: This column indicates the reporting variable to which the specific feature belongs. Six of these variables are described in the original publication by Watson et al.: country of origin, geographical region of origin, type of reporter, patient age group, MedDRA SOC, ATC level 2 of reported drugs, seriousness, and fatality [1]. The remaining 12 are described here:
The Variable column can be useful for filtering the data, for example if one is interested in one or a few specific variables.
Feature: This column contains each of the 44,486 included features. The vast majority should be self-explanatory, or else they have been explained above, or in the original paper [1].
Female reports and Male reports: These columns show the number of female and male reports, respectively, for which the specific feature is present.
Proportion among female reports and Proportion among male reports: These columns show the proportions within the female and male reports, respectively, for which the specific feature is present. Comparing these crude proportions is the simplest and most intuitive way to contrast the female and male reports, and a useful complement to the specific vigiPoint output.
Odds ratio: The odds ratio is a basic measure of association between the classification of reports into female and male reports and a given reporting feature, and hence can be used to compare female and male reports with respect to this feature. It is formally defined as a / (bc / d), where
This crude odds ratio can also be computed as (pfemale / (1-pfemale)) / (pmale / (1-pmale)), where pfemale and pmale are the proportions described earlier. If the odds ratio is above 1, the feature is more common among the female than the male reports; if below 1, the feature is less common among the female than the male reports. Note that the odds ratio can be mathematically undefined, in which case it is missing in the published data.
vigiPoint score: This score is defined based on an odds ratio with added statistical shrinkage, defined as (a + k) / ((bc / d) + k), where k is 1% of the total number of female reports, or about 9,000. While the shrinkage adds robustness to the measure of association, it makes interpretation more difficult, which is why the crude proportions and unshrunk odds ratios are also presented. Further, 99% credibility intervals are computed for the shrinkage odds ratios, and these intervals are transformed onto a log2 scale [6]. The vigiPoint score is then defined as the lower endpoint of the interval, if that endpoint is above 0; as the higher endpoint of the interval, if that endpoint is below 0; and otherwise as 0. The vigiPoint score is useful for sorting the features from strongest positive to strongest negative associations, and/or to filter the features according to some user-defined criteria.
vigiPoint key feature: Features are classified as vigiPoint key features if their vigiPoint score is either above 0.5 or below -0.5. The specific thereshold of 0.5 is arbitrary, but chosen to identify features where the two sets of reports (here female and male reports) differ in a clinically significant way.
References
The dataset includes a PDF file containing the results and an Excel file with the following tables:
Table S1 Results of comparing the performance of MetaFetcheR to MetaboAnalystR using Diamanti et al. Table S2 Results of comparing the performance of MetaFetcheR to MetaboAnalystR for Priolo et al. Table S3 Results of comparing the performance of MetaFetcheR to MetaboAnalyst 5.0 webtool using Diamanti et al. Table S4 Results of comparing the performance of MetaFetcheR to MetaboAnalyst 5.0 webtool for Priolo et al. Table S5 Data quality test results for running 100 iterations on HMDB database. Table S6 Data quality test results for running 100 iterations on KEGG database. Table S7 Data quality test results for running 100 iterations on ChEBI database. Table S8 Data quality test results for running 100 iterations on PubChem database. Table S9 Data quality test results for running 100 iterations on LIPID MAPS database. Table S10 The list of metabolites that were not mapped by MetaboAnalystR for Diamanti et al. Table S11 An example of an input matrix for MetaFetcheR. Table S12 Results of comparing the performance of MetaFetcheR to MS_targeted using Diamanti et al. Table S13 Data set from Diamanti et al. Table S14 Data set from Priolo et al. Table S15 Results of comparing the performance of MetaFetcheR to CTS using KEGG identifiers available in Diamanti et al. Table S16 Results of comparing the performance of MetaFetcheR to CTS using LIPID MAPS identifiers available in Diamanti et al. Table S17 Results of comparing the performance of MetaFetcheR to CTS using KEGG identifiers available in Priolo et al. Table S18 Results of comparing the performance of MetaFetcheR to CTS using KEGG identifiers available in Priolo et al. (See the "index" tab in the Excel file for more information)
Small-compound databases contain a large amount of information for metabolites and metabolic pathways. However, the plethora of such databases and the redundancy of their information lead to major issues with analysis and standardization. Lack of preventive establishment of means of data access at the infant stages of a project might lead to mislabelled compounds, reduced statistical power and large delays in delivery of results.
We developed MetaFetcheR, an open-source R package that links metabolite data from several small-compound databases, resolves inconsistencies and covers a variety of use-cases of data fetching. We showed that the performance of MetaFetcheR was superior to existing approaches and databases by benchmarking the performance of the algorithm in three independent case studies based on two published datasets.
The dataset was originally published in DiVA and moved to SND in 2024.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a benchmark dataset for comparing a number of methods of spatial prediction, found in: A comparison of spatial predictors when datasets could be very large by Jonathan R. Bradley, Noel Cressie, and Tao Shi, which can be found here. This dataset reports level-2 mid-tropospheric CO_2 values at a 17.6 km × 17.6 km spatial resolution, which is obtained from Atmospheric Infrared Sounder (AIRS) data retrieved from 1–9 February 2010. AIRS is a remote sensing instrument on board the Aqua satellite administered by the National Aeronautics and Space Administration (NASA). Among other measurements, it collects CO_2 measurements in the form of spectra (level 1) that are then converted to mid-tropospheric CO_2 values (level 2) given in units of parts per million (ppm). This dataset is in the form given by Bradley et al. (2016) and is freely available under the Creative Commons Attribution 4.0 Australia License. The ZIP file contains three folders, "Small," "Large," and "VeryLarge," the data in these folders are used in a comparison study in Section 4 of Bradley et al. (2016). In each folder, there are two excel csv files, respectively for the training and the validation datasets. In each excel file, the first two columns are the latitude and longitude, respectively. The third column is mid-tropospheric CO_2 in ppm.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains one Excel sheet and five Word documents. In this dataset, Simulation.xlsx describes the parameter values used for the numerical analysis based on empirical data. In this Excel sheet, we calculated the values of each capped call-option model parameter. Computation of Table 2.docx and other documents show the results of the comparative statistics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the raw data used for a systematic review of the impact of background music on cognitive task performance (Cheah et al., 2022). Our intention is to facilitate future updates to this work. Contents description This repository contains eight Microsoft Excel files, each containing the synthesised data pertaining to each of the six cognitive domains analysed in the review, as well as task difficulty, and population characteristics:
raw-data-attention raw-data-inhibition raw-data-language raw-data-memory raw-data-thinking raw-data-processing-speed raw-data-task-difficulty raw-data--population Files description Tabs organisation The files pertaining to each cognitive domain include individual tabs for each cognitive task analysed (c.f. Figure 2 in the original paper for the list of cognitive tasks). The file with the population characteristics data also contains separate tabs for each characteristic (extraversion, music training, gender, and working memory capacity). Tabs contents In all files and tabs, each row corresponds to the data of a test. The same article can have more than one row if it reports multiple tests. For instance, the study by Cassidy and MacDonald (2007; cf. Memory.xlsx, tab: Memory-all) contains two experiments (immediate and delayed free recall) each with multiple test (immediate free recall: tests 25 – 32; delayed free recall: tests 58 – 61). Each test (one per row), in this experiment, pertains to comparisons between conditions where the background music has different levels of arousal, between groups of participants with different extraversion levels, between different tasks material (words or paragraphs) and different combinations of the previous (e.g., high arousing music vs silence test among extraverts whilst completing an immediate free recall task involving paragraphs; cf. test 30). The columns are organised as follows:
"TESTS": the index of the test in a particular tab (for easy reference); "ID": abbreviation of the cognitive tasks involved in a specific experiment (see glossary for meaning); "REFERENCE": the article where the data was taken from (see main publications for list of articles); "CONDITIONS": an abbreviated description of the music condition of a given test; "MEANS (music)": the average performance across all participants in a given experiment with background music; "MEANS (silence)": the average performance across all participants in a given experiment without background music. Then, in horizontal arrangement, we also include groups of two columns that breakdown specific comparisons related to each test (i.e., all tests comparing the same two types of condition, e.g., L-BgM vs I-BgM, will appear under the same set of columns). For each one, we indicate mean difference between the respective conditions ("MD" column) and the direction of effect ("Standard Metric" column). Each file also contains a "Glossary" tab that explains all the abbreviations used in each document. Bibliography Cheah, Y., Wong, H. K., Spitzer, M., & Coutinho, E. (2022). Background music and cognitive task performance: A systematic review of task, music and population impact. Music & Science, 5(1), 1-38. https://doi.org/10.1177/20592043221134392
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spreadsheets targeted at the analysis of GHS safety fingerprints.AbstractOver a 20-year period, the UN developed the Globally Harmonized System (GHS) to address international variation in chemical safety information standards. By 2014, the GHS became widely accepted internationally and has become the cornerstone of OSHA’s Hazard Communication Standard. Despite this progress, today we observe that there are inconsistent results when different sources apply the GHS to specific chemicals, in terms of the GHS pictograms, hazard statements, precautionary statements, and signal words assigned to those chemicals. In order to assess the magnitude of this problem, this research uses an extension of the “chemical fingerprints” used in 2D chemical structure similarity analysis to GHS classifications. By generating a chemical safety fingerprint, the consistency of the GHS information for specific chemicals can be assessed. The problem is the sources for GHS information can differ. For example, the SDS for sodium hydroxide pellets found on Fisher Scientific’s website displays two pictograms, while the GHS information for sodium hydroxide pellets on Sigma Aldrich’s website has only one pictogram. A chemical information tool, which identifies such discrepancies within a specific chemical inventory, can assist in maintaining the quality of the safety information needed to support safe work in the laboratory. The tools for this analysis will be scaled to the size of a moderate large research lab or small chemistry department as a whole (between 1000 and 3000 chemical entities) so that labelling expectations within these universes can be established as consistently as possible.Most chemists are familiar with programs such as excel and google sheets which are spreadsheet programs that are used by many chemists daily. Though a monadal programming approach with these tools, the analysis of GHS information can be made possible for non-programmers. This monadal approach employs single spreadsheet functions to analyze the data collected rather than long programs, which can be difficult to debug and maintain. Another advantage of this approach is that the single monadal functions can be mixed and matched to meet new goals as information needs about the chemical inventory evolve over time. These monadal functions will be used to converts GHS information into binary strings of data called “bitstrings”. This approach is also used when comparing chemical structures. The binary approach make data analysis more manageable, as GHS information comes in a variety of formats such as pictures or alphanumeric strings which are difficult to compare on their face. Bitstrings generated using the GHS information can be compared using an operator such as the tanimoto coefficent to yield values from 0 for strings that have no similarity to 1 for strings that are the same. Once a particular set of information is analyzed the hope is the same techniques could be extended to more information. For example, if GHS hazard statements are analyzed through a spreadsheet approach the same techniques with minor modifications could be used to tackle more GHS information such as pictograms.Intellectual Merit. This research indicates that the use of the cheminformatic technique of structural fingerprints can be used to create safety fingerprints. Structural fingerprints are binary bit strings that are obtained from the non-numeric entity of 2D structure. This structural fingerprint allows comparison of 2D structure through the use of the tanimoto coefficient. The use of this structural fingerprint can be extended to safety fingerprints, which can be created by converting a non-numeric entity such as GHS information into a binary bit string and comparing data through the use of the tanimoto coefficient.Broader Impact. Extension of this research can be applied to many aspects of GHS information. This research focused on comparing GHS hazard statements, but could be further applied to other bits of GHS information such as pictograms and GHS precautionary statements. Another facet of this research is allowing the chemist who uses the data to be able to compare large dataset using spreadsheet programs such as excel and not need a large programming background. Development of this technique will also benefit the Chemical Health and Safety community and Chemical Information communities by better defining the quality of GHS information available and providing a scalable and transferable tool to manipulate this information to meet a variety of other organizational needs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Zenodo repository contains the data supporting the findings of the journal paper, titled "A Platform-Agnostic Approach for Automatically Identifying Real-Life Performance Issue Reports with Heuristic Linguistic Patterns", published on IEEE Transactions on Software Engineering, including:
This section contains detailed data findings from six research questions (RQ1 to RQ6).
The RQ1 tab provides an evaluation of our HLP-based approach, showing the precision, recall, and F1-Score of eight classifiers. These results are juxtaposed with the corresponding values from baseline methods, at both sentence and issue levels for automatic tagging.
The RQ2 tab illustrates the precision, recall, and F1-Score of eight classifiers under two training conditions: a balanced training dataset (BT+HLP) and an imbalanced training dataset (UBT+HLP). These outcomes are contrasted with the equivalent values from baseline methods, also trained under balanced (BT+BLM) and imbalanced (UBT+BLM) conditions. The results are shown at both sentence and issue levels for automatic tagging.
The RQ3 tab evaluates the dataset transferability of our HLP-based approach in comparison to baseline methods. It achieves this by analyzing the precision, recall, and F1-Score metrics for eight classifiers under two different "training/testing" dataset conditions, i.e., 'D1/D1' and 'D1/D3'. These conditions allow for a direct comparison of performance when applied to the same dataset ('D1/D1') versus when transferred to a different dataset ('D1/D3'). Additionally, the tab includes an 'Avg Change' and 'p-value' section, summarizing the statistical change in performance metrics between the two dataset conditions.
The RQ4 tab presents a direct comparison between strict and fuzzy HLP matching approaches, assessed through precision, recall, and F1-Score metrics across eight issue classifiers.
The RQ5 tab examines the influence of sentence order on the accuracy of eight classifiers within our approach. It shows the change in precision, recall, and F1-Score when the sentence order feature is taken into consideration versus when it is not.
The RQ6 tab explores the impact of feature selection algorithms on both issue and sentence-level tagging accuracy. This tab presents the average precision, recall, and F1-Score for three experiments: Boruta, Recursive Feature Elimination (RFE), and the usage of all 80 features.
This spreadsheet offers a comprehensive examination of the data supporting Section 6.1, which focuses on Qualitative Analysis. It is organized into several tabs, each dedicated to specific research questions (RQs) as outlined below:
Tab "RQ-1" showcases performance issue reports accurately detected by our High-Level Performance (HLP) approach's top model, XGBoost, which were not identified by the benchmark method's leading model, BERT. This highlights the comparative advantage of our approach in identifying nuanced performance issues.
Tab "RQ-2" continues the exploration of performance issue reports, presenting cases with specific details (to be added).
Tab "RQ-3" delves into the unique capabilities of XGBoost, the leading model in our HLP approach, showcasing its ability to detect performance issues missed by the baseline's top model, BERT. This comparison is drawn under distinct conditions: with pre-training (Dataset 1) and without pre-training (Dataset 3), illustrating the robustness and adaptability of our model.
Tab "RQ-4" focuses on performance issue reports uniquely identified through the implementation of Fuzzy HLP Matching within our HLP approach. This method underscores the innovative matching techniques that enhance issue detection.
Tab "RQ-5" presents performance issue reports pinpointed exclusively by applying the Issue HLP Matrix within our approach. This tab demonstrates the effectiveness of our matrix-based analysis in isolating and identifying specific performance concerns.
Tab "RQ-6" is dedicated to performance issue reports uniquely detected by incorporating feature selection techniques into our HLP approach. This illustrates the value of advanced feature selection in improving the precision of performance issue identification.
https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/2.2/customlicense?persistentId=doi:10.15139/S3/12157https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/2.2/customlicense?persistentId=doi:10.15139/S3/12157
This study consists of data files that code the data availability policies of top-20 academic journals in the fields of Business & Finance, Economics, International Relations, Political Science, and Sociology. Journals that were ranked as top-20 titles based on 2003-vintage ISI Impact Factor scores were coded on their data policies in 2003 and on their data policies in 2015. In addition, journals that were ranked as top-20 titles based on most recent ISI Impact Factor scores were likewise coded on their data polices in 2015. The included Stata .do file imports the contents of each of the Excel files, cleans and labels the data, and produces two tables: one comparing the data policies of 2003-vintage top-20 journals in 2003 those journals' policies in 2015, and one comparing the data policies of 2003-vintage top-20 journals in 2003 to the data policies of current top-20 journals in 2015.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.
Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.
This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.
The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.
For more information:
NNDSS Supports the COVID-19 Response | CDC.
The deidentified data in the “COVID-19 Case Surveillance Public Use Data” include demographic characteristics, any exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and presence of any underlying medical conditions and risk behaviors. All data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.
COVID-19 case reports have been routinely submitted using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19 included. Current versions of these case definitions are available here: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/.
All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for laboratory-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. Case reporting using this new form is ongoing among U.S. states and territories.
To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.
CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:
To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<5) records and indirect identifiers (e.g., date of first positive specimen). Suppression includes rare combinations of demographic characteristics (sex, age group, race/ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.
For questions, please contact Ask SRRG (eocevent394@cdc.gov).
COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These
Note: Data files will be made available upon manuscript publication This dataset contains all code and data needed to reproduce the analyses in the manuscript: IDENTIFICATION OF A KEY TARGET FOR ELIMINATION OF NITROUS OXIDE, A MAJOR GREENHOUSE GAS. Blake A. Oakley (1), Trevor Mitchell (2), Quentin D. Read (3), Garrett Hibbs (1), Scott E. Gold (2), Anthony E. Glenn (2) Department of Plant Pathology, University of Georgia, Athens, GA, USA. Toxicology and Mycotoxin Research Unit, U.S. National Poultry Research Center, United States Department of Agriculture-Agricultural Research Service, Athens, GA, USA Southeast Area, United States Department of Agriculture-Agricultural Research Service, Raleigh, NC, USA citation will be updated upon acceptance of manuscript Brief description of study aims Denitrification is a chemical process that releases nitrous oxide (N2O), a potent greenhouse gas. The NOR1 gene is part of the denitrification pathway in Fusarium. Three experiments were conducted for this study. (1) The N2O comparative experiment compares denitrification rates, as measured by N2O production, of a variety of Fusarium spp. strains with and without the NOR1 gene. (2) The N2O substrate experiment compares denitrification rates of selected strains on different growth media (substrates). For parts 1 and 2, linear models are fit comparing N2O production between strains and/or substrates. (3) The Bioscreen growth assay tests whether there is a pleiotropic effect of the NOR1 gene. In this portion of the analysis, growth curves are fit to assess differences in growth rate and carrying capacity between selected strains with and without the NOR1 gene. Code All code is included in a .zip archive generated from a private git repository on 2022-10-13 and archived as part of this dataset. The code is contained in R scripts and RMarkdown notebooks. There are two components to the analysis: the denitrification analysis (comprising parts 1 and 2 described above) and the Bioscreen growth analysis (part 3). The scripts for each are listed and described below. Analysis of results of denitrification experiments (parts 1 and 2) NOR1_denitrification_analysis.Rmd: The R code to analyze the experimental data comparing nitrous oxide emissions is all contained in a single RMarkdown notebook. This script analyzes the results from the comparative study and the substrate study. n2o_subgroup_figures.R: R script to create additional figures using the output from the RMarkdown notebook Analysis of results of Bioscreen growth assay (part 3) bioscreen_analysis.Rmd: This RMarkdown notebook contains all R code needed to analyze the results of the Bioscreen assay comparing growth of the different strains. It could be run as is. However, the model-fitting portion was run on a high-performance computing cluster with the following scripts: bioscreen_fit_simpler.R: R script containing only the model-fitting portion of the Bioscreen analysis, fit using the Stan modeling language interfaced with R through the brms and cmdstanr packages. job_bssimple.sh: Job submission shell script used to submit the model-fitting R job to be run on USDA SciNet high-performance computing cluster. Additional scripts developed as part of the analysis but that are not required to reproduce the analyses in the manuscript are in the deprecated/ folder. Also note the files nor1-denitrification.Rproj (RStudio project file) and gtstyle.css (stylesheet for formatting the tables in the notebooks) are included. Data Data required to run the analysis scripts are archived in this dataset, other than strain_lookup.csv, a lookup table of strain abbreviations and full names included in the code repository for convenience. They should be placed in a folder or symbolic link called project within the unzipped code repository directory. N2O_data_2022-08-03/N2O_Comparative_Study_Trial_(n)(date range).xlsx: These are the data from the N2O comparative study, where n is the trial number from 1-3 and date range is the begin and end date of the trial. N2O_data_2022-08-03/Nitrogen_Substrate_Study_Trial(n)(date range).xlsx: These are the data from the N2O substrate study, where n is the trial number from 1-3 and date range is the begin and end date of the trial. Outliers_NOR1_2022/Bioscreen_NOR1_Fungal_Growth_Assay(substrate)(oxygen level)_Outliers_BAO(date).xlsx: These are the raw Bioscreen data files in MS Excel format. The format of each file name includes the substrate (minimal medium with nitrite or nitrate and lysine), oxygen level (hypoxia or normoxia), and date of the run. This repository includes code to process these files, but the processed data are also included on Ag Data Commons, so it is not necessary to run the data processing portion of the code. clean_data/bioscreen_clean_data.csv: This is an intermediate output file in CSV format generated by bioscreen_analysis.Rmd. It includes all the data from the Bioscreen assays in a clean analysis-ready format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains the results from questionnaires gathered during user testing of the SELFEX solution, a training system utilizing motion-tracking gloves, augmented reality (AR), and screen-based interfaces. Participants were asked to complete paper- and tablet-based questionnaires after interacting with both AR and screen-guided training environments. The data provided allows for a comparative analysis between the two training methods (AR vs. screen) and assesses the suitability of the MAGOS hand-tracking gloves for this application. Additionally, it facilitates the exploration of correlations between various user experience factors, such as ease of use, usefulness, satisfaction, and ease of learning.
The folder is divided into two types of files:
- PDF files: These contain the three questionnaires administered during testing.
- "dataset.xlsx": This file includes the questionnaire results.
Within the Excel file, the data is organized across three sheets:
- "Results with AR glasses": Displays data from the experiment conducted using Hololens 2 AR glasses. Participants are anonymized and coded by gender (e.g., M01 for the first male participant).
- "Results without AR glasses": Shows data from the experiment conducted with five participants using a TV screen instead of Hololens 2 to follow the assembly training instructions.
- "Demographic data": Contains demographic information related to the participants.
This dataset enables comprehensive evaluation and comparison of the training methods and user experiences.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Reporting of new Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. This dataset will receive a final update on June 1, 2023, to reconcile historical data through May 10, 2023, and will remain publicly available.
Aggregate Data Collection Process Since the start of the COVID-19 pandemic, data have been gathered through a robust process with the following steps:
Methodology Changes Several differences exist between the current, weekly-updated dataset and the archived version:
Confirmed and Probable Counts In this dataset, counts by jurisdiction are not displayed by confirmed or probable status. Instead, confirmed and probable cases and deaths are included in the Total Cases and Total Deaths columns, when available. Not all jurisdictions report probable cases and deaths to CDC.* Confirmed and probable case definition criteria are described here:
Council of State and Territorial Epidemiologists (ymaws.com).
Deaths CDC reports death data on other sections of the website: CDC COVID Data Tracker: Home, CDC COVID Data Tracker: Cases, Deaths, and Testing, and NCHS Provisional Death Counts. Information presented on the COVID Data Tracker pages is based on the same source (total case counts) as the present dataset; however, NCHS Death Counts are based on death certificates that use information reported by physicians, medical examiners, or coroners in the cause-of-death section of each certificate. Data from each of these pages are considered provisional (not complete and pending verification) and are therefore subject to change. Counts from previous weeks are continually revised as more records are received and processed.
Number of Jurisdictions Reporting There are currently 60 public health jurisdictions reporting cases of COVID-19. This includes the 50 states, the District of Columbia, New York City, the U.S. territories of American Samoa, Guam, the Commonwealth of the Northern Mariana Islands, Puerto Rico, and the U.S Virgin Islands as well as three independent countries in compacts of free association with the United States, Federated States of Micronesia, Republic of the Marshall Islands, and Republic of Palau. New York State’s reported case and death counts do not include New York City’s counts as they separately report nationally notifiable conditions to CDC.
CDC COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths, available by state and by county. These and other data on COVID-19 are available from multiple public locations, such as:
https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html
https://www.cdc.gov/covid-data-tracker/index.html
https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html
https://www.cdc.gov/coronavirus/2019-ncov/php/open-america/surveillance-data-analytics.html
Additional COVID-19 public use datasets, include line-level (patient-level) data, are available at: https://data.cdc.gov/browse?tags=covid-19.
Archived Data Notes:
November 3, 2022: Due to a reporting cadence issue, case rates for Missouri counties are calculated based on 11 days’ worth of case count data in the Weekly United States COVID-19 Cases and Deaths by State data released on November 3, 2022, instead of the customary 7 days’ worth of data.
November 10, 2022: Due to a reporting cadence change, case rates for Alabama counties are calculated based on 13 days’ worth of case count data in the Weekly United States COVID-19 Cases and Deaths by State data released on November 10, 2022, instead of the customary 7 days’ worth of data.
November 10, 2022: Per the request of the jurisdiction, cases and deaths among non-residents have been removed from all Hawaii county totals throughout the entire time series. Cumulative case and death counts reported by CDC will no longer match Hawaii’s COVID-19 Dashboard, which still includes non-resident cases and deaths.
November 17, 2022: Two new columns, weekly historic cases and weekly historic deaths, were added to this dataset on November 17, 2022. These columns reflect case and death counts that were reported that week but were historical in nature and not reflective of the current burden within the jurisdiction. These historical cases and deaths are not included in the new weekly case and new weekly death columns; however, they are reflected in the cumulative totals provided for each jurisdiction. These data are used to account for artificial increases in case and death totals due to batched reporting of historical data.
December 1, 2022: Due to cadence changes over the Thanksgiving holiday, case rates for all Ohio counties are reported as 0 in the data released on December 1, 2022.
January 5, 2023: Due to North Carolina’s holiday reporting cadence, aggregate case and death data will contain 14 days’ worth of data instead of the customary 7 days. As a result, case and death metrics will appear higher than expected in the January 5, 2023, weekly release.
January 12, 2023: Due to data processing delays, Mississippi’s aggregate case and death data will be reported as 0. As a result, case and death metrics will appear lower than expected in the January 12, 2023, weekly release.
January 19, 2023: Due to a reporting cadence issue, Mississippi’s aggregate case and death data will be calculated based on 14 days’ worth of data instead of the customary 7 days in the January 19, 2023, weekly release.
January 26, 2023: Due to a reporting backlog of historic COVID-19 cases, case rates for two Michigan counties (Livingston and Washtenaw) were higher than expected in the January 19, 2023 weekly release.
January 26, 2023: Due to a backlog of historic COVID-19 cases being reported this week, aggregate case and death counts in Charlotte County and Sarasota County, Florida, will appear higher than expected in the January 26, 2023 weekly release.
January 26, 2023: Due to data processing delays, Mississippi’s aggregate case and death data will be reported as 0 in the weekly release posted on January 26, 2023.
February 2, 2023: As of the data collection deadline, CDC observed an abnormally large increase in aggregate COVID-19 cases and deaths reported for Washington State. In response, totals for new cases and new deaths released on February 2, 2023, have been displayed as zero at the state level until the issue is addressed with state officials. CDC is working with state officials to address the issue.
February 2, 2023: Due to a decrease reported in cumulative case counts by Wyoming, case rates will be reported as 0 in the February 2, 2023, weekly release. CDC is working with state officials to verify the data submitted.
February 16, 2023: Due to data processing delays, Utah’s aggregate case and death data will be reported as 0 in the weekly release posted on February 16, 2023. As a result, case and death metrics will appear lower than expected and should be interpreted with caution.
February 16, 2023: Due to a reporting cadence change, Maine’s
This comparative research has examined two political insurgency cases: paramilitary groups in the Northern Ireland conflict and the Red Brigades in Italy; and two organised crime groups in Italy. By comparing illegal violent political groups and criminals this project has first shown the many variables affecting recruitment under the primary conditions dictated by violence and illegality when the size, the military strategy and the goal of the groups are substantially different. Second this research shows how despite those many differences common patterns of how organisations screen their recruits and how potential members signal their “fitness” to join can be identified. Primary and secondary data have been gathered from in-depth qualitative interviews, judicial reports, newspaper articles and published biographies. This dataset contains only the data for Bari, Italy with 740 male records and 161 female records. The aim of this project was to analyse recruitment into illegal organisations that use violence for political or criminal goals. While these groups differ in their aims, structures and constraints — and these differences must of course affect the way they recruit — they all share the need to find trustworthy, loyal and competent members. Moreover, the crude reality of life in the underworld, where there is little evidence of "loyalty amongst thieves" makes the need for a selection process more immediate and therefore clearer to observe. This project investigated recruitment from both `supply' and 'demand' perspective by addressing two core questions: what are the features of those who volunteer to join an illegal organisation and how recruiters and volunteers assess each other's trustworthiness This comparative research has examined two political insurgency cases: paramilitary groups in the Northern Ireland conflict and the Red Brigades in Italy; and two organised crime groups in Italy. By comparing illegal violent political groups and criminals this project has first shown the many variables affecting recruitment under the primary conditions dictated by violence and illegality when the size, the military strategy and the goal of the groups are substantially different. Second this research shows how despite those many differences common patterns of how organisations screen their recruits and how potential members signal their "fitness" to join can be identified. Four case studies were selected: (1) Catholic and Protestant paramilitary organizations in Northern Ireland: the Irish Republican Army (IRA), the Ulster Volunteer Force (UVF) and the Ulster Defence Army (UDA); (2) Italian Red Brigades; (3) Sicilian mafia (Cosa Nostra); (4) Organized crime groups in Apulia, southern Italy. These groups vary in their aims, structures, size and local constraints but they all share the need to find trustworthy, loyal and competent members under the key conditions of illegality, asymmetrical information, the use of violence, varying risks of infiltration and the high cost of error. Furthermore these groups have outlasted their rivals suggesting that they have relatively good solutions to the recruitment problems they face. They also share key similarities regarding their memberships: they maintain a key distinction between members and non-members; there is a formal initiation or ritual entry apart from the Red Brigades where membership was marked by the disclosure of the identity of other underground members; members have exclusive knowledge about the organisation; members are subject to a code of behaviour and they receive the protection of the organisation. Due to a lack of existing data this research has requited the large-scale collection, coding and analysis of primary and secondary data gathered from in-depth qualitative interviews, judicial reports, newspaper articles and published biographies. The following data have been collected: Northern Ireland: 40 qualitative interviews with IRA, UVF and UDA members; 10 interviews with the police and intelligence services; also data from biographies, newspaper and judicial reports. (150 individual cases). Red Brigades: Court papers of 17 RB trials (1980-1984); 32 qualitative interviews of former violent radical left wing militants from the Istituto Carlo Cattaneo DOTE archive in Bologna; 7 interviews released from secondary sources; 10 biographies of former RB members; additional data from biographies, newspaper and judicial reports. (470 individual cases). Organised Crime Groups: 13 qualitative interviews with social workers, judiciary and law and order officials; 51 state witness statements of former members of Cosa Nostra; 16 state witness statement of former members of an organized crime group in Apulia; Court papers of 25 major organized crime trials in Palermo and Bari 1984-2006; DIA annual reports 1998-2009; additional data from newspaper reports and the Commissione Parlamentare sul Fenomeno della mafia. (1738 individual cases) A total number of 3056 cases have been coded and entered into SPSS and Excel data sets to be deposited with ESDS. The qualitative interviews and digitalized judicial papers have been coded and analysed with the support of MAXQDA. Ethical Issues Interviews were anonymised and data held securely, the purpose of the research was made transparent to all interviewees and verbal rather than written consent was sought lest their association with the research questions could have left them vulnerable to the police or other security agencies. The British Sociological Association and University of Oxford ethical guidelines were conformed to throughout.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
-----------------------------------------------------------------------------------------------------------------
Data for "Comparing Machine Learning Classifiers and Linear/Logistic Regression to Explore the Relationship between Hand Dimensions and Demographic Characteristics" (PLOSONE)
Oscar Miguel-Hurtado1, Richard Guest1, Sarah V. Stevenage2,Greg J. Neil2,Sue Black3
-----------------------------------------------------------------------------------------------------------------
For more information please contact: O.Miguel-Hurtado-98@kent.ac.uk (Oscar Miguel)
-----------------------------------------------------------------------------------------------------------------
The zip contains right and left hand geometry images from 112 participants. The images were captured using a Nikon D200 SLR camera (format: jpg, size: 3504x2336 pixels), with both the palm of the hand and camera facing downwards. Participants placed each hand on an acetate sheet with a series of positioning pegs.
-----------------------------------------------------------------------------------------------------------------
The excel contains a series of length measurements (based on the underlying skeleton of the hand) manually extracted (see Figure 1 for details) along with demographic information from the participants: sex (male or female), height (in cm), weight (in kg) and foot size (in UK sizes).
The dataset deposited here underpins a PhD thesis sucessfully defended by Stian Kjeksrud on June 6, 2019, at the University of Oslo, Faculty of Social Sciences, Department of Political Science. This dataset consists of two data files, each in its original excel-file format as well as in a Unicode-text format: 1. UN_POC_Operations_UNPOCO_2019-01-25 This file (United Nations Protection of Civilians Operations (UNPOCO)) captures and codes the core empirical characteristics of 200 UN military operations to protect civilians from violence in African conflicts between 1999 and 2017. 2. UN_POC_Operations_UNPOCO_fsQCA_2019-01-25 This file (UNPOCO fsQCA) builds directly on the UNPOCO dataset, but consists of a sub-set of 126 cases tailored to fuzzy set Qualitative Comparative Analysis (fsQCA), and therefore includes a QCA-matrix and some additional information for each case. Both data files are built by Stian Kjeksrud to support the analysis of variations in outcomes of operations and to explore success factors of UN military protection operations across time and UN missions. The data are captured from the United Nations Secretary-General’s openly available reporting to the United Nations Security Council.
The data is the output of two studies. In study 1 we tested the hypotheses that: Men are more likely to display the better-than-average effect than women, especially when comparing themselves to women. Men who score high on the better-than-average effect also have a bigger tendency to explain things, especially when speaking with women. In study 2 we tested the hypothesis that: Women are more prone to the interpretation bias than men, especially when speaking with men. Data files Cleaned excel file with output of the better-than-average survey Cleaned excel file with output of the better-than-average survey 2 Cleaned excel file with output of the interpretation bias survey Supplemental material Better-than-average questionnaire 1 Better-than-average questionnaire 2 Interpretation Bias questionnaire Method: Qualtrics survey Universe: General population Country / Nation: the Netherlands
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.