51 datasets found

Statistical Comparison of Two ROC Curves
figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yaacov Petscher (2023). Statistical Comparison of Two ROC Curves [Dataset]. http://doi.org/10.6084/m9.figshare.860448.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.860448.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Yaacov Petscher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.
f
UC_vs_US Statistic Analysis.xlsx
figshare.com
xlsx
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.12631628.v1
Dataset updated
Jul 9, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme: Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
H
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...
dataverse.harvard.edu
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgios Boumis; Brad Peter (2024). Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends [Dataset]. http://doi.org/10.7910/DVN/ZZDYM9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ZZDYM9
Dataset updated
Jul 8, 2024
Dataset provided by
Harvard Dataverse
Authors
Georgios Boumis; Brad Peter
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...
o
Data for: Sustainable connectivity in a community repository
explore.openaire.eu
data.niaid.nih.gov
+3more
Updated Dec 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ted Habermann (2023). Data for: Sustainable connectivity in a community repository [Dataset]. http://doi.org/10.5061/dryad.nzs7h44xr
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.nzs7h44xr
Dataset updated
Dec 7, 2023
Authors
Ted Habermann
Description
Data For: Sustainable Connectivity in a Community Repository ## GENERAL INFORMATION This readme.txt file was generated on 30231110 by Ted Habermann ### Title of Dataset Data For: Sustainable Connectivity in a Community Repository ### Author Information Principal Investigator Contact Information Name: Ted Habermann (0000-0003-3585-6733) Institution: Metadata Game Changers () Email: ORCID: 0000-0003-3585-6733 ### Date published or finalized for release: November 10, 2023 ## Date of data collection (single date, range, approximate date) May and June 2023 ### Information about funding sources that supported the collection of the data: National Science Foundation (Crossref Funder ID: 100000001) Award 2134956. ### Overview of the data (abstract): These data are Dryad metadata retrieved from and translated into csv files. There are two datasets: 1. DryadJournalDataset was retrieved from Dryad using the ISSNs in the file DryadJournalDataset_ISSNs.txt, although some had no data. 2. DryadOrganizationDataset was retrieved from Dryad using the RORs in the file DryadOrganizationDataset_RORs.txt, although some had no data. Each dataset includes four types of metadata: identifiers, funders, keywords, and related works, each in a separate comma (.csv) or tab (.tsv) delimited files. There are also Microsoft Excel files (.xlsx) for the identifier metadata and connectivity summaries for each dataset (*.html). The connectivity summaries include summaries of each parameter in all four data files with definitions, counts, unique counts, most frequent values, and completeness. These data formed the basis for an analysis of the connectivity of the Dryad repository for organizations, funders, and people. | Size | FileName | | --------: | :--------------------------------------------------------- | | 90541505 | DryadJournalDataset_Identifiers_20230520_12.csv | | 9017051 | DryadJournalDataset_funders_20230520_12.tsv | | 29108477 | DryadJournalDataset_keywords_20230520_12.tsv | | 8833842 | DryadJournalDataset_relatedWorks_20230520_12.tsv | | | | | 18260935 | DryadOrganizationDataset_funders_20230601_12.tsv | | 240128730 | DryadOrganizationDataset_identifiers_20230601_12.tsv | | 39600659 | DryadOrganizationDataset_keywords_20230601_12.tsv | | 11520475 | DryadOrganizationDataset_relatedWorks_20230601_12.tsv | | | | | 40726143 | DryadJournalDataset_identifiers_20230520_12.xlsx | | 81894301 | DryadOrganizationDataset_identifiers_20230601_12.xlsx | | | | | 842827 | DryadJournalDataset_ConnectivitySummary.html | | 387551 | DryadOrganizationDataset_ConnectivitySummary.html | ### Field Definitions ## SHARING/ACCESS INFORMATION ### Licenses/restrictions placed on the data: Creative Commons Public Domain License (CC0) ### Links to publications that cite or use the data: TBD ### Was data derived from another source? No ## DATA & FILE OVERVIEW ### File List A. *Dataset_identifiers_YYYYMMDD_HH.*sv: Short description: Identifier metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. B. *Dataset_funders_YYYYMMDD_HH.*sv: Short description: Funder metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. C. *Dataset_keywords_YYYYMMDD_HH.*sv: Short description: Keyword metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. D. *Dataset_relatedWorks_YYYYMMDD_HH.*sv: Short description: Related work metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. E. *Dataset_identifiers_YYYYMMDD_HH.xlsx: Short description: Excel spreadsheet with identifier metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. F. *Dataset_ConnectivitySummary.html: Short description: Connectivity summary for Dataset. G. summarizeConnectivity.ipynb Short description: Python notebook with code for creating connectivity summaries and plots. ### Relationship between files: All files with the same dataset name make up a dataset. The .*sv are original metadata extracted from Dryad. ## METHODOLOGICAL INFORMATION ### Description of methods used for collection/generation of data: Most of the analysis is simply extracting and comparing counts of various metadata elements. ## DATA-SPECIFIC INFORMATION See connectivity summaries (*ConnectivitySummary.html) for a list of parameters in each file and summaries of their values. ### Identifier Metadata The identifier metadata datasets include the following fields: | Field | Definition | | :------------------------------- | :--------------------------------------------------------------------------------------------------- | | DOI | Digital object identifier for the dataset | | title | Title for the dataset | | datePublished | Date dataset published | | relatedPublicationISSN | International Standard Serial Number for journal with related publication | | primary_article | Digital object identifier for pr...
z
A vigiPoint characterisation of female versus male reports in VigiBase, the...
zenodo.org
dataone.org
+3more
bin
Updated Jun 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Watson; Sarah Watson; Ola Caster; Ola Caster (2022). A vigiPoint characterisation of female versus male reports in VigiBase, the WHO global database of individual case safety reports [Dataset]. http://doi.org/10.5061/dryad.8cz8w9gk1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.8cz8w9gk1
Dataset updated
Jun 2, 2022
Dataset provided by
Zenodo
Authors
Sarah Watson; Sarah Watson; Ola Caster; Ola Caster
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
General information

This data is supplementary material to the paper by Watson et al. on sex differences in global reporting of adverse drug reactions [1]. Readers are referred to this paper for a detailed description of the context in which the data was generated. Anyone intending to use this data for any purpose should read the publicly available information on the VigiBase source data [2, 3]. The conditions specified in the caveat document [3] must be adhered to.

Source dataset

The dataset published here is based on analyses performed in VigiBase, the WHO global database of individual case safety reports [4]. All reports entered into VigiBase from its inception in 1967 up to 2 January 2018 with patient sex coded as either female or male have been included, except suspected duplicate reports [5]. In total, the source dataset contained 9,056,566 female and 6,012,804 male reports.

Statistical analysis

The characteristics of the female reports were compared to those of the male reports using a method called vigiPoint [6]. This is a method for comparing two or more sets of reports (here female and male reports) on a large set of reporting variables, and highlight any feature in which the sets are different in a statistically and clinically relevant manner. For example, patient age group is a reporting variable, and the different age groups 0 - 27 days, 28 days - 23 months et cetera are features within this variable. The statistical analysis is based on shrinkage log odds ratios computed as a comparison between the two sets of reports for each feature, including all reports without missing information for the variable under consideration. The specific output from vigiPoint is defined precisely below. Here, the results for 18 different variables with a total of 44,486 features are presented. 74 of these features were highlighted as so called vigiPoint key features, suggesting a statistically and clinically significant difference between female and male reports in VigiBase.

Description of published dataset

The dataset is provided in the form of a MS Excel spreadsheet (.xlsx file) with nine columns and 44,486 rows (excluding the header), each corresponding to a specific feature. Below follows a detailed description of the data included in the different columns.

Variable: This column indicates the reporting variable to which the specific feature belongs. Six of these variables are described in the original publication by Watson et al.: country of origin, geographical region of origin, type of reporter, patient age group, MedDRA SOC, ATC level 2 of reported drugs, seriousness, and fatality [1]. The remaining 12 are described here:

MedDRA HLGT (high-level group term), MedDRA HLT (high-level term) and MedDRA PT (preferred term) are defined analogously to the MedDRA SOC (system organ class) [1], only at lower levels of the MedDRA (Medical Dictionary for Regulatory Activities) hierarchy. Here, MedDRA version 20.1 has been used.

ATC level 3 of reported drugs is defined analogously to the variable ATC level 2 of reported drugs [1], only one step further down in the ATC (Anatomical Therapeutical Classification) hierarchy.

The vigiGrade completeness score is a measure of how complete each report is with respect to certain report fields useful for causality assessment [7]. The completeness score has been dichotomised into two features, 'Above or equal to 0.8' and 'Below 0.8'. The maximum possible score for an individual report is 1.0.

The date of VigiBase entry is simply the time when a report was entered into VigiBase. This variable is divided into 14 features that are either individual years or ranges of years.

The number of reported drugs is the number of unique drugs that are coded on a report as either suspected, interacting, or concomitant. A drug is here defined as an entry at the preferred base (i.e. substance) level of the WHODRUG terminology. The variable is divided into four features: 'One drug', 'Two drugs', '3-5 drugs', and 'More than 5 drugs'.

The number of reported MedDRA PTs is the number of unique MedDRA preferred terms that are coded as events on a report. This variable is divided into four features in exactly the same way as the reported drugs.

A reported drug is a drug coded on a report as either suspected, interacting, or concomitant. As above, a drug is defined as an entry at the preferred base (i.e. substance) level of the WHODRUG terminology. This variable has almost 23,000 features, one for each drug that occurs in at least one female or one male report.

The type of report indicates the type of individual case report. The vast majority belongs to the feature 'Spontaneous', but there are four other possible features for this variable.

The Variable column can be useful for filtering the data, for example if one is interested in one or a few specific variables.

Feature: This column contains each of the 44,486 included features. The vast majority should be self-explanatory, or else they have been explained above, or in the original paper [1].

Female reports and Male reports: These columns show the number of female and male reports, respectively, for which the specific feature is present.

Proportion among female reports and Proportion among male reports: These columns show the proportions within the female and male reports, respectively, for which the specific feature is present. Comparing these crude proportions is the simplest and most intuitive way to contrast the female and male reports, and a useful complement to the specific vigiPoint output.

Odds ratio: The odds ratio is a basic measure of association between the classification of reports into female and male reports and a given reporting feature, and hence can be used to compare female and male reports with respect to this feature. It is formally defined as a / (bc / d), where

a is the number of female reports with the feature

b is the number of female reports without the feature (excluding reports where the variable is missing)

c is the number of male reports with the feature

d is the number of male reports without the feature (excluding reports where the variable is missing).

This crude odds ratio can also be computed as (p_female / (1-p_female)) / (p_male / (1-p_male)), where p_female and p_male are the proportions described earlier. If the odds ratio is above 1, the feature is more common among the female than the male reports; if below 1, the feature is less common among the female than the male reports. Note that the odds ratio can be mathematically undefined, in which case it is missing in the published data.

vigiPoint score: This score is defined based on an odds ratio with added statistical shrinkage, defined as (a + k) / ((bc / d) + k), where k is 1% of the total number of female reports, or about 9,000. While the shrinkage adds robustness to the measure of association, it makes interpretation more difficult, which is why the crude proportions and unshrunk odds ratios are also presented. Further, 99% credibility intervals are computed for the shrinkage odds ratios, and these intervals are transformed onto a log₂ scale [6]. The vigiPoint score is then defined as the lower endpoint of the interval, if that endpoint is above 0; as the higher endpoint of the interval, if that endpoint is below 0; and otherwise as 0. The vigiPoint score is useful for sorting the features from strongest positive to strongest negative associations, and/or to filter the features according to some user-defined criteria.

vigiPoint key feature: Features are classified as vigiPoint key features if their vigiPoint score is either above 0.5 or below -0.5. The specific thereshold of 0.5 is arbitrary, but chosen to identify features where the two sets of reports (here female and male reports) differ in a clinically significant way.

References

Watson S, Caster O, Rochon PA, den Ruijter H. Reported adverse drug reactions in women and men: Aggregated evidence from globally collected individual case reports during half a decade. EClinicalMedicine 2019.

Uppsala Monitoring Centre. Guideline for using VigiBase data in studies.

Uppsala Monitoring Centre. Caveat document: Statement of reservations, limitations, and conditions relating to data released from VigiBase, the WHO global database of individual case safety reports (ICSRs).

Lindquist M. VigiBase, the WHO Global ICSR Database System: Basic Facts. The Drug Information Journal 2008; 42(5): 409-19.

Norén GN, Orre R, Bate A, Edwards IR. Duplicate detection in adverse drug reaction surveillance. Data Mining and Knowledge Discovery 2007; 14(3): 305-28.

Juhlin K, Star K, Norén GN. A method for data-driven exploration to pinpoint key features in medical data and facilitate expert review. Pharmacoepidemiology and Drug Safety 2017; 26(10):
r
Data from: Supplementary tables:MetaFetcheR: An R package for complete...
researchdata.se
Updated Jun 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sara A. Yones; Rajmund Csombordi; Jan Komorowski; Klev Diamanti (2024). Supplementary tables:MetaFetcheR: An R package for complete mapping of small compound data [Dataset]. http://doi.org/10.57804/7sf1-fw75
Explore at:
(78625), (728116)Available download formats
Unique identifier
https://doi.org/10.57804/7sf1-fw75
Dataset updated
Jun 24, 2024
Dataset provided by
Uppsala University
Authors
Sara A. Yones; Rajmund Csombordi; Jan Komorowski; Klev Diamanti
Description
The dataset includes a PDF file containing the results and an Excel file with the following tables:

Table S1 Results of comparing the performance of MetaFetcheR to MetaboAnalystR using Diamanti et al. Table S2 Results of comparing the performance of MetaFetcheR to MetaboAnalystR for Priolo et al. Table S3 Results of comparing the performance of MetaFetcheR to MetaboAnalyst 5.0 webtool using Diamanti et al. Table S4 Results of comparing the performance of MetaFetcheR to MetaboAnalyst 5.0 webtool for Priolo et al. Table S5 Data quality test results for running 100 iterations on HMDB database. Table S6 Data quality test results for running 100 iterations on KEGG database. Table S7 Data quality test results for running 100 iterations on ChEBI database. Table S8 Data quality test results for running 100 iterations on PubChem database. Table S9 Data quality test results for running 100 iterations on LIPID MAPS database. Table S10 The list of metabolites that were not mapped by MetaboAnalystR for Diamanti et al. Table S11 An example of an input matrix for MetaFetcheR. Table S12 Results of comparing the performance of MetaFetcheR to MS_targeted using Diamanti et al. Table S13 Data set from Diamanti et al. Table S14 Data set from Priolo et al. Table S15 Results of comparing the performance of MetaFetcheR to CTS using KEGG identifiers available in Diamanti et al. Table S16 Results of comparing the performance of MetaFetcheR to CTS using LIPID MAPS identifiers available in Diamanti et al. Table S17 Results of comparing the performance of MetaFetcheR to CTS using KEGG identifiers available in Priolo et al. Table S18 Results of comparing the performance of MetaFetcheR to CTS using KEGG identifiers available in Priolo et al. (See the "index" tab in the Excel file for more information)

Small-compound databases contain a large amount of information for metabolites and metabolic pathways. However, the plethora of such databases and the redundancy of their information lead to major issues with analysis and standardization. Lack of preventive establishment of means of data access at the infant stages of a project might lead to mislabelled compounds, reduced statistical power and large delays in delivery of results.

We developed MetaFetcheR, an open-source R package that links metabolite data from several small-compound databases, resolves inconsistencies and covers a variety of use-cases of data fetching. We showed that the performance of MetaFetcheR was superior to existing approaches and databases by benchmarking the performance of the algorithm in three independent case studies based on two published datasets.

The dataset was originally published in DiVA and moved to SND in 2024.
u
AIRS level-2 CO_2 dataset (1–9 February 2010)
hpc.niasra.uow.edu.au
Updated Aug 7, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). AIRS level-2 CO_2 dataset (1–9 February 2010) [Dataset]. https://hpc.niasra.uow.edu.au/ckan/dataset/airs-co_2-feb-2010
Explore at:
Dataset updated
Aug 7, 2019
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a benchmark dataset for comparing a number of methods of spatial prediction, found in: A comparison of spatial predictors when datasets could be very large by Jonathan R. Bradley, Noel Cressie, and Tao Shi, which can be found here. This dataset reports level-2 mid-tropospheric CO_2 values at a 17.6 km × 17.6 km spatial resolution, which is obtained from Atmospheric Infrared Sounder (AIRS) data retrieved from 1–9 February 2010. AIRS is a remote sensing instrument on board the Aqua satellite administered by the National Aeronautics and Space Administration (NASA). Among other measurements, it collects CO_2 measurements in the form of spectra (level 1) that are then converted to mid-tropospheric CO_2 values (level 2) given in units of parts per million (ppm). This dataset is in the form given by Bradley et al. (2016) and is freely available under the Creative Commons Attribution 4.0 Australia License. The ZIP file contains three folders, "Small," "Large," and "VeryLarge," the data in these folders are used in a comparison study in Section 4 of Bradley et al. (2016). In each folder, there are two excel csv files, respectively for the training and the validation datasets. In each excel file, the first two columns are the latitude and longitude, respectively. The third column is mid-tropospheric CO_2 in ppm.
m
Dataset for numerical analysis
data.mendeley.com
figshare.com
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shi Chen (2023). Dataset for numerical analysis [Dataset]. http://doi.org/10.17632/crgstcj9cx.1
Explore at:
Unique identifier
https://doi.org/10.17632/crgstcj9cx.1
Dataset updated
Nov 29, 2023
Authors
Shi Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains one Excel sheet and five Word documents. In this dataset, Simulation.xlsx describes the parameter values used for the numerical analysis based on empirical data. In this Excel sheet, we calculated the values of each capped call-option model parameter. Computation of Table 2.docx and other documents show the results of the comparative statistics.
Z
Background music and cognitive task performance: systematic review dataset
data.niaid.nih.gov
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eduardo Coutinho (2023). Background music and cognitive task performance: systematic review dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6301060
Explore at:
Dataset updated
Nov 29, 2023
Dataset provided by
Hoo Keat Wong
Michael Spitzer
Yiting Cheah
Eduardo Coutinho
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the raw data used for a systematic review of the impact of background music on cognitive task performance (Cheah et al., 2022). Our intention is to facilitate future updates to this work. Contents description This repository contains eight Microsoft Excel files, each containing the synthesised data pertaining to each of the six cognitive domains analysed in the review, as well as task difficulty, and population characteristics:

raw-data-attention raw-data-inhibition raw-data-language raw-data-memory raw-data-thinking raw-data-processing-speed raw-data-task-difficulty raw-data--population Files description Tabs organisation The files pertaining to each cognitive domain include individual tabs for each cognitive task analysed (c.f. Figure 2 in the original paper for the list of cognitive tasks). The file with the population characteristics data also contains separate tabs for each characteristic (extraversion, music training, gender, and working memory capacity). Tabs contents In all files and tabs, each row corresponds to the data of a test. The same article can have more than one row if it reports multiple tests. For instance, the study by Cassidy and MacDonald (2007; cf. Memory.xlsx, tab: Memory-all) contains two experiments (immediate and delayed free recall) each with multiple test (immediate free recall: tests 25 – 32; delayed free recall: tests 58 – 61). Each test (one per row), in this experiment, pertains to comparisons between conditions where the background music has different levels of arousal, between groups of participants with different extraversion levels, between different tasks material (words or paragraphs) and different combinations of the previous (e.g., high arousing music vs silence test among extraverts whilst completing an immediate free recall task involving paragraphs; cf. test 30). The columns are organised as follows:

"TESTS": the index of the test in a particular tab (for easy reference); "ID": abbreviation of the cognitive tasks involved in a specific experiment (see glossary for meaning); "REFERENCE": the article where the data was taken from (see main publications for list of articles); "CONDITIONS": an abbreviated description of the music condition of a given test; "MEANS (music)": the average performance across all participants in a given experiment with background music; "MEANS (silence)": the average performance across all participants in a given experiment without background music. Then, in horizontal arrangement, we also include groups of two columns that breakdown specific comparisons related to each test (i.e., all tests comparing the same two types of condition, e.g., L-BgM vs I-BgM, will appear under the same set of columns). For each one, we indicate mean difference between the respective conditions ("MD" column) and the direction of effect ("Standard Metric" column). Each file also contains a "Glossary" tab that explains all the abbreviations used in each document. Bibliography Cheah, Y., Wong, H. K., Spitzer, M., & Coutinho, E. (2022). Background music and cognitive task performance: A systematic review of task, music and population impact. Music & Science, 5(1), 1-38. https://doi.org/10.1177/20592043221134392
f
GHS Safety Fingerprints
figshare.com
xlsx
Updated Oct 25, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Murphy (2018). GHS Safety Fingerprints [Dataset]. http://doi.org/10.6084/m9.figshare.7210019.v3
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7210019.v3
Dataset updated
Oct 25, 2018
Dataset provided by
figshare
Authors
Brian Murphy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Spreadsheets targeted at the analysis of GHS safety fingerprints.AbstractOver a 20-year period, the UN developed the Globally Harmonized System (GHS) to address international variation in chemical safety information standards. By 2014, the GHS became widely accepted internationally and has become the cornerstone of OSHA’s Hazard Communication Standard. Despite this progress, today we observe that there are inconsistent results when different sources apply the GHS to specific chemicals, in terms of the GHS pictograms, hazard statements, precautionary statements, and signal words assigned to those chemicals. In order to assess the magnitude of this problem, this research uses an extension of the “chemical fingerprints” used in 2D chemical structure similarity analysis to GHS classifications. By generating a chemical safety fingerprint, the consistency of the GHS information for specific chemicals can be assessed. The problem is the sources for GHS information can differ. For example, the SDS for sodium hydroxide pellets found on Fisher Scientific’s website displays two pictograms, while the GHS information for sodium hydroxide pellets on Sigma Aldrich’s website has only one pictogram. A chemical information tool, which identifies such discrepancies within a specific chemical inventory, can assist in maintaining the quality of the safety information needed to support safe work in the laboratory. The tools for this analysis will be scaled to the size of a moderate large research lab or small chemistry department as a whole (between 1000 and 3000 chemical entities) so that labelling expectations within these universes can be established as consistently as possible.Most chemists are familiar with programs such as excel and google sheets which are spreadsheet programs that are used by many chemists daily. Though a monadal programming approach with these tools, the analysis of GHS information can be made possible for non-programmers. This monadal approach employs single spreadsheet functions to analyze the data collected rather than long programs, which can be difficult to debug and maintain. Another advantage of this approach is that the single monadal functions can be mixed and matched to meet new goals as information needs about the chemical inventory evolve over time. These monadal functions will be used to converts GHS information into binary strings of data called “bitstrings”. This approach is also used when comparing chemical structures. The binary approach make data analysis more manageable, as GHS information comes in a variety of formats such as pictures or alphanumeric strings which are difficult to compare on their face. Bitstrings generated using the GHS information can be compared using an operator such as the tanimoto coefficent to yield values from 0 for strings that have no similarity to 1 for strings that are the same. Once a particular set of information is analyzed the hope is the same techniques could be extended to more information. For example, if GHS hazard statements are analyzed through a spreadsheet approach the same techniques with minor modifications could be used to tackle more GHS information such as pictograms.Intellectual Merit. This research indicates that the use of the cheminformatic technique of structural fingerprints can be used to create safety fingerprints. Structural fingerprints are binary bit strings that are obtained from the non-numeric entity of 2D structure. This structural fingerprint allows comparison of 2D structure through the use of the tanimoto coefficient. The use of this structural fingerprint can be extended to safety fingerprints, which can be created by converting a non-numeric entity such as GHS information into a binary bit string and comparing data through the use of the tanimoto coefficient.Broader Impact. Extension of this research can be applied to many aspects of GHS information. This research focused on comparing GHS hazard statements, but could be further applied to other bits of GHS information such as pictograms and GHS precautionary statements. Another facet of this research is allowing the chemist who uses the data to be able to compare large dataset using spreadsheet programs such as excel and not need a large programming background. Development of this technique will also benefit the Chemical Health and Safety community and Chemical Information communities by better defining the quality of GHS information available and providing a scalable and transferable tool to manipulate this information to meet a variety of other organizational needs.
Data Package for "A Platform-Agnostic Approach for Automatically Identifying...
zenodo.org
pdf
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yutong Zhao; Yutong Zhao; Lu Xiao; Lu Xiao; Chenhao Wei; Chenhao Wei; Gengwu Zhao; Gengwu Zhao (2024). Data Package for "A Platform-Agnostic Approach for Automatically Identifying Real-Life Performance Issue Reports with Heuristic Linguistic Patterns" [Dataset]. http://doi.org/10.5281/zenodo.10944186
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10944186
Dataset updated
Jul 6, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yutong Zhao; Yutong Zhao; Lu Xiao; Lu Xiao; Chenhao Wei; Chenhao Wei; Gengwu Zhao; Gengwu Zhao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 8, 2024
Description
This Zenodo repository contains the data supporting the findings of the journal paper, titled "A Platform-Agnostic Approach for Automatically Identifying Real-Life Performance Issue Reports with Heuristic Linguistic Patterns", published on IEEE Transactions on Software Engineering, including:

Heuristic Linguistic Pattern Set: we listed the 80 HLP we derived from Apache's JIRA issue tracking system. Column "Category" lists the type of each pattern. Namely, LEX represents lexical pattern, STR represents structural pattern, SEM represents semantic pattern, and PRF represents profiling pattern. Column "Name" is a descriptive name we give to each pattern. Column "Definition" defines the detailed content in each pattern.

Manual Tagging Results: manual_tagging.xlsx spreadsheet comprises both sentence-level and issue-level manually tagging results for three datasets: 'Dataset-1: Apache Jira's Homologous Evaluation', 'Dataset-2: Apache Jira's Heterologous Evaluation', and 'Dataset-3: Other Platform's Evaluation'. The tagging results are segmented into sentence-level tabs ("Dataset-1 Sen", "Dataset-2 Sen", "Dataset-3 Sen") and issue-level tabs ("Dataset-1 Issue", "Dataset-2 Issue", "Dataset-3 Issue").

RQ Findings:
This section contains detailed data findings from six research questions (RQ1 to RQ6).

The RQ1 tab provides an evaluation of our HLP-based approach, showing the precision, recall, and F1-Score of eight classifiers. These results are juxtaposed with the corresponding values from baseline methods, at both sentence and issue levels for automatic tagging.

The RQ2 tab illustrates the precision, recall, and F1-Score of eight classifiers under two training conditions: a balanced training dataset (BT+HLP) and an imbalanced training dataset (UBT+HLP). These outcomes are contrasted with the equivalent values from baseline methods, also trained under balanced (BT+BLM) and imbalanced (UBT+BLM) conditions. The results are shown at both sentence and issue levels for automatic tagging.

The RQ3 tab evaluates the dataset transferability of our HLP-based approach in comparison to baseline methods. It achieves this by analyzing the precision, recall, and F1-Score metrics for eight classifiers under two different "training/testing" dataset conditions, i.e., 'D1/D1' and 'D1/D3'. These conditions allow for a direct comparison of performance when applied to the same dataset ('D1/D1') versus when transferred to a different dataset ('D1/D3'). Additionally, the tab includes an 'Avg Change' and 'p-value' section, summarizing the statistical change in performance metrics between the two dataset conditions.

The RQ4 tab presents a direct comparison between strict and fuzzy HLP matching approaches, assessed through precision, recall, and F1-Score metrics across eight issue classifiers.

The RQ5 tab examines the influence of sentence order on the accuracy of eight classifiers within our approach. It shows the change in precision, recall, and F1-Score when the sentence order feature is taken into consideration versus when it is not.

The RQ6 tab explores the impact of feature selection algorithms on both issue and sentence-level tagging accuracy. This tab presents the average precision, recall, and F1-Score for three experiments: Boruta, Recursive Feature Elimination (RFE), and the usage of all 80 features.

Qualitative Analysis:
This spreadsheet offers a comprehensive examination of the data supporting Section 6.1, which focuses on Qualitative Analysis. It is organized into several tabs, each dedicated to specific research questions (RQs) as outlined below:

Tab "RQ-1" showcases performance issue reports accurately detected by our High-Level Performance (HLP) approach's top model, XGBoost, which were not identified by the benchmark method's leading model, BERT. This highlights the comparative advantage of our approach in identifying nuanced performance issues.

Tab "RQ-2" continues the exploration of performance issue reports, presenting cases with specific details (to be added).

Tab "RQ-3" delves into the unique capabilities of XGBoost, the leading model in our HLP approach, showcasing its ability to detect performance issues missed by the baseline's top model, BERT. This comparison is drawn under distinct conditions: with pre-training (Dataset 1) and without pre-training (Dataset 3), illustrating the robustness and adaptability of our model.

Tab "RQ-4" focuses on performance issue reports uniquely identified through the implementation of Fuzzy HLP Matching within our HLP approach. This method underscores the innovative matching techniques that enhance issue detection.

Tab "RQ-5" presents performance issue reports pinpointed exclusively by applying the Issue HLP Matrix within our approach. This tab demonstrates the effectiveness of our matrix-based analysis in isolating and identifying specific performance concerns.

Tab "RQ-6" is dedicated to performance issue reports uniquely detected by incorporating feature selection techniques into our HLP approach. This illustrates the value of advanced feature selection in improving the precision of performance issue identification.

LLM Experiment Data: presents the tagging outcomes of Large Language Models (LLMs), specifically ChatGPT-3.5 and ChatGPT-4, across three distinct datasets: 'Dataset-1: Apache Jira's Homologous Evaluation', 'Dataset-2: Apache Jira's Heterologous Evaluation', and 'Dataset-3: Evaluation on Other Platforms'. The results are organized into three separate tabs: 'Dataset-1 Issue', 'Dataset-2 Issue', and 'Dataset-3 Issue'.

ChatGPT Operation Python Script: crafted for automating the evaluation and tagging of issue reports in Excel using Large Language Models (LLMs) like ChatGPT-3.5 and ChatGPT-4. It underscores the importance of administrative rights for file modifications and outlines procedures for reading from and writing responses to Excel files. Key functions include querying LLMs with issue descriptions, processing their responses, and updating the spreadsheet with 'Yes' or 'No' labels and explanatory reasons, thereby facilitating an organized review of LLM performance across different datasets.
U
Data Sharing Policies in Social Sciences Academic Journals: Evolving...
dataverse-staging.rdmc.unc.edu
datasearch.gesis.org
Updated Feb 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert O'Reilly; Joel Herndon; Robert O'Reilly; Joel Herndon (2024). Data Sharing Policies in Social Sciences Academic Journals: Evolving Expectations of Data Sharing as a Form of Scholarly Communication [Dataset]. http://doi.org/10.15139/S3/12157
Explore at:
xls(40448), text/x-stata-syntax; charset=us-ascii(11274), text/plain; charset=us-ascii(25382), xls(39936)Available download formats
Unique identifier
https://doi.org/10.15139/S3/12157
Dataset updated
Feb 29, 2024
Dataset provided by
UNC Dataverse
Authors
Robert O'Reilly; Joel Herndon; Robert O'Reilly; Joel Herndon
License
https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/2.2/customlicense?persistentId=doi:10.15139/S3/12157https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/2.2/customlicense?persistentId=doi:10.15139/S3/12157
Time period covered
2003 - 2015
Description
This study consists of data files that code the data availability policies of top-20 academic journals in the fields of Business & Finance, Economics, International Relations, Political Science, and Sociology. Journals that were ranked as top-20 titles based on 2003-vintage ISI Impact Factor scores were coded on their data policies in 2003 and on their data policies in 2015. In addition, journals that were ranked as top-20 titles based on most recent ISI Impact Factor scores were likewise coded on their data polices in 2015. The included Stata .do file imports the contents of each of the Excel files, cleans and labels the data, and produces two tables: one comparing the data policies of 2003-vintage top-20 journals in 2003 those journals' policies in 2015, and one comparing the data policies of 2003-vintage top-20 journals in 2003 to the data policies of current top-20 journals in 2015.
COVID-19 Case Surveillance Public Use Data
data.cdc.gov
opendatalab.com
+5more
application/rdfxml +5
Updated Jul 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDC Data, Analytics and Visualization Task Force (2024). COVID-19 Case Surveillance Public Use Data [Dataset]. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf
Explore at:
application/rdfxml, tsv, csv, json, xml, application/rssxmlAvailable download formats
Dataset updated
Jul 9, 2024
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
CDC Data, Analytics and Visualization Task Force
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.

Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.

This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.

CDC has three COVID-19 case surveillance datasets:
COVID-19 Case Surveillance Public Use Data with Geography: Public use, patient-level dataset with clinical data (including symptoms), demographics, and county and state of residence. (19 data elements)
COVID-19 Case Surveillance Public Use Data: Public use, patient-level dataset with clinical and symptom data and demographics, with no geographic data. (12 data elements)
COVID-19 Case Surveillance Restricted Access Detailed Data: Restricted access, patient-level dataset with clinical and symptom data, demographics, and state and county of residence. Access requires a registration process and a data use agreement. (33 data elements)
The following apply to all three datasets:
Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.
Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers.
Some data cells are suppressed to protect individual privacy.
The datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the current datasets. This 14-day lag allows case reporting to be stabilized and ensures that time-dependent outcome data are accurately captured.
Datasets are updated monthly.
Datasets are created using CDC’s Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy.
For more information about data collection and reporting, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/about-us-cases-deaths.html.
For more information about the COVID-19 case surveillance data, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html

Overview

The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.

For more information: NNDSS Supports the COVID-19 Response | CDC.

The deidentified data in the “COVID-19 Case Surveillance Public Use Data” include demographic characteristics, any exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and presence of any underlying medical conditions and risk behaviors. All data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.

COVID-19 Case Reports

COVID-19 case reports have been routinely submitted using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19 included. Current versions of these case definitions are available here: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/.

All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for laboratory-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. Case reporting using this new form is ongoing among U.S. states and territories.

Data are Considered Provisional

The COVID-19 case surveillance data are dynamic; case reports can be modified at any time by the jurisdictions sharing COVID-19 data with CDC. CDC may update prior cases shared with CDC based on any updated information from jurisdictions. For instance, as new information is gathered about previously reported cases, health departments provide updated data to CDC. As more information and data become available, analyses might find changes in surveillance data and trends during a previously reported time window. Data may also be shared late with CDC due to the volume of COVID-19 cases.
Annual finalized data: To create the final NNDSS data used in the annual tables, CDC works carefully with the reporting jurisdictions to reconcile the data received during the year until each state or territorial epidemiologist confirms that the data from their area are correct.
Access Addressing Gaps in Public Health Reporting of Race and Ethnicity for COVID-19, a report from the Council of State and Territorial Epidemiologists, to better understand the challenges in completing race and ethnicity data for COVID-19 and recommendations for improvement.

Data Limitations

To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.

Data Quality Assurance Procedures

CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:
Questions that have been left unanswered (blank) on the case report form are reclassified to a Missing value, if applicable to the question. For example, in the question “Was the individual hospitalized?” where the possible answer choices include “Yes,” “No,” or “Unknown,” the blank value is recoded to Missing because the case report form did not include a response to the question.
Logic checks are performed for date data. If an illogical date has been provided, CDC reviews the data with the reporting jurisdiction. For example, if a symptom onset date in the future is reported to CDC, this value is set to null until the reporting jurisdiction updates the date appropriately.
Additional data quality processing to recode free text data is ongoing. Data on symptoms, race and ethnicity, and healthcare worker status have been prioritized.

Data Suppression

To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<5) records and indirect identifiers (e.g., date of first positive specimen). Suppression includes rare combinations of demographic characteristics (sex, age group, race/ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.

For questions, please contact Ask SRRG (eocevent394@cdc.gov).

Additional COVID-19 Data

COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These
d
Data from: Data and code from: Identification of a key target for...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data and code from: Identification of a key target for elimination of nitrous oxide, a major greenhouse gas [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-identification-of-a-key-target-for-elimination-of-nitrous-oxide-a-major-c072f
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description
Note: Data files will be made available upon manuscript publication This dataset contains all code and data needed to reproduce the analyses in the manuscript: IDENTIFICATION OF A KEY TARGET FOR ELIMINATION OF NITROUS OXIDE, A MAJOR GREENHOUSE GAS. Blake A. Oakley (1), Trevor Mitchell (2), Quentin D. Read (3), Garrett Hibbs (1), Scott E. Gold (2), Anthony E. Glenn (2) Department of Plant Pathology, University of Georgia, Athens, GA, USA. Toxicology and Mycotoxin Research Unit, U.S. National Poultry Research Center, United States Department of Agriculture-Agricultural Research Service, Athens, GA, USA Southeast Area, United States Department of Agriculture-Agricultural Research Service, Raleigh, NC, USA citation will be updated upon acceptance of manuscript Brief description of study aims Denitrification is a chemical process that releases nitrous oxide (N2O), a potent greenhouse gas. The NOR1 gene is part of the denitrification pathway in Fusarium. Three experiments were conducted for this study. (1) The N2O comparative experiment compares denitrification rates, as measured by N2O production, of a variety of Fusarium spp. strains with and without the NOR1 gene. (2) The N2O substrate experiment compares denitrification rates of selected strains on different growth media (substrates). For parts 1 and 2, linear models are fit comparing N2O production between strains and/or substrates. (3) The Bioscreen growth assay tests whether there is a pleiotropic effect of the NOR1 gene. In this portion of the analysis, growth curves are fit to assess differences in growth rate and carrying capacity between selected strains with and without the NOR1 gene. Code All code is included in a .zip archive generated from a private git repository on 2022-10-13 and archived as part of this dataset. The code is contained in R scripts and RMarkdown notebooks. There are two components to the analysis: the denitrification analysis (comprising parts 1 and 2 described above) and the Bioscreen growth analysis (part 3). The scripts for each are listed and described below. Analysis of results of denitrification experiments (parts 1 and 2) NOR1_denitrification_analysis.Rmd: The R code to analyze the experimental data comparing nitrous oxide emissions is all contained in a single RMarkdown notebook. This script analyzes the results from the comparative study and the substrate study. n2o_subgroup_figures.R: R script to create additional figures using the output from the RMarkdown notebook Analysis of results of Bioscreen growth assay (part 3) bioscreen_analysis.Rmd: This RMarkdown notebook contains all R code needed to analyze the results of the Bioscreen assay comparing growth of the different strains. It could be run as is. However, the model-fitting portion was run on a high-performance computing cluster with the following scripts: bioscreen_fit_simpler.R: R script containing only the model-fitting portion of the Bioscreen analysis, fit using the Stan modeling language interfaced with R through the brms and cmdstanr packages. job_bssimple.sh: Job submission shell script used to submit the model-fitting R job to be run on USDA SciNet high-performance computing cluster. Additional scripts developed as part of the analysis but that are not required to reproduce the analyses in the manuscript are in the deprecated/ folder. Also note the files nor1-denitrification.Rproj (RStudio project file) and gtstyle.css (stylesheet for formatting the tables in the notebooks) are included. Data Data required to run the analysis scripts are archived in this dataset, other than strain_lookup.csv, a lookup table of strain abbreviations and full names included in the code repository for convenience. They should be placed in a folder or symbolic link called project within the unzipped code repository directory. N2O_data_2022-08-03/N2O_Comparative_Study_Trial_(n)(date range).xlsx: These are the data from the N2O comparative study, where n is the trial number from 1-3 and date range is the begin and end date of the trial. N2O_data_2022-08-03/Nitrogen_Substrate_Study_Trial(n)(date range).xlsx: These are the data from the N2O substrate study, where n is the trial number from 1-3 and date range is the begin and end date of the trial. Outliers_NOR1_2022/Bioscreen_NOR1_Fungal_Growth_Assay(substrate)(oxygen level)_Outliers_BAO(date).xlsx: These are the raw Bioscreen data files in MS Excel format. The format of each file name includes the substrate (minimal medium with nitrite or nitrate and lysine), oxygen level (hypoxia or normoxia), and date of the run. This repository includes code to process these files, but the processed data are also included on Ag Data Commons, so it is not necessary to run the data processing portion of the code. clean_data/bioscreen_clean_data.csv: This is an intermediate output file in CSV format generated by bioscreen_analysis.Rmd. It includes all the data from the Bioscreen assays in a clean analysis-ready format.
m
Datasets on Flow State Evaluation, USE Questionnaire, and Motion-Tracking...
data.mendeley.com
research.mondragon.edu
+1more
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oscar Escallada (2025). Datasets on Flow State Evaluation, USE Questionnaire, and Motion-Tracking Glove Integration in SELFEX: An AR and Screen-Guided Training Solution [Dataset]. http://doi.org/10.17632/tvcxfhxpnz.2
Explore at:
Unique identifier
https://doi.org/10.17632/tvcxfhxpnz.2
Dataset updated
Apr 2, 2025
Authors
Oscar Escallada
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database contains the results from questionnaires gathered during user testing of the SELFEX solution, a training system utilizing motion-tracking gloves, augmented reality (AR), and screen-based interfaces. Participants were asked to complete paper- and tablet-based questionnaires after interacting with both AR and screen-guided training environments. The data provided allows for a comparative analysis between the two training methods (AR vs. screen) and assesses the suitability of the MAGOS hand-tracking gloves for this application. Additionally, it facilitates the exploration of correlations between various user experience factors, such as ease of use, usefulness, satisfaction, and ease of learning.

The folder is divided into two types of files:
- PDF files: These contain the three questionnaires administered during testing.
- "dataset.xlsx": This file includes the questionnaire results.

Within the Excel file, the data is organized across three sheets:
- "Results with AR glasses": Displays data from the experiment conducted using Hololens 2 AR glasses. Participants are anonymized and coded by gender (e.g., M01 for the first male participant).
- "Results without AR glasses": Shows data from the experiment conducted with five participants using a TV screen instead of Hololens 2 to follow the assembly training instructions.
- "Demographic data": Contains demographic information related to the participants.

This dataset enables comprehensive evaluation and comparison of the training methods and user experiences.
Weekly United States COVID-19 Cases and Deaths by State - ARCHIVED
data.cdc.gov
data.virginia.gov
+1more
application/rdfxml +5
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDC COVID-19 Response (2023). Weekly United States COVID-19 Cases and Deaths by State - ARCHIVED [Dataset]. https://data.cdc.gov/Case-Surveillance/Weekly-United-States-COVID-19-Cases-and-Deaths-by-/pwn4-m3yp
Explore at:
csv, application/rdfxml, xml, tsv, json, application/rssxmlAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
CDC COVID-19 Response
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Area covered
United States
Description
Reporting of new Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. This dataset will receive a final update on June 1, 2023, to reconcile historical data through May 10, 2023, and will remain publicly available.

Aggregate Data Collection Process Since the start of the COVID-19 pandemic, data have been gathered through a robust process with the following steps:
A CDC data team reviews and validates the information obtained from jurisdictions’ state and local websites via an overnight data review process.

If more than one official county data source exists, CDC uses a comprehensive data selection process comparing each official county data source, and takes the highest case and death counts respectively, unless otherwise specified by the state.

CDC compiles these data and posts the finalized information on COVID Data Tracker.

County level data is aggregated to obtain state and territory specific totals.

This process is collaborative, with CDC and jurisdictions working together to ensure the accuracy of COVID-19 case and death numbers. County counts provide the most up-to-date numbers on cases and deaths by report date. CDC may retrospectively update counts to correct data quality issues.

Methodology Changes Several differences exist between the current, weekly-updated dataset and the archived version:
Source: The current Weekly-Updated Version is based on county-level aggregate count data, while the Archived Version is based on State-level aggregate count data.

Confirmed/Probable Cases/Death breakdown:  While the probable cases and deaths are included in the total case and total death counts in both versions (if applicable), they were reported separately from the confirmed cases and deaths by jurisdiction in the Archived Version.  In the current Weekly-Updated Version, the counts by jurisdiction are not reported by confirmed or probable status (See Confirmed and Probable Counts section for more detail).

Time Series Frequency: The current Weekly-Updated Version contains weekly time series data (i.e., one record per week per jurisdiction), while the Archived Version contains daily time series data (i.e., one record per day per jurisdiction).

Update Frequency: The current Weekly-Updated Version is updated weekly, while the Archived Version was updated twice daily up to October 20, 2022.
Important note: The counts reflected during a given time period in this dataset may not match the counts reflected for the same time period in the archived dataset noted above. Discrepancies may exist due to differences between county and state COVID-19 case surveillance and reconciliation efforts.

Confirmed and Probable Counts In this dataset, counts by jurisdiction are not displayed by confirmed or probable status. Instead, confirmed and probable cases and deaths are included in the Total Cases and Total Deaths columns, when available. Not all jurisdictions report probable cases and deaths to CDC.* Confirmed and probable case definition criteria are described here:

Council of State and Territorial Epidemiologists (ymaws.com).

Deaths CDC reports death data on other sections of the website: CDC COVID Data Tracker: Home, CDC COVID Data Tracker: Cases, Deaths, and Testing, and NCHS Provisional Death Counts. Information presented on the COVID Data Tracker pages is based on the same source (total case counts) as the present dataset; however, NCHS Death Counts are based on death certificates that use information reported by physicians, medical examiners, or coroners in the cause-of-death section of each certificate. Data from each of these pages are considered provisional (not complete and pending verification) and are therefore subject to change. Counts from previous weeks are continually revised as more records are received and processed.

Number of Jurisdictions Reporting There are currently 60 public health jurisdictions reporting cases of COVID-19. This includes the 50 states, the District of Columbia, New York City, the U.S. territories of American Samoa, Guam, the Commonwealth of the Northern Mariana Islands, Puerto Rico, and the U.S Virgin Islands as well as three independent countries in compacts of free association with the United States, Federated States of Micronesia, Republic of the Marshall Islands, and Republic of Palau. New York State’s reported case and death counts do not include New York City’s counts as they separately report nationally notifiable conditions to CDC.

CDC COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths, available by state and by county. These and other data on COVID-19 are available from multiple public locations, such as:

https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html

https://www.cdc.gov/covid-data-tracker/index.html

https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html

https://www.cdc.gov/coronavirus/2019-ncov/php/open-america/surveillance-data-analytics.html

Additional COVID-19 public use datasets, include line-level (patient-level) data, are available at: https://data.cdc.gov/browse?tags=covid-19.

Archived Data Notes:

November 3, 2022: Due to a reporting cadence issue, case rates for Missouri counties are calculated based on 11 days’ worth of case count data in the Weekly United States COVID-19 Cases and Deaths by State data released on November 3, 2022, instead of the customary 7 days’ worth of data.

November 10, 2022: Due to a reporting cadence change, case rates for Alabama counties are calculated based on 13 days’ worth of case count data in the Weekly United States COVID-19 Cases and Deaths by State data released on November 10, 2022, instead of the customary 7 days’ worth of data.

November 10, 2022: Per the request of the jurisdiction, cases and deaths among non-residents have been removed from all Hawaii county totals throughout the entire time series. Cumulative case and death counts reported by CDC will no longer match Hawaii’s COVID-19 Dashboard, which still includes non-resident cases and deaths. 

November 17, 2022: Two new columns, weekly historic cases and weekly historic deaths, were added to this dataset on November 17, 2022. These columns reflect case and death counts that were reported that week but were historical in nature and not reflective of the current burden within the jurisdiction. These historical cases and deaths are not included in the new weekly case and new weekly death columns; however, they are reflected in the cumulative totals provided for each jurisdiction. These data are used to account for artificial increases in case and death totals due to batched reporting of historical data.

December 1, 2022: Due to cadence changes over the Thanksgiving holiday, case rates for all Ohio counties are reported as 0 in the data released on December 1, 2022.

January 5, 2023: Due to North Carolina’s holiday reporting cadence, aggregate case and death data will contain 14 days’ worth of data instead of the customary 7 days. As a result, case and death metrics will appear higher than expected in the January 5, 2023, weekly release.

January 12, 2023: Due to data processing delays, Mississippi’s aggregate case and death data will be reported as 0. As a result, case and death metrics will appear lower than expected in the January 12, 2023, weekly release.

January 19, 2023: Due to a reporting cadence issue, Mississippi’s aggregate case and death data will be calculated based on 14 days’ worth of data instead of the customary 7 days in the January 19, 2023, weekly release.

January 26, 2023: Due to a reporting backlog of historic COVID-19 cases, case rates for two Michigan counties (Livingston and Washtenaw) were higher than expected in the January 19, 2023 weekly release.

January 26, 2023: Due to a backlog of historic COVID-19 cases being reported this week, aggregate case and death counts in Charlotte County and Sarasota County, Florida, will appear higher than expected in the January 26, 2023 weekly release.

January 26, 2023: Due to data processing delays, Mississippi’s aggregate case and death data will be reported as 0 in the weekly release posted on January 26, 2023.

February 2, 2023: As of the data collection deadline, CDC observed an abnormally large increase in aggregate COVID-19 cases and deaths reported for Washington State. In response, totals for new cases and new deaths released on February 2, 2023, have been displayed as zero at the state level until the issue is addressed with state officials. CDC is working with state officials to address the issue.

February 2, 2023: Due to a decrease reported in cumulative case counts by Wyoming, case rates will be reported as 0 in the February 2, 2023, weekly release. CDC is working with state officials to verify the data submitted.

February 16, 2023: Due to data processing delays, Utah’s aggregate case and death data will be reported as 0 in the weekly release posted on February 16, 2023. As a result, case and death metrics will appear lower than expected and should be interpreted with caution.

February 16, 2023: Due to a reporting cadence change, Maine’s
e
Recruitment into illegal and paramilitary organisations, 2006-2010 - Dataset...
b2find.eudat.eu
Updated May 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Recruitment into illegal and paramilitary organisations, 2006-2010 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/80af4291-1272-5f5e-ba17-d4179ee515c2
Explore at:
Dataset updated
May 23, 2020
Description
This comparative research has examined two political insurgency cases: paramilitary groups in the Northern Ireland conflict and the Red Brigades in Italy; and two organised crime groups in Italy. By comparing illegal violent political groups and criminals this project has first shown the many variables affecting recruitment under the primary conditions dictated by violence and illegality when the size, the military strategy and the goal of the groups are substantially different. Second this research shows how despite those many differences common patterns of how organisations screen their recruits and how potential members signal their “fitness” to join can be identified. Primary and secondary data have been gathered from in-depth qualitative interviews, judicial reports, newspaper articles and published biographies. This dataset contains only the data for Bari, Italy with 740 male records and 161 female records. The aim of this project was to analyse recruitment into illegal organisations that use violence for political or criminal goals. While these groups differ in their aims, structures and constraints — and these differences must of course affect the way they recruit — they all share the need to find trustworthy, loyal and competent members. Moreover, the crude reality of life in the underworld, where there is little evidence of "loyalty amongst thieves" makes the need for a selection process more immediate and therefore clearer to observe. This project investigated recruitment from both `supply' and 'demand' perspective by addressing two core questions: what are the features of those who volunteer to join an illegal organisation and how recruiters and volunteers assess each other's trustworthiness This comparative research has examined two political insurgency cases: paramilitary groups in the Northern Ireland conflict and the Red Brigades in Italy; and two organised crime groups in Italy. By comparing illegal violent political groups and criminals this project has first shown the many variables affecting recruitment under the primary conditions dictated by violence and illegality when the size, the military strategy and the goal of the groups are substantially different. Second this research shows how despite those many differences common patterns of how organisations screen their recruits and how potential members signal their "fitness" to join can be identified. Four case studies were selected: (1) Catholic and Protestant paramilitary organizations in Northern Ireland: the Irish Republican Army (IRA), the Ulster Volunteer Force (UVF) and the Ulster Defence Army (UDA); (2) Italian Red Brigades; (3) Sicilian mafia (Cosa Nostra); (4) Organized crime groups in Apulia, southern Italy. These groups vary in their aims, structures, size and local constraints but they all share the need to find trustworthy, loyal and competent members under the key conditions of illegality, asymmetrical information, the use of violence, varying risks of infiltration and the high cost of error. Furthermore these groups have outlasted their rivals suggesting that they have relatively good solutions to the recruitment problems they face. They also share key similarities regarding their memberships: they maintain a key distinction between members and non-members; there is a formal initiation or ritual entry apart from the Red Brigades where membership was marked by the disclosure of the identity of other underground members; members have exclusive knowledge about the organisation; members are subject to a code of behaviour and they receive the protection of the organisation. Due to a lack of existing data this research has requited the large-scale collection, coding and analysis of primary and secondary data gathered from in-depth qualitative interviews, judicial reports, newspaper articles and published biographies. The following data have been collected: Northern Ireland: 40 qualitative interviews with IRA, UVF and UDA members; 10 interviews with the police and intelligence services; also data from biographies, newspaper and judicial reports. (150 individual cases). Red Brigades: Court papers of 17 RB trials (1980-1984); 32 qualitative interviews of former violent radical left wing militants from the Istituto Carlo Cattaneo DOTE archive in Bologna; 7 interviews released from secondary sources; 10 biographies of former RB members; additional data from biographies, newspaper and judicial reports. (470 individual cases). Organised Crime Groups: 13 qualitative interviews with social workers, judiciary and law and order officials; 51 state witness statements of former members of Cosa Nostra; 16 state witness statement of former members of an organized crime group in Apulia; Court papers of 25 major organized crime trials in Palermo and Bari 1984-2006; DIA annual reports 1998-2009; additional data from newspaper reports and the Commissione Parlamentare sul Fenomeno della mafia. (1738 individual cases) A total number of 3056 cases have been coded and entered into SPSS and Excel data sets to be deposited with ESDS. The qualitative interviews and digitalized judicial papers have been coded and analysed with the support of MAXQDA. Ethical Issues Interviews were anonymised and data held securely, the purpose of the research was made transparent to all interviewees and verbal rather than written consent was sought lest their association with the research questions could have left them vulnerable to the police or other security agencies. The British Sociological Association and University of Oxford ethical guidelines were conformed to throughout.
Data from: Comparing Machine Learning Classifiers and Linear/Logistic...
zenodo.org
eprints.soton.ac.uk
+1more
bin, png, zip
Updated Aug 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oscar Miguel-Hurtado; Righard Guest; Sarah V. Stevenage; Greg J. Neil; Sue Black; Oscar Miguel-Hurtado; Righard Guest; Sarah V. Stevenage; Greg J. Neil; Sue Black (2024). Comparing Machine Learning Classifiers and Linear/Logistic Regression to Explore the Relationship between Hand Dimensions and Demographic Characteristics [Dataset]. http://doi.org/10.5281/zenodo.17487
Explore at:
zip, bin, pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17487
Dataset updated
Aug 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Oscar Miguel-Hurtado; Righard Guest; Sarah V. Stevenage; Greg J. Neil; Sue Black; Oscar Miguel-Hurtado; Righard Guest; Sarah V. Stevenage; Greg J. Neil; Sue Black
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
-----------------------------------------------------------------------------------------------------------------

Data for "Comparing Machine Learning Classifiers and Linear/Logistic Regression to Explore the Relationship between Hand Dimensions and Demographic Characteristics" (PLOSONE)

Oscar Miguel-Hurtado¹, Richard Guest¹, Sarah V. Stevenage²,Greg J. Neil²,Sue Black³

¹ School of Engineering and Digital Arts, University of Kent, Canterbury, UK

² Department of Psychology, University of Southampton, Southampton, UK

³ Centre for Anatomy and Human Identification, University of Dundee, Dundee, UK

-----------------------------------------------------------------------------------------------------------------

For more information please contact: O.Miguel-Hurtado-98@kent.ac.uk (Oscar Miguel)

-----------------------------------------------------------------------------------------------------------------

The zip contains right and left hand geometry images from 112 participants. The images were captured using a Nikon D200 SLR camera (format: jpg, size: 3504x2336 pixels), with both the palm of the hand and camera facing downwards. Participants placed each hand on an acetate sheet with a series of positioning pegs.

-----------------------------------------------------------------------------------------------------------------

The excel contains a series of length measurements (based on the underlying skeleton of the hand) manually extracted (see Figure 1 for details) along with demographic information from the participants: sex (male or female), height (in cm), weight (in kg) and foot size (in UK sizes).
d
Replication Data for: Using force to protect civilians
search.dataone.org
dataverse.no
+1more
Updated Jan 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kjeksrud, Stian (2024). Replication Data for: Using force to protect civilians [Dataset]. http://doi.org/10.18710/FZAVCN
Explore at:
Unique identifier
https://doi.org/10.18710/FZAVCN
Dataset updated
Jan 5, 2024
Dataset provided by
DataverseNO
Authors
Kjeksrud, Stian
Time period covered
Jan 1, 1999 - Jan 1, 2017
Description
The dataset deposited here underpins a PhD thesis sucessfully defended by Stian Kjeksrud on June 6, 2019, at the University of Oslo, Faculty of Social Sciences, Department of Political Science. This dataset consists of two data files, each in its original excel-file format as well as in a Unicode-text format: 1. UN_POC_Operations_UNPOCO_2019-01-25 This file (United Nations Protection of Civilians Operations (UNPOCO)) captures and codes the core empirical characteristics of 200 UN military operations to protect civilians from violence in African conflicts between 1999 and 2017. 2. UN_POC_Operations_UNPOCO_fsQCA_2019-01-25 This file (UNPOCO fsQCA) builds directly on the UNPOCO dataset, but consists of a sub-set of 126 cases tailored to fuzzy set Qualitative Comparative Analysis (fsQCA), and therefore includes a QCA-matrix and some additional information for each case. Both data files are built by Stian Kjeksrud to support the analysis of variations in outcomes of operations and to explore success factors of UN military protection operations across time and UN missions. The data are captured from the United Nations Secretary-General’s openly available reporting to the United Nations Security Council.
e
Replication Data for: better-than-average bias and interpretation bias in...
b2find.eudat.eu
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Replication Data for: better-than-average bias and interpretation bias in acts and accusations of mansplaining - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/3064dee4-999c-5292-98d0-d5aac39fa5a4
Explore at:
Dataset updated
Jan 23, 2025
Description
The data is the output of two studies. In study 1 we tested the hypotheses that: Men are more likely to display the better-than-average effect than women, especially when comparing themselves to women. Men who score high on the better-than-average effect also have a bigger tendency to explain things, especially when speaking with women. In study 2 we tested the hypothesis that: Women are more prone to the interpretation bias than men, especially when speaking with men. Data files Cleaned excel file with output of the better-than-average survey Cleaned excel file with output of the better-than-average survey 2 Cleaned excel file with output of the interpretation bias survey Supplemental material Better-than-average questionnaire 1 Better-than-average questionnaire 2 Interpretation Bias questionnaire Method: Qualtrics survey Universe: General population Country / Nation: the Netherlands

Facebook

Twitter

Click to copy link

Link copied

Cite

Yaacov Petscher (2023). Statistical Comparison of Two ROC Curves [Dataset]. http://doi.org/10.6084/m9.figshare.860448.v1

Statistical Comparison of Two ROC Curves

Explore at:

11 scholarly articles cite this dataset (View in Google Scholar)

xlsAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.860448.v1

Dataset updated

Jun 3, 2023

Dataset provided by

Figsharehttp://figshare.com/

Authors

Yaacov Petscher

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.

Clear search

Close search

Google apps

Main menu

Statistical Comparison of Two ROC Curves

UC_vs_US Statistic Analysis.xlsx

Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...

Data for: Sustainable connectivity in a community repository

A vigiPoint characterisation of female versus male reports in VigiBase, the...

Data from: Supplementary tables:MetaFetcheR: An R package for complete...

AIRS level-2 CO_2 dataset (1–9 February 2010)

Dataset for numerical analysis

Background music and cognitive task performance: systematic review dataset

GHS Safety Fingerprints

Data Package for "A Platform-Agnostic Approach for Automatically Identifying...

Data Sharing Policies in Social Sciences Academic Journals: Evolving...

COVID-19 Case Surveillance Public Use Data

CDC has three COVID-19 case surveillance datasets:

Overview

COVID-19 Case Reports

Data are Considered Provisional

Data Limitations

Data Quality Assurance Procedures

Data Suppression

Additional COVID-19 Data

Data from: Data and code from: Identification of a key target for...

Datasets on Flow State Evaluation, USE Questionnaire, and Motion-Tracking...

Weekly United States COVID-19 Cases and Deaths by State - ARCHIVED

Recruitment into illegal and paramilitary organisations, 2006-2010 - Dataset...

Data from: Comparing Machine Learning Classifiers and Linear/Logistic...

Replication Data for: Using force to protect civilians

Replication Data for: better-than-average bias and interpretation bias in...

Statistical Comparison of Two ROC Curves