78 datasets found

Statistical Comparison of Two ROC Curves
figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yaacov Petscher (2023). Statistical Comparison of Two ROC Curves [Dataset]. http://doi.org/10.6084/m9.figshare.860448.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.860448.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Yaacov Petscher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.
H
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...
dataverse.harvard.edu
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgios Boumis; Brad Peter (2024). Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends [Dataset]. http://doi.org/10.7910/DVN/ZZDYM9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ZZDYM9
Dataset updated
Jul 8, 2024
Dataset provided by
Harvard Dataverse
Authors
Georgios Boumis; Brad Peter
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...
f
UC_vs_US Statistic Analysis.xlsx
figshare.com
xlsx
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.12631628.v1
Dataset updated
Jul 9, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme: Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
Data and program: Comparison between Machine Learning Models and...
zenodo.org
zip
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jinxu Li; Xiang Song; Jiangjiang Xia; Wei Shangguan; Xiaodong Zeng; Jinxu Li; Xiang Song; Jiangjiang Xia; Wei Shangguan; Xiaodong Zeng (2025). Data and program: Comparison between Machine Learning Models and Conventional Statistical Models in Predicting Global Tree Canopy Height and Crown Radius [Dataset]. http://doi.org/10.5281/zenodo.15951974
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15951974
Dataset updated
Jul 16, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jinxu Li; Xiang Song; Jiangjiang Xia; Wei Shangguan; Xiaodong Zeng; Jinxu Li; Xiang Song; Jiangjiang Xia; Wei Shangguan; Xiaodong Zeng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The attachment includes three folders:
The first folder, Data classification (testing and training), consists of two folders (crown_radius and height), the first crown_radius folder It contains excel data of three plant functional types (PFTs) - temperate needleleaf trees (MN), temperate broadleaf trees (MB) and tropical broadleaf trees (TB), these three excel data all contain 19 soil factors data, 22 climate factors data and information such as crown_radius_m, mask, stem_diameter_cm, etc. The information in the second height folder is similar, and it corresponds to Table 1.Data summary and Figure 3 for each PFT in the article;

The second folder, Feather importance, contains two excel spreadsheets (crown_radius-FI and height-FI), the first excel spreadsheet of crown_radius-FI Feather importance containing three plant functional types (PFTs) is temperate needleleaf trees (MN), temperate broadleaf trees (MB), and tropical broadleaf trees (TB); The excel table information of the second height-FI is similar, and its information corresponds to Figure 5 and Figure S3 in the article;

The third folder "program" contains two packages (make_model1 and make_model2) and a calling program "Source program". Among them, the make_model1 package is mainly used to obtain the best parameters for selecting the model; The make_model2 package is based on the selection of the make_model1 package to further analyze the specific FI values of the factors in the best model. The Source program is used to make specific calls to the package according to the requirements.
Input-Output Data Sets Used in the Evaluation of the Two-Layer Soil Moisture...
catalog.data.gov
s.cnmilf.com
Updated Mar 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2023). Input-Output Data Sets Used in the Evaluation of the Two-Layer Soil Moisture and Flux Model [Dataset]. https://catalog.data.gov/dataset/input-output-data-sets-used-in-the-evaluation-of-the-two-layer-soil-moisture-and-flux-mode
Explore at:
Dataset updated
Mar 3, 2023
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The Excel file contains the model input-out data sets that where used to evaluate the two-layer soil moisture and flux dynamics model. The model is original and was developed by Dr. Hantush by integrating the well-known Richards equation over the root layer and the lower vadose zone. The input-output data are used for: 1) the numerical scheme verification by comparison against HYDRUS model as a benchmark; 2) model validation by comparison against real site data; and 3) for the estimation of model predictive uncertainty and sources of modeling errors. This dataset is associated with the following publication: He, J., M.M. Hantush, L. Kalin, and S. Isik. Two-Layer numerical model of soil moisture dynamics: Model assessment and Bayesian uncertainty estimation. JOURNAL OF HYDROLOGY. Elsevier Science Ltd, New York, NY, USA, 613 part A: 128327, (2022).
Data-analysis-EXCEL-POWER-BI
kaggle.com
Updated Jul 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Samir (2023). Data-analysis-EXCEL-POWER-BI [Dataset]. https://www.kaggle.com/datasets/ahmedsamir11111/data-analysis-excel-power-bi/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 27, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ahmed Samir
Description
In the beginning, the case was just data for a company that did not indicate any useful information that would help decision-makers. In this case, after collecting a number of revenues and expenses over the months. Needed to know the answers to a number of questions to make important decisions based on intuition-free data. The Questions:- About Rev. & Exp.
- What is the total sales and profit for the whole period? And What Total products sold? And What is Net profit? - In which month was the highest percentage of revenue achieved? And in the same month, what is the largest day have amount of revenue? - In which month was the highest percentage of expenses achieved? And in the same month, what is the largest day have amount of exp.? - What is the extent of the change in expenditures for each month? Percentage change in net profit over the months? About Distribution - What is the number of products sold each month in the largest state? -The top 3 largest states buying products during the two years? Comparison - Between Sales Method by Sales? - Between Men and Women’s Product by Sales? - Between Retailer by Profit?

What I did? - Understanding the data - preprocessing and clean the data - Solve The problems in the cleaning like missing data or false type data - querying the data and make some calculations like "COGS" with power query "Excel". - Modeling and make some measures on the data with power pivot "Excel" - After finishing processing and preparation, I made Some Pivot tables to answers the questions. - Last, I made a dashboard with Power BI to visualize The Results.
Replication Package - How Do Requirements Evolve During Elicitation? An...
zenodo.org
bin, zip
Updated Apr 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessio Ferrari; Alessio Ferrari; Paola Spoletini; Paola Spoletini; Sourav Debnath; Sourav Debnath (2022). Replication Package - How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis [Dataset]. http://doi.org/10.5281/zenodo.6472498
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6472498
Dataset updated
Apr 21, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alessio Ferrari; Alessio Ferrari; Paola Spoletini; Paola Spoletini; Sourav Debnath; Sourav Debnath
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the replication package for the paper titled "How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis", by Alessio Ferrari, Paola Spoletini and Sourav Debnath.

The package contains the following folders and files.

/R-analysis

This is a folder containing all the R implementations of the the statistical tests included in the paper, together with the source .csv file used to produce the results. Each R file has the same title as the associated .csv file. The titles of the files reflect the RQs as they appear in the paper. The association between R files and Tables in the paper is as follows:

- RQ1-1-analyse-story-rates.R: Tabe 1, user story rates

- RQ1-1-analyse-role-rates.R: Table 1, role rates

- RQ1-2-analyse-story-category-phase-1.R: Table 3, user story category rates in phase 1 compared to original rates

- RQ1-2-analyse-role-category-phase-1.R: Table 5, role category rates in phase 1 compared to original rates

- RQ2.1-analysis-app-store-rates-phase-2.R: Table 8, user story and role rates in phase 2

- RQ2.2-analysis-percent-three-CAT-groups-ph1-ph2.R: Table 9, comparison of the categories of user stories in phase 1 and 2

- RQ2.2-analysis-percent-two-CAT-roles-ph1-ph2.R: Table 10, comparison of the categories of roles in phase 1 and 2.

The .csv files used for statistical tests are also used to produce boxplots. The association betwee boxplot figures and files is as follows.

- RQ1-1-story-rates.csv: Figure 4

- RQ1-1-role-rates.csv: Figure 5

- RQ1-2-categories-phase-1.csv: Figure 8

- RQ1-2-role-category-phase-1.csv: Figure 9

- RQ2-1-user-story-and-roles-phase-2.csv: Figure 13

- RQ2.2-percent-three-CAT-groups-ph1-ph2.csv: Figure 14

- RQ2.2-percent-two-CAT-roles-ph1-ph2.csv: Figure 17

- IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv: Figure 15

- IMG-only-RQ2.2-frequent-roles.csv: Figure 18

NOTE: The last two .csv files do not have an associated statistical tests, but are used solely to produce boxplots.

/Data-Analysis

This folder contains all the data used to answer the research questions.

RQ1.xlsx: includes all the data associated to RQ1 subquestions, two tabs for each subquestion (one for user stories and one for roles). The names of the tabs are self-explanatory of their content.

RQ2.1.xlsx: includes all the data for the RQ1.1 subquestion. Specifically, it includes the following tabs:

- Data Source-US-category: for each category of user story, and for each analyst, there are two lines.

The first one reports the number of user stories in that category for phase 1, and the second one reports the

number of user stories in that category for phase 2, considering the specific analyst.

- Data Source-role: for each category of role, and for each analyst, there are two lines.

The first one reports the number of user stories in that role for phase 1, and the second one reports the

number of user stories in that role for phase 2, considering the specific analyst.

- RQ2.1 rates: reports the final rates for RQ2.1.

NOTE: The other tabs are used to support the computation of the final rates.

RQ2.2.xlsx: includes all the data for the RQ2.2 subquestion. Specifically, it includes the following tabs:

- Data Source-US-category: same as RQ2.1.xlsx

- Data Source-role: same as RQ2.1.xlsx

- RQ2.2-category-group: comparison between groups of categories in the different phases, used to produce Figure 14

- RQ2.2-role-group: comparison between role groups in the different phases, used to produce Figure 17

- RQ2.2-specific-roles-diff: difference between specific roles, used to produce Figure 18

NOTE: the other tabs are used to support the computation of the values reported in the tabs above.

RQ2.2-single-US-category.xlsx: includes the data for the RQ2.2 subquestion associated to single categories of user stories.

A separate tab is used given the complexity of the computations.

- Data Source-US-category: same as RQ2.1.xlsx

- Totals: total number of user stories for each analyst in phase 1 and phase 2

- Results-Rate-Comparison: difference between rates of user stories in phase 1 and phase 2, used to produce the file

"img/IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv", which is in turn used to produce Figure 15

- Results-Analysts: number of analysts using each novel category produced in phase 2, used to produce Figure 16.

NOTE: the other tabs are used to support the computation of the values reported in the tabs above.

RQ2.3.xlsx: includes the data for the RQ2.3 subquestion. Specifically, it includes the following tabs:

- Data Source-US-category: same as RQ2.1.xlsx

- Data Source-role: same as RQ2.1.xlsx

- RQ2.3-categories: novel categories produced in phase 2, used to produce Figure 19

- RQ2-3-most-frequent-categories: most frequent novel categories

/Raw-Data-Phase-I

The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx, plus the file of the original user stories with annotations (original-us.xlsx). Each file contains two tabs:

- Evaluation: includes the annotation of the user stories as existing user story in the original categories (annotated with "E"), novel user story in a certain category (refinement, annotated with "N"), and novel user story in novel category (Name of the category in column "New Feature"). **NOTE 1:** It should be noticed that in the paper the case "refinement" is said to be annotated with "R" (instead of "N", as in the files) to make the paper clearer and easy to read.

- Roles: roles used in the user stories, and count of the user stories belonging to a certain role.

/Raw-Data-Phaes-II

The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx. Each file contains two tabs:

- Analysis: includes the annotation of the user stories as belonging to existing original

category (X), or to categories introduced after interviews, or to categories introduced

after app store inspired elicitation (name of category in "Cat. Created in PH1"), or to

entirely novel categories (name of category in "New Category").

- Roles: roles used in the user stories, and count of the user stories belonging to a certain role.

/Figures

This folder includes the figures reported in the paper. The boxplots are generated from the

data using the tool http://shiny.chemgrid.org/boxplotr/. The histograms and other plots are

produced with Excel, and are also reported in the excel files listed above.
d
Oldrieve Excel File Datebase for Computation 3018487 -- Teaching K-3...
search.dataone.org
dataverse.harvard.edu
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oldrieve, Richard (2024). Oldrieve Excel File Datebase for Computation 3018487 -- Teaching K-3 Multi-Digit Arithmetic Computation to Students with Slow Language Processing [Dataset]. http://doi.org/10.7910/DVN/PDHFKV
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/PDHFKV
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
Oldrieve, Richard
Description
The attached excel file contains the database for Studies A & B for the article submitted to the journal "Computation" with submission number 3018487. It also contains data for test piloting the Blended Arithmetic Curriculum in two classrooms who participated in Study B. There is a pre-test that was administered in January and then a post-test given in February that covered Chapter 7--which is the chapter where students have learned how to compute 2-digit by 2-digit numbers that focus on the "Limited Facts" of adding on 1, adding on 0, adding 5+5, 9+1, as well as 7+7, 7+8, 8+7, and 8+8. The students who completed these seven chapters with accuracy and speed, did quite well on the urban school district's 2nd grade math proficiency test. Unfortunately, the section of the paper containing the results of the Chapter 7 assessment needed to be cut because it was confusing to explain and reviewers wanted the article shortened.
m
Excel generated epidemic curves for the paper "A Simple, SIR-like but...
data.mendeley.com
Updated Dec 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaoping Liu (2020). Excel generated epidemic curves for the paper "A Simple, SIR-like but Individual-Based Epidemic Model: Application in Comparison of COVID-19 in New York City and Wuhan" [Dataset]. http://doi.org/10.17632/3vg2r3ymgk.3
Explore at:
Unique identifier
https://doi.org/10.17632/3vg2r3ymgk.3
Dataset updated
Dec 12, 2020
Authors
Xiaoping Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
New York, Wuhan
Description
The author has calculated and plotted all epidemic curves in Excel for the paper "A Simple, SIR-like but Individual-Based Epidemic Model: Application in Comparison of COVID-19 in New York City and Wuhan". All these calculated curves are shown in Figures 2-11, which are separately placed in different sheets in the Excel file. The values of parameters l and c are separately placed in two cells marked in yellow. The two cells are located in top one or two row on the left. After the two parameters are changed, the Excel file will calculate the 4 variables An, In, Rn and Tn from n=1 to N. The calculated values are listed in 4 different columns of cells below the column labels An, In, Rn and Tn, respectively.
f
Additional file 2: Table S2. of Comparison, alignment, and synchronization...
springernature.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edison Ong; Sirarat Sarntivijai; Simon Jupp; Helen Parkinson; Yongqun He (2023). Additional file 2: Table S2. of Comparison, alignment, and synchronization of cell line information between CLO and EFO [Dataset]. http://doi.org/10.6084/m9.figshare.5728968.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5728968.v1
Dataset updated
Jun 5, 2023
Dataset provided by
figshare
Authors
Edison Ong; Sirarat Sarntivijai; Simon Jupp; Helen Parkinson; Yongqun He
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Final EFO-CLO alignment result. The 874 EFO-CLO mapped cell lines aligned and merged into CLO (Tab. 1 in the excel file) and 344 EFO unique immortalized permanent cell lines added to CLO (Tab. 2 in the excel file). File is stored in Microsoft Excel spreadsheet (xlsx) format. (XLSX 54Â kb)
C
Hospital Annual Financial Data - Selected Data & Pivot Tables
data.chhs.ca.gov
data.ca.gov
+5more
csv, data, doc, html +4
Updated Apr 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2025). Hospital Annual Financial Data - Selected Data & Pivot Tables [Dataset]. https://data.chhs.ca.gov/dataset/hospital-annual-financial-data-selected-data-pivot-tables
Explore at:
xls(44933632), pdf(258239), xlsx(758089), pdf(121968), html, xls(44967936), xlsx(763636), xlsx(770931), xls(51554816), xlsx(765216), xlsx(754073), xlsx, xls(14657536), pdf(383996), xlsx(758376), xls(920576), doc, xls(16002048), xls(51424256), xls(19577856), xlsx(768036), xls, xlsx(769128), xlsx(14714368), zip, pdf(310420), xls(18301440), xls(19625472), xlsx(777616), xls(18445312), xlsx(756356), xlsx(790979), pdf(303198), xlsx(771275), xlsx(779866), xls(19599360), pdf(333268), csv(205488092), xlsx(750199), data, xls(19650048), xlsx(782546)Available download formats
Dataset updated
Apr 23, 2025
Dataset authored and provided by
Department of Health Care Access and Information
Description
On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.

Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.

There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.
d
Spreadsheet of best models for each downscaled climate dataset and for all...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Spreadsheet of best models for each downscaled climate dataset and for all downscaled climate datasets considered together (Best_model_lists.xlsx) [Dataset]. https://catalog.data.gov/dataset/spreadsheet-of-best-models-for-each-downscaled-climate-dataset-and-for-all-downscaled-clim
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
The South Florida Water Management District (SFWMD) and the U.S. Geological Survey have developed projected future change factors for precipitation depth-duration-frequency (DDF) curves at 174 National Oceanic and Atmospheric Administration (NOAA) Atlas 14 stations in central and south Florida. The change factors were computed as the ratio of projected future to historical extreme precipitation depths fitted to extreme precipitation data from various downscaled climate datasets using a constrained maximum likelihood (CML) approach. The change factors correspond to the period 2050-2089 (centered in the year 2070) as compared to the 1966-2005 historical period. A Microsoft Excel workbook is provided that tabulates best models for each downscaled climate dataset and for all downscaled climate datasets considered together. Best models were identified based on how well the models capture the climatology and interannual variability of four climate extreme indices using the Model Climatology Index (MCI) and the Model Variability Index (MVI) of Srivastava and others (2020). The four indices consist of annual maxima consecutive precipitation for durations of 1, 3, 5, and 7 days compared against the same indices computed based on the PRISM and SFWMD gridded precipitation datasets for two climate regions: climate region 4 in South Central Florida, and climate region 5 in South Florida. The PRISM dataset is based on the Parameter-elevation Relationships on Independent Slopes Model interpolation method of Daly and others (2008). The South Florida Water Management District’s (SFWMD) precipitation super-grid is a gridded precipitation dataset developed by modelers at the agency for use in hydrologic modeling (SFWMD, 2005). This dataset is considered by the SFWMD as the best available gridded rainfall dataset for south Florida. Best models were selected based on MCI and MVI evaluated within each individual downscaled dataset. In addition, best models were selected by comparison across datasets and referred to as "ALL DATASETS" hereafter. Due to the small sample size, all models in the using the Weather Research and Forecasting Model (JupiterWRF) dataset were considered as best models.
o
BY-COVID - WP5 - Baseline Use Case: SARS-CoV-2 vaccine effectiveness...
explore.openaire.eu
Updated Jan 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francisco Estupiñán-Romero; Nina Van Goethem; Marjan Meurisse; Javier González-Galindo; Enrique Bernal-Delgado (2023). BY-COVID - WP5 - Baseline Use Case: SARS-CoV-2 vaccine effectiveness assessment - Common Data Model Specification [Dataset]. http://doi.org/10.5281/zenodo.6913045
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6913045
Dataset updated
Jan 26, 2023
Authors
Francisco Estupiñán-Romero; Nina Van Goethem; Marjan Meurisse; Javier González-Galindo; Enrique Bernal-Delgado
Description
This publication corresponds to the Common Data Model (CDM) specification of the Baseline Use Case proposed in T.5.2 (WP5) in the BY-COVID project on “SARS-CoV-2 Vaccine(s) effectiveness in preventing SARS-CoV-2 infection.” Research Question: “How effective have the SARS-CoV-2 vaccination programmes been in preventing SARS-CoV-2 infections?” Intervention (exposure): COVID-19 vaccine(s) Outcome: SARS-CoV-2 infection Subgroup analysis: Vaccination schedule (type of vaccine) Study Design: An observational retrospective longitudinal study to assess the effectiveness of the SARS-CoV-2 vaccine in preventing SARS-CoV-2 infections using routinely collected social, health and care data from several countries. A causal model was established using Directed Acyclic Graphs (DAGs) to map domain knowledge, theories and assumptions about the causal relationship between exposure and outcome. The DAG developed for the research question of interest is shown below. Cohort definition: All people eligible to be vaccinated (from 5 to 115 years old, included) or with, at least, one dose of a SARS-CoV-2 vaccine (any of the available brands) having or not a previous SARS-CoV-2 infection. Inclusion criteria: All people vaccinated with at least one dose of the COVID-19 vaccine (any available brands) in an area of residence. Any person eligible to be vaccinated (from 5 to 115 years old, included) with a positive diagnosis (irrespective of the type of test) for SARS-CoV-2 infection (COVID-19) during the period of study. Exclusion criteria: People not eligible for the vaccine (from 0 to 4 years old, included) Study period: From the date of the first documented SARS-CoV-2 infection in each country to the most recent date in which data is available at the time of analysis. Roughly from 01-03-2020 to 30-06-2022, depending on the country. Files included in this publication: Causal model (responding to the research question) SARS-CoV-2 vaccine effectiveness causal model v.1.0.0 (HTML) - Interactive report showcasing the structural causal model (DAG) to answer the research question SARS-CoV-2 vaccine effectiveness causal model v.1.0.0 (QMD) - Quarto RMarkdown script to produce the structural causal model Common data model specification (following the causal model) SARS-CoV-2 vaccine effectiveness data model specification (XLXS) - Human-readable version (Excel) SARS-CoV-2 vaccine effectiveness data model specification dataspice (HTML) - Human-readable version (interactive report) SARS-CoV-2 vaccine effectiveness data model specification dataspice (JSON) - Machine-readable version Synthetic dataset (complying with the common data model specifications) SARS-CoV-2 vaccine effectiveness synthetic dataset (CSV) [UTF-8, pipe | separated, N~650,000 registries] SARS-CoV-2 vaccine effectiveness synthetic dataset EDA (HTML) - Interactive report of the exploratory data analysis (EDA) of the synthetic dataset SARS-CoV-2 vaccine effectiveness synthetic dataset EDA (JSON) - Machine-readable version of the exploratory data analysis (EDA) of the synthetic dataset SARS-CoV-2 vaccine effectiveness synthetic dataset generation script (IPYNB) - Jupyter notebook with Python scripting and commenting to generate the synthetic dataset #### Baseline Use Case: SARS-CoV-2 vaccine effectiveness assessment - Common Data Model Specification v.1.1.0 change log #### Updated Causal model to eliminate the consideration of 'vaccination_schedule_cd' as a mediator Adjusted the study period to be consistent with the Study Protocol Updated 'sex_cd' as a required variable Added 'chronic_liver_disease_bl' as a comorbidity at the individual level Updated 'socecon_lvl_cd' at the area level as a recommended variable Added crosswalks for the definition of 'chronic_liver_disease_bl' in a separate sheet Updated the 'vaccination_schedule_cd' reference to the 'Vaccine' node in the updated DAG Updated the description of the 'confirmed_case_dt' and 'previous_infection_dt' variables to clarify the definition and the need for a single registry per person The scripts (software) accompanying the data model specification are offered "as-is" without warranty and disclaiming liability for damages resulting from using it. The software is released under the CC-BY-4.0 licence, which permits you to use the content for almost any purpose (but does not grant you any trademark permissions), so long as you note the license and give credit.
Store Data Analysis using MS excel
kaggle.com
Updated Mar 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NisshaaChoudhary (2024). Store Data Analysis using MS excel [Dataset]. https://www.kaggle.com/datasets/nisshaachoudhary/store-data-analysis-using-ms-excel/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 10, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
NisshaaChoudhary
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Vrinda Store: Interactive Ms Excel dashboardVrinda Store: Interactive Ms Excel dashboard Feb 2024 - Mar 2024Feb 2024 - Mar 2024 The owner of Vrinda store wants to create an annual sales report for 2022. So that their employees can understand their customers and grow more sales further. Questions asked by Owner of Vrinda store are as follows:- 1) Compare the sales and orders using single chart. 2) Which month got the highest sales and orders? 3) Who purchased more - women per men in 2022? 4) What are different order status in 2022?

And some other questions related to business. The owner of Vrinda store wanted a visual story of their data. Which can depict all the real time progress and sales insight of the store. This project is a Ms Excel dashboard which presents an interactive visual story to help the Owner and employees in increasing their sales. Task performed : Data cleaning, Data processing, Data analysis, Data visualization, Report. Tool used : Ms Excel The owner of Vrinda store wants to create an annual sales report for 2022. So that their employees can understand their customers and grow more sales further. Questions asked by Owner of Vrinda store are as follows:- 1) Compare the sales and orders using single chart. 2) Which month got the highest sales and orders? 3) Who purchased more - women per men in 2022? 4) What are different order status in 2022? And some other questions related to business. The owner of Vrinda store wanted a visual story of their data. Which can depict all the real time progress and sales insight of the store. This project is a Ms Excel dashboard which presents an interactive visual story to help the Owner and employees in increasing their sales. Task performed : Data cleaning, Data processing, Data analysis, Data visualization, Report. Tool used : Ms Excel Skills: Data Analysis · Data Analytics · ms excel · Pivot Tables
r
Data from: Event conceptualisation and aspect in L2 English and Persian: An...
researchdata.se
demo.researchdata.se
Updated Nov 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Somaje Abdollahian Barough (2019). Event conceptualisation and aspect in L2 English and Persian: An application of the Heidelberg-Paris model [Dataset]. http://doi.org/10.5878/wz3s-wt38
Explore at:
(10147845)Available download formats
Unique identifier
https://doi.org/10.5878/wz3s-wt38
Dataset updated
Nov 7, 2019
Dataset provided by
Stockholm University
Authors
Somaje Abdollahian Barough
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Time period covered
Aug 1, 2010 - Jul 31, 2013
Area covered
Islamic Republic of, Iran, Sweden, United States, United Kingdom
Description
The data have been used in an investigation for a PhD thesis in English Linguistics on similarities and differences in the use of the progressive aspect in two different language systems, English and Persian, both of which have the grammaticalised progressive. It is an application of the Heidelberg-Paris model of investigation into the impact of the progressive aspect on event conceptualisation. It builds on an analysis of single event descriptions at sentence level and re-narrations of a film clip at discourse level, as presented in von Stutterheim and Lambert (2005) DOI: 10.1515/9783110909593.203; Carroll and Lambert (2006: 54–73) http://libris.kb.se/bib/10266700; and von Stutterheim, Andermann, Carroll, Flecken & Schmiedtová (2012) DOI: 10.1515/ling-2012-0026. However, there are system-based typological differences between these two language systems due to the absence/presence of the imperfective-perfective categories, respectively. Thus, in addition to the description of the status of the progressive aspect in English and Persian and its impact on event conceptualisation, an important part of the investigation is the analysis of the L2 English speakers’ language production as the progressives in the first languages, L1s, exhibit differences in their principles of use due to the typological differences. The question of importance in the L2 context concerns the way they conceptualise ongoing events when the language systems are different, i.e. whether their language production is conceptually driven by their first language Persian.

The data consist of two data sets as the study includes two linguistic experiments, Experiment 1 and Experiment 2. The data for both experiments were collected by email. Separate forms of instructions, and language background questions were prepared for the six different informant groups, i.e. three speaker groups and two experimental tasks, as well as a Nelson English test https://www.worldcat.org/isbn/9780175551972 on the proficiency of English for Experiment 2 was selected and modified for the L2 English speaker group. Nelson English tests are published in Fowler, W.S. & Coe, N. (1976). Nelson English tests. Middlesex: Nelson and Sons. The test battery provides tests for all levels of proficiency. The graded tests are compiled in ten sets from elementary to very advanced level. Each set includes four graded tests, i.e. A, B, C, and D, resulting in 40 separate tests, each with 50 multiple-choice questions. The test entitled 250C was selected for this project. It belongs to the slot 19 out of the 40 slots of the total battery. The multiple-choice questions were checked with a native English professional and 5 inadequate questions relevant for pronunciation were omitted. In addition, a few modifications of the grammar questions were made, aiming at including questions that involve a contrast for the Persian L2 English learner with respect to the grammars of the two languages. The omissions and modifications provide an appropriate grammar test for very advanced Iranian learners of L2 English who have learnt the language in a classroom setting. The data set collected from the informants are characterised as follows: The data from Experiment 1 functions as the basis for the description of the progressive aspect in English, Persian and L2 English, while the data from Experiment 2 is the basis for the analysis of its use in a long stretch of discourse/language production for the three speaker groups. The parameters selected for the investigation comprised, first, phasal decomposition, which involves the use of the progressive in unrelated single motion events and narratives, and uses of begin/start in narratives. Second, granularity in narratives, which relates to the overall amount of language production in narratives. Third, event boundedness (encoded in the use of 2-state verbs and 1-state verbs with an endpoint adjunct) partly in single motion events and partly in temporal shift in narratives. Temporal shift is defined as follows: Events in the narrative which are bounded shift the time line via a right boundary; events with a left boundary also shift the time line, even if they are unbounded. Fourth, left boundary comprising the use of begin/start and try in narratives. Finally, temporal structuring, which involves the use of bounded versus unbounded events preceding the temporal adverbial then in narratives (The tests are described in the documentation files aspectL2English_Persian_Exp2Chi-square-tests-in-SPSS.docx and aspectL2English_Persian_Exp2Chi-square-tests-in-SPSS.rtf). In both experiments the participants watched a video, one relevant for single event descriptions, the other relevant for re-narration of a series of events. Thus, two different videos with stimuli for the different kinds of experimental tasks were used. For Experiment 1, a video of 63 short film clips presenting unrelated single events was provided by Professor Christiane von Stutterheim, Heidelberg University Language & Cognition (HULC) Lab, at Heidelberg University, German, https://www.hulclab.eu/. For Experiment 2, an animation called Quest produced by Thomas Stellmach 1996 was used. It is available online at http://www.youtube.com/watch?v=uTyev6OaThg. Both stimuli have been used in the previous investigations on different languages by the research groups associated with the HULC Lab. The informants were asked to describe the events seen in the stimuli videos, to record their language production and send it to the researcher. For Experiment 2, most part of the L1 English data were provided by Prof. von Stutterheim, Heidelberg University, making available 34 re-narrations of the film Quest in English. 24 of them were selected for the present investigation. The project used six different informant groups, i.e. fully separate groups for the two experiments. The data from single event descriptions in Experiment 1 were analysed quantitatively in Excel. The re-narrations of Experiment 2 were coded in NVivo 10 (2014) providing frequencies of various parametrical features (Ltd, Nv. (2014). NVivo QSR International Pty Ltd, Version 10. Doncaster, Australia: QSR International). The numbers from NVivo 10 were analysed statistically in Excel and SPSS (2017). The tools are appropriate for this research. Excel suits well for the smaller data load in Experiment 1 while NVivo 10 is practical for the large amount of data and parameters in Experiment 2. Notably, NVivo 10 enabled the analysis of the three data sets to take place in the same manner once the categories of analysis and parameters had been defined under different nodes. As the results were to be extracted in the same fashion from each data set, the L1 English data received from the Heidelberg for Experiment 2 were re-analysed according to the criteria employed in this project. Yet, the analysis in the project conforms to the criteria used earlier in the model.
Data from: "Ecophysiological variation in two provenances of Pinus flexilis...
osti.gov
Updated Dec 31, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Castanha, Cristina; Germino, Matthew J.; Kueppers, Lara M.; Reinhardt, Keith (2020). Data from: "Ecophysiological variation in two provenances of Pinus flexilis seedlings across an elevation gradient from forest to alpine" [Dataset]. https://www.osti.gov/dataexplorer/biblio/1804122-data-from-ecophysiological-variation-two-provenances-pinus-flexilis-seedlings-across-elevation-gradient-from-forest-alpine
Explore at:
Dataset updated
Dec 31, 2020
Dataset provided by
United States Department of Energyhttp://energy.gov/
Environmental System Science Data Infrastructure for a Virtual Ecosystem; Subalpine and Alpine Species Range Shifts with Climate Change: Temperature and Soil Moisture Manipulations to Test Species and Population Responses (Alpine Treeline Warming Experiment)
Authors
Castanha, Cristina; Germino, Matthew J.; Kueppers, Lara M.; Reinhardt, Keith
Description
This archive contains data used to support conclusions drawn in “Ecophysiological variation in two provenances of Pinus flexilis seedlings across an elevation gradient from forest to alpine”, by Reinhardt et al., 2011. Data were collected over one summer season in plots within the Alpine Treeline Warming Experiment (ATWE), before climate manipulations began. The experiment was located on Niwot Ridge, in the Front Range of the Colorado Rocky Mountains. This data package includes five comma-separated-values (.csv) files, five Microsoft Excel (.xlsx) files, one .pdf file, and two types of geospatial files: keyhole markup language (.kml), and ESRI shapefiles (.shp). .csv files can be opened using any simple text-editing software (such as Notepad and TextEdit), R, and Microsoft Excel. .xlsx files can only be opened using Microsoft Excel. The .pdf file can be opened using Adobe Acrobat Reader or any other compatible file viewing software. The .kml file can be opened using Google Earth and Google Maps, and shapefiles can be opened using any software compatible with the file type, such as ESRI’s ArcGIS suite and QGIS.Data archived contain gas exchange and plant physiology measurements, non-structural carbohydrate data, among others. Geospatial files are also provided for additional locational context. The files andmore » their contents in this data package are summarized under "Data Summary" in the included Data User's Guide. All files (excluding geospatial) are available in both Microsoft Excel and in .csv format, and are indicated in the Data Summary list as well.-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Climate change is predicted to cause upward shifts in forest tree distributions, which will require seedling recruitment beyond current forest boundaries. However, predicting the likelihood of successful plant establishment beyond current species’ ranges under changing climate is complicated by the interaction of genetic and environmental controls on seedling establishment. To determine how genetics and climate may interact to affect seedling establishment, we transplanted recently germinated seedlings from high- and low-elevation provenances (HI and LO, respectively) of Pinus flexilis in common gardens arrayed along an elevation and canopy gradient from subalpine forest into the alpine zone and examined differences in physiology and morphology between provenances and among sites. Plant dry mass, projected leaf area and shoot:root ratios were 12–40% greater in LO compared with HI seedlings at each elevation. There were no significant changes in these variables among sites except for decreased dry mass of LO seedlings in the alpine site. Photosynthesis, carbon balance (photosynthesis/respiration) and conductance increased >2× with elevation for both provenances, and were 35–77% greater in LO seedlings compared with HI seedlings. There were no differences in dark-adapted chlorophyll fluorescence (Fv/Fm) among sites or between provenances. Our results suggest that for P. flexilis seedlings, provenances selected for above-ground growth may outperform those selected for stress resistance in the absence of harsh climatic conditions, even well above the species’ range limits in the alpine zone. This indicates that forest genetics may be important to understanding and managing species’ range adjustments due to climate change.« less
Z
A dataset from a survey investigating disciplinary differences in data...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregory, Kathleen (2024). A dataset from a survey investigating disciplinary differences in data citation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7555362
Explore at:
Dataset updated
Jul 12, 2024
Dataset provided by
Haustein, Stefanie
Ninkov, Anton Boudreau
Peters, Isabella
Ripp, Chantal
Gregory, Kathleen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
GENERAL INFORMATION

Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation

Date of data collection: January to March 2022

Collection instrument: SurveyMonkey

Funding: Alfred P. Sloan Foundation

SHARING/ACCESS INFORMATION

Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license

Links to publications that cite or use the data:

Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437

Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data: A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266

DATA & FILE OVERVIEW

File List

Filename: MDCDatacitationReuse2021Codebookv2.pdf Codebook

Filename: MDCDataCitationReuse2021surveydatav2.csv Dataset format in csv

Filename: MDCDataCitationReuse2021surveydatav2.sav Dataset format in SPSS

Filename: MDCDataCitationReuseSurvey2021QNR.pdf Questionnaire

Additional related data collected that was not included in the current data package: Open ended questions asked to respondents

METHODOLOGICAL INFORMATION

Description of methods used for collection/generation of data:

The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.

Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).

Methods for processing the data:

Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.

Instrument- or software-specific information needed to interpret the data:

The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.

DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata

Number of variables: 95

Number of cases/rows: 2,492

Missing data codes: 999 Not asked

Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.
Data for: A systematic review showed no performance benefit of machine...
search.datacite.org
data.mendeley.com
Updated Mar 14, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ben Van Calster (2019). Data for: A systematic review showed no performance benefit of machine learning over logistic regression for clinical prediction models [Dataset]. http://doi.org/10.17632/sypyt6c2mc
Explore at:
Unique identifier
https://doi.org/10.17632/sypyt6c2mc
Dataset updated
Mar 14, 2019
Dataset provided by
DataCitehttps://www.datacite.org/
Mendeley
Authors
Ben Van Calster
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The uploaded files are: 1) Excel file containing 6 sheets in respective Order: "Data Extraction" (summarized final data extractions from the three reviewers involved), "Comparison Data" (data related to the comparisons investigated), "Paper level data" (summaries at paper level), "Outcome Event Data" (information with respect to number of events for every outcome investigated within a paper), "Tuning Classification" (data related to the manner of hyperparameter tuning of Machine Learning Algorithms). 2) R script used for the Analysis (In order to read the data, please: Save "Comparison Data", "Paper level data", "Outcome Event Data" Excel sheets as txt files. In the R script srpap: Refers to the "Paper level data" sheet, srevents: Refers to the "Outcome Event Data" sheet and srcompx: Refers to " Comparison data Sheet". 3) Supplementary Material: Including Search String, Tables of data, Figures 4) PRISMA checklist items
f
Data from: Consolidating and Managing Data for Drug Development within a...
figshare.com
xlsx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arvin Moser; Alexander E. Waked; Joseph DiMartino (2023). Consolidating and Managing Data for Drug Development within a Pharmaceutical Laboratory: Comparing the Mapping and Reporting Tools from Software Applications [Dataset]. http://doi.org/10.1021/acs.oprd.1c00082.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.oprd.1c00082.s002
Dataset updated
May 30, 2023
Dataset provided by
ACS Publications
Authors
Arvin Moser; Alexander E. Waked; Joseph DiMartino
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
We present a perspective on drug development for the synthesis of an active pharmaceutical ingredient (e.g., agomelatine) within a commercial technology called Luminata and compare the results to the current method of consolidating the reaction data into Microsoft Excel. The Excel document becomes the ultimate repository of information extracted from multiple sources such as the electronic lab notebook, the laboratory information management system, the chromatography data system, in-house databases, and external data. The major needs of a pharmaceutical company are tracking the stages of multiple reactions, calculating the impurity carryover across the stages, and performing structure dereplication for an unknown impurity. As there is no standardized software available to link the different needs throughout the life cycle of process development, there is a demand for mapping tools to consolidate the route for an API synthesis and link it with analytical data while reducing transcription errors and maintaining an audit trail.
d
Directed network analysis of 2-year-old and 4-year-old children
search.dataone.org
datasetcatalog.nlm.nih.gov
+1more
Updated Dec 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Norikazu Hirose; Masanori Kato; Ayumi Maruyama (2024). Directed network analysis of 2-year-old and 4-year-old children [Dataset]. http://doi.org/10.5061/dryad.63xsj3vbx
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.63xsj3vbx
Dataset updated
Dec 31, 2024
Dataset provided by
Dryad Digital Repository
Authors
Norikazu Hirose; Masanori Kato; Ayumi Maruyama
Description
This dataset includes triaxial acceleration data collected from 60 children (2- and 4-year-olds) across three childcare facilities during a 15-minute free play session. Dyadic (two-person) and triadic (three-person) peer relationships were analyzed using a novel-directed network analysis method, offering insights into age and sex differences in peer interactions. The dataset includes normalized connection counts, demographic details, and detailed analyses of interaction directionality. This research validates the application of network analysis in early childhood studies, reducing observational biases and labor-intensive manual coding, and provides a framework for exploring complex social dynamics in naturalistic play settings., Participants:The study involved 60 children, with equal representation of 2- and 4-year-olds, across three childcare facilities. Informed consent was obtained from the participants' legal guardians following ethical guidelines. Data Collection:Participants wore wristwatch-style triaxial accelerometers (Silmee W22, TDK) during a 15-minute free play session. Acceleration data were recorded at 20 Hz to measure individual movement intensity and calculate dyadic and triadic peer relationships. Data Processing:

Normalization: Acceleration data were aggregated into 1-second intervals and normalized using Sturgesâ€™ formula to account for individual variability.

Network Analysis: Connections were quantified using directed graph analysis, identifying dyads and triads based on movement entropy thresholds. Entropy values ranged from 0 to 1, representing interaction strength.

Statistical Analysis:Chi-square tests and ANOVAs were performed to analyze age, sex, and directional differences in peer r..., , # Directed network analysis of 2-year-old and 4-year-old children

https://doi.org/10.5061/dryad.63xsj3vbx

Description of the data and file structure

Dataset Overview: This dataset contains information about directed networks among 2-year-old and 4-year-old children. The data reflects interactions in terms of directed connections from one child to another, categorized by their unique identifiers and demographics.

Files and variables

File: Directed_network_of_2_and_4_YO_childlen.xlsx

The Excel file consists of four sheets as follows:

Sheet1: Dyad

Sheet2: Triad

Sheet3: Directed NW

Sheet4: Directed NW among individuals

Sheet description and variables

Sheet 1:Â Dyad

This sheet includes each child's age, gender, and the number of Dyads for each facility. Using this data, comparisons of the number of Dyads were conducted between different ages and genders.

Variables in each column

Facility...

Facebook

Twitter

Click to copy link

Link copied

Cite

Yaacov Petscher (2023). Statistical Comparison of Two ROC Curves [Dataset]. http://doi.org/10.6084/m9.figshare.860448.v1

Statistical Comparison of Two ROC Curves

Explore at:

10 scholarly articles cite this dataset (View in Google Scholar)

xlsAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.860448.v1

Dataset updated

Jun 3, 2023

Dataset provided by

Figsharehttp://figshare.com/

Authors

Yaacov Petscher

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.

Clear search

Close search

Google apps

Main menu

Statistical Comparison of Two ROC Curves

Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...

UC_vs_US Statistic Analysis.xlsx

Data and program: Comparison between Machine Learning Models and...

Input-Output Data Sets Used in the Evaluation of the Two-Layer Soil Moisture...

Data-analysis-EXCEL-POWER-BI

Replication Package - How Do Requirements Evolve During Elicitation? An...

Oldrieve Excel File Datebase for Computation 3018487 -- Teaching K-3...

Excel generated epidemic curves for the paper "A Simple, SIR-like but...

Additional file 2: Table S2. of Comparison, alignment, and synchronization...

Hospital Annual Financial Data - Selected Data & Pivot Tables

Spreadsheet of best models for each downscaled climate dataset and for all...

BY-COVID - WP5 - Baseline Use Case: SARS-CoV-2 vaccine effectiveness...

Store Data Analysis using MS excel

Data from: Event conceptualisation and aspect in L2 English and Persian: An...

Data from: "Ecophysiological variation in two provenances of Pinus flexilis...

A dataset from a survey investigating disciplinary differences in data...

Data for: A systematic review showed no performance benefit of machine...

Data from: Consolidating and Managing Data for Drug Development within a...

Directed network analysis of 2-year-old and 4-year-old children

Description of the data and file structure

Files and variables

File: Directed_network_of_2_and_4_YO_childlen.xlsx

Sheet description and variables

Sheet 1:Â Dyad

Statistical Comparison of Two ROC Curves