100+ datasets found

Statistical Comparison of Two ROC Curves
figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yaacov Petscher (2023). Statistical Comparison of Two ROC Curves [Dataset]. http://doi.org/10.6084/m9.figshare.860448.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.860448.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Yaacov Petscher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.
d
Data from: Delta Neighborhood Physical Activity Study
catalog.data.gov
agdatacommons.nal.usda.gov
+1more
Updated Jun 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Delta Neighborhood Physical Activity Study [Dataset]. https://catalog.data.gov/dataset/delta-neighborhood-physical-activity-study-f82d7
Explore at:
Dataset updated
Jun 5, 2025
Dataset provided by
Agricultural Research Service
Description
The Delta Neighborhood Physical Activity Study was an observational study designed to assess characteristics of neighborhood built environments associated with physical activity. It was an ancillary study to the Delta Healthy Sprouts Project and therefore included towns and neighborhoods in which Delta Healthy Sprouts participants resided. The 12 towns were located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys between August 2016 and September 2017 using the Rural Active Living Assessment (RALA) tools and the Community Park Audit Tool (CPAT). Scale scores for the RALA Programs and Policies Assessment and the Town-Wide Assessment were computed using the scoring algorithms provided for these tools via SAS software programming. The Street Segment Assessment and CPAT do not have associated scoring algorithms and therefore no scores are provided for them. Because the towns were not randomly selected and the sample size is small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Dataset one contains data collected with the RALA Programs and Policies Assessment (PPA) tool. Dataset two contains data collected with the RALA Town-Wide Assessment (TWA) tool. Dataset three contains data collected with the RALA Street Segment Assessment (SSA) tool. Dataset four contains data collected with the Community Park Audit Tool (CPAT). [Note : title changed 9/4/2020 to reflect study name] Resources in this dataset:Resource Title: Dataset One RALA PPA Data Dictionary. File Name: RALA PPA Data Dictionary.csvResource Description: Data dictionary for dataset one collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA Data Dictionary. File Name: RALA TWA Data Dictionary.csvResource Description: Data dictionary for dataset two collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA Data Dictionary. File Name: RALA SSA Data Dictionary.csvResource Description: Data dictionary for dataset three collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT Data Dictionary. File Name: CPAT Data Dictionary.csvResource Description: Data dictionary for dataset four collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset One RALA PPA. File Name: RALA PPA Data.csvResource Description: Data collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA. File Name: RALA TWA Data.csvResource Description: Data collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA. File Name: RALA SSA Data.csvResource Description: Data collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT. File Name: CPAT Data.csvResource Description: Data collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Data Dictionary. File Name: DataDictionary_RALA_PPA_SSA_TWA_CPAT.csvResource Description: This is a combined data dictionary from each of the 4 dataset files in this set.
H
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...
dataverse.harvard.edu
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgios Boumis; Brad Peter (2024). Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends [Dataset]. http://doi.org/10.7910/DVN/ZZDYM9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ZZDYM9
Dataset updated
Jul 8, 2024
Dataset provided by
Harvard Dataverse
Authors
Georgios Boumis; Brad Peter
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...
N
Excel Township, Minnesota Annual Population and Growth Analysis Dataset: A...
neilsberg.com
csv, json
Updated Jul 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Excel Township, Minnesota Annual Population and Growth Analysis Dataset: A Comprehensive Overview of Population Changes and Yearly Growth Rates in Excel township from 2000 to 2023 // 2024 Edition [Dataset]. https://www.neilsberg.com/insights/excel-township-mn-population-by-year/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jul 30, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Minnesota, Excel Township
Variables measured
Annual Population Growth Rate, Population Between 2000 and 2023, Annual Population Growth Rate Percent
Measurement technique
The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2023. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2023. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Excel township population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Excel township across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

Key observations

In 2023, the population of Excel township was 300, a 0.99% decrease year-by-year from 2022. Previously, in 2022, Excel township population was 303, a decline of 0.98% compared to a population of 306 in 2021. Over the last 20 plus years, between 2000 and 2023, population of Excel township increased by 17. In this period, the peak population was 308 in the year 2020. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

Content

When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

Data Coverage:

From 2000 to 2023

Variables / Data Columns

Year: This column displays the data year (Measured annually and for years 2000 to 2023)

Population: The population for the specific year for the Excel township is shown in this column.

Year on Year Change: This column displays the change in Excel township population for each year compared to the previous year.

Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Excel township Population by Year. You can refer the same here
N
Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age...
neilsberg.com
csv, json
Updated Jul 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/aa8c95e0-4983-11ef-ae5d-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jul 24, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Excel
Variables measured
Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.

Key observations

The largest age group in Excel, AL was for the group of age 45 to 49 years years with a population of 74 (15.64%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.42%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group in consideration

Population: The population for the specific age group in the Excel is shown in this column.

% of Total Population: This column displays the population of each age group as a proportion of Excel total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here
f
UC_vs_US Statistic Analysis.xlsx
figshare.com
xlsx
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.12631628.v1
Dataset updated
Jul 9, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme: Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
Input-Output Data Sets Used in the Evaluation of the Two-Layer Soil Moisture...
catalog.data.gov
Updated Mar 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2023). Input-Output Data Sets Used in the Evaluation of the Two-Layer Soil Moisture and Flux Model [Dataset]. https://catalog.data.gov/dataset/input-output-data-sets-used-in-the-evaluation-of-the-two-layer-soil-moisture-and-flux-mode
Explore at:
Dataset updated
Mar 3, 2023
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The Excel file contains the model input-out data sets that where used to evaluate the two-layer soil moisture and flux dynamics model. The model is original and was developed by Dr. Hantush by integrating the well-known Richards equation over the root layer and the lower vadose zone. The input-output data are used for: 1) the numerical scheme verification by comparison against HYDRUS model as a benchmark; 2) model validation by comparison against real site data; and 3) for the estimation of model predictive uncertainty and sources of modeling errors. This dataset is associated with the following publication: He, J., M.M. Hantush, L. Kalin, and S. Isik. Two-Layer numerical model of soil moisture dynamics: Model assessment and Bayesian uncertainty estimation. JOURNAL OF HYDROLOGY. Elsevier Science Ltd, New York, NY, USA, 613 part A: 128327, (2022).
o
Data for: Sustainable connectivity in a community repository
explore.openaire.eu
data.niaid.nih.gov
+3more
Updated Dec 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ted Habermann (2023). Data for: Sustainable connectivity in a community repository [Dataset]. http://doi.org/10.5061/dryad.nzs7h44xr
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.nzs7h44xr
Dataset updated
Dec 7, 2023
Authors
Ted Habermann
Description
Data For: Sustainable Connectivity in a Community Repository ## GENERAL INFORMATION This readme.txt file was generated on 30231110 by Ted Habermann ### Title of Dataset Data For: Sustainable Connectivity in a Community Repository ### Author Information Principal Investigator Contact Information Name: Ted Habermann (0000-0003-3585-6733) Institution: Metadata Game Changers () Email: ORCID: 0000-0003-3585-6733 ### Date published or finalized for release: November 10, 2023 ## Date of data collection (single date, range, approximate date) May and June 2023 ### Information about funding sources that supported the collection of the data: National Science Foundation (Crossref Funder ID: 100000001) Award 2134956. ### Overview of the data (abstract): These data are Dryad metadata retrieved from and translated into csv files. There are two datasets: 1. DryadJournalDataset was retrieved from Dryad using the ISSNs in the file DryadJournalDataset_ISSNs.txt, although some had no data. 2. DryadOrganizationDataset was retrieved from Dryad using the RORs in the file DryadOrganizationDataset_RORs.txt, although some had no data. Each dataset includes four types of metadata: identifiers, funders, keywords, and related works, each in a separate comma (.csv) or tab (.tsv) delimited files. There are also Microsoft Excel files (.xlsx) for the identifier metadata and connectivity summaries for each dataset (*.html). The connectivity summaries include summaries of each parameter in all four data files with definitions, counts, unique counts, most frequent values, and completeness. These data formed the basis for an analysis of the connectivity of the Dryad repository for organizations, funders, and people. | Size | FileName | | --------: | :--------------------------------------------------------- | | 90541505 | DryadJournalDataset_Identifiers_20230520_12.csv | | 9017051 | DryadJournalDataset_funders_20230520_12.tsv | | 29108477 | DryadJournalDataset_keywords_20230520_12.tsv | | 8833842 | DryadJournalDataset_relatedWorks_20230520_12.tsv | | | | | 18260935 | DryadOrganizationDataset_funders_20230601_12.tsv | | 240128730 | DryadOrganizationDataset_identifiers_20230601_12.tsv | | 39600659 | DryadOrganizationDataset_keywords_20230601_12.tsv | | 11520475 | DryadOrganizationDataset_relatedWorks_20230601_12.tsv | | | | | 40726143 | DryadJournalDataset_identifiers_20230520_12.xlsx | | 81894301 | DryadOrganizationDataset_identifiers_20230601_12.xlsx | | | | | 842827 | DryadJournalDataset_ConnectivitySummary.html | | 387551 | DryadOrganizationDataset_ConnectivitySummary.html | ### Field Definitions ## SHARING/ACCESS INFORMATION ### Licenses/restrictions placed on the data: Creative Commons Public Domain License (CC0) ### Links to publications that cite or use the data: TBD ### Was data derived from another source? No ## DATA & FILE OVERVIEW ### File List A. *Dataset_identifiers_YYYYMMDD_HH.*sv: Short description: Identifier metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. B. *Dataset_funders_YYYYMMDD_HH.*sv: Short description: Funder metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. C. *Dataset_keywords_YYYYMMDD_HH.*sv: Short description: Keyword metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. D. *Dataset_relatedWorks_YYYYMMDD_HH.*sv: Short description: Related work metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. E. *Dataset_identifiers_YYYYMMDD_HH.xlsx: Short description: Excel spreadsheet with identifier metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. F. *Dataset_ConnectivitySummary.html: Short description: Connectivity summary for Dataset. G. summarizeConnectivity.ipynb Short description: Python notebook with code for creating connectivity summaries and plots. ### Relationship between files: All files with the same dataset name make up a dataset. The .*sv are original metadata extracted from Dryad. ## METHODOLOGICAL INFORMATION ### Description of methods used for collection/generation of data: Most of the analysis is simply extracting and comparing counts of various metadata elements. ## DATA-SPECIFIC INFORMATION See connectivity summaries (*ConnectivitySummary.html) for a list of parameters in each file and summaries of their values. ### Identifier Metadata The identifier metadata datasets include the following fields: | Field | Definition | | :------------------------------- | :--------------------------------------------------------------------------------------------------- | | DOI | Digital object identifier for the dataset | | title | Title for the dataset | | datePublished | Date dataset published | | relatedPublicationISSN | International Standard Serial Number for journal with related publication | | primary_article | Digital object identifier for pr...
Age-depth models for Pb-210 datasets (NERC Grant NE/V008269/1)
data-search.nerc.ac.uk
metadata.bgs.ac.uk
+1more
html
Updated Sep 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
British Geological Survey (2022). Age-depth models for Pb-210 datasets (NERC Grant NE/V008269/1) [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/api/records/e79e0767-1051-2d82-e053-0937940ae4e8
Explore at:
htmlAvailable download formats
Dataset updated
Sep 18, 2022
Dataset authored and provided by
British Geological Surveyhttps://www.bgs.ac.uk/
License
http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitationshttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations
Time period covered
Dec 1, 2021 - Mar 30, 2022
Description
Age-depth models for Pb-210 datasets. The St Croix Watershed Research Station, of the Science Museum of Minnesota, kindly made available 210Pb datasets that have been measured in their lab over the past decades. The datasets come mostly from North American lakes. These datasets were used to produce both chronologies using the 'classical' CRS (Constant Rate of Supply) approach and also using a recently developed Bayesian alternative called 'Plum'. Both approaches were used in order to compare the two approaches. The 210Pb data will also be deposited in the neotomadb.org database. The dataset consists of 3 files; 1. Rcode_Pb210.R R code to process the data files, produce age-depth models and compare them. 2. StCroix_agemodel_output.zip Output of age-model runs of the St Croix datasets 3. StCroix_xlxs_files.zip Excel files of the St Croix Pb-210 datasets
d
Data from: Pilot Testing of SHRP 2 Reliability Data and Analytical Products:...
datasets.ai
odgavaprod.ogopendata.com
+2more
57
Updated Sep 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Transportation (2024). Pilot Testing of SHRP 2 Reliability Data and Analytical Products: Minnesota [supporting datasets] [Dataset]. https://datasets.ai/datasets/pilot-testing-of-shrp-2-reliability-data-and-analytical-products-minnesota-supporting-data
Explore at:
57Available download formats
Dataset updated
Sep 11, 2024
Dataset authored and provided by
Department of Transportation
Description
The objective of this project was to develop system designs for programs to monitor travel time reliability and to prepare a guidebook that practitioners and others can use to design, build, operate, and maintain such systems. Generally, such travel time reliability monitoring systems will be built on top of existing traffic monitoring systems. The focus of this project was on travel time reliability. The data from the monitoring systems developed in this project – from both public and private sources –included, wherever cost-effective, information on the seven sources of non-recurring congestion. This data was used to construct performance measures or to perform various types of analyses useful for operations management as well as performance measurement, planning, and programming. The datasets in the attached ZIP file support SHRP 2 reliability project L38B, "Pilot testing of SHRP 2 reliability data and analytical products: Minnesota." This report can be accessed via the following URL: https://rosap.ntl.bts.gov/view/dot/3608 This ZIP file package, which is 22.1 MB in size, contains 6 Microsoft Excel spreadsheet files (XLSX). This file package also contains 3 Comma Separated Value files (CSV). The XLSX and CSV files can be opened using Microsoft Excel 2010 and 2016. The CSV files can be opened using most available text editing programs.
C
Hospital Annual Financial Data - Selected Data & Pivot Tables
data.chhs.ca.gov
healthdata.gov
+6more
csv, data, doc, html +4
Updated Apr 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2025). Hospital Annual Financial Data - Selected Data & Pivot Tables [Dataset]. https://data.chhs.ca.gov/dataset/hospital-annual-financial-data-selected-data-pivot-tables
Explore at:
xlsx(770931), xlsx(752914), xlsx, xls(44933632), pdf(121968), pdf(310420), xlsx(14714368), xls(44967936), xlsx(758089), xlsx(765216), csv(205488092), xls(920576), pdf(383996), xls(51554816), xlsx(750199), xls(51424256), pdf(303198), pdf(258239), xlsx(763636), doc, html, pdf(333268), xlsx(768036), xlsx(754073), xls(16002048), xls(14657536), xlsx(756356), xlsx(769128), data, xls, xls(18301440), xlsx(790979), zip, xlsx(779866), xls(18445312), xls(19650048), xlsx(782546), xls(19599360), xlsx(771275), xls(19625472), xlsx(758376), xlsx(777616), xls(19577856)Available download formats
Dataset updated
Apr 23, 2025
Dataset authored and provided by
Department of Health Care Access and Information
Description
On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.

Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.

There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.
d
2.20 Employee Vertical Diversity (summary)
catalog.data.gov
open.tempe.gov
+9more
Updated Jun 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). 2.20 Employee Vertical Diversity (summary) [Dataset]. https://catalog.data.gov/dataset/2-20-employee-vertical-diversity-summary-0c1c3
Explore at:
Dataset updated
Jun 28, 2025
Dataset provided by
City of Tempe
Description
It is important to identify any barriers in recruitment, hiring, and employee retention practices that might discourage any segment of our population from applying for positions or continuing employment at the City of Tempe. This information will provide better awareness for outreach efforts and other strategies to attract, hire, and retain a diverse workforce.This page provides data for the Employee Vertical Diversity performance measure. The performance measure dashboard is available at 2.20 Employee Vertical Diversity. Additional InformationSource:PeopleSoft HCM, Maricopa County Labor Market Census DataContact: Lawrence LaVictoireContact E-Mail: lawrence_lavicotoire@tempe.govData Source Type: Excel, PDFPreparation Method: PeopleSoft query and PDF are moved to a pre-formatted Excel spreadsheet.Publish Frequency: Every six monthsPublish Method: ManualData Dictionary
Z
Dairy Supply Chain Sales Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Athanasios Liatifis (2024). Dairy Supply Chain Sales Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7853252
Explore at:
Dataset updated
Jul 12, 2024
Dataset provided by
Dimitris Iatropoulos
Vasileios Argyriou
Konstantinos Georgakidis
Anna Triantafyllou
Christos Chaschatzis
Athanasios Liatifis
Thomas Lagkas
Panagiotis Sarigiannidis
Dimitrios Pliatsios
Ilias Siniosoglou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
1.Introduction

Sales data collection is a crucial aspect of any manufacturing industry as it provides valuable insights about the performance of products, customer behaviour, and market trends. By gathering and analysing this data, manufacturers can make informed decisions about product development, pricing, and marketing strategies in Internet of Things (IoT) business environments like the dairy supply chain.

One of the most important benefits of the sales data collection process is that it allows manufacturers to identify their most successful products and target their efforts towards those areas. For example, if a manufacturer could notice that a particular product is selling well in a certain region, this information could be utilised to develop new products, optimise the supply chain or improve existing ones to meet the changing needs of customers.

This dataset includes information about 7 of MEVGAL’s products [1]. According to the above information the data published will help researchers to understand the dynamics of the dairy market and its consumption patterns, which is creating the fertile ground for synergies between academia and industry and eventually help the industry in making informed decisions regarding product development, pricing and market strategies in the IoT playground. The use of this dataset could also aim to understand the impact of various external factors on the dairy market such as the economic, environmental, and technological factors. It could help in understanding the current state of the dairy industry and identifying potential opportunities for growth and development.

Citation

Please cite the following papers when using this dataset:

I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis and P. Sarigiannidis, "Evaluating the Effect of Volatile Federated Timeseries on Modern DNNs: Attention over Long/Short Memory," in the 12th International Conference on Circuits and Systems Technologies (MOCAST 2023), April 2023, Accepted

Dataset Modalities

The dataset includes data regarding the daily sales of a series of dairy product codes offered by MEVGAL. In particular, the dataset includes information gathered by the logistics division and agencies within the industrial infrastructures overseeing the production of each product code. The products included in this dataset represent the daily sales and logistics of a variety of yogurt-based stock. Each of the different files include the logistics for that product on a daily basis for three years, from 2020 to 2022.

3.1 Data Collection

The process of building this dataset involves several steps to ensure that the data is accurate, comprehensive and relevant.

The first step is to determine the specific data that is needed to support the business objectives of the industry, i.e., in this publication’s case the daily sales data.

Once the data requirements have been identified, the next step is to implement an effective sales data collection method. In MEVGAL’s case this is conducted through direct communication and reports generated each day by representatives & selling points.

It is also important for MEVGAL to ensure that the data collection process conducted is in an ethical and compliant manner, adhering to data privacy laws and regulation. The industry also has a data management plan in place to ensure that the data is securely stored and protected from unauthorised access.

The published dataset is consisted of 13 features providing information about the date and the number of products that have been sold. Finally, the dataset was anonymised in consideration to the privacy requirement of the data owner (MEVGAL).

File

Period

Number of Samples (days)

product 1 2020.xlsx

01/01/2020–31/12/2020

363

product 1 2021.xlsx

01/01/2021–31/12/2021

364

product 1 2022.xlsx

01/01/2022–31/12/2022

365

product 2 2020.xlsx

01/01/2020–31/12/2020

363

product 2 2021.xlsx

01/01/2021–31/12/2021

364

product 2 2022.xlsx

01/01/2022–31/12/2022

365

product 3 2020.xlsx

01/01/2020–31/12/2020

363

product 3 2021.xlsx

01/01/2021–31/12/2021

364

product 3 2022.xlsx

01/01/2022–31/12/2022

365

product 4 2020.xlsx

01/01/2020–31/12/2020

363

product 4 2021.xlsx

01/01/2021–31/12/2021

364

product 4 2022.xlsx

01/01/2022–31/12/2022

364

product 5 2020.xlsx

01/01/2020–31/12/2020

363

product 5 2021.xlsx

01/01/2021–31/12/2021

364

product 5 2022.xlsx

01/01/2022–31/12/2022

365

product 6 2020.xlsx

01/01/2020–31/12/2020

362

product 6 2021.xlsx

01/01/2021–31/12/2021

364

product 6 2022.xlsx

01/01/2022–31/12/2022

365

product 7 2020.xlsx

01/01/2020–31/12/2020

362

product 7 2021.xlsx

01/01/2021–31/12/2021

364

product 7 2022.xlsx

01/01/2022–31/12/2022

365

3.2 Dataset Overview

The following table enumerates and explains the features included across all of the included files.

Feature

Description

Unit

Day

day of the month

-

Month

Month

-

Year

Year

-

daily_unit_sales

Daily sales - the amount of products, measured in units, that during that specific day were sold

units

previous_year_daily_unit_sales

Previous Year’s sales - the amount of products, measured in units, that during that specific day were sold the previous year

units

percentage_difference_daily_unit_sales

The percentage difference between the two above values

%

daily_unit_sales_kg

The amount of products, measured in kilograms, that during that specific day were sold

kg

previous_year_daily_unit_sales_kg

Previous Year’s sales - the amount of products, measured in kilograms, that during that specific day were sold, the previous year

kg

percentage_difference_daily_unit_sales_kg

The percentage difference between the two above values

kg

daily_unit_returns_kg

The percentage of the products that were shipped to selling points and were returned

%

previous_year_daily_unit_returns_kg

The percentage of the products that were shipped to selling points and were returned the previous year

%

points_of_distribution

The amount of sales representatives through which the product was sold to the market for this year

previous_year_points_of_distribution

The amount of sales representatives through which the product was sold to the market for the same day for the previous year

Table 1 – Dataset Feature Description

Structure and Format

4.1 Dataset Structure

The provided dataset has the following structure:

Where:

Name

Type

Property

Readme.docx

Report

A File that contains the documentation of the Dataset.

product X

Folder

A folder containing the data of a product X.

product X YYYY.xlsx

Data file

An excel file containing the sales data of product X for year YYYY.

Table 2 - Dataset File Description

Acknowledgement

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 957406 (TERMINET).

References

[1] MEVGAL is a Greek dairy production company
b
Data from: Coarse datasets for the 2002-2010 Tsimane' Amazonian Panel...
scholarworks.brandeis.edu
docx, pdf, xls
Updated Mar 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ricardo Godoy; William R. Leonard; Victoria Reyes-Garcia; Tomas Huanca (2022). Coarse datasets for the 2002-2010 Tsimane' Amazonian Panel Study(TAPS) - Introduction and authorization [Dataset]. https://scholarworks.brandeis.edu/esploro/outputs/dataset/Coarse-datasets-for-the-2002-2010-Tsimane/9924097301801921
Explore at:
xls(1472000 bytes), pdf(140365 bytes), docx(32618 bytes)Available download formats
Dataset updated
Mar 15, 2022
Authors
Ricardo Godoy; William R. Leonard; Victoria Reyes-Garcia; Tomas Huanca
Time period covered
Mar 2022
Measurement technique
<p>See Chapter 4 of "Too little, too late" for general methods, and different chapters for methods on different topics</p>
Description
Introduction. This document provides an overview of an archive composed of four sections.
[1] An introduction (this document) which describes the scope of the project
[2] Yearly folder, from 2002 until 2010, of the coarse Microsoft Access datasets + the surveys used to collect information for each year. The word coarse does not mean the information in the Microsoft Access dataset was not corrected for mistakes; it was, but some mistakes and inconsistencies remain, such as with data on age or education. Furthermore, the coarse dataset provides disaggregated information for selected topics, which appear in summary statistics in the clean dataset. For example, in the coarse dataset one can find the different illnesses afflicting a person during the past 14 days whereas in the clean dataset only the total number of illnesses appears.
[3] A letter from the Gran Consejo Tsimane’ authorizing the public use of de-identified data collected in our studies among Tsimane’.
[4] A Microsoft Excel document with the unique identification number for each person in the panel study.

Background. During 2002-2010, a team of international researchers, surveyors, and translators gathered longitudinal (panel) data on the demography, economy, social relations, health, nutritional status, local ecological knowledge, and emotions of about 1400 native Amazonians known as Tsimane’ who lived in thirteen villages near and far from towns in the department of Beni in the Bolivian Amazon. A report titled “Too little, too late” summarizes selected findings from the study and is available to the public at the electronic library of Brandeis University:
https://scholarworks.brandeis.edu/permalink/01BRAND_INST/1bo2f6t/alma9923926194001921

A copy of the clean, merged, and appended Stata (V17) dataset is available to the public at the following two web addresses:
[a] Brandeis University:
https://scholarworks.brandeis.edu/permalink/01BRAND_INST/1bo2f6t/alma9923926193901921
[b] Inter-university Consortium for Political and Social Research (ICPSR), University of Michigan (only available to users affiliated with institutions belonging to ICPSR)
http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/37671/utilization

Chapter 4 of the report “Too little, too late” mentioned above describes the motivation and history of the study, the difference between the coarse and clean datasets, and topics which can be examined only with coarse data.

Aims. The aims of this archive are to:
· Make available in Microsoft Access the coarse de-identified dataset [1] for each of the seven yearly surveys (2004-2010) and [2] one Access data based on quarterly surveys done during 2002 and 2003. Together, these two datasets form one longitudinal dataset of individuals, households, and villages.
· Provide guidance on how to link files within and across years, and
· Make available a Microsoft Excel file with a unique identification number to link individuals across years
The datasets in the archive.
· Eight Microsoft Access datasets with data on a wide range of variables. Except for the Access file for 2002-2003, all the other information in each of the other Access files refers to one year. Within any Access dataset, users will find two types of files:
o Thematic files. The name of a thematic file contains the prefix tbl (e.g., 29_tbl_Demography or tbl_29_Demography). The file name (sometimes in Spanish, sometimes in English) indicates the content of the file. For example, in the Access dataset for one year, the micro file tbl_30_Ventas has all the information on sales for that year. Within each micro file, columns contain information on a variable and the name of the column indicates the content of the variable. For instance, the column heading item in the Sales file would indicate the type of good sold. The exac…
d
Pilot Testing of SHRP 2 Reliability Data and Analytical Products: Florida...
catalog.data.gov
data.virginia.gov
+3more
Updated Dec 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Highway Administration (2023). Pilot Testing of SHRP 2 Reliability Data and Analytical Products: Florida [supporting datasets] [Dataset]. https://catalog.data.gov/dataset/pilot-testing-of-shrp-2-reliability-data-and-analytical-products-florida-supporting-datase
Explore at:
Dataset updated
Dec 7, 2023
Dataset provided by
Federal Highway Administration
Area covered
Florida
Description
"SHRP 2 initiated the L38 project to pilot test products from five of the program’s completed projects. The products support reliability estimation and use based on data analyses, analytical techniques, and decision-making framework. The L38 project has two main objectives: (1) to assist agencies in using travel time reliability as a measure in their business practices and (2) to receive feedback from the project research teams on the applicability and usefulness of the products tested, along with their suggested possible refinements. SHRP 2 selected four teams from California, Minnesota, Florida, and Washington. Project L38C tested elements from Projects L02, L05, L07, and L08. Project L02 identified methods to collect, archive, and integrate required data for reliability estimation and methods for analyzing and visualizing the causes of unreliability based on the collected data. Projects L07 and L08 produced analytical techniques and tools for estimating reliability based on developed models and allowing the estimation of reliability and the impacts on reliability of alternative mitigating strategies. Project L05 provided guidance regarding how to use reliability assessments to support the business processes of transportation agencies. The datasets in this zip file, which is 7.83 MB in size, support of SHRP 2 reliability project L38C, "Pilot testing of SHRP 2 reliability data and analytical products: Florida." The accompanying report can be accessed at the following URL: https://rosap.ntl.bts.gov/view/dot/3609 There are 12 datasets in this zip file, including 2 Microsoft Excel worksheets (XLSX) and 10 Comma Separated Values (CSV) files. The Microsoft Excel worksheets can be opened using the 2010 and 2016 versions of Microsoft Word, the CSV files can be opened using most text editors.
m
An Extensive Dataset for the Heart Disease Classification System
data.mendeley.com
Updated Feb 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sozan S. Maghdid (2022). An Extensive Dataset for the Heart Disease Classification System [Dataset]. http://doi.org/10.17632/65gxgy2nmg.1
Explore at:
Unique identifier
https://doi.org/10.17632/65gxgy2nmg.1
Dataset updated
Feb 15, 2022
Authors
Sozan S. Maghdid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Finding a good data source is the first step toward creating a database. Cardiovascular illnesses (CVDs) are the major cause of death worldwide. CVDs include coronary heart disease, cerebrovascular disease, rheumatic heart disease, and other heart and blood vessel problems. According to the World Health Organization, 17.9 million people die each year. Heart attacks and strokes account for more than four out of every five CVD deaths, with one-third of these deaths occurring before the age of 70 A comprehensive database for factors that contribute to a heart attack has been constructed , The main purpose here is to collect characteristics of Heart Attack or factors that contribute to it. As a result, a form is created to accomplish this. Microsoft Excel was used to create this form. Figure 1 depicts the form which It has nine fields, where eight fields for input fields and one field for output field. Age, gender, heart rate, systolic BP, diastolic BP, blood sugar, CK-MB, and Test-Troponin are representing the input fields, while the output field pertains to the presence of heart attack, which is divided into two categories (negative and positive).negative refers to the absence of a heart attack, while positive refers to the presence of a heart attack.Table 1 show the detailed information and max and min of values attributes for 1319 cases in the whole database.To confirm the validity of this data, we looked at the patient files in the hospital archive and compared them with the data stored in the laboratories system. On the other hand, we interviewed the patients and specialized doctors. Table 2 is a sample for 1320 cases, which shows 44 cases and the factors that lead to a heart attack in the whole database,After collecting this data, we checked the data if it has null values (invalid values) or if there was an error during data collection. The value is null if it is unknown. Null values necessitate special treatment. This value is used to indicate that the target isn’t a valid data element. When trying to retrieve data that isn't present, you can come across the keyword null in Processing. If you try to do arithmetic operations on a numeric column with one or more null values, the outcome will be null. An example of a null values processing is shown in Figure 2.The data used in this investigation were scaled between 0 and 1 to guarantee that all inputs and outputs received equal attention and to eliminate their dimensionality. Prior to the use of AI models, data normalization has two major advantages. The first is to avoid overshadowing qualities in smaller numeric ranges by employing attributes in larger numeric ranges. The second goal is to avoid any numerical problems throughout the process.After completion of the normalization process, we split the data set into two parts - training and test sets. In the test, we have utilized1060 for train 259 for testing Using the input and output variables, modeling was implemented.
d
Circle Hook analyses MS 2 many excel files
catalog.data.gov
fisheries.noaa.gov
Updated Apr 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact, Custodian) (2024). Circle Hook analyses MS 2 many excel files [Dataset]. https://catalog.data.gov/dataset/circle-hook-analyses-ms-2-many-excel-files1
Explore at:
Dataset updated
Apr 1, 2024
Dataset provided by
(Point of Contact, Custodian)
Description
CPUE, hook location, drop back time, damage by hook type
Z
Dataset: A Systematic Literature Review on the topic of High-value datasets
data.niaid.nih.gov
zenodo.org
Updated Jun 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anastasija Nikiforova (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424
Explore at:
Dataset updated
Jun 23, 2023
Dataset provided by
Andrea Miletič
Anastasija Nikiforova
Charalampos Alexopoulos
Nina Rizun
Magdalena Ciesielska
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

Methodology

To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

Description of the data in this data set

Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

Licenses or restrictions CC-BY

For more info, see README.txt
Data Set 1 and 2.xlsx
figshare.com
xlsx
Updated Dec 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niloofar Aflaki (2022). Data Set 1 and 2.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.21747854.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21747854.v1
Dataset updated
Dec 18, 2022
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Niloofar Aflaki
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
two datasets in one excel file to analyse a regression model for distance calculation
The Bushland, Texas, Winter Wheat Datasets
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Jun 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). The Bushland, Texas, Winter Wheat Datasets [Dataset]. https://catalog.data.gov/dataset/the-bushland-texas-winter-wheat-datasets-7e83c
Explore at:
Dataset updated
Jun 5, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Area covered
Bushland, Texas
Description
This parent dataset (collection of datasets) describes the general organization of data in the datasets for each growing season (two-year period) when winter wheat (Triticum aestivum L.) was grown for grain at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU), Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). Winter wheat was grown on two large, precision weighing lysimeters, calibrated to NIST standards (Howell et al., 1995). Each lysimeter was in the center of a 4.44 ha square field on which wheat was also grown (Evett et al., 2000). The two fields were contiguous and arranged with one directly north of the other. See the resource titled "Geographic Coordinates, USDA, ARS, Bushland, Texas" for UTM geographic coordinates for field and lysimeter locations. Wheat was planted in Autumn and grown over the winter in 1989-1990, 1991-1992, and 1992-1993. Agronomic calendar for the each of the three growing seasons list by date the agronomic practices applied, severe weather, and activities (e.g., planting, thinning, fertilization, pesticide application, lysimeter maintenance, harvest) in and on lysimeters that could influence crop growth, water use, and lysimeter data. These include fertilizer and pesticide applications. Irrigation was by linear move sprinkler system equipped with pressure regulated low pressure sprays (mid-elevation spray application, MESA). Irrigations were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a field-calibrated (Evett and Steiner, 1995) neutron probe from 0.10- to 2.4-m depth in the field. The lysimeters and fields were planted to the same plant density, row spacing, tillage depth (by hand on the lysimeters and by machine in the fields), and fertilizer and pesticide applications. The weighing lysimeters were used to measure relative soil water storage to 0.05 mm accuracy at 5-min intervals, and the 5-min change in soil water storage was used along with precipitation, dew and frost accumulation, and irrigation amounts to calculate crop evapotranspiration (ET), which is reported at 15-min intervals. Each lysimeter was equipped with a suite of instruments to sense wind speed, air temperature and humidity, radiant energy (incoming and reflected, typically both shortwave and longwave), surface temperature, soil heat flux, and soil temperature, all of which are reported at 15-min intervals. Instruments used changed from season to season, which is another reason that subsidiary datasets and data dictionaries for each season are required. The Bushland weighing lysimeter research program was described by Evett et al. (2016), and lysimeter design is described by Marek et al. (1988). Important conventions concerning the data-time correspondence, sign conventions, and terminology specific to the USDA ARS, Bushland, TX, field operations are given in the resource titled "Conventions for Bushland, TX, Weighing Lysimeter Datasets". There are six datasets in this collection. Common symbols and abbreviations used in the datasets are defined in the resource titled, "Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets". Datasets consist of Excel (xlsx) files. Each xlsx file contains an Introductory tab that explains the other tabs, lists the authors, describes conventions and symbols used and lists any instruments used. The remaining tabs in a file consist of dictionary and data tabs. The six datasets are as follows: Agronomic Calendars for the Bushland, Texas Winter Wheat Datasets Growth and Yield Data for the Bushland, Texas Winter Wheat Datasets Weighing Lysimeter Data for The Bushland, Texas Winter Wheat Datasets Soil Water Content Data for The Bushland, Texas, Large Weighing Lysimeter Experiments Evapotranspiration, Irrigation, Dew/frost - Water Balance Data for The Bushland, Texas Winter Wheat Datasets Standard Quality Controlled Research Weather Data – USDA-ARS, Bushland, Texas See the README for descriptions of each dataset. The soil is a Pullman series fine, mixed, superactive, thermic Torrertic Paleustoll. Soil properties are given in the resource titled "Soil Properties for the Bushland, TX, Weighing Lysimeter Datasets". The land slope in the lysimeter fields is <0.3% and topography is flat. The mean annual precipitation is ~470 mm, the 20-year pan evaporation record indicates ~2,600 mm Class A pan evaporation per year, and winds are typically from the South and Southwest. The climate is semi-arid with ~70% (350 mm) of the annual precipitation occurring from May to September, during which period the pan evaporation averages ~1520 mm. These datasets originate from research aimed at determining crop water use (ET), crop coefficients for use in ET-based irrigation scheduling based on a reference ET, crop growth, yield, harvest index, and crop water productivity as affected by irrigation method, timing, amount (full or some degree of deficit), agronomic practices, cultivar, and weather. Prior publications have described the facilities and research methods (Evett et al., 2016), and have focused on winter wheat ET (Howell et al., 1995, 1997, 1998), and crop coefficients (Howell et al., 2006; Schneider and Howell, 1997, 2001) that have been used by ET networks for irrigation management. The data have utility for developing, calibrating, and testing simulation models of crop ET, growth, and yield (Evett et al., 1994; Kang et al., 2009), and have been used by several universities and for testing, and calibrating models of ET that use satellite and/or weather data. Resources in this dataset: Resource Title: Geographic Coordinates of Experimental Assets, Weighing Lysimeter Experiments, USDA, ARS, Bushland, Texas. File Name: Geographic Coordinates, USDA, ARS, Bushland, Texas.xlsx. Resource Description: The file gives the UTM latitude and longitude of important experimental assets of the Bushland, Texas, USDA, ARS, Conservation & Production Research Laboratory (CPRL). Locations include weather stations [Soil and Water Management Research Unit (SWMRU) and CPRL], large weighing lysimeters, and corners of fields within which each lysimeter was centered. There were four fields designated NE, SE, NW, and SW, and a weighing lysimeter was centered in each field. The SWMRU weather station was adjacent to and immediately east of the NE and SE lysimeter fields. Resource Title: Conventions for Bushland, TX, Weighing Lysimeter Datasets. File Name: Conventions for Bushland, TX, Weighing Lysimeter Datasets.xlsx. Resource Description: Descriptions of conventions and terminology used in the Bushland, TX, weighing lysimeter research program. Resource Title: Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets. File Name: Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets.xlsx. Resource Description: Definitions of symbols and abbreviations used in the Bushland, TX, weighing lysimeter research datasets. Resource Title: Soil Properties for the Bushland, TX, Weighing Lysimeter Datasets. File Name: Bushland_TX_soil_properties.xlsx. Resource Description: Soil properties useful for simulation modeling and for describing the soil are given for the Pullman soil series at the USDA, ARS, Conservation & Production Research Laboratory, Bushland, TX, USA. For each soil layer, soil horizon designation and texture according to USDA Soil Taxonomy, bulk density, porosity, water content at field capacity (33 kPa) and permanent wilting point (1500 kPa), percent sand, percent silt, percent clay, percent organic matter, pH, and van Genuchten-Mualem characteristic curve parameters describing the soil hydraulic properties are given. A separate table describes the soil horizon thicknesses, designations, and textures according to USDA Soil Taxonomy. Another table describes important aspects of the soil hydrologic and rooting behavior. Resource Title: README - Bushland Texas Winter Wheat collection. File Name: README_Bushland_winter_wheat_collection.pdf. Resource Description: Descriptions of the datasets in the Bushland Texas Winter Wheat collection

Facebook

Twitter

Click to copy link

Link copied

Cite

Yaacov Petscher (2023). Statistical Comparison of Two ROC Curves [Dataset]. http://doi.org/10.6084/m9.figshare.860448.v1

Statistical Comparison of Two ROC Curves

Explore at:

10 scholarly articles cite this dataset (View in Google Scholar)

xlsAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.860448.v1

Dataset updated

Jun 3, 2023

Dataset provided by

Figsharehttp://figshare.com/

Authors

Yaacov Petscher

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.

Clear search

Close search

Google apps

Main menu

Statistical Comparison of Two ROC Curves

Data from: Delta Neighborhood Physical Activity Study

Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...

Excel Township, Minnesota Annual Population and Growth Analysis Dataset: A...

About this dataset

Content

Inspiration

Recommended for further research

Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age...

About this dataset

Content

Inspiration

Recommended for further research

UC_vs_US Statistic Analysis.xlsx

Input-Output Data Sets Used in the Evaluation of the Two-Layer Soil Moisture...

Data for: Sustainable connectivity in a community repository

Age-depth models for Pb-210 datasets (NERC Grant NE/V008269/1)

Data from: Pilot Testing of SHRP 2 Reliability Data and Analytical Products:...

Hospital Annual Financial Data - Selected Data & Pivot Tables

2.20 Employee Vertical Diversity (summary)

Dairy Supply Chain Sales Dataset

Data from: Coarse datasets for the 2002-2010 Tsimane' Amazonian Panel...

Pilot Testing of SHRP 2 Reliability Data and Analytical Products: Florida...

An Extensive Dataset for the Heart Disease Classification System

Circle Hook analyses MS 2 many excel files

Dataset: A Systematic Literature Review on the topic of High-value datasets

Data Set 1 and 2.xlsx

The Bushland, Texas, Winter Wheat Datasets

Statistical Comparison of Two ROC Curves