60 datasets found

Data from: Current and projected research data storage needs of Agricultural...
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
H
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...
dataverse.harvard.edu
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgios Boumis; Brad Peter (2024). Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends [Dataset]. http://doi.org/10.7910/DVN/ZZDYM9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ZZDYM9
Dataset updated
Jul 8, 2024
Dataset provided by
Harvard Dataverse
Authors
Georgios Boumis; Brad Peter
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...
c
Niagara Open Data
catalog.civicdataecosystem.org
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Niagara Open Data [Dataset]. https://catalog.civicdataecosystem.org/dataset/niagara-open-data
Explore at:
Dataset updated
May 13, 2025
Description
The Ontario government, generates and maintains thousands of datasets. Since 2012, we have shared data with Ontarians via a data catalogue. Open data is data that is shared with the public. Click here to learn more about open data and why Ontario releases it. Ontario’s Open Data Directive states that all data must be open, unless there is good reason for it to remain confidential. Ontario’s Chief Digital and Data Officer also has the authority to make certain datasets available publicly. Datasets listed in the catalogue that are not open will have one of the following labels: If you want to use data you find in the catalogue, that data must have a licence – a set of rules that describes how you can use it. A licence: Most of the data available in the catalogue is released under Ontario’s Open Government Licence. However, each dataset may be shared with the public under other kinds of licences or no licence at all. If a dataset doesn’t have a licence, you don’t have the right to use the data. If you have questions about how you can use a specific dataset, please contact us. The Ontario Data Catalogue endeavors to publish open data in a machine readable format. For machine readable datasets, you can simply retrieve the file you need using the file URL. The Ontario Data Catalogue is built on CKAN, which means the catalogue has the following features you can use when building applications. APIs (Application programming interfaces) let software applications communicate directly with each other. If you are using the catalogue in a software application, you might want to extract data from the catalogue through the catalogue API. Note: All Datastore API requests to the Ontario Data Catalogue must be made server-side. The catalogue's collection of dataset metadata (and dataset files) is searchable through the CKAN API. The Ontario Data Catalogue has more than just CKAN's documented search fields. You can also search these custom fields. You can also use the CKAN API to retrieve metadata about a particular dataset and check for updated files. Read the complete documentation for CKAN's API. Some of the open data in the Ontario Data Catalogue is available through the Datastore API. You can also search and access the machine-readable open data that is available in the catalogue. How to use the API feature: Read the complete documentation for CKAN's Datastore API. The Ontario Data Catalogue contains a record for each dataset that the Government of Ontario possesses. Some of these datasets will be available to you as open data. Others will not be available to you. This is because the Government of Ontario is unable to share data that would break the law or put someone's safety at risk. You can search for a dataset with a word that might describe a dataset or topic. Use words like “taxes” or “hospital locations” to discover what datasets the catalogue contains. You can search for a dataset from 3 spots on the catalogue: the homepage, the dataset search page, or the menu bar available across the catalogue. On the dataset search page, you can also filter your search results. You can select filters on the left hand side of the page to limit your search for datasets with your favourite file format, datasets that are updated weekly, datasets released by a particular organization, or datasets that are released under a specific licence. Go to the dataset search page to see the filters that are available to make your search easier. You can also do a quick search by selecting one of the catalogue’s categories on the homepage. These categories can help you see the types of data we have on key topic areas. When you find the dataset you are looking for, click on it to go to the dataset record. Each dataset record will tell you whether the data is available, and, if so, tell you about the data available. An open dataset might contain several data files. These files might represent different periods of time, different sub-sets of the dataset, different regions, language translations, or other breakdowns. You can select a file and either download it or preview it. Make sure to read the licence agreement to make sure you have permission to use it the way you want. Read more about previewing data. A non-open dataset may be not available for many reasons. Read more about non-open data. Read more about restricted data. Data that is non-open may still be subject to freedom of information requests. The catalogue has tools that enable all users to visualize the data in the catalogue without leaving the catalogue – no additional software needed. Have a look at our walk-through of how to make a chart in the catalogue. Get automatic notifications when datasets are updated. You can choose to get notifications for individual datasets, an organization’s datasets or the full catalogue. You don’t have to provide and personal information – just subscribe to our feeds using any feed reader you like using the corresponding notification web addresses. Copy those addresses and paste them into your reader. Your feed reader will let you know when the catalogue has been updated. The catalogue provides open data in several file formats (e.g., spreadsheets, geospatial data, etc). Learn about each format and how you can access and use the data each file contains. A file that has a list of items and values separated by commas without formatting (e.g. colours, italics, etc.) or extra visual features. This format provides just the data that you would display in a table. XLSX (Excel) files may be converted to CSV so they can be opened in a text editor. How to access the data: Open with any spreadsheet software application (e.g., Open Office Calc, Microsoft Excel) or text editor. Note: This format is considered machine-readable, it can be easily processed and used by a computer. Files that have visual formatting (e.g. bolded headers and colour-coded rows) can be hard for machines to understand, these elements make a file more human-readable and less machine-readable. A file that provides information without formatted text or extra visual features that may not follow a pattern of separated values like a CSV. How to access the data: Open with any word processor or text editor available on your device (e.g., Microsoft Word, Notepad). A spreadsheet file that may also include charts, graphs, and formatting. How to access the data: Open with a spreadsheet software application that supports this format (e.g., Open Office Calc, Microsoft Excel). Data can be converted to a CSV for a non-proprietary format of the same data without formatted text or extra visual features. A shapefile provides geographic information that can be used to create a map or perform geospatial analysis based on location, points/lines and other data about the shape and features of the area. It includes required files (.shp, .shx, .dbt) and might include corresponding files (e.g., .prj). How to access the data: Open with a geographic information system (GIS) software program (e.g., QGIS). A package of files and folders. The package can contain any number of different file types. How to access the data: Open with an unzipping software application (e.g., WinZIP, 7Zip). Note: If a ZIP file contains .shp, .shx, and .dbt file types, it is an ArcGIS ZIP: a package of shapefiles which provide information to create maps or perform geospatial analysis that can be opened with ArcGIS (a geographic information system software program). A file that provides information related to a geographic area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open using a GIS software application to create a map or do geospatial analysis. It can also be opened with a text editor to view raw information. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format for sharing data in a machine-readable way that can store data with more unconventional structures such as complex lists. How to access the data: Open with any text editor (e.g., Notepad) or access through a browser. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format to store and organize data in a machine-readable way that can store data with more unconventional structures (not just data organized in tables). How to access the data: Open with any text editor (e.g., Notepad). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A file that provides information related to an area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open with a geospatial software application that supports the KML format (e.g., Google Earth). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. This format contains files with data from tables used for statistical analysis and data visualization of Statistics Canada census data. How to access the data: Open with the Beyond 20/20 application. A database which links and combines data from different files or applications (including HTML, XML, Excel, etc.). The database file can be converted to a CSV/TXT to make the data machine-readable, but human-readable formatting will be lost. How to access the data: Open with Microsoft Office Access (a database management system used to develop application software). A file that keeps the original layout and
18 excel spreadsheets by species and year giving reproduction and growth...
catalog.data.gov
data.wu.ac.at
Updated Aug 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2024). 18 excel spreadsheets by species and year giving reproduction and growth data. One excel spreadsheet of herbicide treatment chemistry. [Dataset]. https://catalog.data.gov/dataset/18-excel-spreadsheets-by-species-and-year-giving-reproduction-and-growth-data-one-excel-sp
Explore at:
Dataset updated
Aug 17, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Excel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).
U
Spreadsheet of best models for each downscaled climate dataset and for all...
data.usgs.gov
s.cnmilf.com
+1more
Updated Apr 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michelle Irizarry-Ortiz; John Stamm (2022). Spreadsheet of best models for each downscaled climate dataset and for all downscaled climate datasets considered together (Best_model_lists.xlsx) [Dataset]. http://doi.org/10.5066/P935WRTG
Explore at:
Unique identifier
https://doi.org/10.5066/P935WRTG
Dataset updated
Apr 1, 2022
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Michelle Irizarry-Ortiz; John Stamm
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
1981 - 2005
Description
The South Florida Water Management District (SFWMD) and the U.S. Geological Survey have developed projected future change factors for precipitation depth-duration-frequency (DDF) curves at 174 National Oceanic and Atmospheric Administration (NOAA) Atlas 14 stations in central and south Florida. The change factors were computed as the ratio of projected future to historical extreme precipitation depths fitted to extreme precipitation data from various downscaled climate datasets using a constrained maximum likelihood (CML) approach. The change factors correspond to the period 2050-2089 (centered in the year 2070) as compared to the 1966-2005 historical period.
A Microsoft Excel workbook is provided that tabulates best models for each downscaled climate dataset and for all downscaled climate datasets considered together. Best models were identified based on how well the models capture the climatology and interannual variability of four climate extreme indices using the Model Clima ...
h
Calculating the Wing Lift Distribution With the Diederich Method in...
reposit.haw-hamburg.de
Updated Mar 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Schnoor; Max Schnoor (2021). Calculating the Wing Lift Distribution With the Diederich Method in Microsoft Excel [Dataset]. https://reposit.haw-hamburg.de/handle/20.500.12738/13729
Explore at:
Dataset updated
Mar 30, 2021
Authors
Max Schnoor; Max Schnoor
Time period covered
Mar 30, 2021
Description
Aim of this project is to provide the Diederich Method for calculating the lift distribution of a wing in a Microsoft Excel spreadsheet based on didactic considerations. The Diederich Method is described based on primary and secondary literature. Diagrams are digitized so that the method can run automatically. To optimize the lift distribution of the wing, the elliptical and triangular lift distribution as well as Mason's lift distribution are offered for comparison. A method for calculating the maximum lift coefficient of the wing is integrated into the Diederich Method. To do this, the maximum lift coefficients of the airfoils at the wing root and at the wing tip must be entered in the program. The calculation assumes a trapezoidal wing. Both wing sweep and linear wing twist can be taken into account. The aspect ratio must not assume values that are too small. Subsonic flow and unseparated flow are assumed. Since only the wing is described, all other influences such as from the fuselage or from the engines are not taken into account. The Excel workbook was created for teaching in aircraft preliminary design. At the moment, the Diederich Method is apparently nowhere offered as a spreadsheet. With this work, this gap can be closed. The Excel file is set up in English. The project report with background information is written in German. A "User Guide for the Diederich Method Excel-File" is written in English and is available in this upload and as Appendix C of the project report.
Sample Student Data
figshare.com
xls
Updated Aug 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carrie Ellis (2022). Sample Student Data [Dataset]. http://doi.org/10.6084/m9.figshare.20419434.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20419434.v1
Dataset updated
Aug 2, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Carrie Ellis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In "Sample Student Data", there are 6 sheets. There are three sheets with sample datasets, one for each of the three different exercise protocols described (CrP Sample Dataset, Glycolytic Dataset, Oxidative Dataset). Additionally, there are three sheets with sample graphs created using one of the three datasets (CrP Sample Graph, Glycolytic Graph, Oxidative Graph). Each dataset and graph pairs are from different subjects. · CrP Sample Dataset and CrP Sample Graph: This is an example of a dataset and graph created from an exercise protocol designed to stress the creatine phosphate system. Here, the subject was a track and field athlete who threw the shot put for the DeSales University track team. The NIRS monitor was placed on the right triceps muscle, and the student threw the shot put six times with a minute rest in between throws. Data was collected telemetrically by the NIRS device and then downloaded after the student had completed the protocol. · Glycolytic Dataset and Glycolytic Graph: This is an example of a dataset and graph created from an exercise protocol designed to stress the glycolytic energy system. In this example, the subject performed continuous squat jumps for 30 seconds, followed by a 90 second rest period, for a total of three exercise bouts. The NIRS monitor was place on the left gastrocnemius muscle. Here again, data was collected telemetrically by the NIRS device and then downloaded after he had completed the protocol. · Oxidative Dataset and Oxidative Graph: In this example, the dataset and graph are from an exercise protocol designed to stress the oxidative system. Here, the student held a sustained, light-intensity, isometric biceps contraction (pushing against a table). The NIRS monitor was attached to the left biceps muscle belly. Here, data was collected by a student observing the SmO2 values displayed on a secondary device; specifically, a smartphone with the IPSensorMan APP displaying data. The recorder student observed and recorded the data on an Excel Spreadsheet, and marked the times that exercise began and ended on the Spreadsheet.
Enterprise Survey 2009-2019, Panel Data - Slovenia
catalog.ihsn.org
microdata.worldbank.org
Updated Jan 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank Group (WBG) (2021). Enterprise Survey 2009-2019, Panel Data - Slovenia [Dataset]. https://catalog.ihsn.org/catalog/9454
Explore at:
Dataset updated
Jan 19, 2021
Dataset provided by
World Bankhttp://topics.nytimes.com/top/reference/timestopics/organizations/w/world_bank/index.html
European Bank for Reconstruction and Development (EBRD)
European Investment Bank (EIB)
Time period covered
2008 - 2019
Area covered
Slovenia
Description
Abstract

The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.

The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.

As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.

Geographic coverage

National

Analysis unit

The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

Universe

As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).

Kind of data

Sample survey data [ssd]

Sampling procedure

The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.

Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.

For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.

For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).

Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).

For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.

For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.

Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).

Mode of data collection

Computer Assisted Personal Interview [capi]

Research instrument

Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.

Response rate

Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.

For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.

For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.

For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.

Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.
m
Spreadsheet Tool for an Interactive and Automatic Simulation of the Monty...
data.mendeley.com
Updated Jan 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Gutiérrez (2025). Spreadsheet Tool for an Interactive and Automatic Simulation of the Monty Hall Problem [Dataset]. http://doi.org/10.17632/nvkc4sgj6m.3
Explore at:
Unique identifier
https://doi.org/10.17632/nvkc4sgj6m.3
Dataset updated
Jan 27, 2025
Authors
Michael Gutiérrez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Monty Hall Problem (Three-Door Problem) is a well-known example for a counterintuitive problem in probability theory. This site provides a VBA-based spreadsheet implementation in Excel for an interactive and automatic simulation of the Monty Hall Problem.

The interactive simulation mode is carried out using Zoom or any other video conferencing software that enables group rooms. In this mode, the game process and the associated simulation based on the Excel tool provided here are deliberately not fully automated; rather, the participants in the role of hosts and contestants should carry out essential steps themselves, interact with each other, and thus become an active part of the simulation. The settings allow for different assumptions regarding, among other things, the random or conscious nature of decisions. This allows a range of different game situations to be mapped - from a purely random game (based solely on Excel’s random number generator) on the one hand to a purely conscious game (based on possibly tactical decisions and expectations of the participants) on the other. The results template can be used to aggregate the results of the interactive simulation of the breakout rooms, e.g. in combination with Moodle.

The automatic mode enables fully automatic simulation with different speed and display options, e.g. successive chart creation during simulation. Finally, both modes allow for different assumptions regarding the probabilities for the car location, the contestant’s first choice and the door opened by the host.

The simulation tool can be used in online teaching. Carrying out the interactive and automatic simulation provides data in the form of absolute and relative frequencies for wins and losses depending on whether the contestant switches doors or not. The results can then be discussed.

Versions of the simulation tool:

for Windows: Monty Hall Problem Simulation 4.0 (Win)

for Mac: Monty Hall Problem Simulation 4.0 (Mac)

Please note: Some functions are not available in the Mac version of the simulation tool provided here.
B
Field Variable Permeability Tests (Slug Tests) in Boreholes Made by Driven...
borealisdata.ca
search.dataone.org
Updated Oct 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert P. Chapuis (2024). Field Variable Permeability Tests (Slug Tests) in Boreholes Made by Driven Flush-Joint Casings, or Driven Flush-Joint Casing Permeameters, or Between Packers in Cored Rock Boreholes, or in Monitoring Wells ― Overdamped Response / Essais de perméabilité à niveau variable (Slug Tests) dans des forages faits avec un tubage battu à joints lisses, ou un perméamètre battu à joints lisses, ou entre des obturateurs dans un trou foré dans le roc, ou dans un puits de surveillance ― Cas de la réponse suramortie [Dataset]. http://doi.org/10.5683/SP2/YUAUGX
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/YUAUGX
Dataset updated
Oct 29, 2024
Dataset provided by
Borealis
Authors
Robert P. Chapuis
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Civil and geological engineers have used field variable-head permeability tests (VH tests or slug tests) for over one century to assess the local hydraulic conductivity of tested soils and rocks. The water level in the pipe or riser casing reaches, after some rest time, a static position or elevation, z2. Then, the water level position is changed rapidly, by adding or removing some water volume, or by inserting or removing a solid slug. Afterward, the water level position or elevation z1(t) is recorded vs. time t, yielding a difference in hydraulic head or water column defined as Z(t) = z1(t) - z2. The water level at rest is assumed to be the piezometric level or PL for the tested zone, before drilling a hole and installing test equipment. All equations use Z(t) or Z*(t) = Z(t) / Z(t=0). The water-level response vs. time may be a slow return to equilibrium (overdamped test), or an oscillation back to equilibrium (underdamped test). This document deals exclusively with overdamped tests. Their data may be analyzed using several methods, known to yield different results for the hydraulic conductivity. The methods fit in three groups: group 1 neglects the influence of the solid matrix strain, group 2 is for tests in aquitards with delayed strain caused by consolidation, and group 3 takes into account some elastic and instant solid matrix strain. This document briefly explains what is wrong with certain theories and why. It shows three ways to plot the data, which are the three diagnostic graphs. According to experience with thousands of tests, most test data are biased by an incorrect estimate z2 of the piezometric level at rest. The derivative or velocity plot does not depend upon this assumed piezometric level, but can verify its correctness. The document presents experimental results and explains the three-diagnostic graphs approach, which unifies the theories and, most important, yields a user-independent result. Two free spreadsheet files are provided. The spreadsheet "Lefranc-Test-English-Model" follows the Canadian standards and is used to explain how to treat correctly the test data to reach a user-independent result. The user does not modify this model spreadsheet but can make as many copies as needed, with different names. The user can treat any other data set in a copy, and can also modify any copy if needed. The second Excel spreadsheet contains several sets of data that can be used to practice with the copies of the model spreadsheet. En génie civil et géologique, on a utilisé depuis plus d'un siècle les essais in situ de perméabilité à niveau variable (essais VH ou slug tests), afin d'évaluer la conductivité hydraulique locale des sols et rocs testés. Le niveau d'eau dans le tuyau ou le tubage prend, après une période de repos, une position ou élévation statique, z2. Ensuite, on modifie rapidement la position du niveau d'eau, en ajoutant ou en enlevant rapi-dement un volume d'eau, ou en insérant ou retirant un objet solide. La position ou l'élévation du niveau d'eau, z1(t), est alors notée en fonction du temps, t, ce qui donne une différence de charge hydraulique définie par Z(t) = z1(t) - z2. Le niveau d'eau au repos est supposé être le niveau piézométrique pour la zone testée, avant de forer un trou et d'installer l'équipement pour un essai. Toutes les équations utilisent Z(t) ou Z*(t) = Z(t) / Z(t=0). La réponse du niveau d'eau avec le temps peut être soit un lent retour à l'équilibre (cas suramorti) soit une oscillation amortie retournant à l'équilibre (cas sous-amorti). Ce document ne traite que des cas suramortis. Leurs données peuvent être analysées à l'aide de plusieurs méthodes, connues pour donner des résultats différents pour la conductivité hydraulique. Les méthodes appartiennent à trois groupes : le groupe 1 néglige l'influence de la déformation de la matrice solide, le groupe 2 est pour les essais dans des aquitards avec une déformation différée causée par la consolidation, et le groupe 3 prend en compte une certaine déformation élastique et instantanée de la matrice solide. Ce document explique brièvement ce qui est incorrect dans les théories et pourquoi. Il montre trois façons de tracer les données, qui sont les trois graphiques de diagnostic. Selon l'expérience de milliers d'essais, la plupart des données sont biaisées par un estimé incorrect de z2, le niveau piézométrique supposé. Le graphe de la dérivée ou graphe des vitesses ne dépend pas de la valeur supposée pour le niveau piézomé-trique, mais peut vérifier son exactitude. Le document présente des résultats expérimentaux et explique le diagnostic à trois graphiques, qui unifie les théories et donne un résultat indépendant de l'utilisateur, ce qui est important. Deux fichiers Excel gratuits sont fournis. Le fichier"Lefranc-Test-English-Model" suit les normes canadiennes : il sert à expliquer comment traiter correctement les données d'essai pour avoir un résultat indépendant de l'utilisateur. Celui-ci ne modifie pas ce...
e
Waterworks — intake point_reporting
data.europa.eu
gimi9.com
unknown
Updated Feb 7, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Waterworks — intake point_reporting [Dataset]. https://data.europa.eu/data/datasets/https-data-norge-no-node-1499
Explore at:
unknownAvailable download formats
Dataset updated
Feb 7, 2022
License
https://data.norge.no/nlod/en/2.0/https://data.norge.no/nlod/en/2.0/
Description
The data sets provide an overview of selected data on waterworks registered with the Norwegian Food Safety Authority. The information has been reported by the waterworks through application processing or other reporting to the Norwegian Food Safety Authority. Drinking water regulations require, among other things, annual reporting. The Norwegian Food Safety Authority has created a separate form service for such reporting. The data sets include public or private waterworks that supply 50 people or more. In addition, all municipal owned businesses with their own water supply are included regardless of size. The data sets also contain decommissioned facilities. This is done for those who wish to view historical data, i.e. data for previous years or earlier. There are data sets for the following supervisory objects: 1. Water supply system. It also includes analysis of drinking water. 2. Transport system 3. Treatment facility 4. Entry point. It also includes analysis of the water source. Below you will find datasets for: 4. Input point_reporting. In addition, there is a file (information.txt) that provides an overview of when the extracts were produced and how many lines there are in the individual files. The withdrawals are done weekly. Furthermore, for the data sets water supply system, transport system and intake point it is possible to see historical data on what is included in the annual reporting. To make use of that information, the file must be linked to the “moder” file. to get names and other static information. These files have the _reporting ending in the file name. Description of the data fields (i.e. metadata) in the individual data sets appears in separate files. These are available in pdf format. If you double-click the csv file and it opens directly in excel, then you will not get the æøå. To see the character set correctly in Excel, you must: & start Excel and a new spreadsheet & select data and then from text, press Import & select separator data and file origin 65001: Unicode (UTF-8) and tick of My Data have headings and press Next & remove tab as separator and select semicolon as separator, press next & otherwise, complete the data sets can be imported into a separate database and compiled as desired. There are link keys in the files that make it possible to link the files together. The waterworks are responsible for the quality of the datasets.

—

Purpose: Make data for drinking water supply available to the public.
2011 skills for life survey: small area estimation data
gov.uk
Updated Dec 12, 2012
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Business, Innovation & Skills (2012). 2011 skills for life survey: small area estimation data [Dataset]. https://www.gov.uk/government/statistical-data-sets/2011-skills-for-life-survey-small-area-estimation-data
Explore at:
Dataset updated
Dec 12, 2012
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Business, Innovation & Skills
Description
Small area estimation modelling methods have been applied to the 2011 Skills for Life survey data in order to generate local level area estimates of the number and proportion of adults (aged 16-64 years old) in England living in households with defined skill levels in:

literacy

numeracy

information and communication technology (ICT); including emailing, word processing, spreadsheet use and a multiple-choice assessment of ICT awareness

The number and proportion of adults in households who do not speak English as a first language are also included.

Two sets of small area estimates are provided for 7 geographies; middle layer super output areas (MSOAs), standard table wards, 2005 statistical wards, 2011 council wards, 2011 parliamentary constituencies, local authorities, and local enterprise partnership areas.

Regional estimates have also been provided, however, unlike the other geographies, these estimates are based on direct survey estimates and not modelled estimates.

The files are available as both Excel and csv files – the user guide explains the estimates and modelling approach in more detail.

How to use the small area estimation files, an example

To find the estimate for the proportion of adults with entry level 1 or below literacy in the Manchester Central parliamentary constituency, you need to:

select the link to the ‘parliamentary-constituencies-2009-all’ Excel file in the table above

select the ‘literacy proportions’ page of the Excel spreadsheet

use the ‘find’ function to locate ‘Manchester Central’

note the proportion listed for Entry Level and below

It is estimated that 8.1% of adults aged 16-64 in Manchester Central have entry level or below literacy. The Credible Intervals for this estimate are 7.0 and 9.3% at the 95 per cent level. This means that while the estimate is 8.1%, there is a 95% likelihood that the actual value lies between 7.0 and 9.3%.

https://assets.publishing.service.gov.uk/media/5a79d91240f0b670a8025dd8/middle-layer-super-output-areas-2001-all_1_.xlsx">

https://assets.publishing.service.gov.uk/media/5a79d91240f0b670a8025dd8/middle-layer-super-output-areas-2001-all_1_.xlsx">Middle layer super output areas: 2001 all skill level estimates

MS Excel Spreadsheet, 14.5 MB This file may not be suitable for users of assistive technology. <details data-module="ga4-event-tracker" data-ga4-event='{"event_name":"select_content","type":"detail","text":"Request an accessible format.","section":"Request an accessible format.","index_section":1}' class="gem-c-details govuk-details govuk-!-margin-bottom-0" title="Request an accessible format.">

Request an accessible format.

If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email <a href="mailto:enquiries@beis.gov.uk" target="_blank" class="govuk-link">enquiries@beis.gov.uk</a>. Please tell us what format you need. It will help us if you say what assistive technology you use.

<div class="gem-c-attachmen
f
Excel: Reformat column layout to plate layout and vice versa (96 and 384...
figshare.com
xlsx
Updated Sep 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kameron Kilchrist (2018). Excel: Reformat column layout to plate layout and vice versa (96 and 384 version) [Dataset]. http://doi.org/10.6084/m9.figshare.7088747.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7088747.v1
Dataset updated
Sep 14, 2018
Dataset provided by
figshare
Authors
Kameron Kilchrist
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are a collection of XLSX sheets containing some of my favorite Excel tricks to reformat data to make analysis easier. I often use these to reformat column formatted data into plate layout or vice versa to better visualize and understand my data.
o
HarDWR - Raw Water Rights Records
osti.gov
Updated Nov 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDOE Office of Science (SC), Biological and Environmental Research (BER) (2023). HarDWR - Raw Water Rights Records [Dataset]. http://doi.org/10.57931/2004664
Explore at:
Unique identifier
https://doi.org/10.57931/2004664
Dataset updated
Nov 15, 2023
Dataset provided by
USDOE Office of Science (SC), Biological and Environmental Research (BER)
MultiSector Dynamics - Living, Intuitive, Value-adding, Environment
Description
For a detailed description of the database of which this record is only one part, please see the HarDWR meta-record. In order to hold a water right in the western United States, an entity, (e.g., an individual, corporation, municipality, sovereign government, or non-profit) must register a physical document with the state's water regulatory agency. State water agencies each maintain their own database containing all registered water right documents within the state, along with relevant metadata such as the point of diversion and place of use of the water. All western U.S. states have digitized their individual water rights databases, along with the geospatial data describing the spatial units where water rights are managed. Each state maintains and provides their own water rights data in accordance with individual state regulations and standards. We collected water rights databases from 11 western United States states either by downloading them from publicly accessible web portals, or by contacting state water management representatives; detailed descriptions of where and when the data was collected is provided in the README.txt, as well as Lisk et al.(in review). This collection of data are those raw water rights. Each state formats their data differently, meaning that file types, field availability, and names vary from state to state. Note, the data provided here reflects the state of the water rights databases at the time we collected the data; updates have likely occurred in many states. Some pieces of information are common among all states. These are: priority date, volume or flow of water allowed by the right, stated water use of the right, and some means of identifying the geography and source of the water pertaining to the right - typically the coordinates of the Point of Diversion (PoD) of a waterbody or well. Arizona regulates water in a different way than the other 10 states. Outside of some relatively small critical agricultural areas called Active Management Areas (AMAs), Arizona does not maintain any water rights. However, the state does require registration of surface and groundwater pumping devices, which includes disclosing the mechanical specifics of the devices. We used these records as a proxy for water rights. Each state, and their respective water right authorities, have made their water right records available for non-commercial reference uses. In addition, the states make no guarantees as to the completeness, accuracy, or timeliness of their respective databases, let alone the modifications which we, the authors of this paper, have made to the collected records. None of the states should be held liable for using this data outside of its intended use. In addition, the following states have requested specifically worded disclaimers to be included with their data. Colorado: "The data made available here has been modified for use from its original source, which is the State of Colorado. THE STATE OF COLORADO MAKES NO REPRESENTATIONS OR WARRANTY AS TO THE COMPLETENESS, ACCURACY, TIMELINESS, OR CONTENT OF ANY DATA MADE AVAILABLE THROUGH THIS SITE. THE STATE OF COLORADO EXPRESSLY DISCLAIMS ALL WARRANTIES, WHETHER EXPRESS OR IMPLIED, INCLUDING ANY IMPLIED WARRANTIES OF MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. The data is subject to change as modifications and updates are complete. It is understood that the information contained in the Web feed is being used at one's own risk." Montana: "The Montana State Library provides this product/service for informational purposes only. The Library did not produce it for, nor is it suitable for legal, engineering, or surveying purposes. Consumers of this information should review or consult the primary data and information sources to ascertain the viability of the information for their purposes. The Library provides these data in good faith but does not represent or warrant its accuracy, adequacy, or completeness. In no event shall the Library be liable for any incorrect results or analysis; any direct, indirect, special, or consequential damages to any party; or any lost profits arising out of or in connection with the use or the inability to use the data or the services provided. The Library makes these data and services available as a convenience to the public, and for no other purpose. The Library reserves the right to change or revise published data and/or services at any time." Oregon: "This product is for informational purposes and may not have been prepared for, or be suitable for legal, engineering, or surveying purposes. Users of this information should review or consult the primary data and information sources to ascertain the usability of the information." The available data is provided as a series of compressed files, which each containing the full data collected from each state. Some of the files have been renamed, to more easily know which state the data belongs to. The file renaming was also required as some files from different states had the same name. In other cases, the data for a state has been placed in a folder indicating which state it belongs to - as the state organized its data by selected subregions. Below is a brief description of the format of the collected data from each state. ArizonaRights_StatementOfClaimants: A folder containing a database of interconnected CSV files. The soc_erd.pdf file contains a visual flowchart of how the various files are connected, beginning with SOC_MAIN.csv in the center of the page. ArizonaRights_SurfaceWaterRightsData: A folder containing a database of a single Shapefile and 10 associated CSVs. SurfaceWater.pdf contains a visual flowchart of how the various files are connected, beginning with ADWR_SW_APPL_REGRY.csv. ArizonaRights_Well55Registry: A folder containing a database of a single Shapefile and 59 associated CSVs. Wells55.pdf contains a visual flowchart of how the various files are connected, beginning with WellRegistry.shp. CaliforniaRights_eWRIMS_directDatabase: A folder containing a collection of four "series" Microsoft Excel files, as either XLS or XLSX. The four "series": byCounty, byEntity (what type of legal entity holds the right), byUse (stated water use), and byWatershed, are various methods by which the California water rights are organized within the state's database. However, it was observed that by only collecting a single series, not all water rights were being provided. So, essentially, the majority of records within each "series" are copies of each other, with each "series" containing some unique records. ColoradoRights_NetAmounts: A folder containing 78 CSV files, with one file per Colorado Water District. IdahoRights_PointOfDiversion: A Shapefile containing the Points of Diversion for the entire state of Idaho. IdahoRights_PlaceOfUse: A Shapefile containing the Place of Use polygons for the entire state of Idaho. MontanaRights_WaterRights: A Geodatabase file containing the Points of Diversion and Places of Use for the entire state of Montana. The name of the Points of Diversion Feature Layer within the Geodatabase is "WRDIV", and the name of the Places of Use Feature Layer is "WRPOU". NevadaRights_POD_Sites: A Shapefile containing the Points of Diversion for the entire state of Nevada. NewMexicoRights_Points_of_Diversion: A Shapefile containing the Points of Diversion for the entire state of New Mexico. OregonRights_state_shp: A folder containing 36 Shapefiles and are split between "pod" (Point of Diversion) and "pou" (Place of Use) for each water management basin within Oregon. In other words, each basin has one "pod" file and one "pou" file. The "pod" files are point shapes, and the "pou" files are polygons. UtahRights_Points_of_Diversion: A Shapefile containing the Points of Diversion for the entire state of Utah. WashingtonRights_WaterDiversions_ECY_NHD: A Geodatabase file containing both the Points of Diversion for the entire state of Washington. The name of the Feature Layer within the Geodatabase is "WaterDiversions_ECY_NHD". WyomingRights: A folder containing four subdirectories, one for each Wyoming Water Division. Each Division directory includes a varying number of subdirectories for each Wyoming Water District. Each District folder contains two copies of the Point of Diversion records for that area, with one copying being in CSV and one copy in Microsoft Excel XLS format.
Statistical Comparison of Two ROC Curves
figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yaacov Petscher (2023). Statistical Comparison of Two ROC Curves [Dataset]. http://doi.org/10.6084/m9.figshare.860448.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.860448.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Yaacov Petscher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.
f
GHS Safety Fingerprints
figshare.com
xlsx
Updated Oct 25, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Murphy (2018). GHS Safety Fingerprints [Dataset]. http://doi.org/10.6084/m9.figshare.7210019.v3
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7210019.v3
Dataset updated
Oct 25, 2018
Dataset provided by
figshare
Authors
Brian Murphy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Spreadsheets targeted at the analysis of GHS safety fingerprints.AbstractOver a 20-year period, the UN developed the Globally Harmonized System (GHS) to address international variation in chemical safety information standards. By 2014, the GHS became widely accepted internationally and has become the cornerstone of OSHA’s Hazard Communication Standard. Despite this progress, today we observe that there are inconsistent results when different sources apply the GHS to specific chemicals, in terms of the GHS pictograms, hazard statements, precautionary statements, and signal words assigned to those chemicals. In order to assess the magnitude of this problem, this research uses an extension of the “chemical fingerprints” used in 2D chemical structure similarity analysis to GHS classifications. By generating a chemical safety fingerprint, the consistency of the GHS information for specific chemicals can be assessed. The problem is the sources for GHS information can differ. For example, the SDS for sodium hydroxide pellets found on Fisher Scientific’s website displays two pictograms, while the GHS information for sodium hydroxide pellets on Sigma Aldrich’s website has only one pictogram. A chemical information tool, which identifies such discrepancies within a specific chemical inventory, can assist in maintaining the quality of the safety information needed to support safe work in the laboratory. The tools for this analysis will be scaled to the size of a moderate large research lab or small chemistry department as a whole (between 1000 and 3000 chemical entities) so that labelling expectations within these universes can be established as consistently as possible.Most chemists are familiar with programs such as excel and google sheets which are spreadsheet programs that are used by many chemists daily. Though a monadal programming approach with these tools, the analysis of GHS information can be made possible for non-programmers. This monadal approach employs single spreadsheet functions to analyze the data collected rather than long programs, which can be difficult to debug and maintain. Another advantage of this approach is that the single monadal functions can be mixed and matched to meet new goals as information needs about the chemical inventory evolve over time. These monadal functions will be used to converts GHS information into binary strings of data called “bitstrings”. This approach is also used when comparing chemical structures. The binary approach make data analysis more manageable, as GHS information comes in a variety of formats such as pictures or alphanumeric strings which are difficult to compare on their face. Bitstrings generated using the GHS information can be compared using an operator such as the tanimoto coefficent to yield values from 0 for strings that have no similarity to 1 for strings that are the same. Once a particular set of information is analyzed the hope is the same techniques could be extended to more information. For example, if GHS hazard statements are analyzed through a spreadsheet approach the same techniques with minor modifications could be used to tackle more GHS information such as pictograms.Intellectual Merit. This research indicates that the use of the cheminformatic technique of structural fingerprints can be used to create safety fingerprints. Structural fingerprints are binary bit strings that are obtained from the non-numeric entity of 2D structure. This structural fingerprint allows comparison of 2D structure through the use of the tanimoto coefficient. The use of this structural fingerprint can be extended to safety fingerprints, which can be created by converting a non-numeric entity such as GHS information into a binary bit string and comparing data through the use of the tanimoto coefficient.Broader Impact. Extension of this research can be applied to many aspects of GHS information. This research focused on comparing GHS hazard statements, but could be further applied to other bits of GHS information such as pictograms and GHS precautionary statements. Another facet of this research is allowing the chemist who uses the data to be able to compare large dataset using spreadsheet programs such as excel and not need a large programming background. Development of this technique will also benefit the Chemical Health and Safety community and Chemical Information communities by better defining the quality of GHS information available and providing a scalable and transferable tool to manipulate this information to meet a variety of other organizational needs.
d
HUN AWRA-R simulation nodes v01
data.gov.au
researchdata.edu.au
+1more
zip
Updated Apr 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2022). HUN AWRA-R simulation nodes v01 [Dataset]. https://data.gov.au/data/dataset/fda20928-d486-49d2-b362-e860c1918b06
Explore at:
zip(9874)Available download formats
Dataset updated
Apr 13, 2022
Dataset authored and provided by
Bioregional Assessment Program
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

The dataset was derived by the Bioregional Assessment Programme from multiple datasets. The source dataset is identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

The dataset consists of an excel spreadsheet and shapefile representing the locations of simulation nodes used in the AWRA-R model. Some of the nodes correspond to gauging station locations or dam locations whereas other locations represent river confluences or catchment outlets which have no gauging. These are marked as "Dummy".

Purpose

Locations are used as pour points in oder to define reach areas for river system modelling.

Dataset History

Subset of data for the Hunter that was extracted from the Bureau of Meteorology's hydstra system and includes all gauges where data has been received from the lead water agency of each jurisdiction. Simulation nodes were added in locations in which the model will provide simulated streamflow.

There are 3 files that have been extracted from the Hydstra database to aid in identifying sites in each bioregion and the type of data collected from each on. These data were used to determine the simulation node locations where model outputs were generated.

The 3 files contained within the source dataset used for this determination are:

Site - lists all sites available in Hydstra from data providers. The data provider is listed in the #Station as _xxx. For example, sites in NSW are _77, QLD are _66.

Some sites do not have locational information and will not be able to be plotted.

Period - the period table lists all the variables that are recorded at each site and the period of record.

Variable - the variable table shows variable codes and names which can be linked to the period table.

Relevant location information and other data were extracted to construct the spreadsheet and shapefile within this dataset.

Dataset Citation

Bioregional Assessment Programme (XXXX) HUN AWRA-R simulation nodes v01. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/fda20928-d486-49d2-b362-e860c1918b06.

Dataset Ancestors

Derived From National Surface Water sites Hydstra
g
Improving the Characterization of several Natural Emissions in CMAQ |...
gimi9.com
Updated Dec 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Improving the Characterization of several Natural Emissions in CMAQ | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_improving-the-characterization-of-several-natural-emissions-in-cmaq/
Explore at:
Dataset updated
Dec 9, 2024
Description
These datasets contains all the data used to make the figures in the associated paper. The excel files are self-explanatory and can be directly used. While the other files in netcdf format, need a visualization tool (such as VERDI) or statistical software (such as R) to make statistical summary or plots. Portions of this dataset are inaccessible because: data will be uploaded when paper will be accepted by journal. They can be accessed through the following means: For excel files, the data can be directly used to make summary or plots. For netcdf files, another visualization tool or statistical package (such as R) can be used. All the netcdf files can be visualized using VERDI. Format: Two types of data formats. One is the excel files which are self-explanatory. The other type is netcdf files which are used to make the spatial plots in the paper. This dataset is associated with the following publication: Kang, D., J. Willison, G. Sarwar, M. Madden, C. Hogrefe, R. Mathur, B. Gantt, and S. Alfonso. Improving the Characterization of the Natural Emissions in CMAQ. EM Magazine. Air and Waste Management Association, Pittsburgh, PA, USA, (10): 1-7, (2021).
Z
A study on real graphs of fake news spreading on Twitter
data.niaid.nih.gov
Updated Aug 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amirhosein Bodaghi (2021). A study on real graphs of fake news spreading on Twitter [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3711599
Explore at:
Dataset updated
Aug 20, 2021
Dataset authored and provided by
Amirhosein Bodaghi
Description
*** Fake News on Twitter ***

These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:

1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.

2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."

3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.

4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.

5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.

The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).

DD

DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:

The structure of excel files for each dataset is as follow:

Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:

User ID (user who has posted the current tweet/retweet)

The description sentence in the profile of the user who has published the tweet/retweet

The number of published tweet/retweet by the user at the time of posting the current tweet/retweet

Date and time of creation of the account by which the current tweet/retweet has been posted

Language of the tweet/retweet

Number of followers

Number of followings (friends)

Date and time of posting the current tweet/retweet

Number of like (favorite) the current tweet had been acquired before crawling it

Number of times the current tweet had been retweeted before crawling it

Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)

The source (OS) of device by which the current tweet/retweet was posted

Tweet/Retweet ID

Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)

Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)

Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)

Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)

State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):

r : The tweet/retweet is a fake news post

a : The tweet/retweet is a truth post

q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it

n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)

DG

DG for each fake news contains two files:

A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)

A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)

Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.

The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.
Superstore Sales Analysis
kaggle.com
Updated Oct 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Reda Elblgihy (2023). Superstore Sales Analysis [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/superstore-sales-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ali Reda Elblgihy
Description
Analyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:

1- Data Import and Transformation:

Gather and import relevant sales data from various sources into Excel.

Utilize Power Query to clean, transform, and structure the data for analysis.

Merge and link different data sheets to create a cohesive dataset, ensuring that all data fields are connected logically.

2- Data Quality Assessment:

Perform data quality checks to identify and address issues like missing values, duplicates, outliers, and data inconsistencies.

Standardize data formats and ensure that all data is in a consistent, usable state.

3- Calculating COGS:

Determine the Cost of Goods Sold (COGS) for each product sold by considering factors like purchase price, shipping costs, and any additional expenses.

Apply appropriate formulas and calculations to determine COGS accurately.

4- Discount Analysis:

Analyze the discount values offered on products to understand their impact on sales and profitability.

Calculate the average discount percentage, identify trends, and visualize the data using charts or graphs.

5- Sales Metrics:

Calculate and analyze various sales metrics, such as total revenue, profit margins, and sales growth.

Utilize Excel functions to compute these metrics and create visuals for better insights.

6- Visualization:

Create visualizations, such as charts, graphs, and pivot tables, to present the data in an understandable and actionable format.

Visual representations can help identify trends, outliers, and patterns in the data.

7- Report Generation:

Compile the findings and insights into a well-structured report or dashboard, making it easy for stakeholders to understand and make informed decisions.

Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.

Facebook

Twitter

Click to copy link

Link copied

Cite

Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016

Explore at:

Dataset updated

Apr 21, 2025

Dataset provided by

Agricultural Research Servicehttps://www.ars.usda.gov/

Description

The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

Clear search

Close search

Google apps

Main menu

Data from: Current and projected research data storage needs of Agricultural...

Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...

Niagara Open Data

18 excel spreadsheets by species and year giving reproduction and growth...

Spreadsheet of best models for each downscaled climate dataset and for all...

Calculating the Wing Lift Distribution With the Diederich Method in...

Sample Student Data

Enterprise Survey 2009-2019, Panel Data - Slovenia

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Response rate

Spreadsheet Tool for an Interactive and Automatic Simulation of the Monty...

Field Variable Permeability Tests (Slug Tests) in Boreholes Made by Driven...

Waterworks — intake point_reporting

2011 skills for life survey: small area estimation data

How to use the small area estimation files, an example

https://assets.publishing.service.gov.uk/media/5a79d91240f0b670a8025dd8/middle-layer-super-output-areas-2001-all_1_.xlsx">Middle layer super output areas: 2001 all skill level estimates

Excel: Reformat column layout to plate layout and vice versa (96 and 384...

HarDWR - Raw Water Rights Records

Statistical Comparison of Two ROC Curves

GHS Safety Fingerprints

HUN AWRA-R simulation nodes v01

Abstract

Purpose

Dataset History

Dataset Citation

Dataset Ancestors

Improving the Characterization of several Natural Emissions in CMAQ |...

A study on real graphs of fake news spreading on Twitter

Superstore Sales Analysis

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016See More Versions

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016