Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spreadsheets targeted at the analysis of GHS safety fingerprints.AbstractOver a 20-year period, the UN developed the Globally Harmonized System (GHS) to address international variation in chemical safety information standards. By 2014, the GHS became widely accepted internationally and has become the cornerstone of OSHA’s Hazard Communication Standard. Despite this progress, today we observe that there are inconsistent results when different sources apply the GHS to specific chemicals, in terms of the GHS pictograms, hazard statements, precautionary statements, and signal words assigned to those chemicals. In order to assess the magnitude of this problem, this research uses an extension of the “chemical fingerprints” used in 2D chemical structure similarity analysis to GHS classifications. By generating a chemical safety fingerprint, the consistency of the GHS information for specific chemicals can be assessed. The problem is the sources for GHS information can differ. For example, the SDS for sodium hydroxide pellets found on Fisher Scientific’s website displays two pictograms, while the GHS information for sodium hydroxide pellets on Sigma Aldrich’s website has only one pictogram. A chemical information tool, which identifies such discrepancies within a specific chemical inventory, can assist in maintaining the quality of the safety information needed to support safe work in the laboratory. The tools for this analysis will be scaled to the size of a moderate large research lab or small chemistry department as a whole (between 1000 and 3000 chemical entities) so that labelling expectations within these universes can be established as consistently as possible.Most chemists are familiar with programs such as excel and google sheets which are spreadsheet programs that are used by many chemists daily. Though a monadal programming approach with these tools, the analysis of GHS information can be made possible for non-programmers. This monadal approach employs single spreadsheet functions to analyze the data collected rather than long programs, which can be difficult to debug and maintain. Another advantage of this approach is that the single monadal functions can be mixed and matched to meet new goals as information needs about the chemical inventory evolve over time. These monadal functions will be used to converts GHS information into binary strings of data called “bitstrings”. This approach is also used when comparing chemical structures. The binary approach make data analysis more manageable, as GHS information comes in a variety of formats such as pictures or alphanumeric strings which are difficult to compare on their face. Bitstrings generated using the GHS information can be compared using an operator such as the tanimoto coefficent to yield values from 0 for strings that have no similarity to 1 for strings that are the same. Once a particular set of information is analyzed the hope is the same techniques could be extended to more information. For example, if GHS hazard statements are analyzed through a spreadsheet approach the same techniques with minor modifications could be used to tackle more GHS information such as pictograms.Intellectual Merit. This research indicates that the use of the cheminformatic technique of structural fingerprints can be used to create safety fingerprints. Structural fingerprints are binary bit strings that are obtained from the non-numeric entity of 2D structure. This structural fingerprint allows comparison of 2D structure through the use of the tanimoto coefficient. The use of this structural fingerprint can be extended to safety fingerprints, which can be created by converting a non-numeric entity such as GHS information into a binary bit string and comparing data through the use of the tanimoto coefficient.Broader Impact. Extension of this research can be applied to many aspects of GHS information. This research focused on comparing GHS hazard statements, but could be further applied to other bits of GHS information such as pictograms and GHS precautionary statements. Another facet of this research is allowing the chemist who uses the data to be able to compare large dataset using spreadsheet programs such as excel and not need a large programming background. Development of this technique will also benefit the Chemical Health and Safety community and Chemical Information communities by better defining the quality of GHS information available and providing a scalable and transferable tool to manipulate this information to meet a variety of other organizational needs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead of
urban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.
Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.
There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contains raw data and their corresponding results files associated with a recent study. Each MS Excel spreadsheet entails the data for one aspect of study which is specified by name of the file.The information about participants i.e. personal and demographic, responses for first SD scale, second SD scale and personal evaluation are presented in each spreadsheet. The supplemental material (participant information sheet, informed consent form, online questionnaire, risk assessment form) are also enclosed with this dataset. Lastly, for the analysis of raw data, statistical test such as; independent sample t-test was performed. The original SPSS data files are also included.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Directory of Files:
A. Filename: Combine_CCTDI.zip
Short description: Quantitative Data. The zip files contain 6 Excel files which store students' raw data. This raw data set consists of student's input on each CCTDI item. The pre-data were collected through an online survey, while post-data were collected through pen and paper. The data will be analysed by ANOVA to compare the effectiveness of the intervention.
(California Critical Thinking Disposition Inventory (CCTDI) has been widely employed in the field of education to investigate the changes in students’ Critical Thinking (CT) attitudes resulting from teaching interventions by comparing the pre- and post-tests. This 6-point scale self-reporting instrument requires respondents to rate themselves, ranging from “rating 1” for not describing them at all to “rating 6” for extremely well. The instrument has 40 questions categorized in seven subsets covering various CT dispositions dimensions, namely: i) truth-seeking, ii) open-mindedness, iii) analyticity, iv) systematicity, v) inquisitiveness, vi) maturity, and vii) self-confidence.
B. Filename: Combine_TCTSPS.zip
Short description: Quantitative Data. The zip files contains 6 excel files which stores students' raw data. consists of student's input on each TCTSPS item. The pre-data were collected through an online survey, while post-data were collected through pen and paper. The data will be analysed by ANOVA to compare the effectiveness of the intervention.
(Test of Critical Thinking Skills for Primary and Secondary School Students (TCTS-PS) consists of 24 items divided into five subscales measuring distinct yet correlated aspects of CT skills, namely: (I) differentiating theory from assumptions, (II) deciding evidence, (III) inference, (IV) finding an alternative theory, and (V) evaluation of arguments. The instrument yields a possible total score of 72. The instrument is intended for use in measuring gains in CT skills resulting from instruction, predicting success in programs where CT is crucial, and examining relationships between CT skills and other abilities or traits.)
C. Filename: Combine_SMTSL.zip
Short description: Quantitative Data. The zip files contains 5 excel files which stores students' raw data. consists of student's input on each SMTSL item. The pre-data were collected through an online survey, while post-data were collected through pen and paper. The data will be analysed by ANOVA to compare the effectiveness of the intervention.
(Students' Motivation Towards Science learning (SMTSL) defined six factors that related to the motivation in science learning including self-efficacy, active learning strategies and so on, in order to measure participants' motivation towards science learning: A. Self-efficacy, B. Active learning , trategies, C. Science learning value, D. Performance goal, E. Achievement goal, and F. Learning environment stimulation)
D. Filename: Combine_Discourse Transcription_1.zip and Combine_Discourse Transcription_2.zip
Short description: Qualitative Data.The zip files contains 6 excel files which 6 teachers' classroom teaching discourse transcriptions. The data will be analysed by thematic analysis to compare the effectiveness of the intervention.
(38 science classroom discourse videos of 8th graders were transcribed and coded by Academically Productive Talk framework (APT). APT is drawing from sociological, linguistic, and anthropological perspectives, comprises four primary constructs or objectives.)
E. Filename: Combine_Inquiry Report.zip
Short description: Qualitative Data. The zip files contains 2 excel files which 2 schools' inquiry report scores according rubrics. The data will be analysed by thematic analysis to compare the effectiveness of the intervention.
(To assess the quality of students' arguments, a validated scoring rubric was employed to evaluate the student's written argument. These aspects primarily concentrated on the student's proficiency in five perspectives (Walker & Sampson, 2013, p. 573):
(AR1) Provide a well-articulated, adequate, and accurate claim that answers the research question, (AR2) Use genuine evidence to support the claim and to present the evidence in an appropriate manner, (AR3) Provide enough valid and reliable evidence to support the claim, (AR4) Provide a rationale is sufficient and appropriate, and (AR5) Compare his or her findings with other groups in the project.)
F. Filename: Combined_Interview Transcription.xlsx
Short description: Qualitative Data. The file contains all the students' interview transcriptions. The data will be analysed by thematic analysis to compare the effectiveness of the intervention.
(A semi-structured interviews was conducted to gather interviewees' motivation of CT and learning motivation in the context of science. The interview data would be used to complement the quantitative results (i.e., TCTS-PS, CCTDI, and SMTSL scores).
The Delta Produce Sources Study was an observational study designed to measure and compare food environments of farmers markets (n=3) and grocery stores (n=12) in 5 rural towns located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys from June 2019 to March 2020 using a modified version of the Nutrition Environment Measures Survey (NEMS) Farmers Market Audit tool. The tool was modified to collect information pertaining to source of fresh produce and also for use with both farmers markets and grocery stores. Availability, source, quality, and price information were collected and compared between farmers markets and grocery stores for 13 fresh fruits and 32 fresh vegetables via SAS software programming. Because the towns were not randomly selected and the sample sizes are relatively small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Resources in this dataset:Resource Title: Delta Produce Sources Study dataset . File Name: DPS Data Public.csvResource Description: The dataset contains variables corresponding to availability, source (country, state and town if country is the United States), quality, and price (by weight or volume) of 13 fresh fruits and 32 fresh vegetables sold in farmers markets and grocery stores located in 5 Lower Mississippi Delta towns.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Delta Produce Sources Study data dictionary. File Name: DPS Data Dictionary Public.csvResource Description: This file is the data dictionary corresponding to the Delta Produce Sources Study dataset.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Excel township population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Excel township across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2023, the population of Excel township was 300, a 0.99% decrease year-by-year from 2022. Previously, in 2022, Excel township population was 303, a decline of 0.98% compared to a population of 306 in 2021. Over the last 20 plus years, between 2000 and 2023, population of Excel township increased by 17. In this period, the peak population was 308 in the year 2020. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Excel township Population by Year. You can refer the same here
https://borealisdata.ca/api/datasets/:persistentId/versions/2.1/customlicense?persistentId=doi:10.5683/SP3/SZHJFYhttps://borealisdata.ca/api/datasets/:persistentId/versions/2.1/customlicense?persistentId=doi:10.5683/SP3/SZHJFY
This CD-ROM product is an authoritative reference source of 15 key financial ratios by industry groupings compiled from the North American Industry Classification System (NAICS 2007). It is based on up-to-date, reliable and comprehensive data on Canadian businesses, derived from Statistics Canada databases of financial statements for three reference years. The CD-ROM enables users to compare their enterprise's performance to that of their industry and to address issues such as profitability, efficiency and business risk. Financial Performance Indicators can also be used for inter-industry comparisons. Volume 1 covers large enterprises in both the financial and non-financial sectors, at the national level, with annual operating revenue of $25 million or more. Volume 2 covers medium-sized enterprises in the non-financial sector, at the national level, with annual operating revenue of $5 million to less than $25 million. Volume 3 covers small enterprises in the non-financial sector, at the national, provincial, territorial, Atlantic region and Prairie region levels, with annual operating revenue of $30,000 to less than $5 million. Note: FPICB has been discontinued as of 2/23/2015. Statistics Canada continues to provide information on Canadian businesses through alternative data sources. Information on specific financial ratios will continue to be available through the annual Financial and Taxation Statistics for Enterprises program: CANSIM table 180-0003 ; the Quarterly Survey of Financial Statements: CANSIM tables 187-0001 and 187-0002 ; and the Small Business Profiles, which present financial data for small businesses in Canada, available on Industry Canada's website: Financial Performance Data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary material for the manuscript "A Test to Compare Interval Time Series". This includes figures and tables referred to in the manuscript as well as details of scripts and data files used for the simulation studies and the application. All scripts are in MATLAB (.m) format and data files are is MATLAB (.mat) and in EXCEL (. xlsx) formats.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
We present a perspective on drug development for the synthesis of an active pharmaceutical ingredient (e.g., agomelatine) within a commercial technology called Luminata and compare the results to the current method of consolidating the reaction data into Microsoft Excel. The Excel document becomes the ultimate repository of information extracted from multiple sources such as the electronic lab notebook, the laboratory information management system, the chromatography data system, in-house databases, and external data. The major needs of a pharmaceutical company are tracking the stages of multiple reactions, calculating the impurity carryover across the stages, and performing structure dereplication for an unknown impurity. As there is no standardized software available to link the different needs throughout the life cycle of process development, there is a demand for mapping tools to consolidate the route for an API synthesis and link it with analytical data while reducing transcription errors and maintaining an audit trail.
When water is pumped slowly from saturated sediment-water inteface sediments, the more highly connected, mobile porosity domain is prefferentially sampled, compared to less-mobile pore spaces. Changes in fluid electrical conductivity (EC) during controlled downward ionic tracer injections into interface sediments can be assumed to represent mobile porosity dynamics, which are therefore distinguished from less-mobile porosity dynamics that is measured using bulk EC geoelectrical methods. Fluid EC samples were drawn at flow rates similar to tracer injection rates to prevent inducing preferential flow. The data were collected using a stainless steel tube with slits cut into the bottom (USGS MINIPOINT style) connected to an EC meter via c-flex or neoprene tubing, and drawn up through the system via a peristaltic pump. The data were compiled into an excel spreadsheet and time corrected to compare to bulk EC data that were collected simultaneously and contained in another section of this data release. Controlled, downward flow experiments were conducted in Dual-domain porosity apparatus (DDPA). Downward flow rates ranged from 1.2 to 1.4 m/d in DDPA1 and at 1 m/d, 3 m/d, 5 m/d, 0.9 m/d as described in the publication: Briggs, M.A., Day-Lewis, F.D., Dehkordy, F.M.P., Hampton, T., Zarnetske, J.P., Singha, K., Harvey, J.W. and Lane, J.W., 2018, Direct observations of hydrologic exchange occurring with less-mobile porosity and the development of anoxic microzones in sandy lakebed sediments, Water Resources Research, DOI:10.1029/2018WR022823.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the essential files for conducting a dynamic stock market analysis using Power BI. The data is sourced from Yahoo Finance and includes historical stock prices, which can be dynamically updated by adding new stock codes to the provided Excel sheet.
Files Included: Power BI Report (.pbix): The interactive Power BI report that includes various visualizations such as Candle Charts, Line Charts for Support and Resistance, and Technical Indicators like SMA, EMA, Bollinger Bands, and RSI. The report is designed to provide a comprehensive analysis of stock performance over time.
Stock Data Excel Sheet (.xlsx): This Excel sheet is connected to the Power BI report and allows for dynamic data loading. By adding new stock codes to this sheet, the Power BI report automatically refreshes to include the new data, enabling continuous updates without manual intervention.
Overview and Chart Pages Snapshots for better understanding about the Report.
Key Features: Dynamic Data Loading: Easily update the dataset by adding new stock codes to the Excel sheet. The Power BI report will automatically pull the corresponding data from Yahoo Finance. Comprehensive Visualizations: Analyze stock trends using Candle Charts, identify key price levels with Support and Resistance lines, and explore market behavior through various technical indicators. Interactive Analysis: The Power BI report includes slicers and navigation buttons to switch between different time periods and visualizations, providing a tailored analysis experience. Use Cases: Ideal for financial analysts, traders, or anyone interested in conducting a detailed stock market analysis. Can be used to monitor the performance of individual stocks or compare trends across multiple stocks over time. Tags: Stock Market Power BI Financial Analysis Yahoo Finance Data Visualization
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1.Introduction
Sales data collection is a crucial aspect of any manufacturing industry as it provides valuable insights about the performance of products, customer behaviour, and market trends. By gathering and analysing this data, manufacturers can make informed decisions about product development, pricing, and marketing strategies in Internet of Things (IoT) business environments like the dairy supply chain.
One of the most important benefits of the sales data collection process is that it allows manufacturers to identify their most successful products and target their efforts towards those areas. For example, if a manufacturer could notice that a particular product is selling well in a certain region, this information could be utilised to develop new products, optimise the supply chain or improve existing ones to meet the changing needs of customers.
This dataset includes information about 7 of MEVGAL’s products [1]. According to the above information the data published will help researchers to understand the dynamics of the dairy market and its consumption patterns, which is creating the fertile ground for synergies between academia and industry and eventually help the industry in making informed decisions regarding product development, pricing and market strategies in the IoT playground. The use of this dataset could also aim to understand the impact of various external factors on the dairy market such as the economic, environmental, and technological factors. It could help in understanding the current state of the dairy industry and identifying potential opportunities for growth and development.
Please cite the following papers when using this dataset:
I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis and P. Sarigiannidis, "Evaluating the Effect of Volatile Federated Timeseries on Modern DNNs: Attention over Long/Short Memory," in the 12th International Conference on Circuits and Systems Technologies (MOCAST 2023), April 2023, Accepted
The dataset includes data regarding the daily sales of a series of dairy product codes offered by MEVGAL. In particular, the dataset includes information gathered by the logistics division and agencies within the industrial infrastructures overseeing the production of each product code. The products included in this dataset represent the daily sales and logistics of a variety of yogurt-based stock. Each of the different files include the logistics for that product on a daily basis for three years, from 2020 to 2022.
3.1 Data Collection
The process of building this dataset involves several steps to ensure that the data is accurate, comprehensive and relevant.
The first step is to determine the specific data that is needed to support the business objectives of the industry, i.e., in this publication’s case the daily sales data.
Once the data requirements have been identified, the next step is to implement an effective sales data collection method. In MEVGAL’s case this is conducted through direct communication and reports generated each day by representatives & selling points.
It is also important for MEVGAL to ensure that the data collection process conducted is in an ethical and compliant manner, adhering to data privacy laws and regulation. The industry also has a data management plan in place to ensure that the data is securely stored and protected from unauthorised access.
The published dataset is consisted of 13 features providing information about the date and the number of products that have been sold. Finally, the dataset was anonymised in consideration to the privacy requirement of the data owner (MEVGAL).
File
Period
Number of Samples (days)
product 1 2020.xlsx
01/01/2020–31/12/2020
363
product 1 2021.xlsx
01/01/2021–31/12/2021
364
product 1 2022.xlsx
01/01/2022–31/12/2022
365
product 2 2020.xlsx
01/01/2020–31/12/2020
363
product 2 2021.xlsx
01/01/2021–31/12/2021
364
product 2 2022.xlsx
01/01/2022–31/12/2022
365
product 3 2020.xlsx
01/01/2020–31/12/2020
363
product 3 2021.xlsx
01/01/2021–31/12/2021
364
product 3 2022.xlsx
01/01/2022–31/12/2022
365
product 4 2020.xlsx
01/01/2020–31/12/2020
363
product 4 2021.xlsx
01/01/2021–31/12/2021
364
product 4 2022.xlsx
01/01/2022–31/12/2022
364
product 5 2020.xlsx
01/01/2020–31/12/2020
363
product 5 2021.xlsx
01/01/2021–31/12/2021
364
product 5 2022.xlsx
01/01/2022–31/12/2022
365
product 6 2020.xlsx
01/01/2020–31/12/2020
362
product 6 2021.xlsx
01/01/2021–31/12/2021
364
product 6 2022.xlsx
01/01/2022–31/12/2022
365
product 7 2020.xlsx
01/01/2020–31/12/2020
362
product 7 2021.xlsx
01/01/2021–31/12/2021
364
product 7 2022.xlsx
01/01/2022–31/12/2022
365
3.2 Dataset Overview
The following table enumerates and explains the features included across all of the included files.
Feature
Description
Unit
Day
day of the month
-
Month
Month
-
Year
Year
-
daily_unit_sales
Daily sales - the amount of products, measured in units, that during that specific day were sold
units
previous_year_daily_unit_sales
Previous Year’s sales - the amount of products, measured in units, that during that specific day were sold the previous year
units
percentage_difference_daily_unit_sales
The percentage difference between the two above values
%
daily_unit_sales_kg
The amount of products, measured in kilograms, that during that specific day were sold
kg
previous_year_daily_unit_sales_kg
Previous Year’s sales - the amount of products, measured in kilograms, that during that specific day were sold, the previous year
kg
percentage_difference_daily_unit_sales_kg
The percentage difference between the two above values
kg
daily_unit_returns_kg
The percentage of the products that were shipped to selling points and were returned
%
previous_year_daily_unit_returns_kg
The percentage of the products that were shipped to selling points and were returned the previous year
%
points_of_distribution
The amount of sales representatives through which the product was sold to the market for this year
previous_year_points_of_distribution
The amount of sales representatives through which the product was sold to the market for the same day for the previous year
Table 1 – Dataset Feature Description
4.1 Dataset Structure
The provided dataset has the following structure:
Where:
Name
Type
Property
Readme.docx
Report
A File that contains the documentation of the Dataset.
product X
Folder
A folder containing the data of a product X.
product X YYYY.xlsx
Data file
An excel file containing the sales data of product X for year YYYY.
Table 2 - Dataset File Description
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 957406 (TERMINET).
References
[1] MEVGAL is a Greek dairy production company
Statistics about homelessness for every local authority in England.
This includes annual data covering 2009-10 to 2017-18 based on CLG live table 784, known as the P1E returns.
There are also quarterly returns (live table 784a) which cover April to June; July to September, September to December and January to March, since April 2013 available on the CLG webpage (see links)
Both are provided in excel and csv format.
These data help us compare trends across the country for the decisions local authorities make when people apply to them as homeless and each district's use of temporary accommodation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Files contain 5000 samples of AWARE characterization factors, as well as sampled independent data used in their calculations and selected intermediate results.
AWARE is a consensus-based method development to assess water use in LCA. It was developed by the WULCA UNEP/SETAC working group. Its characterization factors represent the relative Available WAter REmaining per area in a watershed, after the demand of humans and aquatic ecosystems has been met. It assesses the potential of water deprivation, to either humans or ecosystems, building on the assumption that the less water remaining available per area, the more likely another user will be deprived.
The code used to generate the samples can be found here: https://github.com/PascalLesage/aware_cf_calculator/
Samples were updated from v1.0 in 2020 to include model uncertainty associated with the choice of WaterGap as the global hydrological model (GHM).
The following datasets are supplied:
1) AWARE_characterization_factor_samples.zip
Actual characterization factors resulting from the Monte Carlo Simulation. Contains 4 zip files:
* monthly_cf.zip: contains 116,484 arrays of 5000 monthly characterization factor samples for each of 9707 watershed and for each month, in csv format. Names are cf_.csv, where is the watershed id and is the first three letters of the month ('jan', 'feb', etc.).
* average_agri_cf.zip: contains 9707 arrays of 5000 annual average, agricultural use, characterization factor samples for each watershed, in csv format. Names are cf_average_agri_.csv.
* average_non_agri_cf.zip: contains 9707 arrays of 5000 annual average, non-agricultural use, characterization factor samples for each watershed, in csv format. Names are cf_average_non_agri_.csv.
* average_unknown_cf.zip: contains 9707 arrays of 5000 annual average, unspecified use, characterization factor samples for each watershed, in csv format. Names are cf_average_unknown_.csv..
2) AWARE_base_data.xlsx
Excel file with the deterministic data, per watershed and per month, for each of the independent variables used in the calculation of AWARE characterization factors. Specifically, it includes:
Monthly irrigation
Description: irrigation water, per month, per basin
Unit: m3/month
Location in Excel doc: Irrigation
File name once imported: irrigation.pickle
table shape: (11050, 12)
Non-irrigation hwc: electricity, domestic, livestock, manufacturing
Description: non-irrigation uses of water
Unit: m3/year
Location in Excel doc: hwc_non_irrigation
File name once imported: electricity.pickle, domestic.pickle,
livestock.pickle, manufacturing.pickle
table shape: 3 x (11050,)
avail_delta
Description: Difference between "pristine" natural availability
reported in PastorXNatAvail and natural availability calculated
from "Actual availability as received from WaterGap - after
human consumption" (Avail!W:AH) plus HWC.
This should be added to calculated water availability to
get the water availability used for the calculation of EWR
Unit: m3/month
Location in Excel doc: avail_delta
File name once imported: avail_delta.pickle
table shape: (11050, 12)
avail_net
Description: Actual availability as received from WaterGap - after human consumption
Unit: m3/month
Location in Excel doc: avail_net
File name once imported: avail_net.pickle
table shape: (11050, 12)
pastor
Description: fraction of PRISTINE water availability that should be reserved for environment
Unit: unitless
Location in Excel doc: pastor
File name once imported: pastor.pickle
table shape: (11050, 12)
area
Description: area
Unit: m2
Location in Excel doc: area
File name once imported: area.pickle
table shape: (11050,)
It also includes:
information (k values) on the distributions used for each variable (uncertainty tab)
information (k values) on the model uncertainty (model uncertainty tab)
two filters used to exclude watersheds that are either in Greenland (polar filter) or without data from the Pastor et al. (2014) method (122 cells), representing small coastal cells with no direct overlap (pastor filter). (filters tab)
3) independent_variable_samples.zip
Samples for each of the independent variables used in the calculation of characterization factors. Only random variables are contained. For all watershed or watershed-months without samples, the Monte Carlo simulation used the deterministic values found in the AWARE_base_data.xlsx file.
The files are in csv format. The first column contains the watershed id (BAS34S_ID) if the data is annual or the (BAS34S_ID, month) for data with a monthly resolution. the other 5000 columns contain the sampled data.
The names of the files are .
4) intermediate_variables.zip
Contains results of intermediate calculations, used in the calculation of characterization factors. The zip file contains 3 zip files:
* AMD_world_over_AMD_i.zip: contains 116,484 arrays (for each watershed-month) of 5000 calculated values of the ratio between the AMD (Availability Minus Demand) for the watershed-month and AMD_glo, the world weighted AMD average. Format is csv.
* AMD_world.zip: contains one array of 5000 calculated values of the world average AMD. Format is csv.
* HWC.zip: contains 116,484 arrays (for each watershed-month) of 5000 calculated values of the total Human Water Consumption. Format is csv.
5) watershedBAS34S_ID.zip
Contains the GIS files to link the watershed ids (BAS34S_ID) to actual spatial data.
Students use U.S. Geological Survey (USGS) real-time, real-world seismic data from around the planet to identify where earthquakes occur and look for trends in earthquake activity. They explore where and why earthquakes occur, learning about faults and how they influence earthquakes. Looking at the interactive maps and the data, students use Microsoft Excel to conduct detailed analysis of the most-recent 25 earthquakes; they calculate mean, median, mode of the data set, as well as identify the minimum and maximum magnitudes. Students compare their predictions with the physical data, and look for trends to and patterns in the data. A worksheet serves as a student guide for the activity.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Poseidon 2.0 is a user-oriented, simple and fast Excel-Tool which aims to compare different wastewater treatment techniques based on their pollutant removal efficiencies, their costs and additional assessment criteria. Poseidon can be applied for pre-feasibility studies in order to assess possible water reuse options and can show decision makers and other stakeholders that implementable solutions are available to comply with local requirements. This upload consists in:
This dataset is linked to following additional open access resources:
http://standaarden.overheid.nl/owms/terms/licentieonbekendhttp://standaarden.overheid.nl/owms/terms/licentieonbekend
Financial data of the municipality and city districts are published via Openspending.nl. The OpenSpending platform of the Open State Foundation makes it possible to digitally disclose and compare government expenditure and income. The source data can also be downloaded in (uniform) Excel format and available through an API.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the replication package for the paper titled "How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis", by Alessio Ferrari, Paola Spoletini and Sourav Debnath.
The package contains the following folders and files.
/R-analysis
This is a folder containing all the R implementations of the the statistical tests included in the paper, together with the source .csv file used to produce the results. Each R file has the same title as the associated .csv file. The titles of the files reflect the RQs as they appear in the paper. The association between R files and Tables in the paper is as follows:
- RQ1-1-analyse-story-rates.R: Tabe 1, user story rates
- RQ1-1-analyse-role-rates.R: Table 1, role rates
- RQ1-2-analyse-story-category-phase-1.R: Table 3, user story category rates in phase 1 compared to original rates
- RQ1-2-analyse-role-category-phase-1.R: Table 5, role category rates in phase 1 compared to original rates
- RQ2.1-analysis-app-store-rates-phase-2.R: Table 8, user story and role rates in phase 2
- RQ2.2-analysis-percent-three-CAT-groups-ph1-ph2.R: Table 9, comparison of the categories of user stories in phase 1 and 2
- RQ2.2-analysis-percent-two-CAT-roles-ph1-ph2.R: Table 10, comparison of the categories of roles in phase 1 and 2.
The .csv files used for statistical tests are also used to produce boxplots. The association betwee boxplot figures and files is as follows.
- RQ1-1-story-rates.csv: Figure 4
- RQ1-1-role-rates.csv: Figure 5
- RQ1-2-categories-phase-1.csv: Figure 8
- RQ1-2-role-category-phase-1.csv: Figure 9
- RQ2-1-user-story-and-roles-phase-2.csv: Figure 13
- RQ2.2-percent-three-CAT-groups-ph1-ph2.csv: Figure 14
- RQ2.2-percent-two-CAT-roles-ph1-ph2.csv: Figure 17
- IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv: Figure 15
- IMG-only-RQ2.2-frequent-roles.csv: Figure 18
NOTE: The last two .csv files do not have an associated statistical tests, but are used solely to produce boxplots.
/Data-Analysis
This folder contains all the data used to answer the research questions.
RQ1.xlsx: includes all the data associated to RQ1 subquestions, two tabs for each subquestion (one for user stories and one for roles). The names of the tabs are self-explanatory of their content.
RQ2.1.xlsx: includes all the data for the RQ1.1 subquestion. Specifically, it includes the following tabs:
- Data Source-US-category: for each category of user story, and for each analyst, there are two lines.
The first one reports the number of user stories in that category for phase 1, and the second one reports the
number of user stories in that category for phase 2, considering the specific analyst.
- Data Source-role: for each category of role, and for each analyst, there are two lines.
The first one reports the number of user stories in that role for phase 1, and the second one reports the
number of user stories in that role for phase 2, considering the specific analyst.
- RQ2.1 rates: reports the final rates for RQ2.1.
NOTE: The other tabs are used to support the computation of the final rates.
RQ2.2.xlsx: includes all the data for the RQ2.2 subquestion. Specifically, it includes the following tabs:
- Data Source-US-category: same as RQ2.1.xlsx
- Data Source-role: same as RQ2.1.xlsx
- RQ2.2-category-group: comparison between groups of categories in the different phases, used to produce Figure 14
- RQ2.2-role-group: comparison between role groups in the different phases, used to produce Figure 17
- RQ2.2-specific-roles-diff: difference between specific roles, used to produce Figure 18
NOTE: the other tabs are used to support the computation of the values reported in the tabs above.
RQ2.2-single-US-category.xlsx: includes the data for the RQ2.2 subquestion associated to single categories of user stories.
A separate tab is used given the complexity of the computations.
- Data Source-US-category: same as RQ2.1.xlsx
- Totals: total number of user stories for each analyst in phase 1 and phase 2
- Results-Rate-Comparison: difference between rates of user stories in phase 1 and phase 2, used to produce the file
"img/IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv", which is in turn used to produce Figure 15
- Results-Analysts: number of analysts using each novel category produced in phase 2, used to produce Figure 16.
NOTE: the other tabs are used to support the computation of the values reported in the tabs above.
RQ2.3.xlsx: includes the data for the RQ2.3 subquestion. Specifically, it includes the following tabs:
- Data Source-US-category: same as RQ2.1.xlsx
- Data Source-role: same as RQ2.1.xlsx
- RQ2.3-categories: novel categories produced in phase 2, used to produce Figure 19
- RQ2-3-most-frequent-categories: most frequent novel categories
/Raw-Data-Phase-I
The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx, plus the file of the original user stories with annotations (original-us.xlsx). Each file contains two tabs:
- Evaluation: includes the annotation of the user stories as existing user story in the original categories (annotated with "E"), novel user story in a certain category (refinement, annotated with "N"), and novel user story in novel category (Name of the category in column "New Feature"). **NOTE 1:** It should be noticed that in the paper the case "refinement" is said to be annotated with "R" (instead of "N", as in the files) to make the paper clearer and easy to read.
- Roles: roles used in the user stories, and count of the user stories belonging to a certain role.
/Raw-Data-Phaes-II
The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx. Each file contains two tabs:
- Analysis: includes the annotation of the user stories as belonging to existing original
category (X), or to categories introduced after interviews, or to categories introduced
after app store inspired elicitation (name of category in "Cat. Created in PH1"), or to
entirely novel categories (name of category in "New Category").
- Roles: roles used in the user stories, and count of the user stories belonging to a certain role.
/Figures
This folder includes the figures reported in the paper. The boxplots are generated from the
data using the tool http://shiny.chemgrid.org/boxplotr/. The histograms and other plots are
produced with Excel, and are also reported in the excel files listed above.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.