38 datasets found

f
Statistical Comparison of Two ROC Curves
figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yaacov Petscher (2023). Statistical Comparison of Two ROC Curves [Dataset]. http://doi.org/10.6084/m9.figshare.860448.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.860448.v1
Dataset updated
Jun 3, 2023
Dataset provided by
figshare
Authors
Yaacov Petscher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.
f
GHS Safety Fingerprints
figshare.com
xlsx
Updated Oct 25, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Murphy (2018). GHS Safety Fingerprints [Dataset]. http://doi.org/10.6084/m9.figshare.7210019.v3
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7210019.v3
Dataset updated
Oct 25, 2018
Dataset provided by
figshare
Authors
Brian Murphy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Spreadsheets targeted at the analysis of GHS safety fingerprints.AbstractOver a 20-year period, the UN developed the Globally Harmonized System (GHS) to address international variation in chemical safety information standards. By 2014, the GHS became widely accepted internationally and has become the cornerstone of OSHA’s Hazard Communication Standard. Despite this progress, today we observe that there are inconsistent results when different sources apply the GHS to specific chemicals, in terms of the GHS pictograms, hazard statements, precautionary statements, and signal words assigned to those chemicals. In order to assess the magnitude of this problem, this research uses an extension of the “chemical fingerprints” used in 2D chemical structure similarity analysis to GHS classifications. By generating a chemical safety fingerprint, the consistency of the GHS information for specific chemicals can be assessed. The problem is the sources for GHS information can differ. For example, the SDS for sodium hydroxide pellets found on Fisher Scientific’s website displays two pictograms, while the GHS information for sodium hydroxide pellets on Sigma Aldrich’s website has only one pictogram. A chemical information tool, which identifies such discrepancies within a specific chemical inventory, can assist in maintaining the quality of the safety information needed to support safe work in the laboratory. The tools for this analysis will be scaled to the size of a moderate large research lab or small chemistry department as a whole (between 1000 and 3000 chemical entities) so that labelling expectations within these universes can be established as consistently as possible.Most chemists are familiar with programs such as excel and google sheets which are spreadsheet programs that are used by many chemists daily. Though a monadal programming approach with these tools, the analysis of GHS information can be made possible for non-programmers. This monadal approach employs single spreadsheet functions to analyze the data collected rather than long programs, which can be difficult to debug and maintain. Another advantage of this approach is that the single monadal functions can be mixed and matched to meet new goals as information needs about the chemical inventory evolve over time. These monadal functions will be used to converts GHS information into binary strings of data called “bitstrings”. This approach is also used when comparing chemical structures. The binary approach make data analysis more manageable, as GHS information comes in a variety of formats such as pictures or alphanumeric strings which are difficult to compare on their face. Bitstrings generated using the GHS information can be compared using an operator such as the tanimoto coefficent to yield values from 0 for strings that have no similarity to 1 for strings that are the same. Once a particular set of information is analyzed the hope is the same techniques could be extended to more information. For example, if GHS hazard statements are analyzed through a spreadsheet approach the same techniques with minor modifications could be used to tackle more GHS information such as pictograms.Intellectual Merit. This research indicates that the use of the cheminformatic technique of structural fingerprints can be used to create safety fingerprints. Structural fingerprints are binary bit strings that are obtained from the non-numeric entity of 2D structure. This structural fingerprint allows comparison of 2D structure through the use of the tanimoto coefficient. The use of this structural fingerprint can be extended to safety fingerprints, which can be created by converting a non-numeric entity such as GHS information into a binary bit string and comparing data through the use of the tanimoto coefficient.Broader Impact. Extension of this research can be applied to many aspects of GHS information. This research focused on comparing GHS hazard statements, but could be further applied to other bits of GHS information such as pictograms and GHS precautionary statements. Another facet of this research is allowing the chemist who uses the data to be able to compare large dataset using spreadsheet programs such as excel and not need a large programming background. Development of this technique will also benefit the Chemical Health and Safety community and Chemical Information communities by better defining the quality of GHS information available and providing a scalable and transferable tool to manipulate this information to meet a variety of other organizational needs.
f
UC_vs_US Statistic Analysis.xlsx
figshare.com
xlsx
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.12631628.v1
Dataset updated
Jul 9, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme: Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
C
Hospital Annual Financial Data - Selected Data & Pivot Tables
data.chhs.ca.gov
healthdata.gov
+4more
csv, data, doc, html +4
Updated Apr 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2025). Hospital Annual Financial Data - Selected Data & Pivot Tables [Dataset]. https://data.chhs.ca.gov/dataset/hospital-annual-financial-data-selected-data-pivot-tables
Explore at:
data, xlsx, xlsx(752914), xls(44933632), xls(18445312), html, xls(19650048), xlsx(765216), xlsx(754073), xlsx(770931), xls(920576), xlsx(769128), xls(44967936), xls, xlsx(763636), xls(18301440), pdf(310420), xls(51554816), xlsx(756356), pdf(333268), xls(19577856), xls(19599360), xlsx(777616), xlsx(782546), zip, xlsx(779866), xlsx(790979), xls(16002048), xlsx(14714368), xlsx(768036), xlsx(758089), doc, xlsx(750199), pdf(383996), pdf(258239), pdf(303198), pdf(121968), xls(14657536), csv(205488092), xlsx(758376), xls(51424256), xlsx(771275), xls(19625472)Available download formats
Dataset updated
Apr 23, 2025
Dataset authored and provided by
Department of Health Care Access and Information
Description
On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.

Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.

There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.
l
Dataset _ The influence of social context on the perception of assistive...
repository.lboro.ac.uk
explore.openaire.eu
pdf
Updated Oct 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salman Asghar; George Torrens; Hassan Iftikhar; Ruth Welsh; Robert G. Harland (2019). Dataset _ The influence of social context on the perception of assistive technology: Using a semantic differential scale to compare young adults’ views from the UK and Pakistan [Dataset]. http://doi.org/10.17028/rd.lboro.7982006.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.17028/rd.lboro.7982006.v1
Dataset updated
Oct 9, 2019
Dataset provided by
Loughborough University
Authors
Salman Asghar; George Torrens; Hassan Iftikhar; Ruth Welsh; Robert G. Harland
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
Pakistan, United Kingdom
Description
This dataset contains raw data and their corresponding results files associated with a recent study. Each MS Excel spreadsheet entails the data for one aspect of study which is specified by name of the file.The information about participants i.e. personal and demographic, responses for first SD scale, second SD scale and personal evaluation are presented in each spreadsheet. The supplemental material (participant information sheet, informed consent form, online questionnaire, risk assessment form) are also enclosed with this dataset. Lastly, for the analysis of raw data, statistical test such as; independent sample t-test was performed. The original SPSS data files are also included.
Data from: Delta Produce Sources Study
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Delta Produce Sources Study [Dataset]. https://catalog.data.gov/dataset/delta-produce-sources-study-51a7a
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The Delta Produce Sources Study was an observational study designed to measure and compare food environments of farmers markets (n=3) and grocery stores (n=12) in 5 rural towns located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys from June 2019 to March 2020 using a modified version of the Nutrition Environment Measures Survey (NEMS) Farmers Market Audit tool. The tool was modified to collect information pertaining to source of fresh produce and also for use with both farmers markets and grocery stores. Availability, source, quality, and price information were collected and compared between farmers markets and grocery stores for 13 fresh fruits and 32 fresh vegetables via SAS software programming. Because the towns were not randomly selected and the sample sizes are relatively small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Resources in this dataset:Resource Title: Delta Produce Sources Study dataset . File Name: DPS Data Public.csvResource Description: The dataset contains variables corresponding to availability, source (country, state and town if country is the United States), quality, and price (by weight or volume) of 13 fresh fruits and 32 fresh vegetables sold in farmers markets and grocery stores located in 5 Lower Mississippi Delta towns.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Delta Produce Sources Study data dictionary. File Name: DPS Data Dictionary Public.csvResource Description: This file is the data dictionary corresponding to the Delta Produce Sources Study dataset.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel
h
Supporting data for PhD thesis “Investigating the Impact of Argument-Driven...
datahub.hku.hk
Updated Jul 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kin Yi Leung (2023). Supporting data for PhD thesis “Investigating the Impact of Argument-Driven Inquiry and Academically Productive Talk on Critical Thinking and Learning Motivation in Post-Pandemic Hong Kong Science Education” [Dataset]. http://doi.org/10.25442/hku.23648130.v1
Explore at:
Unique identifier
https://doi.org/10.25442/hku.23648130.v1
Dataset updated
Jul 20, 2023
Dataset provided by
HKU Data Repository
Authors
Kin Yi Leung
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Directory of Files: A. Filename: Combine_CCTDI.zip
Short description: Quantitative Data. The zip files contain 6 Excel files which store students' raw data. This raw data set consists of student's input on each CCTDI item. The pre-data were collected through an online survey, while post-data were collected through pen and paper. The data will be analysed by ANOVA to compare the effectiveness of the intervention. (California Critical Thinking Disposition Inventory (CCTDI) has been widely employed in the field of education to investigate the changes in students’ Critical Thinking (CT) attitudes resulting from teaching interventions by comparing the pre- and post-tests. This 6-point scale self-reporting instrument requires respondents to rate themselves, ranging from “rating 1” for not describing them at all to “rating 6” for extremely well. The instrument has 40 questions categorized in seven subsets covering various CT dispositions dimensions, namely: i) truth-seeking, ii) open-mindedness, iii) analyticity, iv) systematicity, v) inquisitiveness, vi) maturity, and vii) self-confidence.

B. Filename: Combine_TCTSPS.zip
Short description: Quantitative Data. The zip files contains 6 excel files which stores students' raw data. consists of student's input on each TCTSPS item. The pre-data were collected through an online survey, while post-data were collected through pen and paper. The data will be analysed by ANOVA to compare the effectiveness of the intervention. (Test of Critical Thinking Skills for Primary and Secondary School Students (TCTS-PS) consists of 24 items divided into five subscales measuring distinct yet correlated aspects of CT skills, namely: (I) differentiating theory from assumptions, (II) deciding evidence, (III) inference, (IV) finding an alternative theory, and (V) evaluation of arguments. The instrument yields a possible total score of 72. The instrument is intended for use in measuring gains in CT skills resulting from instruction, predicting success in programs where CT is crucial, and examining relationships between CT skills and other abilities or traits.)

C. Filename: Combine_SMTSL.zip
Short description: Quantitative Data. The zip files contains 5 excel files which stores students' raw data. consists of student's input on each SMTSL item. The pre-data were collected through an online survey, while post-data were collected through pen and paper. The data will be analysed by ANOVA to compare the effectiveness of the intervention. (Students' Motivation Towards Science learning (SMTSL) defined six factors that related to the motivation in science learning including self-efficacy, active learning strategies and so on, in order to measure participants' motivation towards science learning: A. Self-efficacy, B. Active learning , trategies, C. Science learning value, D. Performance goal, E. Achievement goal, and F. Learning environment stimulation)

D. Filename: Combine_Discourse Transcription_1.zip and Combine_Discourse Transcription_2.zip
Short description: Qualitative Data.The zip files contains 6 excel files which 6 teachers' classroom teaching discourse transcriptions. The data will be analysed by thematic analysis to compare the effectiveness of the intervention. (38 science classroom discourse videos of 8th graders were transcribed and coded by Academically Productive Talk framework (APT). APT is drawing from sociological, linguistic, and anthropological perspectives, comprises four primary constructs or objectives.)

E. Filename: Combine_Inquiry Report.zip
Short description: Qualitative Data. The zip files contains 2 excel files which 2 schools' inquiry report scores according rubrics. The data will be analysed by thematic analysis to compare the effectiveness of the intervention. (To assess the quality of students' arguments, a validated scoring rubric was employed to evaluate the student's written argument. These aspects primarily concentrated on the student's proficiency in five perspectives (Walker & Sampson, 2013, p. 573): (AR1) Provide a well-articulated, adequate, and accurate claim that answers the research question, (AR2) Use genuine evidence to support the claim and to present the evidence in an appropriate manner, (AR3) Provide enough valid and reliable evidence to support the claim, (AR4) Provide a rationale is sufficient and appropriate, and (AR5) Compare his or her findings with other groups in the project.)

F. Filename: Combined_Interview Transcription.xlsx
Short description: Qualitative Data. The file contains all the students' interview transcriptions. The data will be analysed by thematic analysis to compare the effectiveness of the intervention. (A semi-structured interviews was conducted to gather interviewees' motivation of CT and learning motivation in the context of science. The interview data would be used to complement the quantitative results (i.e., TCTS-PS, CCTDI, and SMTSL scores).
N
Excel Township, Minnesota Annual Population and Growth Analysis Dataset: A...
neilsberg.com
csv, json
Updated Jul 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Excel Township, Minnesota Annual Population and Growth Analysis Dataset: A Comprehensive Overview of Population Changes and Yearly Growth Rates in Excel township from 2000 to 2023 // 2024 Edition [Dataset]. https://www.neilsberg.com/insights/excel-township-mn-population-by-year/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jul 30, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Minnesota, Excel Township
Variables measured
Annual Population Growth Rate, Population Between 2000 and 2023, Annual Population Growth Rate Percent
Measurement technique
The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2023. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2023. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Excel township population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Excel township across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

Key observations

In 2023, the population of Excel township was 300, a 0.99% decrease year-by-year from 2022. Previously, in 2022, Excel township population was 303, a decline of 0.98% compared to a population of 306 in 2021. Over the last 20 plus years, between 2000 and 2023, population of Excel township increased by 17. In this period, the peak population was 308 in the year 2020. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

Content

When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

Data Coverage:

From 2000 to 2023

Variables / Data Columns

Year: This column displays the data year (Measured annually and for years 2000 to 2023)

Population: The population for the specific year for the Excel township is shown in this column.

Year on Year Change: This column displays the change in Excel township population for each year compared to the previous year.

Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Excel township Population by Year. You can refer the same here
B
Financial Performance Indicators for Canadian Business [Excel]
borealisdata.ca
Updated Sep 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2023). Financial Performance Indicators for Canadian Business [Excel] [Dataset]. http://doi.org/10.5683/SP3/SZHJFY
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/SZHJFY
Dataset updated
Sep 29, 2023
Dataset provided by
Borealis
Authors
Statistics Canada
License
https://borealisdata.ca/api/datasets/:persistentId/versions/2.1/customlicense?persistentId=doi:10.5683/SP3/SZHJFYhttps://borealisdata.ca/api/datasets/:persistentId/versions/2.1/customlicense?persistentId=doi:10.5683/SP3/SZHJFY
Time period covered
1994 - 2011
Area covered
Canada
Description
This CD-ROM product is an authoritative reference source of 15 key financial ratios by industry groupings compiled from the North American Industry Classification System (NAICS 2007). It is based on up-to-date, reliable and comprehensive data on Canadian businesses, derived from Statistics Canada databases of financial statements for three reference years. The CD-ROM enables users to compare their enterprise's performance to that of their industry and to address issues such as profitability, efficiency and business risk. Financial Performance Indicators can also be used for inter-industry comparisons. Volume 1 covers large enterprises in both the financial and non-financial sectors, at the national level, with annual operating revenue of $25 million or more. Volume 2 covers medium-sized enterprises in the non-financial sector, at the national level, with annual operating revenue of $5 million to less than $25 million. Volume 3 covers small enterprises in the non-financial sector, at the national, provincial, territorial, Atlantic region and Prairie region levels, with annual operating revenue of $30,000 to less than $5 million. Note: FPICB has been discontinued as of 2/23/2015. Statistics Canada continues to provide information on Canadian businesses through alternative data sources. Information on specific financial ratios will continue to be available through the annual Financial and Taxation Statistics for Enterprises program: CANSIM table 180-0003 ; the Quarterly Survey of Financial Statements: CANSIM tables 187-0001 and 187-0002 ; and the Small Business Profiles, which present financial data for small businesses in Canada, available on Industry Canada's website: Financial Performance Data.
m
A Test to Compare Interval Time Series - Supplementary Material
data.mendeley.com
Updated Jan 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Ann Maharaj (2021). A Test to Compare Interval Time Series - Supplementary Material [Dataset]. http://doi.org/10.17632/f35nry7hjz.1
Explore at:
Unique identifier
https://doi.org/10.17632/f35nry7hjz.1
Dataset updated
Jan 11, 2021
Authors
Elizabeth Ann Maharaj
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary material for the manuscript "A Test to Compare Interval Time Series". This includes figures and tables referred to in the manuscript as well as details of scripts and data files used for the simulation studies and the application. All scripts are in MATLAB (.m) format and data files are is MATLAB (.mat) and in EXCEL (. xlsx) formats.
f
Data from: Consolidating and Managing Data for Drug Development within a...
figshare.com
xlsx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arvin Moser; Alexander E. Waked; Joseph DiMartino (2023). Consolidating and Managing Data for Drug Development within a Pharmaceutical Laboratory: Comparing the Mapping and Reporting Tools from Software Applications [Dataset]. http://doi.org/10.1021/acs.oprd.1c00082.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.oprd.1c00082.s002
Dataset updated
May 30, 2023
Dataset provided by
ACS Publications
Authors
Arvin Moser; Alexander E. Waked; Joseph DiMartino
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
We present a perspective on drug development for the synthesis of an active pharmaceutical ingredient (e.g., agomelatine) within a commercial technology called Luminata and compare the results to the current method of consolidating the reaction data into Microsoft Excel. The Excel document becomes the ultimate repository of information extracted from multiple sources such as the electronic lab notebook, the laboratory information management system, the chromatography data system, in-house databases, and external data. The major needs of a pharmaceutical company are tracking the stages of multiple reactions, calculating the impurity carryover across the stages, and performing structure dereplication for an unknown impurity. As there is no standardized software available to link the different needs throughout the life cycle of process development, there is a demand for mapping tools to consolidate the route for an API synthesis and link it with analytical data while reducing transcription errors and maintaining an audit trail.
c
Fluid electrical conductivity data
s.cnmilf.com
data.usgs.gov
+1more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Fluid electrical conductivity data [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/fluid-electrical-conductivity-data
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
When water is pumped slowly from saturated sediment-water inteface sediments, the more highly connected, mobile porosity domain is prefferentially sampled, compared to less-mobile pore spaces. Changes in fluid electrical conductivity (EC) during controlled downward ionic tracer injections into interface sediments can be assumed to represent mobile porosity dynamics, which are therefore distinguished from less-mobile porosity dynamics that is measured using bulk EC geoelectrical methods. Fluid EC samples were drawn at flow rates similar to tracer injection rates to prevent inducing preferential flow. The data were collected using a stainless steel tube with slits cut into the bottom (USGS MINIPOINT style) connected to an EC meter via c-flex or neoprene tubing, and drawn up through the system via a peristaltic pump. The data were compiled into an excel spreadsheet and time corrected to compare to bulk EC data that were collected simultaneously and contained in another section of this data release. Controlled, downward flow experiments were conducted in Dual-domain porosity apparatus (DDPA). Downward flow rates ranged from 1.2 to 1.4 m/d in DDPA1 and at 1 m/d, 3 m/d, 5 m/d, 0.9 m/d as described in the publication: Briggs, M.A., Day-Lewis, F.D., Dehkordy, F.M.P., Hampton, T., Zarnetske, J.P., Singha, K., Harvey, J.W. and Lane, J.W., 2018, Direct observations of hydrologic exchange occurring with less-mobile porosity and the development of anoxic microzones in sandy lakebed sediments, Water Resources Research, DOI:10.1029/2018WR022823.
Stock Market Analysis using Power BI
kaggle.com
Updated Aug 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DileepKumarVemali (2024). Stock Market Analysis using Power BI [Dataset]. https://www.kaggle.com/datasets/dileepkumarvemali/stock-market-analysis-using-power-bi/data?select=StocksListNSETest.xlsx
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 12, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
DileepKumarVemali
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the essential files for conducting a dynamic stock market analysis using Power BI. The data is sourced from Yahoo Finance and includes historical stock prices, which can be dynamically updated by adding new stock codes to the provided Excel sheet.

Files Included: Power BI Report (.pbix): The interactive Power BI report that includes various visualizations such as Candle Charts, Line Charts for Support and Resistance, and Technical Indicators like SMA, EMA, Bollinger Bands, and RSI. The report is designed to provide a comprehensive analysis of stock performance over time.

Stock Data Excel Sheet (.xlsx): This Excel sheet is connected to the Power BI report and allows for dynamic data loading. By adding new stock codes to this sheet, the Power BI report automatically refreshes to include the new data, enabling continuous updates without manual intervention.

Overview and Chart Pages Snapshots for better understanding about the Report.

Key Features: Dynamic Data Loading: Easily update the dataset by adding new stock codes to the Excel sheet. The Power BI report will automatically pull the corresponding data from Yahoo Finance. Comprehensive Visualizations: Analyze stock trends using Candle Charts, identify key price levels with Support and Resistance lines, and explore market behavior through various technical indicators. Interactive Analysis: The Power BI report includes slicers and navigation buttons to switch between different time periods and visualizations, providing a tailored analysis experience. Use Cases: Ideal for financial analysts, traders, or anyone interested in conducting a detailed stock market analysis. Can be used to monitor the performance of individual stocks or compare trends across multiple stocks over time. Tags: Stock Market Power BI Financial Analysis Yahoo Finance Data Visualization
Z
Dairy Supply Chain Sales Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dimitrios Pliatsios (2024). Dairy Supply Chain Sales Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7853252
Explore at:
Dataset updated
Jul 12, 2024
Dataset provided by
Anna Triantafyllou
Dimitris Iatropoulos
Dimitrios Pliatsios
Athanasios Liatifis
Ilias Siniosoglou
Thomas Lagkas
Vasileios Argyriou
Christos Chaschatzis
Konstantinos Georgakidis
Panagiotis Sarigiannidis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
1.Introduction

Sales data collection is a crucial aspect of any manufacturing industry as it provides valuable insights about the performance of products, customer behaviour, and market trends. By gathering and analysing this data, manufacturers can make informed decisions about product development, pricing, and marketing strategies in Internet of Things (IoT) business environments like the dairy supply chain.

One of the most important benefits of the sales data collection process is that it allows manufacturers to identify their most successful products and target their efforts towards those areas. For example, if a manufacturer could notice that a particular product is selling well in a certain region, this information could be utilised to develop new products, optimise the supply chain or improve existing ones to meet the changing needs of customers.

This dataset includes information about 7 of MEVGAL’s products [1]. According to the above information the data published will help researchers to understand the dynamics of the dairy market and its consumption patterns, which is creating the fertile ground for synergies between academia and industry and eventually help the industry in making informed decisions regarding product development, pricing and market strategies in the IoT playground. The use of this dataset could also aim to understand the impact of various external factors on the dairy market such as the economic, environmental, and technological factors. It could help in understanding the current state of the dairy industry and identifying potential opportunities for growth and development.

Citation

Please cite the following papers when using this dataset:

I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis and P. Sarigiannidis, "Evaluating the Effect of Volatile Federated Timeseries on Modern DNNs: Attention over Long/Short Memory," in the 12th International Conference on Circuits and Systems Technologies (MOCAST 2023), April 2023, Accepted

Dataset Modalities

The dataset includes data regarding the daily sales of a series of dairy product codes offered by MEVGAL. In particular, the dataset includes information gathered by the logistics division and agencies within the industrial infrastructures overseeing the production of each product code. The products included in this dataset represent the daily sales and logistics of a variety of yogurt-based stock. Each of the different files include the logistics for that product on a daily basis for three years, from 2020 to 2022.

3.1 Data Collection

The process of building this dataset involves several steps to ensure that the data is accurate, comprehensive and relevant.

The first step is to determine the specific data that is needed to support the business objectives of the industry, i.e., in this publication’s case the daily sales data.

Once the data requirements have been identified, the next step is to implement an effective sales data collection method. In MEVGAL’s case this is conducted through direct communication and reports generated each day by representatives & selling points.

It is also important for MEVGAL to ensure that the data collection process conducted is in an ethical and compliant manner, adhering to data privacy laws and regulation. The industry also has a data management plan in place to ensure that the data is securely stored and protected from unauthorised access.

The published dataset is consisted of 13 features providing information about the date and the number of products that have been sold. Finally, the dataset was anonymised in consideration to the privacy requirement of the data owner (MEVGAL).

File

Period

Number of Samples (days)

product 1 2020.xlsx

01/01/2020–31/12/2020

363

product 1 2021.xlsx

01/01/2021–31/12/2021

364

product 1 2022.xlsx

01/01/2022–31/12/2022

365

product 2 2020.xlsx

01/01/2020–31/12/2020

363

product 2 2021.xlsx

01/01/2021–31/12/2021

364

product 2 2022.xlsx

01/01/2022–31/12/2022

365

product 3 2020.xlsx

01/01/2020–31/12/2020

363

product 3 2021.xlsx

01/01/2021–31/12/2021

364

product 3 2022.xlsx

01/01/2022–31/12/2022

365

product 4 2020.xlsx

01/01/2020–31/12/2020

363

product 4 2021.xlsx

01/01/2021–31/12/2021

364

product 4 2022.xlsx

01/01/2022–31/12/2022

364

product 5 2020.xlsx

01/01/2020–31/12/2020

363

product 5 2021.xlsx

01/01/2021–31/12/2021

364

product 5 2022.xlsx

01/01/2022–31/12/2022

365

product 6 2020.xlsx

01/01/2020–31/12/2020

362

product 6 2021.xlsx

01/01/2021–31/12/2021

364

product 6 2022.xlsx

01/01/2022–31/12/2022

365

product 7 2020.xlsx

01/01/2020–31/12/2020

362

product 7 2021.xlsx

01/01/2021–31/12/2021

364

product 7 2022.xlsx

01/01/2022–31/12/2022

365

3.2 Dataset Overview

The following table enumerates and explains the features included across all of the included files.

Feature

Description

Unit

Day

day of the month

-

Month

Month

-

Year

Year

-

daily_unit_sales

Daily sales - the amount of products, measured in units, that during that specific day were sold

units

previous_year_daily_unit_sales

Previous Year’s sales - the amount of products, measured in units, that during that specific day were sold the previous year

units

percentage_difference_daily_unit_sales

The percentage difference between the two above values

%

daily_unit_sales_kg

The amount of products, measured in kilograms, that during that specific day were sold

kg

previous_year_daily_unit_sales_kg

Previous Year’s sales - the amount of products, measured in kilograms, that during that specific day were sold, the previous year

kg

percentage_difference_daily_unit_sales_kg

The percentage difference between the two above values

kg

daily_unit_returns_kg

The percentage of the products that were shipped to selling points and were returned

%

previous_year_daily_unit_returns_kg

The percentage of the products that were shipped to selling points and were returned the previous year

%

points_of_distribution

The amount of sales representatives through which the product was sold to the market for this year

previous_year_points_of_distribution

The amount of sales representatives through which the product was sold to the market for the same day for the previous year

Table 1 – Dataset Feature Description

Structure and Format

4.1 Dataset Structure

The provided dataset has the following structure:

Where:

Name

Type

Property

Readme.docx

Report

A File that contains the documentation of the Dataset.

product X

Folder

A folder containing the data of a product X.

product X YYYY.xlsx

Data file

An excel file containing the sales data of product X for year YYYY.

Table 2 - Dataset File Description

Acknowledgement

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 957406 (TERMINET).

References

[1] MEVGAL is a Greek dairy production company
W
Data from: Homelessness in England
cloud.csiss.gmu.edu
data.europa.eu
csv, xls
Updated Oct 21, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Housing Board (2018). Homelessness in England [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/homelessness-in-england1
Explore at:
csv, xlsAvailable download formats
Dataset updated
Oct 21, 2018
Dataset provided by
The Housing Board
Area covered
England
Description
Statistics about homelessness for every local authority in England.

This includes annual data covering 2009-10 to 2017-18 based on CLG live table 784, known as the P1E returns.

There are also quarterly returns (live table 784a) which cover April to June; July to September, September to December and January to March, since April 2013 available on the CLG webpage (see links)

Both are provided in excel and csv format.

These data help us compare trends across the country for the decisions local authorities make when people apply to them as homeless and each district's use of temporary accommodation.
Z
Data from: AWARE characterization factor samples
data.niaid.nih.gov
zenodo.org
Updated Dec 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lesage, Pascal (2020). AWARE characterization factor samples [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3406507
Explore at:
Dataset updated
Dec 26, 2020
Dataset provided by
Pfister, Stefan
Lesage, Pascal
Boulay, Anne-Marie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Files contain 5000 samples of AWARE characterization factors, as well as sampled independent data used in their calculations and selected intermediate results.

AWARE is a consensus-based method development to assess water use in LCA. It was developed by the WULCA UNEP/SETAC working group. Its characterization factors represent the relative Available WAter REmaining per area in a watershed, after the demand of humans and aquatic ecosystems has been met. It assesses the potential of water deprivation, to either humans or ecosystems, building on the assumption that the less water remaining available per area, the more likely another user will be deprived.

The code used to generate the samples can be found here: https://github.com/PascalLesage/aware_cf_calculator/

Samples were updated from v1.0 in 2020 to include model uncertainty associated with the choice of WaterGap as the global hydrological model (GHM).

The following datasets are supplied:

1) AWARE_characterization_factor_samples.zip

Actual characterization factors resulting from the Monte Carlo Simulation. Contains 4 zip files:

* monthly_cf.zip: contains 116,484 arrays of 5000 monthly characterization factor samples for each of 9707 watershed and for each month, in csv format. Names are cf_.csv, where is the watershed id and is the first three letters of the month ('jan', 'feb', etc.). * average_agri_cf.zip: contains 9707 arrays of 5000 annual average, agricultural use, characterization factor samples for each watershed, in csv format. Names are cf_average_agri_.csv. * average_non_agri_cf.zip: contains 9707 arrays of 5000 annual average, non-agricultural use, characterization factor samples for each watershed, in csv format. Names are cf_average_non_agri_.csv. * average_unknown_cf.zip: contains 9707 arrays of 5000 annual average, unspecified use, characterization factor samples for each watershed, in csv format. Names are cf_average_unknown_.csv..

2) AWARE_base_data.xlsx

Excel file with the deterministic data, per watershed and per month, for each of the independent variables used in the calculation of AWARE characterization factors. Specifically, it includes:

Monthly irrigation Description: irrigation water, per month, per basin Unit: m3/month Location in Excel doc: Irrigation File name once imported: irrigation.pickle table shape: (11050, 12) Non-irrigation hwc: electricity, domestic, livestock, manufacturing Description: non-irrigation uses of water Unit: m3/year Location in Excel doc: hwc_non_irrigation File name once imported: electricity.pickle, domestic.pickle, livestock.pickle, manufacturing.pickle table shape: 3 x (11050,) avail_delta Description: Difference between "pristine" natural availability reported in PastorXNatAvail and natural availability calculated from "Actual availability as received from WaterGap - after human consumption" (Avail!W:AH) plus HWC. This should be added to calculated water availability to get the water availability used for the calculation of EWR Unit: m3/month Location in Excel doc: avail_delta File name once imported: avail_delta.pickle table shape: (11050, 12) avail_net Description: Actual availability as received from WaterGap - after human consumption Unit: m3/month Location in Excel doc: avail_net File name once imported: avail_net.pickle table shape: (11050, 12) pastor Description: fraction of PRISTINE water availability that should be reserved for environment Unit: unitless Location in Excel doc: pastor File name once imported: pastor.pickle table shape: (11050, 12) area Description: area Unit: m2 Location in Excel doc: area File name once imported: area.pickle table shape: (11050,)

It also includes:

information (k values) on the distributions used for each variable (uncertainty tab)

information (k values) on the model uncertainty (model uncertainty tab)

two filters used to exclude watersheds that are either in Greenland (polar filter) or without data from the Pastor et al. (2014) method (122 cells), representing small coastal cells with no direct overlap (pastor filter). (filters tab)

3) independent_variable_samples.zip

Samples for each of the independent variables used in the calculation of characterization factors. Only random variables are contained. For all watershed or watershed-months without samples, the Monte Carlo simulation used the deterministic values found in the AWARE_base_data.xlsx file.

The files are in csv format. The first column contains the watershed id (BAS34S_ID) if the data is annual or the (BAS34S_ID, month) for data with a monthly resolution. the other 5000 columns contain the sampled data.

The names of the files are .

4) intermediate_variables.zip

Contains results of intermediate calculations, used in the calculation of characterization factors. The zip file contains 3 zip files:

* AMD_world_over_AMD_i.zip: contains 116,484 arrays (for each watershed-month) of 5000 calculated values of the ratio between the AMD (Availability Minus Demand) for the watershed-month and AMD_glo, the world weighted AMD average. Format is csv. * AMD_world.zip: contains one array of 5000 calculated values of the world average AMD. Format is csv. * HWC.zip: contains 116,484 arrays (for each watershed-month) of 5000 calculated values of the total Human Water Consumption. Format is csv.

5) watershedBAS34S_ID.zip

Contains the GIS files to link the watershed ids (BAS34S_ID) to actual spatial data.
d
Data from: Earthquakes Living Lab: Locating Earthquakes
datadiscoverystudio.org
Updated Apr 4, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Earthquakes Living Lab: Locating Earthquakes [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/9ad64a18337f47679c03d840d72b8257/html
Explore at:
Dataset updated
Apr 4, 2016
Area covered

Description
Students use U.S. Geological Survey (USGS) real-time, real-world seismic data from around the planet to identify where earthquakes occur and look for trends in earthquake activity. They explore where and why earthquakes occur, learning about faults and how they influence earthquakes. Looking at the interactive maps and the data, students use Microsoft Excel to conduct detailed analysis of the most-recent 25 earthquakes; they calculate mean, median, mode of the data set, as well as identify the minimum and maximum magnitudes. Students compare their predictions with the physical data, and look for trends to and patterns in the data. A worksheet serves as a student guide for the activity.
Poseidon 2.0 - Decision Support Tool for Water Reuse (Microsoft Excel) and...
zenodo.org
data.niaid.nih.gov
bin, pdf
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emmanuel Oertlé; Emmanuel Oertlé (2024). Poseidon 2.0 - Decision Support Tool for Water Reuse (Microsoft Excel) and Handbook [Dataset]. http://doi.org/10.5281/zenodo.3755380
Explore at:
bin, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3755380
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Emmanuel Oertlé; Emmanuel Oertlé
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Poseidon 2.0 is a user-oriented, simple and fast Excel-Tool which aims to compare different wastewater treatment techniques based on their pollutant removal efficiencies, their costs and additional assessment criteria. Poseidon can be applied for pre-feasibility studies in order to assess possible water reuse options and can show decision makers and other stakeholders that implementable solutions are available to comply with local requirements. This upload consists in:

Poseidon 2.0 Excel File that can be used with Microsoft Excel - XLSM

Handbook presenting main features of the decision support tool - PDF

This dataset is linked to following additional open access resources:

: Oertlé E, Hugi C, Wintgens T, Karavitis C, Oertlé E, Hugi C, Wintgens T, Karavitis CA. 2019. Poseidon—Decision Support Tool for Water Reuse. Water. 11(1):153. doi:10.3390/w11010153. [accessed 2019 Jan 22]. http://www.mdpi.com/2073-4441/11/1/153 .

Externally hosted supplementary file 1, Oertlé, Emmanuel. (2018, December 5). Poseidon - Decision Support Tool for Water Reuse (Microsoft Excel) and Handbook (Version 1.1.1). Zenodo. http://doi.org/10.5281/zenodo.3341573

Externally hosted supplementary file 2, Oertlé, Emmanuel. (2018). Wastewater Treatment Unit Processes Datasets: Pollutant removal efficiencies, evaluation criteria and cost estimations (Version 1.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1247434

Externally hosted supplementary file 3, Oertlé, Emmanuel. (2018). Treatment Trains for Water Reclamation (Dataset) (Version 1.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1972627

Externally hosted supplementary file 4, Oertlé, Emmanuel. (2018). Water Quality Classes - Recommended Water Quality Based on Guideline and Typical Wastewater Qualities (Version 1.0.2) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3341570
C
Budget and results of the municipality and city districts
ckan.mobidatalab.eu
Updated Apr 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OverheidNl (2023). Budget and results of the municipality and city districts [Dataset]. https://ckan.mobidatalab.eu/dataset/qrzdwvf8jdh7fw
Explore at:
http://publications.europa.eu/resource/authority/file-type/html, http://publications.europa.eu/resource/authority/file-type/tar_xzAvailable download formats
Dataset updated
Apr 11, 2023
Dataset provided by
OverheidNl
License
http://standaarden.overheid.nl/owms/terms/licentieonbekendhttp://standaarden.overheid.nl/owms/terms/licentieonbekend
Description
Financial data of the municipality and city districts are published via Openspending.nl. The OpenSpending platform of the Open State Foundation makes it possible to digitally disclose and compare government expenditure and income. The source data can also be downloaded in (uniform) Excel format and available through an API.
Replication Package - How Do Requirements Evolve During Elicitation? An...
zenodo.org
bin, zip
Updated Apr 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessio Ferrari; Alessio Ferrari; Paola Spoletini; Paola Spoletini; Sourav Debnath; Sourav Debnath (2022). Replication Package - How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis [Dataset]. http://doi.org/10.5281/zenodo.6472498
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6472498
Dataset updated
Apr 21, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alessio Ferrari; Alessio Ferrari; Paola Spoletini; Paola Spoletini; Sourav Debnath; Sourav Debnath
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the replication package for the paper titled "How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis", by Alessio Ferrari, Paola Spoletini and Sourav Debnath.

The package contains the following folders and files.

/R-analysis

This is a folder containing all the R implementations of the the statistical tests included in the paper, together with the source .csv file used to produce the results. Each R file has the same title as the associated .csv file. The titles of the files reflect the RQs as they appear in the paper. The association between R files and Tables in the paper is as follows:

- RQ1-1-analyse-story-rates.R: Tabe 1, user story rates

- RQ1-1-analyse-role-rates.R: Table 1, role rates

- RQ1-2-analyse-story-category-phase-1.R: Table 3, user story category rates in phase 1 compared to original rates

- RQ1-2-analyse-role-category-phase-1.R: Table 5, role category rates in phase 1 compared to original rates

- RQ2.1-analysis-app-store-rates-phase-2.R: Table 8, user story and role rates in phase 2

- RQ2.2-analysis-percent-three-CAT-groups-ph1-ph2.R: Table 9, comparison of the categories of user stories in phase 1 and 2

- RQ2.2-analysis-percent-two-CAT-roles-ph1-ph2.R: Table 10, comparison of the categories of roles in phase 1 and 2.

The .csv files used for statistical tests are also used to produce boxplots. The association betwee boxplot figures and files is as follows.

- RQ1-1-story-rates.csv: Figure 4

- RQ1-1-role-rates.csv: Figure 5

- RQ1-2-categories-phase-1.csv: Figure 8

- RQ1-2-role-category-phase-1.csv: Figure 9

- RQ2-1-user-story-and-roles-phase-2.csv: Figure 13

- RQ2.2-percent-three-CAT-groups-ph1-ph2.csv: Figure 14

- RQ2.2-percent-two-CAT-roles-ph1-ph2.csv: Figure 17

- IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv: Figure 15

- IMG-only-RQ2.2-frequent-roles.csv: Figure 18

NOTE: The last two .csv files do not have an associated statistical tests, but are used solely to produce boxplots.

/Data-Analysis

This folder contains all the data used to answer the research questions.

RQ1.xlsx: includes all the data associated to RQ1 subquestions, two tabs for each subquestion (one for user stories and one for roles). The names of the tabs are self-explanatory of their content.

RQ2.1.xlsx: includes all the data for the RQ1.1 subquestion. Specifically, it includes the following tabs:

- Data Source-US-category: for each category of user story, and for each analyst, there are two lines.

The first one reports the number of user stories in that category for phase 1, and the second one reports the

number of user stories in that category for phase 2, considering the specific analyst.

- Data Source-role: for each category of role, and for each analyst, there are two lines.

The first one reports the number of user stories in that role for phase 1, and the second one reports the

number of user stories in that role for phase 2, considering the specific analyst.

- RQ2.1 rates: reports the final rates for RQ2.1.

NOTE: The other tabs are used to support the computation of the final rates.

RQ2.2.xlsx: includes all the data for the RQ2.2 subquestion. Specifically, it includes the following tabs:

- Data Source-US-category: same as RQ2.1.xlsx

- Data Source-role: same as RQ2.1.xlsx

- RQ2.2-category-group: comparison between groups of categories in the different phases, used to produce Figure 14

- RQ2.2-role-group: comparison between role groups in the different phases, used to produce Figure 17

- RQ2.2-specific-roles-diff: difference between specific roles, used to produce Figure 18

NOTE: the other tabs are used to support the computation of the values reported in the tabs above.

RQ2.2-single-US-category.xlsx: includes the data for the RQ2.2 subquestion associated to single categories of user stories.

A separate tab is used given the complexity of the computations.

- Data Source-US-category: same as RQ2.1.xlsx

- Totals: total number of user stories for each analyst in phase 1 and phase 2

- Results-Rate-Comparison: difference between rates of user stories in phase 1 and phase 2, used to produce the file

"img/IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv", which is in turn used to produce Figure 15

- Results-Analysts: number of analysts using each novel category produced in phase 2, used to produce Figure 16.

NOTE: the other tabs are used to support the computation of the values reported in the tabs above.

RQ2.3.xlsx: includes the data for the RQ2.3 subquestion. Specifically, it includes the following tabs:

- Data Source-US-category: same as RQ2.1.xlsx

- Data Source-role: same as RQ2.1.xlsx

- RQ2.3-categories: novel categories produced in phase 2, used to produce Figure 19

- RQ2-3-most-frequent-categories: most frequent novel categories

/Raw-Data-Phase-I

The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx, plus the file of the original user stories with annotations (original-us.xlsx). Each file contains two tabs:

- Evaluation: includes the annotation of the user stories as existing user story in the original categories (annotated with "E"), novel user story in a certain category (refinement, annotated with "N"), and novel user story in novel category (Name of the category in column "New Feature"). **NOTE 1:** It should be noticed that in the paper the case "refinement" is said to be annotated with "R" (instead of "N", as in the files) to make the paper clearer and easy to read.

- Roles: roles used in the user stories, and count of the user stories belonging to a certain role.

/Raw-Data-Phaes-II

The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx. Each file contains two tabs:

- Analysis: includes the annotation of the user stories as belonging to existing original

category (X), or to categories introduced after interviews, or to categories introduced

after app store inspired elicitation (name of category in "Cat. Created in PH1"), or to

entirely novel categories (name of category in "New Category").

- Roles: roles used in the user stories, and count of the user stories belonging to a certain role.

/Figures

This folder includes the figures reported in the paper. The boxplots are generated from the

data using the tool http://shiny.chemgrid.org/boxplotr/. The histograms and other plots are

produced with Excel, and are also reported in the excel files listed above.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yaacov Petscher (2023). Statistical Comparison of Two ROC Curves [Dataset]. http://doi.org/10.6084/m9.figshare.860448.v1

Statistical Comparison of Two ROC Curves

Explore at:

11 scholarly articles cite this dataset (View in Google Scholar)

xlsAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.860448.v1

Dataset updated

Jun 3, 2023

Dataset provided by

figshare

Authors

Yaacov Petscher

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.

Clear search

Close search

Google apps

Main menu

Statistical Comparison of Two ROC Curves

GHS Safety Fingerprints

UC_vs_US Statistic Analysis.xlsx

Hospital Annual Financial Data - Selected Data & Pivot Tables

Dataset _ The influence of social context on the perception of assistive...

Data from: Delta Produce Sources Study

Supporting data for PhD thesis “Investigating the Impact of Argument-Driven...

Excel Township, Minnesota Annual Population and Growth Analysis Dataset: A...

About this dataset

Content

Inspiration

Recommended for further research

Financial Performance Indicators for Canadian Business [Excel]

A Test to Compare Interval Time Series - Supplementary Material

Data from: Consolidating and Managing Data for Drug Development within a...

Fluid electrical conductivity data

Stock Market Analysis using Power BI

Dairy Supply Chain Sales Dataset

Data from: Homelessness in England

Data from: AWARE characterization factor samples

Data from: Earthquakes Living Lab: Locating Earthquakes

Poseidon 2.0 - Decision Support Tool for Water Reuse (Microsoft Excel) and...

Budget and results of the municipality and city districts

Replication Package - How Do Requirements Evolve During Elicitation? An...

Statistical Comparison of Two ROC Curves