Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data, clean data, and SQL query output tables as spreadsheets to support Tableau story and github repository available at https://github.com/brittabeta/Bellabeat-Case-Study-SQL-Excel-Tableau
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Unlocking Data to Inform Public Health Policy and Practice: WP1 Mapping Review Supplementary Excel S1
The data extracted into Excel Tab "S1 Case studies (extracted)" represents information from 31 case studies as part of the "Unlocking Data to Inform Public Health Policy and Practice" project, Workpackage (WP) 1 Mapping Review.
Details about the WP1 mapping review can be found in the "Unlocking Data to Inform Public Health Policy and Practice" project report, which can be found via this DOI link: https://doi.org/10.15131/shef.data.21221606
Facebook
TwitterIntroduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.
Section 1 - Ask:
A. Guiding Questions:
1. Who are the key stakeholders and what are their goals for the data analysis project?
2. What is the business task that this data analysis project is attempting to solve?
B. Key Tasks: 1. Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -Urška Sršen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team.
Section 2 - Prepare:
A. Guiding Questions: 1. Where is the data stored and organized? 2. Are there any problems with the data? 3. How does the data help answer the business question?
B. Key Tasks:
Research and communicate the source of the data, and how it is stored/organized to stakeholders.
*The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016.
*Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were:
-sleepDay_merged.csv
-dailyActivity_merged.csv
Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead ofurban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
Facebook
TwitterCollected Data from Coursera data case study, topic 8 steps followed were both from the curse and YouTube video tutors from tutors like. google data analytics professional certificate capstone Case Study in Excel by matt bratting
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by mgbecheta paschal
Released under Apache 2.0
Facebook
TwitterDataset related to the case study of the AI Agent integrated in the databases educational escape room.One excel file with the pre and post-test grades.One excel file with the survey results.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Okorigwe Clinton
Released under Apache 2.0
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This contains the dataset (in MS Excel format) and codebook for the Axshya SAMVAD study asessing the effects of active case finding for TB on costs due to TB diagnosis and catastrophic costs due to TB diagnosis (prevelance, intensity and inequity) when compared to passive case finding. The study was published in Global Health Action in 2018
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The present study recorded indigenous knowledge of medicinal plants in Shahrbabak, Iran. We described a method using data mining algorithms to predict medicinal plants’ mode of application. Twenty-oneindividuals aged 28 to 81 were interviewed. Firstly, data were collected and analyzed based on quantitative indices such as the informant consensus factor (ICF), the cultural importance index (CI), and the relative frequency of citation (RFC). Secondly, the data was classified by support vector machines, J48 decision trees, neural networks, and logistic regression. So, 141 medicinal plants from 43 botanical families were documented. Lamiaceae, with 18 species, was the dominant family among plants, and plant leaves were most frequently used for medicinal purposes. The decoction was the most commonly used preparation method (56%), and therophytes were the most dominant (48.93%) among plants. Regarding the RFC index, the most important species are Adiantum capillus-veneris L. and Plantago ovata Forssk., while Artemisia auseri Boiss. ranked first based on the CI index. The ICF index demonstrated that metabolic disorders are the most common problems among plants in the Shahrbabak region. Finally, the J48 decision tree algorithm consistently outperforms other methods, achieving 95% accuracy in 10-fold cross-validation and 70–30 data split scenarios. The developed model detects with maximum accuracy how to consume medicinal plants.
Facebook
Twitterhttp://www.gnu.org/licenses/fdl-1.3.htmlhttp://www.gnu.org/licenses/fdl-1.3.html
This project provides insight into the data analytics process followed for the Google Data Analytics Case Study – BellaBeat and covers the following deliverables:
It covers the 6 phases of data analysis: Ask, Prepare Process, Analyze, Share, Act
To complete this case study, a variety of data analysis tools we used, including: Google Sheets, Excel, SQL, RStudio, Kaggle
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundCountries are well advised to prepare for future pandemic risks (e.g., pandemic influenza, novel emerging agents or synthetic bioweapons). These preparations do not typically include planning for complete border closure. Even though border closure may not be instituted in time, and can fail, there might still plausible chances of success for well organized island nations.ObjectiveTo estimate costs and benefits of complete border closure in response to new pandemic threats, at an initial proof-of-concept level. New Zealand was used as a case-study for an island country.MethodsAn Excel spreadsheet model was developed to estimate costs and benefits. Case-study specific epidemiological data was sourced from past influenza pandemics. Country-specific healthcare cost data, valuation of life, and lost tourism revenue were imputed (with lost trade also in scenario analyses).ResultsFor a new pandemic equivalent to the 1918 influenza pandemic (albeit with half the mortality rate, “Scenario A”), it was estimated that successful border closure for 26 weeks provided a net societal benefit (e.g., of NZ$11.0 billion, USD$7.3 billion). Even in the face of a complete end to trade, a net benefit was estimated for scenarios where the mortality rate was high (e.g., at 10 times the mortality impact of “Scenario A”, or 2.75% of the country’s population dying) giving a net benefit of NZ$54 billion (USD$36 billion). But for some other pandemic scenarios where trade ceased, border closure resulted in a net negative societal value (e.g., for “Scenario A” times three for 26 weeks of border closure–but not for only 12 weeks of closure when it would still be beneficial).ConclusionsThis “proof-of-concept” work indicates that more detailed cost-benefit analysis of border closure in very severe pandemic situations for some island nations is probably warranted, as this course of action might sometimes be worthwhile from a societal perspective.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This data was obtained from a controlled, multiple case study involving six professional developers and four real-life, industrial systems. The study was designed to control for the moderator factors: programmer skill, maintenance task and learning effect. The primary data set contains multiple sets of defects, in the form of reports (excel files) extracted from six issue tracking systems. The secondary data consists of a series of attributes extracted from the software systems (i.e., code smells) and their evolution (i.e., code churn), and a log specifying the dates on which developers worked on each of the systems/tasks, in the form of excel files. Details on the controlled, multiple case study can be found in the doctoral dissertation by Yamashita titled: "Assessing the Capability of Code Smells to Support Software Maintainability Assessments: Empirical Inquiry and Methodological Approach" (online) Available at: https://www.duo.uio.no/handle/10852/34525
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This excel spreadsheet contains the full answers given and later corrected by respondents to the original questionnaire applied in the case study of the work: "Design Thinking for Requirements Engineering: Problems and Opportunities on Non-functional Requirements".
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Project Name: Divvy Bikeshare Trip Data_Year2020 Date Range: April 2020 to December 2020. Analyst: Ajith Software: R Program, Microsoft Excel IDE: RStudio
The following are the basic system requirements, necessary for the project: Processor: Intel i3 or AMD Ryzen 3 and higher Internal RAM: 8 GB or higher Operating System: Windows 7 or above, MacOS
**Data Usage License: https://ride.divvybikes.com/data-license-agreement ** Introduction:
In this case, study we aim to utilize different data analysis techniques and tools, to understand the rental patterns of the divvy bike sharing company and understand the key business improvement suggestions. This case study is a mandatory project to be submitted to achieve the Google Data Analytics Certification. The data utilized in this case study was licensed based on the provided data usage license. The trips between April 2020 to December 2020 are used to analyse the data.
Scenario: Marketing team needs to design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ.
Objective: The main objective of this case study, is to understand the customer usage patterns and the breakdown of customers, based on their subscription status and the average durations of the rental bike usage.
Introduction to Data: The Data provided for this project, is adhered to the data usage license, laid down by the source company. The source data was provided in the CSV files and are month and quarter breakdowns. A total of 13 columns of data was provided in each csv file.
The following are the columns, which were initially observed across the datasets.
Ride_id Ride_type Start_station_name Start_station_id End_station_name End_station_id Usertype Start_time End_time Start_lat Start_lng End_lat End_lng
Documentation, Cleaning and Preparing Data for Analysis: The total size of the datasets, for the year 2020, is approximately 450 MB, which is tiring job, when you have to upload them to the SQL database and visualize using the BI tools. I wanted to improve my skills into R environment and this is the best opportunity and optimal to use R for the data analysis.
For more insights, installation procedures for R and RStudio, please refer to the following URL, for additional information.
R Projects Document: https://www.r-project.org/other-docs.html RStudio Download: https://www.rstudio.com/products/rstudio/ Installation Guide: https://www.youtube.com/watch?v=TFGYlKvQEQ4
Facebook
TwitterDescription of building parts in three case study buildings to study the effects interdependent parts have on life cycle material flows and associated environmental impact of the building. This data is related to fictitious buildings (pavilions). This excel file contains three datasets for three fictitious cases used in the analysis of interdependencies:- BASE: the base case - PACE: a pace-layered variation of the BASE case - MODU: a modular variation of the BASE case The three cases can be visualized in the associated paper (DOI will be provided in References when known).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This replication package consists of the detailed results of the safety assessment process of Apollo 7.0’s perception system for its use on a 3.4-kilometer segment of the Dutch highway, A270.
‘Operational design domain description’ consists of a detailed description of the operational area (operational design domain).
‘Hazard Analysis and Risk Assessment’ is a Microsoft Excel workbook of 6 sheets comprising of all intermediate results from the first two steps of safety requirement elicitation (hazard analysis, risk assessment), along with the final result (safety goals and their risk levels).
‘Safety Requirements’ is a Microsoft Excel workbook of 2 sheets comprising the final result of the safety requirement elicitation process, i.e., safety requirements (1) for traditional software (2) specific to ML-based systems.
‘Design assessment’ is a Microsoft Excel workbook of 2 sheets comprising the design assessment results. Specifically, the sheets consist of (1) the safety requirements and applicable design choices for each requirement; (2) where did we assess each requirement; (3) the final verdict for assessment of each requirement; (4) the reason for the verdict and the design decisions found in Apollo’s perception system related artifacts.
Facebook
TwitterThis is the data collection sheet for the study Deciphering the significance of neutrophil to lymphocyte and monocyte to lymphocyte ratios in tuberculosis: A case-control study from southern India
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This zip file contains data files for 3 activities described in the accompanying PPT slides 1. an excel spreadsheet for analysing gain scores in a 2 group, 2 times data array. this activity requires access to –https://campbellcollaboration.org/research-resources/effect-size-calculator.html to calculate effect size.2. an AMOS path model and SPSS data set for an autoregressive, bivariate path model with cross-lagging. This activity is related to the following article: Brown, G. T. L., & Marshall, J. C. (2012). The impact of training students how to write introductions for academic essays: An exploratory, longitudinal study. Assessment & Evaluation in Higher Education, 37(6), 653-670. doi:10.1080/02602938.2011.5632773. an AMOS latent curve model and SPSS data set for a 3-time latent factor model with an interaction mixed model that uses GPA as a predictor of the LCM start and slope or change factors. This activity makes use of data reported previously and a published data analysis case: Peterson, E. R., Brown, G. T. L., & Jun, M. C. (2015). Achievement emotions in higher education: A diary study exploring emotions across an assessment event. Contemporary Educational Psychology, 42, 82-96. doi:10.1016/j.cedpsych.2015.05.002andBrown, G. T. L., & Peterson, E. R. (2018). Evaluating repeated diary study responses: Latent curve modeling. In SAGE Research Methods Cases Part 2. Retrieved from http://methods.sagepub.com/case/evaluating-repeated-diary-study-responses-latent-curve-modeling doi:10.4135/9781526431592
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data, clean data, and SQL query output tables as spreadsheets to support Tableau story and github repository available at https://github.com/brittabeta/Bellabeat-Case-Study-SQL-Excel-Tableau