Facebook
TwitterIntroduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.
Section 1 - Ask:
A. Guiding Questions:
1. Who are the key stakeholders and what are their goals for the data analysis project?
2. What is the business task that this data analysis project is attempting to solve?
B. Key Tasks: 1. Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -Urška Sršen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team.
Section 2 - Prepare:
A. Guiding Questions: 1. Where is the data stored and organized? 2. Are there any problems with the data? 3. How does the data help answer the business question?
B. Key Tasks:
Research and communicate the source of the data, and how it is stored/organized to stakeholders.
*The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016.
*Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were:
-sleepDay_merged.csv
-dailyActivity_merged.csv
Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead ofurban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
Facebook
TwitterDuring a survey carried out in 2024, roughly one in three marketing managers from France, Germany, and the United Kingdom stated that they based every marketing decision on data. Under ** percent of respondents in all five surveyed countries said they struggled to incorporate data analytics into their decision-making process.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.
The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.
Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:
Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.
Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.
Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.
Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).
We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.
Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.
Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise
Facebook
TwitterThe Github Repository, https://github.com/jodhernandezbe/TRI4PLADS/tree/v1.0.0,, is publicly available and referenced in supplementary information. This GitHub repository describes the computational framework overview, software requirements, model use, model output, and disclaimer. This repository presents a multi-scale framework that combines data engineering with process systems engineering (PSE) to enhance the precision of chemical flow analysis (CFA) at the end-of-life (EoL) stage. The focus is on chemicals used in plastic manufacturing, tracing their flows through the supply chain and EoL pathways. Additionally, this study examines potential discharges from material recovery facilities to publicly owned treatment works (POTW) facilities, recognizing their relevance to human and environmental health. Tracking these discharges is critical, as industrial EoL material transfers to POTWs can interfere with biological treatment processes, leading to unintended environmental chemical releases. By integrating data-driven methodologies with mechanistic modeling, this framework supports the identification, quantification, and regulatory assessment of chemical discharges, providing a science-based foundation for industrial and policy decision-making in sustainable material and water management. The attached file CoU - Metadata File.xlsx contains the datasets to build Figure 3 and describe a qualitative flow diagram of methyl methacrylate from manufacturing to potential consumer products generated from the Chemical Conditions of Use Locator methodology (https://doi.org/10.1111/jiec.13626). The attached file "MMA POTW Dataset.xlsx" contains the datasets needed to run the Chemical Tracker and Exposure Assessor in Publicly Owned Treatment Works Model (ChemTEAPOTW) as described in the Github Repository https://github.com/gruizmer/ChemTEAPOTW. The attached file "Plastic Data-Calculations-Assumptions.docx" contains all calculations and assumption to estimate the methyl methacrylate (MMA) releases from plastic recycling. Finally, users can generate Figures 4 and 5 after following the step-by-step process described in main Github repository for the MMA case study. This dataset is associated with the following publication: Hernandez-Betancur, J.D., J.D. Chea, D. Perez, and G.J. Ruiz-Mercado. Integrating data engineering and process systems engineering for end-of-life chemical flow analysis. COMPUTERS AND CHEMICAL ENGINEERING. Elsevier Science Ltd, New York, NY, USA, 204: 109414, (2026).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset containing process models, event logs, LCA data for the evaluation of SOPA, as well as additional figures.
Facebook
Twitterhttps://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy
The global Process Analytics Service Market is expected to reach a value of USD 674.52 million by 2033, exhibiting a CAGR of 8.41% during the forecast period (2025-2033). The market growth is attributed to increasing adoption of process analytics to optimize operations, reduce costs, and enhance decision-making. The rising demand for data-driven insights, growing adoption of cloud-based analytics solutions, and increasing investments in digital transformation initiatives are also driving the market growth. North America is the largest market for process analytics services, followed by Europe and Asia Pacific. The high adoption of process analytics in the manufacturing, financial services, and healthcare industries in these regions is fueling the market growth. The Asia Pacific region is expected to witness significant growth in the coming years due to the rapid adoption of digital technologies and increasing government initiatives to promote data analytics across various industries. Key players in the market include Domo, IBM, Oracle, Cisco, SAP, Microsoft, and others. These companies are offering a wide range of process analytics services, including consulting, implementation, support, and maintenance to help organizations leverage process analytics to improve their operational efficiency and business outcomes. The global Process Analytics Service market is projected to reach USD 802.7 Million by 2027, growing at a CAGR of 28.3%. Key drivers for this market are: AI and machine learning integration, Growing demand for real-time analytics; Increasing focus on process optimization; Rising adoption of cloud-based solutions; Expanding market in emerging economies . Potential restraints include: Increasing demand for data-driven insights, Adoption of AI and machine learning; Need for operational efficiency; Growing regulatory compliance requirements; Shift towards cloud-based solutions .
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This file contains the dataset we used for the evaluation of the multi-perspective Declare analysis.
Logs
In particular, it contains logs with different sizes and different trace lengths. We generated traces with 10, 20, 30, 40, and 50 events and, for each of these lengths, we generated logs with 25000, 50000, 75000, and 100000 traces. Therefore, in total, there are 20 logs.
Declare models
In addition, the dataset contains 10 Declare models. In particular, we prepared two models with 10 constraints, one only containing constraints on the control-flow (without conditions on data and time), and another one including real multi-perspective constraints (with conditions on time and data). We followed the same procedure to create models with 20, 30, 40, and 50 constraints.
Facebook
TwitterThis dataset was created by Dr. Khurram Shahzad
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
In the way of my journey to earn the google data analytics certificate I will practice real world example by following the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Picking the Bellabeat example.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The size of the process analytical technology market was valued at USD XXX million in 2024 and is projected to reach USD XXX million by 2033, with an expected CAGR of XX% during the forecast period.
Facebook
Twitterhttps://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
The Process Oil Market Report is Segmented by Type (aromatic, Paraffinic, And Naphthenic), Application (rubber, Polymers, Personal Care, And Other Applications), And Geography (Asia-Pacific, North America, Europe, South America, And Middle-East and Africa). The Market Size and Forecast are Provided in Terms of Volume (tons) for all the Above Segments.
Facebook
TwitterBackground Microarray experiments offer a potent solution to the problem of making and comparing large numbers of gene expression measurements either in different cell types or in the same cell type under different conditions. Inferences about the biological relevance of observed changes in expression depend on the statistical significance of the changes. In lieu of many replicates with which to determine accurate intensity means and variances, reliable estimates of statistical significance remain problematic. Without such estimates, overly conservative choices for significance must be enforced. Results A simple statistical method for estimating variances from microarray control data which does not require multiple replicates is presented. Comparison of datasets from two commercial entities using this difference-averaging method demonstrates that the standard deviation of the signal scales at a level intermediate between the signal intensity and its square root. Application of the method to a dataset related to the β-catenin pathway yields a larger number of biologically reasonable genes whose expression is altered than the ratio method. Conclusions The difference-averaging method enables determination of variances as a function of signal intensities by averaging over the entire dataset. The method also provides a platform-independent view of important statistical properties of microarray data.
Facebook
TwitterModels developed with this Open Source tool.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This notebook presents the Bellabeat Google Data Analytics Capstone case study. The analysis uses Fitbit smart device data to uncover patterns in steps, sleep, and calories, applying these insights to Bellabeat Time, a wellness-focused smartwatch.
The work follows the six-step analysis process (Ask, Prepare, Process, Analyze, Share, Act), providing both insights and actionable recommendations.
Key Insights:
Users average ~7,000 steps/day (below 10,000 recommended).
Average sleep is 6–7 hours/night (less than the recommended 8).
Steps strongly correlate with calories burned.
Activity and sleep vary by weekday vs weekend.
Deliverables:
Exploratory analysis in RMarkdown
Visualizations of activity and sleep patterns
Actionable marketing recommendations for Bellabeat
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The research hypothesis of this study revolves around employing Multi-Criteria Decision Analysis (MCDA) techniques, particularly the Analytical Hierarchical Process (AHP), to optimize technology selection in metal additive manufacturing. The data collected and analyzed includes the results of the survey and criteria evaluation relevant to the decision-making process, such as reliability, finishing of the part after printing, complexity of post-processing, sustainability of the process, user preferences, machine price, manufacturing cost, and productivity. The AHP methodology involves constructing a hierarchy structure wherein the goal or objective, criteria, and alternatives are systematically organized. Pairwise comparisons are then made among criteria and alternatives, using a relative importance scale ranging from 1 to 9. These comparisons are recorded in a positive reciprocal matrix, which is then normalized to obtain numerical weights for decision-making. The priority vector or normalized principal eigenvector is computed, representing the relative importance of criteria, and the maximum eigenvalue is determined. Finally, a global ranking of decision alternatives is analyzed based on additive aggregation and normalization of the sum of local priorities of criteria and alternatives.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The size of the Process Data Historian Software market was valued at USD XXX million in 2023 and is projected to reach USD XXX million by 2032, with an expected CAGR of XX% during the forecast period.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract This work was the result of a cooperation agreement between the University of Brasilia (UnB) and a security organization. Its goal was to model the logistic processes of the company to assist in the modernization of a control system for the management of materials. The project was managed in a dynamic model, through the monitoring and control of activities executed during its various stages. The development of a control system enabled the detection of discrepancies between what was planned and how it was performed, identifying its causes and which actions to take to ensure that the project got back on track according to the planned schedule and budget. The main objective of this article was to identify which elements controlled by the project affected its execution time. With that knowledge, it was possible to improve the planning of the next phases of the project. To this end, we performed a case study of exploratory aspect and quantitative nature to provide information on the object and guide the formulation of hypotheses. The qualitative analysis of the execution time of the modeling identified two dependent variables - systematic version and team - out of the four evaluated. The quantitative analysis studied two variables - number of modifications and number of elements -, which did not indicate evidence of correlation with the aforementioned time.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Software tools used to collect and analyze data. Parentheses for analysis software indicate the tools participants were taught to use as part of their education in research methods and statistics. “Other” responses for data collection software were largely comprised of survey tools (e.g. Survey Monkey, LimeSurvey) and tools for building and running behavioral experiments (e.g. Gorilla, JsPsych). “Other” responses for data analysis software largely consisted of neuroimaging-related tools (e.g. SPM, AFNI).
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The size of the Business Process Management Platforms market was valued at USD 2714 million in 2023 and is projected to reach USD 4897.92 million by 2032, with an expected CAGR of 8.8% during the forecast period.
Facebook
TwitterIntroduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.
Section 1 - Ask:
A. Guiding Questions:
1. Who are the key stakeholders and what are their goals for the data analysis project?
2. What is the business task that this data analysis project is attempting to solve?
B. Key Tasks: 1. Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -Urška Sršen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team.
Section 2 - Prepare:
A. Guiding Questions: 1. Where is the data stored and organized? 2. Are there any problems with the data? 3. How does the data help answer the business question?
B. Key Tasks:
Research and communicate the source of the data, and how it is stored/organized to stakeholders.
*The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016.
*Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were:
-sleepDay_merged.csv
-dailyActivity_merged.csv
Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual ...