Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Dataset for Continuous Stress Monitoring of Hospital Nurses
The growing accessibility of wearable tech has opened doors to continuously monitor various physiological factors. Detecting stress early has become pivotal, aiding individuals in proactively managing their health against the detrimental effects of prolonged stress exposure. This paper presents an exclusive stress detection dataset cultivated within the natural environment of a hospital. Compiled during the COVID-19 outbreak, this dataset encompasses the biometric data of nurses. Analyzing stress in a workplace setting is intricate due to the multifaceted social, cultural, and psychological elements inherent in dealing with stressful circumstances. Hence, our dataset not only encompasses physiological data but also contextual information surrounding stress events. Key physiological metrics such as electrodermal activity, heart rate, and skin temperature of the nurse subjects were continuously monitored. Additionally, a periodic survey administered via smartphones captured contributing factors linked to detected stress events. The database housing these signals, stress occurrences, and survey responses is publicly accessible on Dryad.
Project Overview This project delves into leveraging wearable device-derived physiological signals to gauge stress levels among nurses operating within a hospital environment. The dataset comprises details acquired from nurses wearing watches that tracked their heart rate, skin temperature, and electrodermal activity (EDA) while simultaneously reporting their stress levels.
The primary goal revolves around evaluating various machine learning models to forecast stress levels based on recorded physiological signals. Additionally, the project investigates the most pertinent physiological indicators for stress detection and offers insights to enhance the accuracy and dependability of stress detection via wearable tech.
Dataset Description:
Data Collection Context: Period: Data gathered over one week from 15 female nurses aged 30 to 55 years, during regular shifts at a hospital. Collection Phases: Two phases - Phase-I (April 15, 2020, to August 6, 2020) and Phase-II (October 8, 2020, to December 11, 2020). Exclusion Criteria: Pregnancy, heavy smoking, mental disorders, chronic or cardiovascular diseases.
Data Captured: Physiological Variables Monitored: Electrodermal activity, Heart Rate, and skin temperature of the nurse subjects. Survey Responses: Periodic smartphone-administered surveys capturing contributing factors to detected stress events. Measurement Technologies: Utilized Empatica E4 for data collection, specifically focusing on Galvanic Skin Response and Blood Volume Pulse (BVP) readings.
Study Procedure: Approval: University's Institutional Review Board approved the study protocol (FA19–50 INFOR). Consent and Enrollment: Nurse subjects were enrolled after expressing interest and obtaining hospital compliance. Study Design: Conducted in three phases, each including 7 nurses. No incentives were provided, and anonymization of data was ensured.
Data Availability: Public Release: A database containing signals, stress events, and survey responses is publicly available on Dryad. Anonymization: Unique identifiers assigned to subjects to maintain anonymity.
Merge CSV File Information: This dataset comprises approximately 11.5 million entries across nine columns: X, Y, Z: Orientation data (256 unique entries each). EDA, HR, TEMP: Physiological measurements (EDA: 274,452 unique, HR: 6,268 unique, TEMP: 599 unique). id: 18 categorical identifiers. datetime: Extensive date and time entries (10.6 million unique). label: Categorical states or classes (three unique entries). The dataset offers a wide array of continuous physiological measurements alongside orientation data, facilitating stress detection, health monitoring, and related research endeavours.
Requirements Python 3.7 or higher and Jupyter Notebook are prerequisites. The necessary Python packages are enumerated in the requirements.txt file. To execute the code, installation of the following libraries is mandatory: pandas, numpy, sci-kit-learn, and matplotlib.
Facebook
Twitterhttps://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
According to Cognitive Market Research, the global Wearable Technology market size was USD 117514.5 million in 2024. It will expand at a compound annual growth rate (CAGR) of 18.70% from 2024 to 2031.
North America held the major market share for more than 40% of the global revenue with a market size of USD 47005.80 million in 2024 and will grow at a compound annual growth rate (CAGR) of 16.9% from 2024 to 2031.
Europe accounted for a market share of over 30% of the global revenue with a market size of USD 35254.35 million.
Asia Pacific held a market share of around 23% of the global revenue with a market size of USD 27028.34 million in 2024 and will grow at a compound annual growth rate (CAGR) of 20.7% from 2024 to 2031.
Latin America had a market share of more than 5% of the global revenue with a market size of USD 5875.73 million in 2024 and will grow at a compound annual growth rate (CAGR) of 18.1% from 2024 to 2031.
Middle East and Africa had a market share of around 2% of the global revenue and was estimated at a market size of USD 2350.29 million in 2024 and will grow at a compound annual growth rate (CAGR) of 18.4% from 2024 to 2031.
The wristwear held the highest Wearable Technology market revenue share in 2024.
Market Dynamics of Wearable Technology Market
Key Drivers for Wearable Technology Market
Increasing consumer demand for health and fitness tracking
The market for wearable technology is rapidly expanding due to growing consumer focus on health and fitness. Despite busy lifestyles, individuals from around the world remain committed to investing in health and fitness. As more individuals are becoming aware of the benefits of maintaining a healthy and active lifestyle, more individual are taking part in physical activities like sports, yoga, athletics and gym workouts. The popularity of wearable fitness tracking devices, such as, fitness watches and rings that allow individuals to proactively manage their health by tracking vitals like heart rate, sleep quality and other activity levels are further encouraging people to build better habits. Wearable devices and fitness apps collect large volumes of user data allowing for the creation of data-driven fitness programs that provide highly customized insights and suggestions.
For instance, several developed markets are achieving new benchmarks in fitness engagement. Countries such as the U.S., U.K., Spain, and Switzerland have all reported record-high penetration rates and fitness facility memberships.
Advancements in Technology to Propel Market Growth
The Wearable Technology market has witnessed steady growth, driven by advancements in technology. Component miniaturization, longer battery life, and greater sensor capabilities have all contributed to the development of increasingly complex and user-friendly systems. These technical advancements have broadened the possibilities for wearable technology, ranging from simple fitness trackers to sophisticated health monitoring systems. Furthermore, the integration of artificial intelligence and machine learning has created new opportunities for data analysis and tailored experiences, accelerating market growth. As technology advances, the wearable technology sector is primed for continued expansion.
Restraint Factor for the Wearable Technology Market
Growing concerns around data privacy and security
Users today are more concerned about the security of their data and potential for misuse. Wearable devices collect significant amounts of personal data, raising concerns about data breaches, unauthorized access and how the collected data is shared and stored. Since wearables hold a lot of personal information, they are a prime target for cybercriminals, raising serious concerns about the possibility of data breaches.
For instance, the sensitivity of wearable data was highlighted by a 2021 data breach that revealed over 61 million activity tracker records from Fitbit and Apple.
Impact of Covid-19 on the Wearable Technology Market
The COVID-19 pandemic catalyzed the wearable technology market, propelling it into a period of unprecedented growth and transformation. While the initial stages of...
Facebook
TwitterMultiple wearable devices that purport to measure physical activity are widely available to consumers. While they may support increases in physical activity among people with multiple sclerosis (MS) by providing feedback on their performance, there is little information about the validity and acceptability of these devices. Providing devices that are perceived as inaccurate and difficult to use may have negative consequences for people with MS, rather than supporting participation in physical activity. The aim of this study was, therefore, to assess the validity and acceptability of commercially available devices for monitoring step-count and activity time among people with MS. Nineteen ambulatory adults with MS [mean (SD) age 52.1 (11.9) years] participated in the study. Step-count was assessed using five commercially available devices (Fitbit Alta, Fitbit Zip, Garmin Vivofit 4, Yamax Digi Walker SW200, and Letscom monitor) and an activPAL3μ while completing nine everyday activities. Step-count was also manually counted. Time in light activity, moderate-to-vigorous activity, and total activity were measured during activities using an Actigraph GT3X accelerometer. Of the 19 participants who completed the validity study, fifteen of these people also wore the five commercially available devices for three consecutive days each, and participated in a semi-structured interview regarding their perception of the acceptability of the monitors. Mean percentage error for step-count ranged from 12.1% for the Yamax SW200 to −112.3% for the Letscom. Mean step-count as manually determined differed to mean step-count measured by the Fitbit Alta (p = 0.002), Garmin vivofit 4 (p < 0.001), Letscom (p < 0.001) and the research standard device, the activPAL3μ (p < 0.001). However, 95% limits of agreement were smallest for the activPAL3μ and largest for the Fitbit Alta. Median percentage error for activity minutes was 52.9% for the Letscom and 100% for the Garmin Vivofit 4 and Fitbit Alta compared to minutes in total activity. Three inductive themes were generated from participant accounts: Interaction with device; The way the device looks and feels; Functionality. In conclusion, commercially available devices demonstrated poor criterion validity when measuring step-count and activity time in people with MS. This negatively affected the acceptability of devices, with perceived inaccuracies causing distrust and frustration. Additional considerations when designing devices for people with MS include an appropriately sized and lit display and ease of attaching and charging devices.
Facebook
Twitterhttps://ora.ox.ac.uk/objects/uuid:99d7c092-d865-4a19-b096-cc16440cd001https://ora.ox.ac.uk/objects/uuid:99d7c092-d865-4a19-b096-cc16440cd001
This dataset contains Axivity AX3 wrist-worn activity tracker data that were collected from 151 participants in 2014-2016 around the Oxfordshire area. Participants were asked to wear the device in daily living for a period of roughly 24 hours, amounting to a total of almost 4,000 hours. Vicon Autograph wearable cameras and Whitehall II sleep diaries were used to obtain the ground truth activities performed during the period (e.g. sitting watching TV, walking the dog, washing dishes, sleeping), resulting in more than 2,500 hours of labelled data. Accompanying code to analyse this data is available at https://github.com/activityMonitoring/capture24. The following papers describe the data collection protocol in full: i.) Gershuny J, Harms T, Doherty A, Thomas E, Milton K, Kelly P, Foster C (2020) Testing self-report time-use diaries against objective instruments in real time. Sociological Methodology doi: 10.1177/0081175019884591; ii.) Willetts M, Hollowell S, Aslett L, Holmes C, Doherty A. (2018) Statistical machine learning of sleep and physical activity phenotypes from sensor data in 96,220 UK Biobank participants. Scientific Reports. 8(1):7961. Regarding Data Protection, the Clinical Data Set will not include any direct subject identifiers. However, it is possible that the Data Set may contain certain information that could be used in combination with other information to identify a specific individual, such as a combination of activities specific to that individual ("Personal Data"). Accordingly, in the conduct of the Analysis, users will comply with all applicable laws and regulations relating to information privacy. Further, the user agrees to preserve the confidentiality of, and not attempt to identify, individuals in the Data Set.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Analyst: Alexandra Loop Date: 12/02/2024
Business Task:
Question to be Answered : - What are trends in non-Bellabeat smart device usage? - What do these trends suggest for Bellabeat customers? - How could these trends help influence Bellabeat marketing strategy?
Description of Data Sources:
Data Set to be studied: FitBit Fitness Tracker Data: Pattern Recognition with tracker data: Improve Your Overall Health
Data privacy: Data was sourced from a public dataset available on Kaggle. Information has been anonymized prior to being posted online.
Bias: Due to the degree of anonymity in this study, the only demographic data available in this study is weight, and other cultural differences or lifestyle requirements cannot be accounted for. The sample size is quite small. The time period of the study is only a month so the observer effect could conceivably still be influencing the sample groups. We also have no information on the weather in the region studied. April and May are very variable months in terms of accessible outdoor activities.
Process:
Cleaning Process: After going through the data to find duplicates, whitespace, and nulls, I have determined that this set of data has been well-cleaned and already aggregated into several reasonably sized spreadsheets.
Trim: No issues found
Consistent length ID: No issues found
Irrelevant columns: In WLI_M the fat column is not consistently filled in so it is not productive to use it in analysis Sedentary_active_distance was mostly filled with nulls and could confuse the data I have removed the columns
Irrelevant Rows: 77 rows in daily_Activity_merged had 0s across the board. As there is little chance that someone would take zero steps I decided to interpret these days as ones where people did not put on the fitbit. As such they are irrelevant rows. Removed 77 columns. 85 rows in daily_intensities_merged registered 0 minutes of sedentary activity, which I do not believe to be possible. Row 241 logged 2 minutes of sedentary activity. I have determined it to be unusable. Row 322 likewise does not add up to a day’s minutes and has been deleted. Removed 85 columns 7 rows had 1440 sedentary minutes, which I have determined to be time on but not used. Implication of the presence noted.
Scientifically debunked information: BMI as a measurement has been determined to be problematic on many lines, it misrepresents non-white people who have different healthy body types, does not account for muscle mass or scoliosis, has been known to change definitions in accordance with business interests rather than health data, and was never meant to be used as a measure of individual health. I have removed the BMI column from the Weight Log Info chart.
Cleaning Process 1:
I have elected to see what can be found in the data as it was organized by the providers first.
Cleaning Process 2:
I calculated and removed rows where the participants did not put on the fitbit. These rows were removed, and the implications of their presence have been noted.
Found Averages, Minimum, and Maximum Values of Steps, distance, types of active minutes, and calories.
Found the sum of all kinds of minutes documented to check for inconsistencies.
Found the difference between total minutes and a full 1440 minutes.
I tried to make a pie chart to convey the average minutes of activity, and so created a duplicate dataset to trim down and remove misleading data caused by different inputs.
Analysis:
Observations: On average, the participants do not seem interested in moderate physical activity as it was the category with the fewest number of active minutes. Perhaps advertise the effectiveness of low impact workouts. Very few participants volunteered their weights, but none of them lost weight. The person with the highest weight volunteered it only once near the beginning. Given evidence from the Health At Every Size movement, we cannot deny the possibility that having to be weight conscious could have had negative effects on this individual. I would suggest that weight would be a counterproductive focus for our marketing campaign as it would make heavier people less likely to want to participate, and any claims of weight loss would be statistically unfounded, and open us up to false advertising lawsuits. Fully half of the participants had days where they did not put on their fitbit at all during the day. For a total number of 77-84 lost days of data, meaning that on average participants who did not wear their fitbit daily lost 5 days of data, though of course some lost significantly more. I would suggest focusing on creating a biometric tracker that is comfortable and rarely needs to be charged so that people will gain more reliable resources from it. 400 full days of data are recorded, meaning that the participants did not take the device off to sleep, shower, or swim. 280 more have 16...
Facebook
TwitterThe displayed data on wearables usage shows a country comparison from the Statista Consumer Insights. As of **********, Some ** percent of respondents in China stated that they personally use wearables (e.g. smart watch, health / fitness tracker). The survey was conducted in 2024. Looking to gain further valuable insights into this matter? Check out share of eHealth tracker / smart watch owners in ** countries and territories worldwide to see how consumers around the globe are adopting this form of technology. Access millions of exclusive survey results with Statista Consumer Insights.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Advances in wearable technologies provide the opportunity to monitor many physiological variables continuously. Stress detection has gained increased attention in recent years, especially because early stress detection can help individuals better manage health to minimize the negative impacts of long-term stress exposure. This paper provides a unique stress detection dataset created in a natural working environment in a hospital. This dataset is a collection of biometric data of nurses during the COVID-19 outbreak. Studying stress in a work environment is complex due to the influence of many social, cultural, and individuals experience in dealing with stressful conditions. In order to address these concerns, we captured both the physiological data and associated context pertaining to the stress events. We monitored specific physiological variables, including electrodermal activity, heart rate, skin temperature, and accelerometer data of the nurse subjects. A periodic smartphone-administered survey also captured the contributing factors for the detected stress events. A database containing the signals, stress events, and survey responses is available upon request.
Methods The data was gathered for approximately one week from 15 female nurses working regular shifts at a hospital. 1,250 hours worth of data was collected in two study sessions in Apr-May and Nov-Dec of 2020. The data was collected using Empatica E4 wearable devices. A survey was administered every day to identify the type of stress.
Facebook
TwitterThe dataset contains wrist-accelerometer and audio data from people performing at-home tasks such as sweeping, brushing teeth, washing hands, or watching TV. These activities represent a subset of activities that are needed to be able to live independently. Being able to detect activities with wearable devices in real-time has the potential for the realization of assistive technologies with applications in different domains such as elderly care and mental health monitoring. By making this dataset public, researchers can test different machine learning algorithms for activity recognition, especially, sensor data fusion methods and mult-view learning.
Description
The database contains three directories. One for each user. Each directory contains the raw accelerometer data. The features file contains extracted features. Features prefixed with "v1_" are Mel Frequency Cepstral Coefficients extracted from the audio data. Features prefixed with 'v2_" were extracted from the accelerometer.
For more details about the dataset you can check the original paper here: https://osf.io/preprints/osf/j723c
Terms of use
The data is released fully open for research and educational purposes. The use of the dataset for purposes such as competitions and commercial purposes needs prior written permission. In all documents and papers that use or refer to the dataset or report experimental results based on the HTAD, a reference to the related article needs to be added: https://link.springer.com/chapter/10.1007/978-3-030-67835-7_17.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Results of qualitative data analysis of a field study of a smart wearable system (Grippy) aiming to help people deal with daily stress. This dataset has been anonymized.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sport DB 2.0 is a collection of 168 cardiorespiratory datasets, acquired through wearable sensors and portable devices from 130 subjects while practicing 11 different sports during training and competition. Each dataset consists of demographic data (sex, age, weight, height, smoking habit, alcohol consumption, caffeine consumption, weekly training rate, presence of diseases and dietary supplement consumption), cardiorespiratory signals (electrocardiogram, heart-rate series, RR-interval series, and/or breathing-rate series), and training note data (sport-dependent training protocol). Cardiorespiratory signals were acquired through the BioHarness 3.0 by Zephyr, the KardiaMobile by AliveCor, the Kardia 6L by AliveCor, the Polar M400 by Polar, and heart-rate sensor H7 by Polar, on the playing field or gym following a specific acquisition protocol for each sport. Sport DB 2.0 may be useful to support research activity finalized to investigate the cardiorespiratory pathophysiological mechanisms triggered by sport, to develop automatic algorithms for monitoring athletes’ health while practicing sports, to validate the reliability of wearable sensors and portable devices in sport, and to develop data analytics techniques and artificial intelligence applications to support sport sciences.
Burattini, Laura; Romagnoli, Sofia; Sbrollini, Agnese; Morettini, Micaela; Nocera, Antonio; Bondi, Danilo; Pietrangelo, Tiziana; Verratti, Vittore (2023), “Sport DB 2.0”, Mendeley Data, V1, doi: 10.17632/kzkjkt7mx2.1
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data collection was carried out in the context of the TOLIFE Horizon Europe project, which aims to develop and validate artificial intelligence (AI) solutions for processing patient data collected through non-intrusive wearable sensors. The ultimate goal is to enable personalized treatment, monitor health outcomes, and improve the quality of life of patients with Chronic Obstructive Pulmonary Disease (COPD).
The dataset is composed of four distinct subsets, each acquired under different conditions and with specific goals:
1. Gait Speed Dataset:
Acquisition Protocol:
Folder Content:
2. Activity Recognition Dataset:
Materials:
Acquisition Protocol:
Folder Content:
3. Validation Dataset:
Materials: Same TOLIFE wearable platform.
Acquisition protocol:
Participants: 10 healthy adults (5 M, 5 F), age: 43.8 ± 12.1 years
Protocol: free walking outdoors, with mixed activities:
- Resting, level walking, stair climbing
- 4 walking paths of known length (100–280)
- Device placement: user’s preferred side (watch), front pants pocket (phone)
Folder Content:
Data folder: continuous sensor data recordings
Timestamps folder: annotated transitions between activities
Materials: TOLIFE wearable platform (smartphone, smartwatch, smart shoes)
Acquisition Protocol:
Participants: 38 COPD patients (23 M, 15 F), age: 63.3 ± 5.8 years
Duration: continuous data collection over two weeks
Initial Reference Visit (RV):
- 2 clinical 6MWTs performed by a pulmonologist, average distance used as a reference value
- Instructions: participants used the devices freely (indoors/outdoors) in comfortable conditions
Folder Content:
Data folder: two weeks of wearable sensor recordings per patient
Clinical Six_Minute_Walking_Distance.csv: distances from RV
Citation:
When using any portion of this dataset collection, please cite:
Zanoletti, M.; Bufano, P.; Bossi, F.; Di Rienzo, F.; Marinai, C.; Rho, G.; Vallati, C.; Carbonaro, N.; Greco, A.; Laurino, M.; et al. Combining Different Wearable Devices to Assess Gait Speed in Real-World Settings. Sensors 2024, 24, 3205. https://doi.org/10.3390/s24103205
Facebook
TwitterABSTRACT: With the popularization of low-cost mobile and wearable sensors, prior studies have utilized such sensors to track and analyze people's mental well-being, productivity, and behavioral patterns. However, there still is a lack of open datasets collected in-the-wild contexts with affective and cognitive state labels such as emotion, stress, and attention, which would limit the advances of research in affective computing and human-computer interaction. This work presents K-EmoPhone, an in-the-wild multi-modal dataset collected from 77 university students for seven days. This dataset contains (i) continuous probing of peripheral physiological signals and mobility data measured by commercial off-the-shelf devices; (ii) context and interaction data collected from individuals' smartphones; and (iii) 5,582 self-reported affect states, such as emotion, stress, attention, and disturbance, acquired by the experience sampling method. We anticipate that the presented dataset will contribute to the advancement of affective computing, emotion intelligence technologies, and attention management based on mobile and wearable sensor data.
|
Last update: Apr. 12, 2023 ----------------------------- * Version 1.1.2 (Jun. 3, 2023)
* Version 1.1.1 (Apr. 12, 2023)
* Version 1.1.0 (Feb. 5, 2023)
* Version 1.0.0 (Aug. 3, 2022)
|
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LifeSnaps Dataset Documentation Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction. The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication. Data Import: Reading CSV For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command. Data Import: Setting up a MongoDB (Recommended) To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database. To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here. For the Fitbit data, run the following: mongorestore --host localhost:27017 -d rais_anonymized -c fitbit
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Case Study: How Wellness Technology Companies Play It Smart – Analyzing FitBit Fitness Tracker Data
Overview In today’s fast-paced world, wellness technology companies like FitBit are revolutionizing how individuals track and improve their health. This project explores a real-world dataset from FitBit fitness trackers, focusing on understanding user behavior, activity patterns, and the role of wearable technology in promoting healthier lifestyles. By analyzing metrics such as daily steps, calories burned, sleep patterns, and active minutes, this project uncovers actionable insights that wellness companies can leverage to enhance product features, personalize user experiences, and drive customer engagement.
Key Objectives -Understand User Behavior : Analyze daily activity metrics (e.g., steps, distance, calories burned) to identify trends and patterns among users. -Evaluate Sleep and Activity Correlations : Explore the relationship between sleep quality and physical activity to highlight the importance of balanced wellness routines. -Segment Users : Group users based on their activity levels and usage patterns to inform targeted marketing strategies and product personalization. -Provide Business Insights : Translate data-driven findings into actionable recommendations for wellness technology companies to optimize their offerings and play "smart" in a competitive market.
Why This Project Matters -This project showcases my ability to: -Clean, analyze, and interpret complex datasets. -Use data visualization to communicate insights effectively. -Derive actionable business recommendations from data-driven findings. -Apply statistical and machine learning techniques to solve real-world problems.
Facebook
TwitterThis dataset is made available under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). See LICENSE.pdf for details.
Dataset description
Parquet file, with:
The file is indexed on [participant]_[month], such that 34_12 means month 12 from participant 34. All participant IDs have been replaced with randomly generated integers and the conversion table deleted.
Column names and explanations are included as a separate tab-delimited file. Detailed descriptions of feature engineering are available from the linked publications.
File contains aggregated, derived feature matrix describing person-generated health data (PGHD) captured as part of the DiSCover Project (https://clinicaltrials.gov/ct2/show/NCT03421223). This matrix focuses on individual changes in depression status over time, as measured by PHQ-9.
The DiSCover Project is a 1-year long longitudinal study consisting of 10,036 individuals in the United States, who wore consumer-grade wearable devices throughout the study and completed monthly surveys about their mental health and/or lifestyle changes, between January 2018 and January 2020.
The data subset used in this work comprises the following:
From these input sources we define a range of input features, both static (defined once, remain constant for all samples from a given participant throughout the study, e.g. demographic features) and dynamic (varying with time for a given participant, e.g. behavioral features derived from consumer-grade wearables).
The dataset contains a total of 35,694 rows for each month of data collection from the participants. We can generate 3-month long, non-overlapping, independent samples to capture changes in depression status over time with PGHD. We use the notation ‘SM0’ (sample month 0), ‘SM1’, ‘SM2’ and ‘SM3’ to refer to relative time points within each sample. Each 3-month sample consists of: PHQ-9 survey responses at SM0 and SM3, one set of screener survey responses, LMC survey responses at SM3 (as well as SM1, SM2, if available), and wearable PGHD for SM3 (and SM1, SM2, if available). The wearable PGHD includes data collected from 8 to 14 days prior to the PHQ-9 label generation date at SM3. Doing this generates a total of 10,866 samples from 4,036 unique participants.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Crowd-sourced Fitbit datasets 03.12.2016-05.12.2016 was part of a study done by Brinton J, Keating M, Ortiz A, Evenson K, Furberg R for a study titled: Establishing Linkages Between Distributed Survey Responses and Consumer Wearable Device Datasets: A Pilot Protocol. Their goal was to create methods for remotely collecting data and establishing a link between self-reported survey responses and biometric data from consumer wearable devices, resulting in a de-identified and linked dataset.
As described by the research team, a total of 30* participants who use a Fitbit were recruited on Mechanical Turk Prime and asked to complete a short online self-administered questionnaire.
The participants were asked to connect their personal Fitbit activity tracker to an online third-party software system, called Fitabase, which allowed the research team to access a 1 month’s retrospective data and 1 month’s prospective data, both from the date of consent.
The original complete data is uploaded to Zenodo by Furberg, Robert; Brinton, Julia; Keating, Michael ; Ortiz, Alexa, on 31/05/2016 and it was divided into two folders, each folder contains roughly one month's worth of user-submitted data:
The dataset uploaded to Kaggle by Möbius is only the first folder named (mturkfitbitexport4.12.16-5.12.16.zip), which contains data from 12/04/2016 to 12/05/2016, and this dataset contains the retrospective data (mturkfitbit_export_3.12.16-4.11.16.zip).
This notebook contains a cleaned merged dataset that spans the dates 12/03/2016 to 12/05/2016. If you prefer not to use the notebook output and want to clean the data yourself, keep in mind that the file dailyActivity merged.csv was not properly merged and does not contain all of the data available in the minute CSV files. For more information on this check, see 4.1.3 in this notebook.
If you're looking for more recent fitness activity data, look check PMDATA and this notebook..
*Although 35 people signed up for the study, only 30 were chosen by the research team.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Bitbrain Open Access Sleep (BOAS) dataset.
This project aimed at bridging the gap between gold-standard clinical sleep monitoring and emerging wearable EEG technologies. The dataset contains data from 128 nights in which participants were simultaneously monitored with two technologies: a Brain Quick Plus Evolution PSG system by Micromed and a wearable EEG headband by Bitbrain. The Micromed PSG system records a comprehensive and clinically validated set of physiological sleep parameters, while the Bitbrain wearable EEG headband offers a user-friendly, self-administered alternative, limited to forehead EEG electrodes, movement sensors, and photo-plethysmography. Data from both systems were acquired simultaneously, allowing for direct comparison and validation of the wearable EEG device against the established PSG standard. This dual-recording approach provides a rich resource for evaluating the performance and potential of wearable EEG technology in sleep studies.
Human sleep scoring: To ensure robust and reliable sleep staging, we followed a rigorous labeling process. Three expert sleep scorers independently annotated the PSG recordings following criteria developed by the American Academy of Sleep Medicine (AASM) (Berry et al., 2015). From the resulting three scorings, a consensus label was derived: each epoch of sleep data received the label scored by at least two of the scorers. In cases where all three scorers had given different labels, a fourth scorer made the final decision. This consensus labeling approach addresses the inherent variability in human-derived sleep scoring, with an estimated inter-scorer agreement of approximately 85% (Danker-Hopfe et al., 2009; Rosenberg and Van Hout, 2013).
Automatic scoring: We used the human expert consensus labels to train a deep learning model (Esparza-Iaizzo et al., 2024). By implementing a cross-validation procedure, we trained and validated the model separately on the PSG and wearable EEG datasets. The model achieved an 87.08% match between human-consensus and network-provided labels for the PSG data, and an 86.64% match for the wearable EEG data.
Our dataset includes:
Participants were members of the general population, provided written informed consent, and received economic compensation of 50€ per night.
In order to represent the general population, we recruited a broad spectrum of participants along the dimensions of age, sex, and body mass index. We did not recruit patients with particular health conditions but only excluded severe conditions that could have affected the feasibility or safety of the study. In detail, inclusion and exclusion criteria were as follows.
Inclusion criteria - Age > 18 years, - Sufficient knowledge of Spanish to understand the explanatory text, the consent form and study-related instructions.
Exclusion criteria - Current severe medical interventions or medication, - History of severe neurological or psychiatric disorders, - Severe health problems in the last 12 months (especially neurological or cardiac disorders), - Current pregnancy or nursing, - Use of psychotropic medication, benzodiazepines, gamma-hydroxybutyric acid, and similar drugs before or during the study.
The dataset is formatted according to the Brain Imaging Data Structure (BIDS). Please note that while the recordings are named from sub-1 up to sub-128, some come from the same participants. 108 unique individuals participated in the recordings, data of which can be matched using the pid (= unique participant ID) property in the file "*participants.tsv*"
The folder of each recording contains the data recorded with the PSG ("*sub-xx_task-Sleep_acq-psg_eeg.edf*") and with the wearable EEG headband ("*sub-xx_task-Sleep_acq-headband_eeg.edf*").
Channel groups
Not all recordings contain data from all available sensors. The full list of available sensors for each recording can be obtained on the "*channels.tsv*" file. Channels in this file are coded in groups: - PSG_EEG: Electroencephalography recorded with the PSG system. Channels available are F3, F4, C3, C4, O1, O2 (PSG_F3, PSG_F4, PSG_C3, PSG_C4, PSG_O1, PSG_O2). - PSG_EOG: Electrooculography signals recorded with the PSG system. The location of the EOG electrodes was lateral of the eyes; one slightly lower than the participant's left eye and one slightly higher than the participant's right eye (according to AASM guidelines). For recordings containing only one EOG channel (PSG_EOG), the electrodes were recorded as a bipolar derivation. If two EOG channels are present (PSG_EOGR, PSG_EOGL), both electrodes were referenced against the left mastoid. - PSG_EMG: Electromyography signals recorded with the PSG system. Data contain a single EMG channel (PSG_EMG), which is the result of a bipolar derivation of two chin electrodes. - PSG_BELTS: Breathing activity recorded by the PSG system using abdominal and thoracic breathing belts (PSG_ABD, PSG_THOR). - PSG_THER: Respiratory airflow recorded with the PSG system using a thermistor (PSG_THER). - PSG_CAN: Respiratory airflow recorded with the PSG system using a nasal cannula (PSG_CAN). - PSG_PPG: Photopletismographic (PPG) activity recorded with the PSG system. Channels available are pulse (PSG_PULSE), heart beat (PSG_BEAT) and oxygen saturation (PSG_SPO2). - HB_EEG: Electroencephalography recorded with the wearable EEG headband. Headband channels are approximately located at AF7 and AF8 (HB_1, HB_2). - HB_IMU: Movement activity recorded by an Inertial Measurement Unit (IMU) in the headband. Signals are derived from an accelerometer (HB_IMU_1, HB_IMU_2, HB_IMU_3) and gyroscope (HB_IMU_4, HB_IMU_5, HB_IMU_6), both recording signals for all three spatial dimensions. - HB_PULSE: Pulse activity recorded with the wearable EEG headband using a PPG sensor (HB_PULSE).
Sleep staging labels
The sleep stage labels for each recording are coded as events in corresponding event files (stage_hum and stage_ai; see above). Stages are coded as follows: - 0: Wake, - 1: NonREM sleep stage 1 (N1), - 2: NonREM sleep stage 2 (N2), - 3: NonREM sleep stage 3 (N3), - 4: REM sleep, - 8: PSG disconnections (e.g., due to bathroom breaks; human-scored only) - -2: Artifacts and missing data (AI-scored only)
Berry, R. B., Brooks, R., Gamaldo, C. E., Harding, S. M., Lloyd, R. M., Marcus, C. L., et al. (2015). The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications, Version 2.2. Darien, Illinois.
Danker-Hopfe, H., Anderer, P., Zeitlhofer, J., Boeck, M., Dorn, H., Gruber, G., et al. (2009). Interrater reliability for sleep scoring according to the Rechtschaffen & Kales and the new AASM standard. J. Sleep Res. 18, 74–84. doi: 10.1111/j.1365-2869.2008.00700.x.
Esparza-Iaizzo, M., Sierra-Torralba, M., Klinzing, J. G., Minguez, J., Montesano, L., and López-Larraz, E. (2024). Automatic sleep scoring for real-time monitoring and stimulation in individuals with and without sleep apnea. bioRxiv, 2024.06.12.597764. doi: 10.1101/2024.06.12.597764.
Rosenberg, R. S., and Van Hout, S. (2013). The American Academy of Sleep Medicine inter-scorer reliability program: sleep stage scoring. J. Clin. sleep Med. 9, 81–87. doi: 10.5664/jcsm.2350.
If you have any questions or comments, please contact:
Eduardo López-Larraz: eduardo.lopez@bitbrain.com Jens G. Klinzing: jens.klinzing@bitbrain.com
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Structure
Columns: 1. User ID: Unique identifier for each user 2. Date: Date of the activity 3. Step Count: Number of steps taken 4. Distance (km): Distance covered in kilometers 5. Calories Burned: Total calories burned 6. Active Minutes: Total minutes of physical activity 7. Workout Type: Type of workout (e.g., Running, Walking, Cycling, Swimming) 8. Duration (min): Duration of the workout in minutes 9. Heart Rate (bpm): Average heart rate during the activity 10. Sleep Duration (hours): Total hours of sleep 11. Sleep Quality: Quality of sleep (e.g., Good, Fair, Poor) 12. Water Intake (liters): Amount of water consumed 13. Calories Intake: Total calories consumed 14. Weight (kg): Weight of the user 15. Mood: Self-reported mood (e.g., Happy, Stressed, Tired) 16. Notes: Any additional notes about the day or workout
Example Entry:
| User ID | Date | Step Count | Distance (km) | Calories Burned | Active Minutes | Workout Type | Duration (min) | Heart Rate (bpm) | Sleep Duration (hours) | Sleep Quality | Water Intake (liters) | Calories Intake | Weight (kg) | Mood | Notes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2024-05-01 | 12000 | 9.6 | 500 | 60 | Running | 30 | 140 | 7 | Good | 2.5 | 2000 | 70 | Happy | Felt great during run |
| 2 | 2024-05-01 | 8000 | 6.4 | 350 | 45 | Walking | 45 | 110 | 8 | Fair | 3.0 | 1800 | 65 | Tired | Tired in the afternoon |
Data Collection Methods: 1. Wearable Devices: Smartwatches or fitness trackers can provide step count, distance, calories burned, heart rate, and active minutes. 2. Mobile Apps: Health apps can log workout types, durations, and track water and calorie intake. 3. Manual Entry: Users can manually enter sleep quality, mood, weight, and notes. 4. Integrations: Integrate with other health apps and devices for comprehensive data collection.
Usage: - Personal Fitness Tracking: Individuals can monitor their progress and adjust their routines. - Research: Anonymized datasets can be used for studies on physical activity and health outcomes. - Health Monitoring: Healthcare providers can use the data for monitoring patient health and recommending interventions.
Facebook
TwitterThe unpredictability of epileptic seizures exposes people with epilepsy to potential physical harm, restricts day-to-day activities, and impacts mental well-being. Accurate seizure forecasters would reduce the uncertainty associated with seizures but need to be feasible and accessible in the long-term. Wearable devices are perfect candidates to develop non-invasive, accessible forecasts but are yet to be investigated in long-term studies. We hypothesized that machine learning models could utilize heart rate as a biomarker for well-established cycles of seizures and epileptic activity, in addition to other wearable signals, to forecast high and low risk seizure periods. This feasibility study tracked participants' (n = 11) heart rates, sleep, and step counts using wearable smartwatches and seizure occurrence using smartphone seizure diaries for at least 6 months (mean = 14.6 months, SD = 3.8 months). Eligible participants had a diagnosis of refractory epilepsy and reported at least 20 seizures (mean = 135, SD = 123) during the recording period. An ensembled machine learning and neural network model estimated seizure risk either daily or hourly, with retraining occurring on a weekly basis as additional data was collected. Performance was evaluated retrospectively against a rate-matched random forecast using the area under the receiver operating curve. A pseudo-prospective evaluation was also conducted on a held-out dataset. Of the 11 participants, seizures were predicted above chance in all (100%) participants using an hourly forecast and in ten (91%) participants using a daily forecast. The average time spent in high risk (prediction time) before a seizure occurred was 37 min in the hourly forecast and 3 days in the daily forecast. Cyclic features added the most predictive value to the forecasts, particularly circadian and multiday heart rate cycles. Wearable devices can be used to produce patient-specific seizure forecasts, particularly when biomarkers of seizure and epileptic activity cycles are utilized.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
The accelerometer case study dataset refers to a collection of data generated by accelerometers, which are sensors commonly used to measure and record acceleration forces. This dataset is specifically designed for studying human motion and activity patterns.
The dataset captures various physical activities and movements performed by individuals while wearing accelerometer devices. It records acceleration data in multiple axes, such as x, y, and z, providing a detailed representation of the forces experienced during different activities.
Researchers and data scientists can utilize the accelerometer case study dataset to analyze and understand human motion and activity recognition. By examining the dataset, they can explore patterns, correlations, and trends related to specific activities, such as walking, running, sitting, or even more complex movements like jumping or cycling.
The accelerometer dataset serves as a valuable resource for developing and evaluating algorithms and models for activity recognition, gait analysis, fitness tracking, and other applications in the fields of healthcare, sports science, and wearable technology. Researchers can also use this dataset to investigate the impact of various factors, such as age, gender, or environmental conditions, on human movement patterns.
By leveraging the accelerometer case study dataset, researchers can gain insights into human behavior, identify abnormal movement patterns, monitor physical activity levels, and design personalized interventions for promoting healthy lifestyles. Additionally, the dataset can aid in the development of innovative solutions for activity tracking, fall detection, and other applications aimed at improving human well-being and quality of life.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Dataset for Continuous Stress Monitoring of Hospital Nurses
The growing accessibility of wearable tech has opened doors to continuously monitor various physiological factors. Detecting stress early has become pivotal, aiding individuals in proactively managing their health against the detrimental effects of prolonged stress exposure. This paper presents an exclusive stress detection dataset cultivated within the natural environment of a hospital. Compiled during the COVID-19 outbreak, this dataset encompasses the biometric data of nurses. Analyzing stress in a workplace setting is intricate due to the multifaceted social, cultural, and psychological elements inherent in dealing with stressful circumstances. Hence, our dataset not only encompasses physiological data but also contextual information surrounding stress events. Key physiological metrics such as electrodermal activity, heart rate, and skin temperature of the nurse subjects were continuously monitored. Additionally, a periodic survey administered via smartphones captured contributing factors linked to detected stress events. The database housing these signals, stress occurrences, and survey responses is publicly accessible on Dryad.
Project Overview This project delves into leveraging wearable device-derived physiological signals to gauge stress levels among nurses operating within a hospital environment. The dataset comprises details acquired from nurses wearing watches that tracked their heart rate, skin temperature, and electrodermal activity (EDA) while simultaneously reporting their stress levels.
The primary goal revolves around evaluating various machine learning models to forecast stress levels based on recorded physiological signals. Additionally, the project investigates the most pertinent physiological indicators for stress detection and offers insights to enhance the accuracy and dependability of stress detection via wearable tech.
Dataset Description:
Data Collection Context: Period: Data gathered over one week from 15 female nurses aged 30 to 55 years, during regular shifts at a hospital. Collection Phases: Two phases - Phase-I (April 15, 2020, to August 6, 2020) and Phase-II (October 8, 2020, to December 11, 2020). Exclusion Criteria: Pregnancy, heavy smoking, mental disorders, chronic or cardiovascular diseases.
Data Captured: Physiological Variables Monitored: Electrodermal activity, Heart Rate, and skin temperature of the nurse subjects. Survey Responses: Periodic smartphone-administered surveys capturing contributing factors to detected stress events. Measurement Technologies: Utilized Empatica E4 for data collection, specifically focusing on Galvanic Skin Response and Blood Volume Pulse (BVP) readings.
Study Procedure: Approval: University's Institutional Review Board approved the study protocol (FA19–50 INFOR). Consent and Enrollment: Nurse subjects were enrolled after expressing interest and obtaining hospital compliance. Study Design: Conducted in three phases, each including 7 nurses. No incentives were provided, and anonymization of data was ensured.
Data Availability: Public Release: A database containing signals, stress events, and survey responses is publicly available on Dryad. Anonymization: Unique identifiers assigned to subjects to maintain anonymity.
Merge CSV File Information: This dataset comprises approximately 11.5 million entries across nine columns: X, Y, Z: Orientation data (256 unique entries each). EDA, HR, TEMP: Physiological measurements (EDA: 274,452 unique, HR: 6,268 unique, TEMP: 599 unique). id: 18 categorical identifiers. datetime: Extensive date and time entries (10.6 million unique). label: Categorical states or classes (three unique entries). The dataset offers a wide array of continuous physiological measurements alongside orientation data, facilitating stress detection, health monitoring, and related research endeavours.
Requirements Python 3.7 or higher and Jupyter Notebook are prerequisites. The necessary Python packages are enumerated in the requirements.txt file. To execute the code, installation of the following libraries is mandatory: pandas, numpy, sci-kit-learn, and matplotlib.