8 datasets found
  1. RUNNING"calorie:heartrate

    • kaggle.com
    zip
    Updated Jan 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    romechris34 (2022). RUNNING"calorie:heartrate [Dataset]. https://www.kaggle.com/datasets/romechris34/wellness
    Explore at:
    zip(25272804 bytes)Available download formats
    Dataset updated
    Jan 6, 2022
    Authors
    romechris34
    Description

    title: 'BellaBeat Fitbit' author: 'C Romero' date: 'r Sys.Date()' output: html_document: number_sections: true

    toc: true

    ##Installation of the base package for data analysis tool
    install.packages("base")
    
    ##Installation of the ggplot2 package for data analysis tool
    install.packages("ggplot2")
    
    ##install Lubridate is an R package that makes it easier to work with dates and times.
    install.packages("lubridate")
    ```{r}
    
    ##Installation of the tidyverse package for data analysis tool
    install.packages("tidyverse")
    
    ##Installation of the tidyr package for data analysis tool
    install.packages("dplyr")
    
    ##Installation of the readr package for data analysis tool
    install.packages("readr")
    
    ##Installation of the tidyr package for data analysis tool
    install.packages("tidyr")
    

    Importing packages

    metapackage of all tidyverse packages

    library(base) library(lubridate)# make dealing with dates a little easier library(ggplot2)# create elegant data visialtions using the grammar of graphics library(dplyr)# a grammar of data manpulation library(readr)# read rectangular data text library(tidyr)

    
    ## Running code
    
    In a notebook, you can run a single code cell by clicking in the cell and then hitting 
    the blue arrow to the left, or by clicking in the cell and pressing Shift+Enter. In a script, 
    you can run code by highlighting the code you want to run and then clicking the blue arrow
    at the bottom of this window.
    
    ## Reading in files
    
    
    ```{r}
    list.files(path = "../input")
    
    # load the activity and sleep data set
    ```{r}
    dailyActivity <- read_csv("../input/wellness/dailyActivity_merge.csv")
    sleepDay <- read_csv("../input/wellness/sleepDay_merged.csv")
    
    

    check for duplicates and na

    sum(duplicated(dailyActivity)) sum(duplicated(sleepDay)) sum(is.na(dailyActivity)) sum(is.na(sleepDay))

    now we will remove duplicate from sleep & create new dataframe

    sleepy <- sleepDay %>% distinct() head(sleepy) head(dailyActivity)

    count number of id's total sleepy & dailyActivity frames

    n_distinct(dailyActivity$Id) n_distinct(sleepy$Id)

    get total sum steps for each member id

    dailyActivity %>% group_by(Id) %>% summarise(freq = sum(TotalSteps)) %>% arrange(-freq) Tot_dist <- dailyActivity %>% mutate(Id = as.character(dailyActivity$Id)) %>% group_by(Id) %>% summarise(dizzy = sum(TotalDistance)) %>% arrange(-dizzy)

    now get total min sleep & lie in bed

    sleepy %>% group_by(Id) %>% summarise(Msleep = sum(TotalMinutesAsleep)) %>% arrange(Msleep) sleepy %>% group_by(Id) %>% summarise(inBed = sum(TotalTimeInBed)) %>% arrange(inBed)

    plot graph for "inbed and sleep data" & "total steps and distance"

    ggplot(Tot_dist) + 
     geom_count(mapping = aes(y= dizzy, x= Id, color = Id, fill = Id, size = 2)) +
     labs(x = "member id's", title = "distance miles" ) +
     theme(axis.text.x = element_text(angle = 90)) 
     ```
    
  2. f

    Results of the P2C2M.Skyline on empirical datasets.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emanuel M. Fonseca; Drew J. Duckett; Filipe G. Almeida; Megan L. Smith; Maria Tereza C. Thomé; Bryan C. Carstens (2023). Results of the P2C2M.Skyline on empirical datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0269438.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Emanuel M. Fonseca; Drew J. Duckett; Filipe G. Almeida; Megan L. Smith; Maria Tereza C. Thomé; Bryan C. Carstens
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Asterisk indicates datasets with p-value < 0.05.

  3. Dataset (Covid-Bacterial-Viral-Normal-Emphysema)

    • kaggle.com
    zip
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nhật Nguyễn Minh (2024). Dataset (Covid-Bacterial-Viral-Normal-Emphysema) [Dataset]. https://www.kaggle.com/datasets/minhnhat232/dataset-covid-bacterial-viral-normal-emphysema
    Explore at:
    zip(7209133657 bytes)Available download formats
    Dataset updated
    Jun 13, 2024
    Authors
    Nhật Nguyễn Minh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset contain lung x-ray image including:

    1. Normal - 3,270 images
    2. Covid-19 - 3,017 images
    3. Viral-pneumonia - 3,013 images
    4. Bacterial-pneumonia - 3,000 images
    5. Emphysema - 2,550 images

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15315323%2F8041ddd2485bfe9cdf2ba1f9d96bd7e5%2F6_Class_Img.jpg?generation=1741951756137022&alt=media" alt="">

    The dataset we use is compiled from many reputable sources including: Dataset 1 [1]: This dataset includes four classes of diseases: COVID-19, viral pneumonia, bacterial pneumonia, and normal. It has multiple versions, and we are currently using the latest version (version 4). Previous studies, such as those by Hariri et al. [18] and Ahmad et al. [20], have also utilized earlier versions of this dataset. Dataset 2 [2]: This dataset is from the National Institutes of Health (NIH) Chest X-Ray Dataset, which contains over 100,000 chest X-ray images from over 30,000 patients. It includes 14 disease classes, including conditions like atelectasis, consolidation, and infiltration. For this study, we have selected 2,550 chest X-ray images specifically from the Emphysema class. Dataset 3 [3]: This is the COVQU dataset, which we have extended to include two additional classes: COVID-19 and viral pneumonia. This dataset has been widely used in previous studies by M.E.H. Chowdhury et al. [4] and Rahman T et al. [5], establishing its reputation as a reliable resource.

    In addition, we also publish a modified dataset that aims to remove image regions that do not contain lungs (abdomen, arms, etc.).

    References: [1] U. Sait, K. G. Lal, S. P. Prajapati, R. Bhaumik, T. Kumar, S. Shivakumar, K. Bhalla, Curated dataset for covid-19 posterior-anterior chest radiography images (x-rays)., Mendeley Data V4 (2022). doi:10.17632/9xkhgts2s6.4. [2] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, R. M. Summers, Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases (2017) 3462–3471. doi:10.1109/CVPR.2017.369. [3] A. M. Tahir, M. E. Chowdhury, A. Khandakar, T. Rahman, Y. Qiblawey, U. Khurshid, S. Kiranyaz, N. Ibtehaz, M. S. Rahman, S. Al-Maadeed,S. Mahmud, M. Ezeddin, K. Hameed, T. Hamid, Covid-19 infection localization and severity grading from chest x-ray images, Computers in Biology and Medicine 139 (2021) 105002. URL: https://www.sciencedirect.com/science/article/pii/S0010482521007964. doi:https://doi.org/10.1016/j.compbiomed.2021.105002. [4] M. E. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M. A. Kadir, Z. B. Mahbub, K. R. Islam, M. S. Khan, A. Iqbal, N. A. Emadi, M. B. I. Reaz, M. T. Islam, Can ai help in screening viral and covid-19 pneumonia?, IEEE Access 8 (2020) 132665–132676. doi:10.1109/ACCESS.2020.3010287. [5] T. Rahman, A. Khandakar, Y. Qiblawey, A. Tahir, S. Kiranyaz, S. B. A. Kashem, M. T. Islam, S. A. Maadeed, S. M. Zughaier, M. S. Khan, M. E. Chowdhury, Exploring the effect of image enhancement techniques on covid-19 detection using chest x-ray images, Computers in Biology and Medicine 132 (2021). doi:10.1016/j.compbiomed.2021.104319.

  4. Example of how to manually extract incubation bouts from interactive plots...

    • figshare.com
    txt
    Updated Jan 22, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Bulla (2016). Example of how to manually extract incubation bouts from interactive plots of raw data - R-CODE and DATA [Dataset]. http://doi.org/10.6084/m9.figshare.2066784.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 22, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Martin Bulla
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    {# General information# The script runs with R (Version 3.1.1; 2014-07-10) and packages plyr (Version 1.8.1), XLConnect (Version 0.2-9), utilsMPIO (Version 0.0.25), sp (Version 1.0-15), rgdal (Version 0.8-16), tools (Version 3.1.1) and lattice (Version 0.20-29)# --------------------------------------------------------------------------------------------------------# Questions can be directed to: Martin Bulla (bulla.mar@gmail.com)# -------------------------------------------------------------------------------------------------------- # Data collection and how the individual variables were derived is described in: #Steiger, S.S., et al., When the sun never sets: diverse activity rhythms under continuous daylight in free-living arctic-breeding birds. Proceedings of the Royal Society B: Biological Sciences, 2013. 280(1764): p. 20131016-20131016. # Dale, J., et al., The effects of life history and sexual selection on male and female plumage colouration. Nature, 2015. # Data are available as Rdata file # Missing values are NA. # --------------------------------------------------------------------------------------------------------# For better readability the subsections of the script can be collapsed # --------------------------------------------------------------------------------------------------------}{# Description of the method # 1 - data are visualized in an interactive actogram with time of day on x-axis and one panel for each day of data # 2 - red rectangle indicates the active field, clicking with the mouse in that field on the depicted light signal generates a data point that is automatically (via custom made function) saved in the csv file. For this data extraction I recommend, to click always on the bottom line of the red rectangle, as there is always data available due to a dummy variable ("lin") that creates continuous data at the bottom of the active panel. The data are captured only if greenish vertical bar appears and if new line of data appears in R console). # 3 - to extract incubation bouts, first click in the new plot has to be start of incubation, then next click depict end of incubation and the click on the same stop start of the incubation for the other sex. If the end and start of incubation are at different times, the data will be still extracted, but the sex, logger and bird_ID will be wrong. These need to be changed manually in the csv file. Similarly, the first bout for a given plot will be always assigned to male (if no data are present in the csv file) or based on previous data. Hence, whenever a data from a new plot are extracted, at a first mouse click it is worth checking whether the sex, logger and bird_ID information is correct and if not adjust it manually. # 4 - if all information from one day (panel) is extracted, right-click on the plot and choose "stop". This will activate the following day (panel) for extraction. # 5 - If you wish to end extraction before going through all the rectangles, just press "escape". }{# Annotations of data-files from turnstone_2009_Barrow_nest-t401_transmitter.RData dfr-- contains raw data on signal strength from radio tag attached to the rump of female and male, and information about when the birds where captured and incubation stage of the nest1. who: identifies whether the recording refers to female, male, capture or start of hatching2. datetime_: date and time of each recording3. logger: unique identity of the radio tag 4. signal_: signal strength of the radio tag5. sex: sex of the bird (f = female, m = male)6. nest: unique identity of the nest7. day: datetime_ variable truncated to year-month-day format8. time: time of day in hours9. datetime_utc: date and time of each recording, but in UTC time10. cols: colors assigned to "who"--------------------------------------------------------------------------------------------------------m-- contains metadata for a given nest1. sp: identifies species (RUTU = Ruddy turnstone)2. nest: unique identity of the nest3. year_: year of observation4. IDfemale: unique identity of the female5. IDmale: unique identity of the male6. lat: latitude coordinate of the nest7. lon: longitude coordinate of the nest8. hatch_start: date and time when the hatching of the eggs started 9. scinam: scientific name of the species10. breeding_site: unique identity of the breeding site (barr = Barrow, Alaska)11. logger: type of device used to record incubation (IT - radio tag)12. sampling: mean incubation sampling interval in seconds--------------------------------------------------------------------------------------------------------s-- contains metadata for the incubating parents1. year_: year of capture2. species: identifies species (RUTU = Ruddy turnstone)3. author: identifies the author who measured the bird4. nest: unique identity of the nest5. caught_date_time: date and time when the bird was captured6. recapture: was the bird capture before? (0 - no, 1 - yes)7. sex: sex of the bird (f = female, m = male)8. bird_ID: unique identity of the bird9. logger: unique identity of the radio tag --------------------------------------------------------------------------------------------------------}

  5. Google Data Analytics Case Study Cyclistic

    • kaggle.com
    zip
    Updated Sep 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Udayakumar19 (2022). Google Data Analytics Case Study Cyclistic [Dataset]. https://www.kaggle.com/datasets/udayakumar19/google-data-analytics-case-study-cyclistic/suggestions
    Explore at:
    zip(1299 bytes)Available download formats
    Dataset updated
    Sep 27, 2022
    Authors
    Udayakumar19
    Description

    Introduction

    Welcome to the Cyclistic bike-share analysis case study! In this case study, you will perform many real-world tasks of a junior data analyst. You will work for a fictional company, Cyclistic, and meet different characters and team members. In order to answer the key business questions, you will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Along the way, the Case Study Roadmap tables — including guiding questions and key tasks — will help you stay on the right path.

    Scenario

    You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.

    Ask

    How do annual members and casual riders use Cyclistic bikes differently?

    Guiding Question:

    What is the problem you are trying to solve?
      How do annual members and casual riders use Cyclistic bikes differently?
    How can your insights drive business decisions?
      The insight will help the marketing team to make a strategy for casual riders
    

    Prepare

    Guiding Question:

    Where is your data located?
      Data located in Cyclistic organization data.
    
    How is data organized?
      Dataset are in csv format for each month wise from Financial year 22.
    
    Are there issues with bias or credibility in this data? Does your data ROCCC? 
      It is good it is ROCCC because data collected in from Cyclistic organization.
    
    How are you addressing licensing, privacy, security, and accessibility?
      The company has their own license over the dataset. Dataset does not have any personal information about the riders.
    
    How did you verify the data’s integrity?
      All the files have consistent columns and each column has the correct type of data.
    
    How does it help you answer your questions?
      Insights always hidden in the data. We have the interpret with data to find the insights.
    
    Are there any problems with the data?
      Yes, starting station names, ending station names have null values.
    

    Process

    Guiding Question:

    What tools are you choosing and why?
      I used R studio for the cleaning and transforming the data for analysis phase because of large dataset and to gather experience in the language.
    
    Have you ensured the data’s integrity?
     Yes, the data is consistent throughout the columns.
    
    What steps have you taken to ensure that your data is clean?
      First duplicates, null values are removed then added new columns for analysis.
    
    How can you verify that your data is clean and ready to analyze? 
     Make sure the column names are consistent thorough out all data sets by using the “bind row” function.
    
    Make sure column data types are consistent throughout all the dataset by using the “compare_df_col” from the “janitor” package.
    Combine the all dataset into single data frame to make consistent throught the analysis.
    Removed the column start_lat, start_lng, end_lat, end_lng from the dataframe because those columns not required for analysis.
    Create new columns day, date, month, year, from the started_at column this will provide additional opportunities to aggregate the data
    Create the “ride_length” column from the started_at and ended_at column to find the average duration of the ride by the riders.
    Removed the null rows from the dataset by using the “na.omit function”
    Have you documented your cleaning process so you can review and share those results? 
      Yes, the cleaning process is documented clearly.
    

    Analyze Phase:

    Guiding Questions:

    How should you organize your data to perform analysis on it? The data has been organized in one single dataframe by using the read csv function in R Has your data been properly formatted? Yes, all the columns have their correct data type.

    What surprises did you discover in the data?
      Casual member ride duration is higher than the annual members
      Causal member widely uses docked bike than the annual members
    What trends or relationships did you find in the data?
      Annual members are used mainly for commute purpose
      Casual member are preferred the docked bikes
      Annual members are preferred the electric or classic bikes
    How will these insights help answer your business questions?
      This insights helps to build a profile for members
    

    Share

    Guiding Quesions:

    Were you able to answer the question of how ...
    
  6. FacialRecognition

    • kaggle.com
    zip
    Updated Dec 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TheNicelander (2016). FacialRecognition [Dataset]. https://www.kaggle.com/petein/facialrecognition
    Explore at:
    zip(121674455 bytes)Available download formats
    Dataset updated
    Dec 1, 2016
    Authors
    TheNicelander
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    #https://www.kaggle.com/c/facial-keypoints-detection/details/getting-started-with-r #################################

    ###Variables for downloaded files data.dir <- ' ' train.file <- paste0(data.dir, 'training.csv') test.file <- paste0(data.dir, 'test.csv') #################################

    ###Load csv -- creates a data.frame matrix where each column can have a different type. d.train <- read.csv(train.file, stringsAsFactors = F) d.test <- read.csv(test.file, stringsAsFactors = F)

    ###In training.csv, we have 7049 rows, each one with 31 columns. ###The first 30 columns are keypoint locations, which R correctly identified as numbers. ###The last one is a string representation of the image, identified as a string.

    ###To look at samples of the data, uncomment this line:

    head(d.train)

    ###Let's save the first column as another variable, and remove it from d.train: ###d.train is our dataframe, and we want the column called Image. ###Assigning NULL to a column removes it from the dataframe

    im.train <- d.train$Image d.train$Image <- NULL #removes 'image' from the dataframe

    im.test <- d.test$Image d.test$Image <- NULL #removes 'image' from the dataframe

    ################################# #The image is represented as a series of numbers, stored as a string #Convert these strings to integers by splitting them and converting the result to integer

    #strsplit splits the string #unlist simplifies its output to a vector of strings #as.integer converts it to a vector of integers. as.integer(unlist(strsplit(im.train[1], " "))) as.integer(unlist(strsplit(im.test[1], " ")))

    ###Install and activate appropriate libraries ###The tutorial is meant for Linux and OSx, where they use a different library, so: ###Replace all instances of %dopar% with %do%.

    install.packages('foreach')

    library("foreach", lib.loc="~/R/win-library/3.3")

    ###implement parallelization im.train <- foreach(im = im.train, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } im.test <- foreach(im = im.test, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } #The foreach loop will evaluate the inner command for each row in im.train, and combine the results with rbind (combine by rows). #%do% instructs R to do all evaluations in parallel. #im.train is now a matrix with 7049 rows (one for each image) and 9216 columns (one for each pixel):

    ###Save all four variables in data.Rd file ###Can reload them at anytime with load('data.Rd')

    save(d.train, im.train, d.test, im.test, file='data.Rd')

    load('data.Rd')

    #each image is a vector of 96*96 pixels (96*96 = 9216). #convert these 9216 integers into a 96x96 matrix: im <- matrix(data=rev(im.train[1,]), nrow=96, ncol=96)

    #im.train[1,] returns the first row of im.train, which corresponds to the first training image. #rev reverse the resulting vector to match the interpretation of R's image function #(which expects the origin to be in the lower left corner).

    #To visualize the image we use R's image function: image(1:96, 1:96, im, col=gray((0:255)/255))

    #Let’s color the coordinates for the eyes and nose points(96-d.train$nose_tip_x[1], 96-d.train$nose_tip_y[1], col="red") points(96-d.train$left_eye_center_x[1], 96-d.train$left_eye_center_y[1], col="blue") points(96-d.train$right_eye_center_x[1], 96-d.train$right_eye_center_y[1], col="green")

    #Another good check is to see how variable is our data. #For example, where are the centers of each nose in the 7049 images? (this takes a while to run): for(i in 1:nrow(d.train)) { points(96-d.train$nose_tip_x[i], 96-d.train$nose_tip_y[i], col="red") }

    #there are quite a few outliers -- they could be labeling errors. Looking at one extreme example we get this: #In this case there's no labeling error, but this shows that not all faces are centralized idx <- which.max(d.train$nose_tip_x) im <- matrix(data=rev(im.train[idx,]), nrow=96, ncol=96) image(1:96, 1:96, im, col=gray((0:255)/255)) points(96-d.train$nose_tip_x[idx], 96-d.train$nose_tip_y[idx], col="red")

    #One of the simplest things to try is to compute the mean of the coordinates of each keypoint in the training set and use that as a prediction for all images colMeans(d.train, na.rm=T)

    #To build a submission file we need to apply these computed coordinates to the test instances: p <- matrix(data=colMeans(d.train, na.rm=T), nrow=nrow(d.test), ncol=ncol(d.train), byrow=T) colnames(p) <- names(d.train) predictions <- data.frame(ImageId = 1:nrow(d.test), p) head(predictions)

    #The expected submission format has one one keypoint per row, but we can easily get that with the help of the reshape2 library:

    install.packages('reshape2')

    library(...

  7. Energy Expenditure of Human Physical Activity

    • kaggle.com
    zip
    Updated Oct 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Desquens (2023). Energy Expenditure of Human Physical Activity [Dataset]. https://www.kaggle.com/datasets/anonymousds/energy-expenditure-of-human-physical-activity
    Explore at:
    zip(5061744 bytes)Available download formats
    Dataset updated
    Oct 15, 2023
    Authors
    David Desquens
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    🗂️ Data source

    This dataset is built from the data underlying two scientific articles (1)(2).

    The individuals were selected via a paper advertisement and they had to meet the following criteria: 1. Be older than 60 years of age; 1. Have a BMI between 23 and 35 kg/m2; 1. Not being restricted in their movements by health conditions; 1. Bring their own bicycle.

    The selected participants received €50 for their contribution to the study and agreed to the use of recorded data for scientific purposes, in an anonymised manner.
    A video example of the data collection can be found in youtube.

    ⚙️ Data processing

    • I have personally consolidated and merged the resulting csv output with an R script.
    • From the 35 participants only the 31 who used the calorimetry were kept with their associated Energy Expenditure measurements.

    💡 Inspiration

    I am a passionate individual about physical activity in general and I was very curious to gather and explore some data related to this field that would quantify the Energy Expenditure of different indoor and outdoor activities of daily living with low (lying down, sitting), mid (standing, household activities) and high (walking and cycling) levels of intensity.

    🔍 Data overview

    • The dataset encompasses ~40K records with the participants' attributes and the physical activity features.
    • Each observation represents a physical activity performed by one of the 31 individuals that used the calorimetry.
    • The first twelve columns (ID:cosmed) are attributes related to the participant, so are consistent accross observations.
    • The rest of the columns (EEm:predicted_activity_label) are features related to a single physical activity.

    🔢 Columns

    NameDescription
    IDparticipant's ID
    trial_datedate and time when data collection started at ID level
    gendersex = male or female
    agein years
    weightin kg
    heightin cm
    bmiBody mass index in kg/m
    gaAnkleTRUE if data from GENEActiv on the ankle exist, FALSE otherwise
    gaChestTRUE if data from GENEActiv on the chest exist, FALSE otherwise
    gaWristTRUE if data from GENEActiv on the wrist exist, FALSE otherwise
    equivitalTRUE if data from Equivital exist, FALSE otherwise
    cosmedTRUE if data from COSMED exist, FALSE otherwise
    EEmEnergy Expenditure per minute, in Kcal
    COSMEDset_rowthe original indexes of COSMED data (used for merging)
    EEhEnergy Expenditure per hour, in Kcal
    EEtotTotal Kcal spent (it is reseted between indoor and outdoor measurements)
    METSMetabolic Equivalent per minute
    RfRespiratory Frequency (litre/min)
    BRBreath Rate
    VTTidal Volume in litre
    VEExpiratory Minute Ventilation (litre/min)
    VO2Oxygen Uptake (ml/min)
    VCO2Carbon Dioxide production (ml/min)
    O2expVolume of O2 expired (ml/min)
    CO2expVolume of CO2 expired (ml/min)
    FeO2Averaged expiratory concentration of O2 (%)
    FeCO2Averaged expiratory concentration of CO2 (%)
    FiO2Fraction of inspired O2 (%)
    FiCO2Fraction of inspired CO2 (%)
    VE.VO2Ventilatory equivalent for O2
    VE.VCO2Ventilatory equivalent for CO2
    RRespiratory Quotient
    TiDuration of Inspiration (seconds)
    TeDuration of Expiration (seconds)
    TtotDuration of Total breathing cycle (seconds)
    VO2.HROxygen pulse (ml/beat)
    HRHeart Rate
    QtCardiac output (litre)
    SVStroke volume (litre/min)
    original_activity_labelsTrue activity label as noted from study protocol, NA if is unknown
    predicted_activity_labelPredicted activity label by model from [1], NA if is unknown

    🔀 Data usage

    • Exploratory Data Analysis: Which insights we can extract from the data?
    • Classification: Are you able to better classify the kind of activity versus the original model?
    • Prediction: Are you capable of improving the accuracy from the original model?
    • Inference: Which variables explain the Energy Expenditure?

    🖲️ Study devices and their body location

    https://media.springernature.com/full/springer-static/image/art%3A10.1007%2Fs11257-020-09268-2/MediaObjects/11257_2020_9268_Fig3_HTML.png?as=webp" alt="">

  8. f

    Supplement 1. A summary of data and a list of references.

    • figshare.com
    • wiley.figshare.com
    html
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bastien Castagneyrol; Hervé Jactel (2023). Supplement 1. A summary of data and a list of references. [Dataset]. http://doi.org/10.6084/m9.figshare.3554256.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Wiley
    Authors
    Bastien Castagneyrol; Hervé Jactel
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    File List MA_Data.txt (md5: 0e737a7606064d123a53ae961663aa9b) Description MA_data.txt is a text file (tab delimited, with headers) containing a summary of data included in the meta-analysis. nr: sample size used to calculate variance of effect size, for correlations nb: sample size used to calculate variance of effect size, for slopes r: correlation coefficient b: regression slope V(b): variance of b NA: Not Available data. Difference in n: indicates whether the sample size used in the meta-analysis was different or equal to the number of replicates reported by authors. n values were sometimes "Different" when r and b had to be re-calculated from tables or digitized from figures. Type of r: r values directly extracted from original publications or re-calculated from tables or figures Source of r, b, extent: Text, figures, tables or appendices used to extract r, b or spatial extent values In the last five columns, the marks indicate the case studies used for a specific meta-analysis. E.g. Case 1 was used in the test of the spatial extent, the habitat, the taxon and the diversity metric effects but not in the test of the trophic level effect on plant-animals correlations.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
romechris34 (2022). RUNNING"calorie:heartrate [Dataset]. https://www.kaggle.com/datasets/romechris34/wellness
Organization logo

RUNNING"calorie:heartrate

Fitness~"Bellbeat"~Tracker

Explore at:
zip(25272804 bytes)Available download formats
Dataset updated
Jan 6, 2022
Authors
romechris34
Description

title: 'BellaBeat Fitbit' author: 'C Romero' date: 'r Sys.Date()' output: html_document: number_sections: true

toc: true

##Installation of the base package for data analysis tool
install.packages("base")
##Installation of the ggplot2 package for data analysis tool
install.packages("ggplot2")
##install Lubridate is an R package that makes it easier to work with dates and times.
install.packages("lubridate")
```{r}

##Installation of the tidyverse package for data analysis tool
install.packages("tidyverse")
##Installation of the tidyr package for data analysis tool
install.packages("dplyr")
##Installation of the readr package for data analysis tool
install.packages("readr")
##Installation of the tidyr package for data analysis tool
install.packages("tidyr")

Importing packages

metapackage of all tidyverse packages

library(base) library(lubridate)# make dealing with dates a little easier library(ggplot2)# create elegant data visialtions using the grammar of graphics library(dplyr)# a grammar of data manpulation library(readr)# read rectangular data text library(tidyr)


## Running code

In a notebook, you can run a single code cell by clicking in the cell and then hitting 
the blue arrow to the left, or by clicking in the cell and pressing Shift+Enter. In a script, 
you can run code by highlighting the code you want to run and then clicking the blue arrow
at the bottom of this window.

## Reading in files


```{r}
list.files(path = "../input")

# load the activity and sleep data set
```{r}
dailyActivity <- read_csv("../input/wellness/dailyActivity_merge.csv")
sleepDay <- read_csv("../input/wellness/sleepDay_merged.csv")

check for duplicates and na

sum(duplicated(dailyActivity)) sum(duplicated(sleepDay)) sum(is.na(dailyActivity)) sum(is.na(sleepDay))

now we will remove duplicate from sleep & create new dataframe

sleepy <- sleepDay %>% distinct() head(sleepy) head(dailyActivity)

count number of id's total sleepy & dailyActivity frames

n_distinct(dailyActivity$Id) n_distinct(sleepy$Id)

get total sum steps for each member id

dailyActivity %>% group_by(Id) %>% summarise(freq = sum(TotalSteps)) %>% arrange(-freq) Tot_dist <- dailyActivity %>% mutate(Id = as.character(dailyActivity$Id)) %>% group_by(Id) %>% summarise(dizzy = sum(TotalDistance)) %>% arrange(-dizzy)

now get total min sleep & lie in bed

sleepy %>% group_by(Id) %>% summarise(Msleep = sum(TotalMinutesAsleep)) %>% arrange(Msleep) sleepy %>% group_by(Id) %>% summarise(inBed = sum(TotalTimeInBed)) %>% arrange(inBed)

plot graph for "inbed and sleep data" & "total steps and distance"

ggplot(Tot_dist) + 
 geom_count(mapping = aes(y= dizzy, x= Id, color = Id, fill = Id, size = 2)) +
 labs(x = "member id's", title = "distance miles" ) +
 theme(axis.text.x = element_text(angle = 90)) 
 ```
Search
Clear search
Close search
Google apps
Main menu