Facebook
Twittertitle: 'BellaBeat Fitbit'
author: 'C Romero'
date: 'r Sys.Date()'
output:
html_document:
number_sections: true
##Installation of the base package for data analysis tool
install.packages("base")
##Installation of the ggplot2 package for data analysis tool
install.packages("ggplot2")
##install Lubridate is an R package that makes it easier to work with dates and times.
install.packages("lubridate")
```{r}
##Installation of the tidyverse package for data analysis tool
install.packages("tidyverse")
##Installation of the tidyr package for data analysis tool
install.packages("dplyr")
##Installation of the readr package for data analysis tool
install.packages("readr")
##Installation of the tidyr package for data analysis tool
install.packages("tidyr")
library(base) library(lubridate)# make dealing with dates a little easier library(ggplot2)# create elegant data visialtions using the grammar of graphics library(dplyr)# a grammar of data manpulation library(readr)# read rectangular data text library(tidyr)
## Running code
In a notebook, you can run a single code cell by clicking in the cell and then hitting
the blue arrow to the left, or by clicking in the cell and pressing Shift+Enter. In a script,
you can run code by highlighting the code you want to run and then clicking the blue arrow
at the bottom of this window.
## Reading in files
```{r}
list.files(path = "../input")
# load the activity and sleep data set
```{r}
dailyActivity <- read_csv("../input/wellness/dailyActivity_merge.csv")
sleepDay <- read_csv("../input/wellness/sleepDay_merged.csv")
sum(duplicated(dailyActivity)) sum(duplicated(sleepDay)) sum(is.na(dailyActivity)) sum(is.na(sleepDay))
sleepy <- sleepDay %>% distinct() head(sleepy) head(dailyActivity)
n_distinct(dailyActivity$Id) n_distinct(sleepy$Id)
dailyActivity %>% group_by(Id) %>% summarise(freq = sum(TotalSteps)) %>% arrange(-freq) Tot_dist <- dailyActivity %>% mutate(Id = as.character(dailyActivity$Id)) %>% group_by(Id) %>% summarise(dizzy = sum(TotalDistance)) %>% arrange(-dizzy)
sleepy %>% group_by(Id) %>% summarise(Msleep = sum(TotalMinutesAsleep)) %>% arrange(Msleep) sleepy %>% group_by(Id) %>% summarise(inBed = sum(TotalTimeInBed)) %>% arrange(inBed)
ggplot(Tot_dist) +
geom_count(mapping = aes(y= dizzy, x= Id, color = Id, fill = Id, size = 2)) +
labs(x = "member id's", title = "distance miles" ) +
theme(axis.text.x = element_text(angle = 90))
```
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Asterisk indicates datasets with p-value < 0.05.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contain lung x-ray image including:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15315323%2F8041ddd2485bfe9cdf2ba1f9d96bd7e5%2F6_Class_Img.jpg?generation=1741951756137022&alt=media" alt="">
The dataset we use is compiled from many reputable sources including: Dataset 1 [1]: This dataset includes four classes of diseases: COVID-19, viral pneumonia, bacterial pneumonia, and normal. It has multiple versions, and we are currently using the latest version (version 4). Previous studies, such as those by Hariri et al. [18] and Ahmad et al. [20], have also utilized earlier versions of this dataset. Dataset 2 [2]: This dataset is from the National Institutes of Health (NIH) Chest X-Ray Dataset, which contains over 100,000 chest X-ray images from over 30,000 patients. It includes 14 disease classes, including conditions like atelectasis, consolidation, and infiltration. For this study, we have selected 2,550 chest X-ray images specifically from the Emphysema class. Dataset 3 [3]: This is the COVQU dataset, which we have extended to include two additional classes: COVID-19 and viral pneumonia. This dataset has been widely used in previous studies by M.E.H. Chowdhury et al. [4] and Rahman T et al. [5], establishing its reputation as a reliable resource.
In addition, we also publish a modified dataset that aims to remove image regions that do not contain lungs (abdomen, arms, etc.).
References: [1] U. Sait, K. G. Lal, S. P. Prajapati, R. Bhaumik, T. Kumar, S. Shivakumar, K. Bhalla, Curated dataset for covid-19 posterior-anterior chest radiography images (x-rays)., Mendeley Data V4 (2022). doi:10.17632/9xkhgts2s6.4. [2] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, R. M. Summers, Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases (2017) 3462–3471. doi:10.1109/CVPR.2017.369. [3] A. M. Tahir, M. E. Chowdhury, A. Khandakar, T. Rahman, Y. Qiblawey, U. Khurshid, S. Kiranyaz, N. Ibtehaz, M. S. Rahman, S. Al-Maadeed,S. Mahmud, M. Ezeddin, K. Hameed, T. Hamid, Covid-19 infection localization and severity grading from chest x-ray images, Computers in Biology and Medicine 139 (2021) 105002. URL: https://www.sciencedirect.com/science/article/pii/S0010482521007964. doi:https://doi.org/10.1016/j.compbiomed.2021.105002. [4] M. E. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M. A. Kadir, Z. B. Mahbub, K. R. Islam, M. S. Khan, A. Iqbal, N. A. Emadi, M. B. I. Reaz, M. T. Islam, Can ai help in screening viral and covid-19 pneumonia?, IEEE Access 8 (2020) 132665–132676. doi:10.1109/ACCESS.2020.3010287. [5] T. Rahman, A. Khandakar, Y. Qiblawey, A. Tahir, S. Kiranyaz, S. B. A. Kashem, M. T. Islam, S. A. Maadeed, S. M. Zughaier, M. S. Khan, M. E. Chowdhury, Exploring the effect of image enhancement techniques on covid-19 detection using chest x-ray images, Computers in Biology and Medicine 132 (2021). doi:10.1016/j.compbiomed.2021.104319.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
{# General information# The script runs with R (Version 3.1.1; 2014-07-10) and packages plyr (Version 1.8.1), XLConnect (Version 0.2-9), utilsMPIO (Version 0.0.25), sp (Version 1.0-15), rgdal (Version 0.8-16), tools (Version 3.1.1) and lattice (Version 0.20-29)# --------------------------------------------------------------------------------------------------------# Questions can be directed to: Martin Bulla (bulla.mar@gmail.com)# -------------------------------------------------------------------------------------------------------- # Data collection and how the individual variables were derived is described in: #Steiger, S.S., et al., When the sun never sets: diverse activity rhythms under continuous daylight in free-living arctic-breeding birds. Proceedings of the Royal Society B: Biological Sciences, 2013. 280(1764): p. 20131016-20131016. # Dale, J., et al., The effects of life history and sexual selection on male and female plumage colouration. Nature, 2015. # Data are available as Rdata file # Missing values are NA. # --------------------------------------------------------------------------------------------------------# For better readability the subsections of the script can be collapsed # --------------------------------------------------------------------------------------------------------}{# Description of the method # 1 - data are visualized in an interactive actogram with time of day on x-axis and one panel for each day of data # 2 - red rectangle indicates the active field, clicking with the mouse in that field on the depicted light signal generates a data point that is automatically (via custom made function) saved in the csv file. For this data extraction I recommend, to click always on the bottom line of the red rectangle, as there is always data available due to a dummy variable ("lin") that creates continuous data at the bottom of the active panel. The data are captured only if greenish vertical bar appears and if new line of data appears in R console). # 3 - to extract incubation bouts, first click in the new plot has to be start of incubation, then next click depict end of incubation and the click on the same stop start of the incubation for the other sex. If the end and start of incubation are at different times, the data will be still extracted, but the sex, logger and bird_ID will be wrong. These need to be changed manually in the csv file. Similarly, the first bout for a given plot will be always assigned to male (if no data are present in the csv file) or based on previous data. Hence, whenever a data from a new plot are extracted, at a first mouse click it is worth checking whether the sex, logger and bird_ID information is correct and if not adjust it manually. # 4 - if all information from one day (panel) is extracted, right-click on the plot and choose "stop". This will activate the following day (panel) for extraction. # 5 - If you wish to end extraction before going through all the rectangles, just press "escape". }{# Annotations of data-files from turnstone_2009_Barrow_nest-t401_transmitter.RData dfr-- contains raw data on signal strength from radio tag attached to the rump of female and male, and information about when the birds where captured and incubation stage of the nest1. who: identifies whether the recording refers to female, male, capture or start of hatching2. datetime_: date and time of each recording3. logger: unique identity of the radio tag 4. signal_: signal strength of the radio tag5. sex: sex of the bird (f = female, m = male)6. nest: unique identity of the nest7. day: datetime_ variable truncated to year-month-day format8. time: time of day in hours9. datetime_utc: date and time of each recording, but in UTC time10. cols: colors assigned to "who"--------------------------------------------------------------------------------------------------------m-- contains metadata for a given nest1. sp: identifies species (RUTU = Ruddy turnstone)2. nest: unique identity of the nest3. year_: year of observation4. IDfemale: unique identity of the female5. IDmale: unique identity of the male6. lat: latitude coordinate of the nest7. lon: longitude coordinate of the nest8. hatch_start: date and time when the hatching of the eggs started 9. scinam: scientific name of the species10. breeding_site: unique identity of the breeding site (barr = Barrow, Alaska)11. logger: type of device used to record incubation (IT - radio tag)12. sampling: mean incubation sampling interval in seconds--------------------------------------------------------------------------------------------------------s-- contains metadata for the incubating parents1. year_: year of capture2. species: identifies species (RUTU = Ruddy turnstone)3. author: identifies the author who measured the bird4. nest: unique identity of the nest5. caught_date_time: date and time when the bird was captured6. recapture: was the bird capture before? (0 - no, 1 - yes)7. sex: sex of the bird (f = female, m = male)8. bird_ID: unique identity of the bird9. logger: unique identity of the radio tag --------------------------------------------------------------------------------------------------------}
Facebook
TwitterWelcome to the Cyclistic bike-share analysis case study! In this case study, you will perform many real-world tasks of a junior data analyst. You will work for a fictional company, Cyclistic, and meet different characters and team members. In order to answer the key business questions, you will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Along the way, the Case Study Roadmap tables — including guiding questions and key tasks — will help you stay on the right path.
You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
How do annual members and casual riders use Cyclistic bikes differently?
What is the problem you are trying to solve?
How do annual members and casual riders use Cyclistic bikes differently?
How can your insights drive business decisions?
The insight will help the marketing team to make a strategy for casual riders
Where is your data located?
Data located in Cyclistic organization data.
How is data organized?
Dataset are in csv format for each month wise from Financial year 22.
Are there issues with bias or credibility in this data? Does your data ROCCC?
It is good it is ROCCC because data collected in from Cyclistic organization.
How are you addressing licensing, privacy, security, and accessibility?
The company has their own license over the dataset. Dataset does not have any personal information about the riders.
How did you verify the data’s integrity?
All the files have consistent columns and each column has the correct type of data.
How does it help you answer your questions?
Insights always hidden in the data. We have the interpret with data to find the insights.
Are there any problems with the data?
Yes, starting station names, ending station names have null values.
What tools are you choosing and why?
I used R studio for the cleaning and transforming the data for analysis phase because of large dataset and to gather experience in the language.
Have you ensured the data’s integrity?
Yes, the data is consistent throughout the columns.
What steps have you taken to ensure that your data is clean?
First duplicates, null values are removed then added new columns for analysis.
How can you verify that your data is clean and ready to analyze?
Make sure the column names are consistent thorough out all data sets by using the “bind row” function.
Make sure column data types are consistent throughout all the dataset by using the “compare_df_col” from the “janitor” package.
Combine the all dataset into single data frame to make consistent throught the analysis.
Removed the column start_lat, start_lng, end_lat, end_lng from the dataframe because those columns not required for analysis.
Create new columns day, date, month, year, from the started_at column this will provide additional opportunities to aggregate the data
Create the “ride_length” column from the started_at and ended_at column to find the average duration of the ride by the riders.
Removed the null rows from the dataset by using the “na.omit function”
Have you documented your cleaning process so you can review and share those results?
Yes, the cleaning process is documented clearly.
How should you organize your data to perform analysis on it? The data has been organized in one single dataframe by using the read csv function in R Has your data been properly formatted? Yes, all the columns have their correct data type.
What surprises did you discover in the data?
Casual member ride duration is higher than the annual members
Causal member widely uses docked bike than the annual members
What trends or relationships did you find in the data?
Annual members are used mainly for commute purpose
Casual member are preferred the docked bikes
Annual members are preferred the electric or classic bikes
How will these insights help answer your business questions?
This insights helps to build a profile for members
Were you able to answer the question of how ...
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
#https://www.kaggle.com/c/facial-keypoints-detection/details/getting-started-with-r #################################
###Variables for downloaded files data.dir <- ' ' train.file <- paste0(data.dir, 'training.csv') test.file <- paste0(data.dir, 'test.csv') #################################
###Load csv -- creates a data.frame matrix where each column can have a different type. d.train <- read.csv(train.file, stringsAsFactors = F) d.test <- read.csv(test.file, stringsAsFactors = F)
###In training.csv, we have 7049 rows, each one with 31 columns. ###The first 30 columns are keypoint locations, which R correctly identified as numbers. ###The last one is a string representation of the image, identified as a string.
###To look at samples of the data, uncomment this line:
###Let's save the first column as another variable, and remove it from d.train: ###d.train is our dataframe, and we want the column called Image. ###Assigning NULL to a column removes it from the dataframe
im.train <- d.train$Image d.train$Image <- NULL #removes 'image' from the dataframe
im.test <- d.test$Image d.test$Image <- NULL #removes 'image' from the dataframe
################################# #The image is represented as a series of numbers, stored as a string #Convert these strings to integers by splitting them and converting the result to integer
#strsplit splits the string #unlist simplifies its output to a vector of strings #as.integer converts it to a vector of integers. as.integer(unlist(strsplit(im.train[1], " "))) as.integer(unlist(strsplit(im.test[1], " ")))
###Install and activate appropriate libraries ###The tutorial is meant for Linux and OSx, where they use a different library, so: ###Replace all instances of %dopar% with %do%.
library("foreach", lib.loc="~/R/win-library/3.3")
###implement parallelization im.train <- foreach(im = im.train, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } im.test <- foreach(im = im.test, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } #The foreach loop will evaluate the inner command for each row in im.train, and combine the results with rbind (combine by rows). #%do% instructs R to do all evaluations in parallel. #im.train is now a matrix with 7049 rows (one for each image) and 9216 columns (one for each pixel):
###Save all four variables in data.Rd file ###Can reload them at anytime with load('data.Rd')
#each image is a vector of 96*96 pixels (96*96 = 9216). #convert these 9216 integers into a 96x96 matrix: im <- matrix(data=rev(im.train[1,]), nrow=96, ncol=96)
#im.train[1,] returns the first row of im.train, which corresponds to the first training image. #rev reverse the resulting vector to match the interpretation of R's image function #(which expects the origin to be in the lower left corner).
#To visualize the image we use R's image function: image(1:96, 1:96, im, col=gray((0:255)/255))
#Let’s color the coordinates for the eyes and nose points(96-d.train$nose_tip_x[1], 96-d.train$nose_tip_y[1], col="red") points(96-d.train$left_eye_center_x[1], 96-d.train$left_eye_center_y[1], col="blue") points(96-d.train$right_eye_center_x[1], 96-d.train$right_eye_center_y[1], col="green")
#Another good check is to see how variable is our data. #For example, where are the centers of each nose in the 7049 images? (this takes a while to run): for(i in 1:nrow(d.train)) { points(96-d.train$nose_tip_x[i], 96-d.train$nose_tip_y[i], col="red") }
#there are quite a few outliers -- they could be labeling errors. Looking at one extreme example we get this: #In this case there's no labeling error, but this shows that not all faces are centralized idx <- which.max(d.train$nose_tip_x) im <- matrix(data=rev(im.train[idx,]), nrow=96, ncol=96) image(1:96, 1:96, im, col=gray((0:255)/255)) points(96-d.train$nose_tip_x[idx], 96-d.train$nose_tip_y[idx], col="red")
#One of the simplest things to try is to compute the mean of the coordinates of each keypoint in the training set and use that as a prediction for all images colMeans(d.train, na.rm=T)
#To build a submission file we need to apply these computed coordinates to the test instances: p <- matrix(data=colMeans(d.train, na.rm=T), nrow=nrow(d.test), ncol=ncol(d.train), byrow=T) colnames(p) <- names(d.train) predictions <- data.frame(ImageId = 1:nrow(d.test), p) head(predictions)
#The expected submission format has one one keypoint per row, but we can easily get that with the help of the reshape2 library:
library(...
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is built from the data underlying two scientific articles (1)(2).
The individuals were selected via a paper advertisement and they had to meet the following criteria: 1. Be older than 60 years of age; 1. Have a BMI between 23 and 35 kg/m2; 1. Not being restricted in their movements by health conditions; 1. Bring their own bicycle.
The selected participants received €50 for their contribution to the study and agreed to the use of recorded data for scientific purposes, in an anonymised manner.
A video example of the data collection can be found in youtube.
Energy Expenditure measurements.I am a passionate individual about physical activity in general and I was very curious to gather and explore some data related to this field that would quantify the Energy Expenditure of different indoor and outdoor activities of daily living with low (lying down, sitting), mid (standing, household activities) and high (walking and cycling) levels of intensity.
ID:cosmed) are attributes related to the participant, so are consistent accross observations.EEm:predicted_activity_label) are features related to a single physical activity.| Name | Description |
|---|---|
| ID | participant's ID |
| trial_date | date and time when data collection started at ID level |
| gender | sex = male or female |
| age | in years |
| weight | in kg |
| height | in cm |
| bmi | Body mass index in kg/m |
| gaAnkle | TRUE if data from GENEActiv on the ankle exist, FALSE otherwise |
| gaChest | TRUE if data from GENEActiv on the chest exist, FALSE otherwise |
| gaWrist | TRUE if data from GENEActiv on the wrist exist, FALSE otherwise |
| equivital | TRUE if data from Equivital exist, FALSE otherwise |
| cosmed | TRUE if data from COSMED exist, FALSE otherwise |
| EEm | Energy Expenditure per minute, in Kcal |
| COSMEDset_row | the original indexes of COSMED data (used for merging) |
| EEh | Energy Expenditure per hour, in Kcal |
| EEtot | Total Kcal spent (it is reseted between indoor and outdoor measurements) |
| METS | Metabolic Equivalent per minute |
| Rf | Respiratory Frequency (litre/min) |
| BR | Breath Rate |
| VT | Tidal Volume in litre |
| VE | Expiratory Minute Ventilation (litre/min) |
| VO2 | Oxygen Uptake (ml/min) |
| VCO2 | Carbon Dioxide production (ml/min) |
| O2exp | Volume of O2 expired (ml/min) |
| CO2exp | Volume of CO2 expired (ml/min) |
| FeO2 | Averaged expiratory concentration of O2 (%) |
| FeCO2 | Averaged expiratory concentration of CO2 (%) |
| FiO2 | Fraction of inspired O2 (%) |
| FiCO2 | Fraction of inspired CO2 (%) |
| VE.VO2 | Ventilatory equivalent for O2 |
| VE.VCO2 | Ventilatory equivalent for CO2 |
| R | Respiratory Quotient |
| Ti | Duration of Inspiration (seconds) |
| Te | Duration of Expiration (seconds) |
| Ttot | Duration of Total breathing cycle (seconds) |
| VO2.HR | Oxygen pulse (ml/beat) |
| HR | Heart Rate |
| Qt | Cardiac output (litre) |
| SV | Stroke volume (litre/min) |
| original_activity_labels | True activity label as noted from study protocol, NA if is unknown |
| predicted_activity_label | Predicted activity label by model from [1], NA if is unknown |
Energy Expenditure?https://media.springernature.com/full/springer-static/image/art%3A10.1007%2Fs11257-020-09268-2/MediaObjects/11257_2020_9268_Fig3_HTML.png?as=webp" alt="">
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
File List MA_Data.txt (md5: 0e737a7606064d123a53ae961663aa9b) Description MA_data.txt is a text file (tab delimited, with headers) containing a summary of data included in the meta-analysis. nr: sample size used to calculate variance of effect size, for correlations nb: sample size used to calculate variance of effect size, for slopes r: correlation coefficient b: regression slope V(b): variance of b NA: Not Available data. Difference in n: indicates whether the sample size used in the meta-analysis was different or equal to the number of replicates reported by authors. n values were sometimes "Different" when r and b had to be re-calculated from tables or digitized from figures. Type of r: r values directly extracted from original publications or re-calculated from tables or figures Source of r, b, extent: Text, figures, tables or appendices used to extract r, b or spatial extent values In the last five columns, the marks indicate the case studies used for a specific meta-analysis. E.g. Case 1 was used in the test of the spatial extent, the habitat, the taxon and the diversity metric effects but not in the test of the trophic level effect on plant-animals correlations.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twittertitle: 'BellaBeat Fitbit'
author: 'C Romero'
date: 'r Sys.Date()'
output:
html_document:
number_sections: true
##Installation of the base package for data analysis tool
install.packages("base")
##Installation of the ggplot2 package for data analysis tool
install.packages("ggplot2")
##install Lubridate is an R package that makes it easier to work with dates and times.
install.packages("lubridate")
```{r}
##Installation of the tidyverse package for data analysis tool
install.packages("tidyverse")
##Installation of the tidyr package for data analysis tool
install.packages("dplyr")
##Installation of the readr package for data analysis tool
install.packages("readr")
##Installation of the tidyr package for data analysis tool
install.packages("tidyr")
library(base) library(lubridate)# make dealing with dates a little easier library(ggplot2)# create elegant data visialtions using the grammar of graphics library(dplyr)# a grammar of data manpulation library(readr)# read rectangular data text library(tidyr)
## Running code
In a notebook, you can run a single code cell by clicking in the cell and then hitting
the blue arrow to the left, or by clicking in the cell and pressing Shift+Enter. In a script,
you can run code by highlighting the code you want to run and then clicking the blue arrow
at the bottom of this window.
## Reading in files
```{r}
list.files(path = "../input")
# load the activity and sleep data set
```{r}
dailyActivity <- read_csv("../input/wellness/dailyActivity_merge.csv")
sleepDay <- read_csv("../input/wellness/sleepDay_merged.csv")
sum(duplicated(dailyActivity)) sum(duplicated(sleepDay)) sum(is.na(dailyActivity)) sum(is.na(sleepDay))
sleepy <- sleepDay %>% distinct() head(sleepy) head(dailyActivity)
n_distinct(dailyActivity$Id) n_distinct(sleepy$Id)
dailyActivity %>% group_by(Id) %>% summarise(freq = sum(TotalSteps)) %>% arrange(-freq) Tot_dist <- dailyActivity %>% mutate(Id = as.character(dailyActivity$Id)) %>% group_by(Id) %>% summarise(dizzy = sum(TotalDistance)) %>% arrange(-dizzy)
sleepy %>% group_by(Id) %>% summarise(Msleep = sum(TotalMinutesAsleep)) %>% arrange(Msleep) sleepy %>% group_by(Id) %>% summarise(inBed = sum(TotalTimeInBed)) %>% arrange(inBed)
ggplot(Tot_dist) +
geom_count(mapping = aes(y= dizzy, x= Id, color = Id, fill = Id, size = 2)) +
labs(x = "member id's", title = "distance miles" ) +
theme(axis.text.x = element_text(angle = 90))
```