3 datasets found
  1. RUNNING"calorie:heartrate

    • kaggle.com
    zip
    Updated Jan 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    romechris34 (2022). RUNNING"calorie:heartrate [Dataset]. https://www.kaggle.com/datasets/romechris34/wellness
    Explore at:
    zip(25272804 bytes)Available download formats
    Dataset updated
    Jan 6, 2022
    Authors
    romechris34
    Description

    title: 'BellaBeat Fitbit' author: 'C Romero' date: 'r Sys.Date()' output: html_document: number_sections: true

    toc: true

    ##Installation of the base package for data analysis tool
    install.packages("base")
    
    ##Installation of the ggplot2 package for data analysis tool
    install.packages("ggplot2")
    
    ##install Lubridate is an R package that makes it easier to work with dates and times.
    install.packages("lubridate")
    ```{r}
    
    ##Installation of the tidyverse package for data analysis tool
    install.packages("tidyverse")
    
    ##Installation of the tidyr package for data analysis tool
    install.packages("dplyr")
    
    ##Installation of the readr package for data analysis tool
    install.packages("readr")
    
    ##Installation of the tidyr package for data analysis tool
    install.packages("tidyr")
    

    Importing packages

    metapackage of all tidyverse packages

    library(base) library(lubridate)# make dealing with dates a little easier library(ggplot2)# create elegant data visialtions using the grammar of graphics library(dplyr)# a grammar of data manpulation library(readr)# read rectangular data text library(tidyr)

    
    ## Running code
    
    In a notebook, you can run a single code cell by clicking in the cell and then hitting 
    the blue arrow to the left, or by clicking in the cell and pressing Shift+Enter. In a script, 
    you can run code by highlighting the code you want to run and then clicking the blue arrow
    at the bottom of this window.
    
    ## Reading in files
    
    
    ```{r}
    list.files(path = "../input")
    
    # load the activity and sleep data set
    ```{r}
    dailyActivity <- read_csv("../input/wellness/dailyActivity_merge.csv")
    sleepDay <- read_csv("../input/wellness/sleepDay_merged.csv")
    
    

    check for duplicates and na

    sum(duplicated(dailyActivity)) sum(duplicated(sleepDay)) sum(is.na(dailyActivity)) sum(is.na(sleepDay))

    now we will remove duplicate from sleep & create new dataframe

    sleepy <- sleepDay %>% distinct() head(sleepy) head(dailyActivity)

    count number of id's total sleepy & dailyActivity frames

    n_distinct(dailyActivity$Id) n_distinct(sleepy$Id)

    get total sum steps for each member id

    dailyActivity %>% group_by(Id) %>% summarise(freq = sum(TotalSteps)) %>% arrange(-freq) Tot_dist <- dailyActivity %>% mutate(Id = as.character(dailyActivity$Id)) %>% group_by(Id) %>% summarise(dizzy = sum(TotalDistance)) %>% arrange(-dizzy)

    now get total min sleep & lie in bed

    sleepy %>% group_by(Id) %>% summarise(Msleep = sum(TotalMinutesAsleep)) %>% arrange(Msleep) sleepy %>% group_by(Id) %>% summarise(inBed = sum(TotalTimeInBed)) %>% arrange(inBed)

    plot graph for "inbed and sleep data" & "total steps and distance"

    ggplot(Tot_dist) + 
     geom_count(mapping = aes(y= dizzy, x= Id, color = Id, fill = Id, size = 2)) +
     labs(x = "member id's", title = "distance miles" ) +
     theme(axis.text.x = element_text(angle = 90)) 
     ```
    
  2. s

    Surface soil moisture for Europe 2014-2024 at 1 km annual and quarterly...

    • repository.soilwise-he.eu
    Updated Feb 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Surface soil moisture for Europe 2014-2024 at 1 km annual and quarterly aggregates [Dataset]. http://doi.org/10.5281/zenodo.14833053
    Explore at:
    Dataset updated
    Feb 7, 2025
    Description

    Copernicus Land Monitoring Services provides Surface Soil Moisture 2014-present (raster 1 km), Europe, daily – version 1. Each day covers only 5 to 10% of European land mask and shows lines of scenes (obvious artifacts). This is the long-term aggregates of daily images of soil moisture (0–100%) based on two types of aggregation:

    • Long-term quarterly (qr.1 - winter, qr.2 - spring, qr.3 - summer and qr.4 - autumn),
    • Annual quantiles P.05, P.50 and P.95,

    The soil moisture rasters are based on Sentinel 1 and described in detail in:

    • Bauer-Marschallinger, B. ; Freeman, V. ; Cao, S. ; Paulik, C. ; Schaufler, S. ; Stachl, T. ; Modanesi, S. ; Massari, C. ; Ciabatta, L. ; Brocca, L. ; Wagner, W. Toward Global Soil Moisture Monitoring With Sentinel-1: Harnessing Assets and Overcoming Obstacles. IEEE Transactions on Geoscience and Remote Sensing 2019, 1 - 20. DOI 10.1109/TGRS.2018.2858004

    You can access and download the original data as .nc files from: https://globalland.vito.be/download/manifest/ssm_1km_v1_daily_netcdf/.

    Aggregation has been generated using the terra package in R in combination with the matrixStats::rowQuantiles function. Tiling system and land mask for pan-EU is also available.

    library(terra)library(matrixStats)g1 = terra::vect('/mnt/inca/EU_landmask/tilling_filter/eu_ard2_final_status.gpkg')## 1254 tilestile = g1[534]nc.lst = list.files('/mnt/landmark/SM1km/ssm_1km_v1_daily_netcdf/', pattern = glob2rx('*.nc$'), full.names=TRUE)## 3726## test it#r = terra::rast(nc.lst[100:210])agg_tile = function(r, tile, pv=c(0.05,0.5,0.95), out.year='2015.annual'){ bb = paste(as.vector(ext(tile)), collapse = '.') out.tif = paste0('./eu_tmp/', out.year, '/sm1km_', pv, '_', out.year, '_', bb, '.tif') if(any(!file.exists(out.tif))){  r.t = terra::crop(r, ext(tile))  ## each tile is 100x100 pixels 365 days  r.t = as.data.frame(r.t, xy=TRUE, na.rm=FALSE)  sel.c = grep(glob2rx('ssm$'), colnames(r.t))  ## remove everything outside the range  t1s = cbind(data.frame(matrixStats::rowQuantiles(as.matrix(r.t[,sel.c]), probs = pv, na.rm=TRUE)), data.frame(x=r.t$x, y=r.t$y))  #str(t1s)  ## write to GeoTIFFs  r.o = terra::rast(t1s[,c('x','y','X5.','X50.','X95.')], type='xyz', crs='+proj=longlat +datum=WGS84 +no_defs')  for(k in 1:length(pv)){    terra::writeRaster(r.o[[k]], filename=out.tif[k], gdal=c('COMPRESS=DEFLATE'), datatype='INT2U', NAflag=32768, overwrite=FALSE)  }  rm(r.t); gc()  tmpFiles(remove=TRUE) }}## quarterly values:lA = data.frame(filename=nc.lst)library(lubridate)lA$Date = ymd(sapply(lA$filename, function(i){substr(strsplit(basename(i), '_')[[1]][4], 1, 8)}))#summary(is.na(lA$Date))#hist(lA$Date, breaks=60)lA$quarter = quarter(lA$Date, fiscal_start = 11)summary(as.factor(lA$quarter))for(qr in 1:4){ #qr=1 pth = paste0('A.q', qr) rs = terra::rast(lA$filename[lA$quarter==qr]) #agg_tile(rs, tile, out.year=pth) x = parallel::mclapply(sample(1:length(g1)), function(i){try( agg_tile(rs, tile=g1[i], out.year=pth) )}, mc.cores=20) for(type in c(0.05,0.5,0.95)){  x <- list.files(path=paste0('./eu_tmp/', pth), pattern=glob2rx(paste0('sm1km_', type, '_*.tif$')), full.names=TRUE)  out.tmp <- paste0(pth, '.', type, '.sm1km_eu.txt')  vrt.tmp <- paste0(pth, '.', type, '.sm1km_eu.vrt')  cat(x, sep=' n', file=out.tmp)  system(paste0('gdalbuildvrt -input_file_list ', out.tmp, ' ', vrt.tmp))  system(paste0('gdal_translate ', vrt.tmp, ' ./cogs/soil.moisture_s1.clms.qr.', qr, '.p', type, '_m_1km_20140101_20241231_eu_epsg4326_v20250206.tif -ot 'Byte' -r 'near' --config GDAL_CACHEMAX 9216 -co BIGTIFF=YES -co NUM_THREADS=80 -co COMPRESS=DEFLATE -of COG -projwin -32 72 45 27')) }}## per year ----for(year in 2015:2023){ l.lst = nc.lst[grep(year, basename(nc.lst))] r = terra::rast(l.lst) ## test it: pth = paste0(year, '.annual') x = parallel::mclapply(sample(1:length(g1)), function(i){try( agg_tile(r, tile=g1[i], out.year=pth) )}, mc.cores=40) ## Mosaics: for(type in c(0.05,0.5,0.95)){  x <- list.files(path=paste0('./eu_tmp/', pth), pattern=glob2rx(paste0('sm1km_', type, '_*.tif$')), full.names=TRUE)  out.tmp <- paste0(pth, '.', type, '.sm1km_eu.txt')  vrt.tmp <- paste0(pth, '.', type, '.sm1km_eu.vrt')  cat(x, sep=' n', file=out.tmp)  system(paste0('gdalbuildvrt -input_file_list ', out.tmp, ' ', vrt.tmp))  system(paste0('gdal_translate ', vrt.tmp, ' ./cogs/soil.moisture_s1.clms.annual.', type, '_m_1km_', year, '0101_', year, '1231_eu_epsg4326_v20250206.tif -ot 'Byte' -r 'near' --config GDAL_CACHEMAX 9216 -co BIGTIFF=YES -co NUM_THREADS=80 -co COMPRESS=DEFLATE -of COG -projwin -32 72 45 27')) }}

  3. FacialRecognition

    • kaggle.com
    zip
    Updated Dec 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TheNicelander (2016). FacialRecognition [Dataset]. https://www.kaggle.com/petein/facialrecognition
    Explore at:
    zip(121674455 bytes)Available download formats
    Dataset updated
    Dec 1, 2016
    Authors
    TheNicelander
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    #https://www.kaggle.com/c/facial-keypoints-detection/details/getting-started-with-r #################################

    ###Variables for downloaded files data.dir <- ' ' train.file <- paste0(data.dir, 'training.csv') test.file <- paste0(data.dir, 'test.csv') #################################

    ###Load csv -- creates a data.frame matrix where each column can have a different type. d.train <- read.csv(train.file, stringsAsFactors = F) d.test <- read.csv(test.file, stringsAsFactors = F)

    ###In training.csv, we have 7049 rows, each one with 31 columns. ###The first 30 columns are keypoint locations, which R correctly identified as numbers. ###The last one is a string representation of the image, identified as a string.

    ###To look at samples of the data, uncomment this line:

    head(d.train)

    ###Let's save the first column as another variable, and remove it from d.train: ###d.train is our dataframe, and we want the column called Image. ###Assigning NULL to a column removes it from the dataframe

    im.train <- d.train$Image d.train$Image <- NULL #removes 'image' from the dataframe

    im.test <- d.test$Image d.test$Image <- NULL #removes 'image' from the dataframe

    ################################# #The image is represented as a series of numbers, stored as a string #Convert these strings to integers by splitting them and converting the result to integer

    #strsplit splits the string #unlist simplifies its output to a vector of strings #as.integer converts it to a vector of integers. as.integer(unlist(strsplit(im.train[1], " "))) as.integer(unlist(strsplit(im.test[1], " ")))

    ###Install and activate appropriate libraries ###The tutorial is meant for Linux and OSx, where they use a different library, so: ###Replace all instances of %dopar% with %do%.

    install.packages('foreach')

    library("foreach", lib.loc="~/R/win-library/3.3")

    ###implement parallelization im.train <- foreach(im = im.train, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } im.test <- foreach(im = im.test, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } #The foreach loop will evaluate the inner command for each row in im.train, and combine the results with rbind (combine by rows). #%do% instructs R to do all evaluations in parallel. #im.train is now a matrix with 7049 rows (one for each image) and 9216 columns (one for each pixel):

    ###Save all four variables in data.Rd file ###Can reload them at anytime with load('data.Rd')

    save(d.train, im.train, d.test, im.test, file='data.Rd')

    load('data.Rd')

    #each image is a vector of 96*96 pixels (96*96 = 9216). #convert these 9216 integers into a 96x96 matrix: im <- matrix(data=rev(im.train[1,]), nrow=96, ncol=96)

    #im.train[1,] returns the first row of im.train, which corresponds to the first training image. #rev reverse the resulting vector to match the interpretation of R's image function #(which expects the origin to be in the lower left corner).

    #To visualize the image we use R's image function: image(1:96, 1:96, im, col=gray((0:255)/255))

    #Let’s color the coordinates for the eyes and nose points(96-d.train$nose_tip_x[1], 96-d.train$nose_tip_y[1], col="red") points(96-d.train$left_eye_center_x[1], 96-d.train$left_eye_center_y[1], col="blue") points(96-d.train$right_eye_center_x[1], 96-d.train$right_eye_center_y[1], col="green")

    #Another good check is to see how variable is our data. #For example, where are the centers of each nose in the 7049 images? (this takes a while to run): for(i in 1:nrow(d.train)) { points(96-d.train$nose_tip_x[i], 96-d.train$nose_tip_y[i], col="red") }

    #there are quite a few outliers -- they could be labeling errors. Looking at one extreme example we get this: #In this case there's no labeling error, but this shows that not all faces are centralized idx <- which.max(d.train$nose_tip_x) im <- matrix(data=rev(im.train[idx,]), nrow=96, ncol=96) image(1:96, 1:96, im, col=gray((0:255)/255)) points(96-d.train$nose_tip_x[idx], 96-d.train$nose_tip_y[idx], col="red")

    #One of the simplest things to try is to compute the mean of the coordinates of each keypoint in the training set and use that as a prediction for all images colMeans(d.train, na.rm=T)

    #To build a submission file we need to apply these computed coordinates to the test instances: p <- matrix(data=colMeans(d.train, na.rm=T), nrow=nrow(d.test), ncol=ncol(d.train), byrow=T) colnames(p) <- names(d.train) predictions <- data.frame(ImageId = 1:nrow(d.test), p) head(predictions)

    #The expected submission format has one one keypoint per row, but we can easily get that with the help of the reshape2 library:

    install.packages('reshape2')

    library(...

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
romechris34 (2022). RUNNING"calorie:heartrate [Dataset]. https://www.kaggle.com/datasets/romechris34/wellness
Organization logo

RUNNING"calorie:heartrate

Fitness~"Bellbeat"~Tracker

Explore at:
zip(25272804 bytes)Available download formats
Dataset updated
Jan 6, 2022
Authors
romechris34
Description

title: 'BellaBeat Fitbit' author: 'C Romero' date: 'r Sys.Date()' output: html_document: number_sections: true

toc: true

##Installation of the base package for data analysis tool
install.packages("base")
##Installation of the ggplot2 package for data analysis tool
install.packages("ggplot2")
##install Lubridate is an R package that makes it easier to work with dates and times.
install.packages("lubridate")
```{r}

##Installation of the tidyverse package for data analysis tool
install.packages("tidyverse")
##Installation of the tidyr package for data analysis tool
install.packages("dplyr")
##Installation of the readr package for data analysis tool
install.packages("readr")
##Installation of the tidyr package for data analysis tool
install.packages("tidyr")

Importing packages

metapackage of all tidyverse packages

library(base) library(lubridate)# make dealing with dates a little easier library(ggplot2)# create elegant data visialtions using the grammar of graphics library(dplyr)# a grammar of data manpulation library(readr)# read rectangular data text library(tidyr)


## Running code

In a notebook, you can run a single code cell by clicking in the cell and then hitting 
the blue arrow to the left, or by clicking in the cell and pressing Shift+Enter. In a script, 
you can run code by highlighting the code you want to run and then clicking the blue arrow
at the bottom of this window.

## Reading in files


```{r}
list.files(path = "../input")

# load the activity and sleep data set
```{r}
dailyActivity <- read_csv("../input/wellness/dailyActivity_merge.csv")
sleepDay <- read_csv("../input/wellness/sleepDay_merged.csv")

check for duplicates and na

sum(duplicated(dailyActivity)) sum(duplicated(sleepDay)) sum(is.na(dailyActivity)) sum(is.na(sleepDay))

now we will remove duplicate from sleep & create new dataframe

sleepy <- sleepDay %>% distinct() head(sleepy) head(dailyActivity)

count number of id's total sleepy & dailyActivity frames

n_distinct(dailyActivity$Id) n_distinct(sleepy$Id)

get total sum steps for each member id

dailyActivity %>% group_by(Id) %>% summarise(freq = sum(TotalSteps)) %>% arrange(-freq) Tot_dist <- dailyActivity %>% mutate(Id = as.character(dailyActivity$Id)) %>% group_by(Id) %>% summarise(dizzy = sum(TotalDistance)) %>% arrange(-dizzy)

now get total min sleep & lie in bed

sleepy %>% group_by(Id) %>% summarise(Msleep = sum(TotalMinutesAsleep)) %>% arrange(Msleep) sleepy %>% group_by(Id) %>% summarise(inBed = sum(TotalTimeInBed)) %>% arrange(inBed)

plot graph for "inbed and sleep data" & "total steps and distance"

ggplot(Tot_dist) + 
 geom_count(mapping = aes(y= dizzy, x= Id, color = Id, fill = Id, size = 2)) +
 labs(x = "member id's", title = "distance miles" ) +
 theme(axis.text.x = element_text(angle = 90)) 
 ```
Search
Clear search
Close search
Google apps
Main menu