Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Publication
will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt. 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets and R source code of manuscript "Adding insult to injury: anthropogenic noise intensifies predation risk by an invasive freshwater fish species" by Fernandez Declerck et al. (submitted)
Project
├── README
├── data_functional_response.txt
├── data_prey_behaviour.txt
├── R_script.R
└── BINV-D-22-00447_Playback.wav
Main dataset: 'data_functional_response.txt'. Dataset for the functional response of the predator. Fish behaviour was recorded under two noise conditions (either boat noise or ambient noise). The dataset corresponds to data frame "d" in the script. Variables description:
- id: identity of the fish
- condition: noise condition, either "boat noise" or "ambient noise"
- prey_number: number of chironomid larvae introduced in the tank
- prey_captured : number of chironomid larvae consumed
- fish_mass : fish body mass (g)
- swim_distance: swim distance (m)
Secondary dataset: 'data_prey_behaviour.txt'. Dataset for control experiment on prey behaviour. Prey behaviour was recorded under two noise conditions (either boat noise or ambient noise). The dataset corresponds to data frame "f" in the script. We used 20 replicates with 10 replicates for ambient noise condition, and 10 replicates for boat noise condition. Two focal prey larva were observed per replicates. Each prey larva was observed during two time periods (corresponding to two noise sequences). Variables description:
- condition: noise condition, either "boat noise" or "ambient noise"
- replicate: number of the replicate
- unique_id: identity of each focal larva
- noise_sequence: number of the noise sequence (either second or third)
- prop_inactive: proportion of time spent inactive by the focal larva
- prop_active: proportion of time spent active by the focal larva
R source code 'code R_script.R'. Complete analysis as one single R script. See comments for additional information.
The last file `BINV-D-22-00447_Playback.wav` is an audio file (Waveform Audio File Format). It is the soundtrack used in the playback experiments.
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Rcode – Custom code written the R programming language that will translate an open reading frame for an existing sequence, then compare it to a data frame of nucleotide polymorphisms at specific locations, and retranslate the amino acid changes into a new data frame.
Facebook
TwitterOverview This dataset contains biologging data and R script used to produce the results in "A summer heatwave reduced activity, heart rate and autumn body mass in a cold-adapted ungulate", a submitted manuscript. The longitudinal data of female reindeer and calf body masses used in the paper is owned by the Finnish Reindeer Herders’ Association. Natural Resources Institute Finland (Luke) updates, saves and administrates this long-term reindeer herd data. Methods of data collection Animals and study area The study involved biologging (see below) 14 adult semi-domesticated reindeer females (Focal animals: Table S1) at the Kutuharju Reindeer Research Facility (Kaamanen, Northern Finland, 69° 8’ N, 26° 59’ E, Figure S1), during June–September 2018. Ten of these individuals had been intensively handled in June as part of another study (Trondrud, 2021). The 14 females were part of a herd of ~100 animals, belonging to the Reindeer Herders’ Association. The herding management included keeping reindeer in two large enclosures (~13.8 and ~15 km2) after calving until the rut, after which animals were moved to a winter enclosure (~15 km2) and then in spring to a calving paddock (~0.3 km2) to give birth (See Supporting Information for further details on the study area). Kutuharju reindeer graze freely on natural pastures from May to November and after that are provided with silage and pellets as a supplementary feed in winter. During the period from September to April animals are weighed 5–6 times. In September, body masses of the focal females did not differ from the rest of the herd. Heart rate (HR) and subcutaneous body temperature (Tsc) data In February 2018, the focal females were instrumented with a heart rate (HR) and temperature logger (DST centi-HRT, Star-Oddi, Gardabaer, Iceland). The surgical protocol is described in the Supporting Information. The DST centi-HRT sensors recorded HR and subcutaneous body temperature (Tsc) every 15 min. HR was automatically calculated from a 4-sec electrocardiogram (ECG) at 150 Hz measurement frequency, alongside an index for signal quality. Additional data processing is described in Supporting Information. Activity data The animals were fitted with collar-mounted tri-axial accelerometers (Vertex Plus Activity Sensor, Vectronic Aerospace GmbH, Berlin, Germany) to monitor their activity levels. These sensors recorded acceleration (g) in three directions representing back-forward, lateral, and dorsal-ventral movements at 8 Hz resolution. For each axis, partial dynamic body acceleration (PDBA) was calculated by subtracting the static acceleration using a 4 sec running average from the raw acceleration (Shepard et al., 2008). We estimated vectorial dynamic body acceleration (VeDBA) by calculating the square root of the sum of squared PDBAs (Wilson et al., 2020). We aggregated VeDBA data into 15-min sums (hereafter “sum VeDBA”) to match with HR and Tsc records. Corrections for time offsets are described in Supporting Information. Due to logger failures, only 10 of the 14 individuals had complete data from both loggers (activity and heart rate). Weather and climate data We set up a HOBO weather station (Onset Computer Corporation, Bourne, MA, USA) mounted on a 2 m tall tripod in May 2018 that measured air temperature (Ta, °C) at 15-minute intervals. The placement of the station was between the two summer paddocks. These measurements were matched to the nearest timestamps for VeDBA, HR and Tsc recordings. Also, we obtained weather records from the nearest public weather stations for the years 1990–2021 (Table S2). Weather station IDs and locations relative to the study area are shown in Figure S1 in the Supporting Information. The temperatures at the study site and the nearest weather station were strongly correlated (Pearson’s, r = 0.99), but temperatures were on average ~1.0°C higher at the study site (Figure S2). Statistical analyses All statistical analyses were conducted in R version 4.1.1 (The R Core Team, 2021). Mean values are presented with standard deviation (SD), and parameter estimates with standard error (SE). Environmental effects on activity states and transition probabilities We fitted hidden Markov models (HMM) to 15-min sum VeDBA using the package ‘momentuHMM’ (McClintock & Michelot, 2018). HMMs assume that the observed pattern is driven by an underlying latent state sequence (a finite Markov chain). These states can then be used as proxies to interpret the animal’s unobserved behaviour (Langrock et al., 2012). We assumed only two underlying states, thought to represent ‘inactive’ and ‘active’ (Figure S3). The ‘active’ state thus contains multiple forms of movement, e.g., foraging, walking, and running, but reindeer spend more than 50% of the time foraging in summer (Skogland, 1980). We fitted several HMMs to evaluate both external (temperature and time of day) and individual-level (calf status) effects on the probability to occupy each state (stationary state probabilities). The combination of the explanatory variables in each HMM is listed in Table S5. Ta was fitted as a continuous variable with piecewise polynomial spline with 8 knots, asserted from visual inspection of the model outputs. We included sine and cosine terms for time of day to account for cyclicity. In addition, to assess the impact of Ta on activity patterns, we fitted five temperature-day categories in interaction with time of day. These categories were based on 20% intervals of the distribution of temperature data from our local weather station, in the period 19 June to 19 August 2018, with ranges of < 10°C (cold), 10−13°C (cool), 13−16°C (intermediate) 16−20°C (warm) and ≥ 20°C (hot). We evaluated the significance of each variable on the transition probabilities from the confidence intervals of each estimate, and the goodness-of-fit of each model using Akaike information criteria (AIC) (Burnham & Anderson, 2002), retaining models within ΔAIC < 5. We extracted the most likely state occupied by an individual using the viterbi function, returning the optimal state pathway, i.e., a two-level categorical variable indicating whether the individual was most likely resting or active. We used this output to calculate daily activity budgets (% time spent active). Drivers of heart rate (HR) and subcutaneous body temperature (Tsc) We matched the activity states derived from the HMM to the HR and Tsc data. We opted to investigate the drivers of variation in HR and Tsc only within the inactive state. HR and Tsc were fitted as response variables in separate generalised additive mixed-effects models (GAMM), which included the following smooth terms: calendar day as a thin-plate regression spline, time of day (ToD, in hours, knots [k] = 10) as a cubic circular regression spline and individual as random intercept. All models were fitted using restricted maximum likelihood, a penalization value (λ) of 1.4 (Wood, 2017), and an autoregressive structure (AR1) to account for temporal autocorrelation. We used the ‘gam.check’ function from the ‘mgcv’ package to select k. The sum of VeDBA in the past 15 minutes was included as a predictor in all models. All models were fitted with the same set of explanatory variables: sum VeDBA, age, body mass (BM), lactation status, Ta, as well as the interaction between lactation status and Ta. Description of files 1. Data: "kutuharju_weather.csv" weather data recorded from local weather station during study period "Inari_Ivalo_lentoasema.csv" public weather data from weather station ID 102033, owned and managed by the Finnish Meterorological Institute "activitydata.Rdata" dataset used in analyses of activity patterns in reindeer "HR_temp_data.Rdata" dataset used in analyses of heart rate and body temperature responses in reindeer "HRfigureData.Rdata" and "TempFigureData.Rdata" are data files (lists) with model outputs generated in "heartrate_bodytemp_analyses.R" and used in "figures_in_paper.R" "HMM_df_withStates.Rdata" data frame used in HMM models including output from viterbi function "plotdf_m16.Rdata" dataframe for plotting output from model 16 "plotdf_m22.Rdata" dataframe for plotting output from model 22 2. Scripts "activitydata_HMMs.R" R script for data prep and hidden markov models to analyse activity patterns in reindeer "heartrate_bodytemp_analyses.R" R script for data prep and generalized additive mixed models to analyse heart rate and body temperature responses in reindeer "figures_in_paper.R" R script for generating figures 1-3 in the manuscript 3. HMM_model "modelList.Rdata" list containing 2 items: string of all 25 HMM models created, and dataframe with model number and formula "m16.Rdata" and "m22.Rdata" direct acces to two best-fit models
Facebook
TwitterCopernicus Land Monitoring Services provides Surface Soil Moisture 2014-present (raster 1 km), Europe, daily – version 1. Each day covers only 5 to 10% of European land mask and shows lines of scenes (obvious artifacts). This is the long-term aggregates of daily images of soil moisture (0–100%) based on two types of aggregation:
The soil moisture rasters are based on Sentinel 1 and described in detail in:
You can access and download the original data as .nc files from: https://globalland.vito.be/download/manifest/ssm_1km_v1_daily_netcdf/.
Aggregation has been generated using the terra package in R in combination with the matrixStats::rowQuantiles function. Tiling system and land mask for pan-EU is also available.
library(terra)library(matrixStats)g1 = terra::vect('/mnt/inca/EU_landmask/tilling_filter/eu_ard2_final_status.gpkg')## 1254 tilestile = g1[534]nc.lst = list.files('/mnt/landmark/SM1km/ssm_1km_v1_daily_netcdf/', pattern = glob2rx('*.nc$'), full.names=TRUE)## 3726## test it#r = terra::rast(nc.lst[100:210])agg_tile = function(r, tile, pv=c(0.05,0.5,0.95), out.year='2015.annual'){ bb = paste(as.vector(ext(tile)), collapse = '.') out.tif = paste0('./eu_tmp/', out.year, '/sm1km_', pv, '_', out.year, '_', bb, '.tif') if(any(!file.exists(out.tif))){ r.t = terra::crop(r, ext(tile)) ## each tile is 100x100 pixels 365 days r.t = as.data.frame(r.t, xy=TRUE, na.rm=FALSE) sel.c = grep(glob2rx('ssm$'), colnames(r.t)) ## remove everything outside the range t1s = cbind(data.frame(matrixStats::rowQuantiles(as.matrix(r.t[,sel.c]), probs = pv, na.rm=TRUE)), data.frame(x=r.t$x, y=r.t$y)) #str(t1s) ## write to GeoTIFFs r.o = terra::rast(t1s[,c('x','y','X5.','X50.','X95.')], type='xyz', crs='+proj=longlat +datum=WGS84 +no_defs') for(k in 1:length(pv)){ terra::writeRaster(r.o[[k]], filename=out.tif[k], gdal=c('COMPRESS=DEFLATE'), datatype='INT2U', NAflag=32768, overwrite=FALSE) } rm(r.t); gc() tmpFiles(remove=TRUE) }}## quarterly values:lA = data.frame(filename=nc.lst)library(lubridate)lA$Date = ymd(sapply(lA$filename, function(i){substr(strsplit(basename(i), '_')[[1]][4], 1, 8)}))#summary(is.na(lA$Date))#hist(lA$Date, breaks=60)lA$quarter = quarter(lA$Date, fiscal_start = 11)summary(as.factor(lA$quarter))for(qr in 1:4){ #qr=1 pth = paste0('A.q', qr) rs = terra::rast(lA$filename[lA$quarter==qr]) #agg_tile(rs, tile, out.year=pth) x = parallel::mclapply(sample(1:length(g1)), function(i){try( agg_tile(rs, tile=g1[i], out.year=pth) )}, mc.cores=20) for(type in c(0.05,0.5,0.95)){ x <- list.files(path=paste0('./eu_tmp/', pth), pattern=glob2rx(paste0('sm1km_', type, '_*.tif$')), full.names=TRUE) out.tmp <- paste0(pth, '.', type, '.sm1km_eu.txt') vrt.tmp <- paste0(pth, '.', type, '.sm1km_eu.vrt') cat(x, sep=' n', file=out.tmp) system(paste0('gdalbuildvrt -input_file_list ', out.tmp, ' ', vrt.tmp)) system(paste0('gdal_translate ', vrt.tmp, ' ./cogs/soil.moisture_s1.clms.qr.', qr, '.p', type, '_m_1km_20140101_20241231_eu_epsg4326_v20250206.tif -ot 'Byte' -r 'near' --config GDAL_CACHEMAX 9216 -co BIGTIFF=YES -co NUM_THREADS=80 -co COMPRESS=DEFLATE -of COG -projwin -32 72 45 27')) }}## per year ----for(year in 2015:2023){ l.lst = nc.lst[grep(year, basename(nc.lst))] r = terra::rast(l.lst) ## test it: pth = paste0(year, '.annual') x = parallel::mclapply(sample(1:length(g1)), function(i){try( agg_tile(r, tile=g1[i], out.year=pth) )}, mc.cores=40) ## Mosaics: for(type in c(0.05,0.5,0.95)){ x <- list.files(path=paste0('./eu_tmp/', pth), pattern=glob2rx(paste0('sm1km_', type, '_*.tif$')), full.names=TRUE) out.tmp <- paste0(pth, '.', type, '.sm1km_eu.txt') vrt.tmp <- paste0(pth, '.', type, '.sm1km_eu.vrt') cat(x, sep=' n', file=out.tmp) system(paste0('gdalbuildvrt -input_file_list ', out.tmp, ' ', vrt.tmp)) system(paste0('gdal_translate ', vrt.tmp, ' ./cogs/soil.moisture_s1.clms.annual.', type, '_m_1km_', year, '0101_', year, '1231_eu_epsg4326_v20250206.tif -ot 'Byte' -r 'near' --config GDAL_CACHEMAX 9216 -co BIGTIFF=YES -co NUM_THREADS=80 -co COMPRESS=DEFLATE -of COG -projwin -32 72 45 27')) }}
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Publication
will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt. 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.