29 datasets found

e
Children's questionnaire (data frame) - Dataset - B2FIND
b2find.eudat.eu
Updated Jun 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Children's questionnaire (data frame) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/93fcb56f-065d-502e-8941-49dd99608fb5
Explore at:
Dataset updated
Jun 23, 2024
Description
"WeAreHere!" Children's questionnaire. This dataset includes: (1) the WaH children's questionnaire (20 questions including 5-point Likert scale questions, dichotomous questions and an open space for comments). The Catalan version (original), and the Spanish and English versions of the questionnaire can be found in this dataset in pdf format. (2) The data frame in xlsx format, with the children's answers to the questionnaire (a total of 3664 answers) and a reduced version of it for doing the regression (with the 5-point likert scale variable "ask for help" transformed into a dichotomous variable). (3) The data frame in xlsx format, with the children's answers to the questionnaire and the categorization of their comments (sheet 1), the data frame with only the MCA variables selected (sheet 2), and the categories and subcategories table (sheet 3). (4) The data analysis procedure for the regression, the component and multiple component analysis (R script).
r
R codes and dataset for Visualisation of Diachronic Constructional Change...
researchdata.edu.au
Updated Apr 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gede Primahadi Wijaya Rajeg; Gede Primahadi Wijaya Rajeg (2019). R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart [Dataset]. http://doi.org/10.26180/5c844c7a81768
Explore at:
Unique identifier
https://doi.org/10.26180/5c844c7a81768
Dataset updated
Apr 1, 2019
Dataset provided by
Monash University
Authors
Gede Primahadi Wijaya Rajeg; Gede Primahadi Wijaya Rajeg
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Publication

Primahadi Wijaya R., Gede. 2014. Visualisation of diachronic constructional change using Motion Chart. In Zane Goebel, J. Herudjati Purwoko, Suharno, M. Suryadi & Yusuf Al Aried (eds.). Proceedings: International Seminar on Language Maintenance and Shift IV (LAMAS IV), 267-270. Semarang: Universitas Diponegoro. doi: https://doi.org/10.4225/03/58f5c23dd8387

Description of R codes and data files in the repository

This repository is imported from its GitHub repo. Versioning of this figshare repository is associated with the GitHub repo's Release. So, check the Releases page for updates (the next version is to include the unified version of the codes in the first release with the tidyverse).

The raw input data consists of two files (i.e. will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).

These two input files are used in the R code file 1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt.

Then, the script 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.

Next, input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).

The repository adopts the project-oriented workflow in RStudio; double-click on the Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.
CORESIDENCE_GLAD: The Global Living Arrangements Database, 1960-2021
zenodo.org
bin, csv
Updated Mar 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan Galeano; Juan Galeano; Albert Esteve; Albert Esteve (2025). CORESIDENCE_GLAD: The Global Living Arrangements Database, 1960-2021 [Dataset]. http://doi.org/10.5281/zenodo.15038210
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15038210
Dataset updated
Mar 17, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juan Galeano; Juan Galeano; Albert Esteve; Albert Esteve
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Global Living Arrangements Database (GLAD), is a global resource designed to fill a critical gap in the availability of statistical information for examining patterns and changes in living arrangements by age, sex, marital status and educational attainment. Utilizing comprehensive census microdata from IPUMS International and the European Labour Force Survey (EU-LFS), GLAD summarizes over 740 million individual records across 107 countries, covering the period from 1960 to 2021. This database has been constructed using an innovative algorithm that reconstructs kinship relationships among all household members, providing a robust and scalable methodology for studying living arrangements. GLAD is expected to be a valuable resource for both researchers and policymakers, supporting evidence-based decision-making in areas such as housing, social services, and healthcare, as well as offering insights into long-term transformations in family structures. The open-source R code used in this project is publicly available, promoting transparency and enabling the creation of new ego-centred typologies based in interfamily relationships

The repository is composed of the following elements: a Rda file named CORESIDENCE_GLAD_2025.Rda in the form of a List. In R, a List object is a versatile data structure that can contain a collection of different data types, including vectors, matrices, data frames, other lists, spatial objects or even functions. It allows to store and organize heterogeneous data elements within a single object. The CORESIDENCE_GLAD_2025 R-list object is composed of six elements:

SINGLE AGES: a data frame where data is aggregated by single ages, marital status, educational attainment and living arrangement types. Source of the original data: IPUMS-I

AGE GROUPS IPUMS: a data frame where data is aggregated by five-year age groups, marital status, educational attainment and living arrangement types. Source of the original data: IPUMS-I.

AGE GROUPS LFS: a data frame where data is aggregated by five-year age groups, marital status, educational attainment and living arrangement types. Source of the original data: EU-LFS.

HARMONIZED: a data frame where data is aggregated by five-year age groups, marital status, educational attainment and living arrangement types. The categories of marital status and educational attainment have been harmonized between the two data sources. Source of the original data: IPUMS-I and EU-LFS

CODEBOOK: a data frame with the complete list of variables included, their names description and categories.

LABELS LAT: A R function to add the qualitative labels to Living Arrangement Types (LAT).

ATLAS LIVING ARRANGEMENTS: The url of the folder with leaflet of living arrangements for each sample included in GLAD.
d
Data from: The island-mainland species turnover relationship
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Oct 10, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoel E. Stuart; Jonathan B. Losos; Adam C. Algar (2012). The island-mainland species turnover relationship [Dataset]. http://doi.org/10.5061/dryad.gm2p8
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.gm2p8
Dataset updated
Oct 10, 2012
Dataset provided by
Dryad
Authors
Yoel E. Stuart; Jonathan B. Losos; Adam C. Algar
Time period covered
Jul 13, 2012
Area covered
Neotropics, Caribbean, Caribbean islands
Description
Many oceanic islands are notable for their high endemism, suggesting that islands may promote unique assembly processes. However, mainland assemblages sometimes harbour comparable levels of endemism, suggesting that island biotas may not be as unique as often assumed. Here, we test the uniqueness of island biotic assembly by comparing the rate of species turnover among islands and the mainland, after accounting for distance decay and environmental gradients. We modeled species turnover as a function of geographic and environmental distance for mainland (M-M) communities of Anolis lizards and Terrarana frogs, two clades that have diversified extensively on Caribbean islands and the mainland Neotropics. We compared mainland-island (M–I) and island-island (I–I) species turnover to predictions of the M–M model. If island assembly is not unique, then the M–M model should successfully predict M–I and I–I turnover, given geographic and environmental distance. We found that M–I turnover and, to...
r
Data for the Farewell and Herberg example of a two-phase experiment using a...
researchdata.edu.au
datasetcatalog.nlm.nih.gov
+1more
Updated Jul 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chris Brien (2021). Data for the Farewell and Herberg example of a two-phase experiment using a plaid design [Dataset]. http://doi.org/10.25909/13122095
Explore at:
Unique identifier
https://doi.org/10.25909/13122095
Dataset updated
Jul 1, 2021
Dataset provided by
The University of Adelaide
Authors
Chris Brien
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The experiment that Farewell and Herzberg (2003) describe is pain-rating experiment that is a subset of the experiment reported by Solomon et al. (1997). It is a two-phase experiment. The first phase is a self-assessment phase in which patients self-assess for pain while moving a painful shoulder joint. The second phase of this experiment is an evaluation phase in which occupational and physical therapy students (the raters) are evaluated for rating patients in a set of videos for pain. The measured response is the difference between a student rating and the patient's rating.

The R data file plaid.dat.rda contains the data.frame plaid.dat that has a revised version of the data for the Farewell and Herzberg example downloaded from https://doi.org/10.17863/CAM.54494. The comma delimited text file plaid.dat.csv has the same information in this more commonly accepted format, but without the metadata associated with the data.frame<\CODE>. The data.frame contains the factors Raters, Viewings, Trainings, Expressiveness, Patients, Occasions, and Motions and a column for the response variable Y. The two factors Viewings and Occasions are additional to those in the downloaded file and the remaining factors have been converted from integers or characters to factors and renamed to the names given above. The column Y is unchanged from the column in the original file. To load the data in R use: load("plaid.dat.rda") or plaid.dat <- read.csv(file = "plaid.dat.csv").
References
Farewell, V. T.,& Herzberg, A. M. (2003). Plaid designs for the evaluation of training for medical practitioners. Journal of Applied Statistics, 30(9), 957-965. https://doi.org/10.1080/0266476032000076092
Solomon, P. E., Prkachin, K. M. & Farewell, V. (1997). Enhancing sensitivity to facial expression of pain. Pain, 71(3), 279-284. https://doi.org/10.1016/S0304-3959(97)03377-0
d
Data from: Constraints on trait combinations explain climatic drivers of...
datadryad.org
search.dataone.org
zip
Updated Apr 27, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John M. Dwyer; Daniel C. Laughlin (2018). Constraints on trait combinations explain climatic drivers of biodiversity: the importance of trait covariance in community assembly [Dataset]. http://doi.org/10.5061/dryad.76kt8
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.76kt8
Dataset updated
Apr 27, 2018
Dataset provided by
Dryad
Authors
John M. Dwyer; Daniel C. Laughlin
Time period covered
Apr 27, 2017
Description
quadrat.scale.dataRefer to R script ("Dwyer_&_Laughlin_2017_Trait_covariance_script.r" for information about this dataframe.species.in.quadrat.scale.dataRefer to R script ("Dwyer_&_Laughlin_2017_Trait_covariance_script.r" for information about this dataframe.Dwyer_&_Laughlin_2017_Trait_covariance_scriptThis script reads in the two dataframes of "raw" data, calculates diversity and trait metrics and runs the major analyses presented in Dwyer & Laughlin 2017.
Data from: Humans exploit robust locomotion by improving the stability of...
zenodo.org
bin
Updated Jun 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessandro Santuz; Alessandro Santuz; Leon Brüll; Antonis Ekizos; Antonis Ekizos; Arno Schroll; Nils Eckardt; Nils Eckardt; Armin Kibele; Armin Kibele; Michael Schwenk; Michael Schwenk; Adamantios Arampatzis; Adamantios Arampatzis; Leon Brüll; Arno Schroll (2022). Humans exploit robust locomotion by improving the stability of control signals [Dataset]. http://doi.org/10.5281/zenodo.2687682
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2687682
Dataset updated
Jun 17, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alessandro Santuz; Alessandro Santuz; Leon Brüll; Antonis Ekizos; Antonis Ekizos; Arno Schroll; Nils Eckardt; Nils Eckardt; Armin Kibele; Armin Kibele; Michael Schwenk; Michael Schwenk; Adamantios Arampatzis; Adamantios Arampatzis; Leon Brüll; Arno Schroll
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background

Is the control of movement less stable when we walk or run in challenging settings? Intuitively, one might answer that it is, given that adding constraints to locomotion (e.g. rough terrain, age-related impairments, etc.) makes movements less stable. Here, we investigated how young and old humans synergistically activate muscles during locomotion when different perturbation levels are introduced. Of these control signals, called muscle synergies, we analyzed the stability over time. Surprisingly, we found that perturbations and older age force the central nervous system to produce muscle activation patterns that are more stable. These outcomes show that robust locomotion in challenging settings is achieved by increasing the stability of control signals, whereas easier tasks allow for more unstable control.

How to use the data set

This supplementary data set contains: a) the metadata with anonymized participant information, b) the raw electromyographic (EMG) data acquired during locomotion, c) the touchdown and lift-off timings of the recorded limb, d) the filtered and time-normalized EMG, e) the muscle synergies extracted via non-negative matrix factorization and f) the code written in R (R Found. for Stat. Comp.) to process the data, including the scripts to calculate the Maximum Lyapunov Exponents of motor primitives. In total, 476 trials from 86 participants are included in the supplementary data set.

The file “participant_data.dat” is available in ASCII and RData (R Found. for Stat. Comp.) format and contains:

Code: the participant’s code

Experiment: the experimental setup in which the participant was involved (E1 = walking and running, overground and treadmill; E2 = walking and running, even- and uneven-surface; E3 = unperturbed and perturbed walking, young and old)

Group: the group to which the participant was assigned (see above for the details)

Sex: the participant’s sex (M or F)

Speed: the speed at which the recordings were conducted in [m/s] (two values separated by a comma mean that recordings were done at two different speeds, i.e. walking and running)

Age: the participant’s age in years (participants were considered old if older than 65 years, but younger than 80)

Height: the participant’s height in [cm]

Mass: the participant’s body mass in [kg].

The files containing the gait cycle breakdown are available in RData (R Found. for Stat. Comp.) format, in the file named “CYCLE_TIMES.RData”. The files are structured as data frames with 30 rows (one for each gait cycle) and two columns. The first column contains the touchdown incremental times in seconds. The second column contains the duration of each stance phase in seconds. Each trial is saved as an element of a single R list. Trials are named like “CYCLE_TIMES_P0020,” where the characters “CYCLE_TIMES” indicate that the trial contains the gait cycle breakdown times and the characters “P0020” indicate the participant number (in this example the 20th). Please note that the overground trials of participants P0001 and P0009 and the second uneven-surface running trial of participant P0048 only contain 22, 27 and 23 cycles, respectively.

The files containing the raw, filtered and the normalized EMG data are available in RData (R Found. for Stat. Comp.) format, in the files named “RAW_EMG.RData” and “FILT_EMG.RData”. The raw EMG files are structured as data frames with 30000 rows (one for each recorded data point) and 14 columns. The first column contains the incremental time in seconds. The remaining thirteen columns contain the raw EMG data, named with muscle abbreviations that follow those reported in the Materials and Methods section of this Supplementary Materials file. Each trial is saved as an element of a single R list. Trials are named like “RAW_EMG_P0053_OG_02”, where the characters “RAW_EMG” indicate that the trial contains raw emg data, the characters “P0053” indicate the participant number (in this example the 53rd), the characters “OW” indicate the locomotion type (E1: OW=overground walking, OR=overground running, TW=treadmill walking, TR=treadmill running; E2: EW=even-surface walking, ER=even-surface running, UW=uneven-surface walking, UR=uneven-surface running; E3: NW=normal walking, PW=perturbed walking), and the numbers “02” indicate the trial number (in this case the 2nd). The 10 trials per participant recorded for each overground session (i.e. 10 for walking and 10 for running) were concatenated into one. The filtered and time-normalized emg data is named, following the same rules, like “FILT_EMG_P0053_OG_02”.

The files containing the muscle synergies extracted from the filtered and normalized EMG data are available in RData (R Found. for Stat. Comp.) format, in the files named “SYNS_H.RData” and “SYNS_W.RData”. The muscle synergies files are divided in motor primitives and motor modules and are presented as direct output of the factorization and not in any functional order. Motor primitives are data frames with 6000 rows and a number of columns equal to the number of synergies (which might differ from trial to trial) plus one. The rows contain the time-dependent coefficients (motor primitives), one column for each synergy plus the time points (columns are named e.g. “Time, Syn1, Syn2, Syn3”, where “Syn” is the abbreviation for “synergy”). Each gait cycle contains 200 data points, 100 for the stance and 100 for the swing phase which, multiplied by the 30 recorded cycles, result in 6000 data points distributed in as many rows. This output is transposed as compared to the one discussed above to improve user readability. Each set of motor primitives is saved as an element of a single R list. Trials are named like “SYNS_H_P0012_PW_02”, where the characters “SYNS_H” indicate that the trial contains motor primitive data, the characters “P0012” indicate the participant number (in this example the 12th), ), the characters “PW” indicate the locomotion type (see above), and the numbers “02” indicate the trial number (in this case the 2nd). Motor modules are data frames with 13 rows (number of recorded muscles) and a number of columns equal to the number of synergies (which might differ from trial to trial). The rows, named with muscle abbreviations that follow those reported in the Materials and Methods section of this Supplementary Materials file, contain the time-independent coefficients (motor modules), one for each synergy and for each muscle. Each set of motor modules relative to one synergy is saved as an element of a single R list. Trials are named like “SYNS_W_P0082_PW_02”, where the characters “SYNS_W” indicate that the trial contains motor module data, the characters “P0082” indicate the participant number (in this example the 82nd) ), the characters “PW” indicate the locomotion type (see above), and the numbers “02” indicate the trial number (in this case the 2nd). Given the nature of the NMF algorithm for the extraction of muscle synergies, the supplementary data set might show non-significant differences as compared to the one used for obtaining the results of this paper.

The files containing the MLE calculated from motor primitives are available in RData (R Found. for Stat. Comp.) format, in the file named “MLE.RData”. MLE results are presented in a list of lists containing, for each trial, 1) the divergences, 2) the MLE, and 3) the value of the R² between the divergence curve and its linear interpolation made using the specified amount of points. The divergences are presented as a one-dimensional vector. MLE are one number like the R² value. Trials are named like “MLE_P0081_EW_01”, where the characters “MLE” indicate that the trial contains MLE data, the characters “P0081” indicate the participant number (in this example the 81st) ), the characters “EW” indicate the locomotion type (see above), and the numbers “01” indicate the trial number (in this case the 1st).

All the code used for the preprocessing of EMG data, the extraction of muscle synergies and the calculation of MLE is available in R (R Found. for Stat. Comp.) format. Explanatory comments are profusely present throughout the scripts (“SYNS.R”, which is the script to extract synergies, “fun_NMF.R”, which contains the NMF function, “MLE.R”, which is the script to calculate the MLE of motor primitives and “fun_MLE.R”, which contains the MLE function).
e
Data from: Phonotactically driven cue weighting in a sound change in...
data.europa.eu
data.ub.uni-muenchen.de
csv
Updated May 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Universitätsbibliothek der Ludwig-Maximilians-Universität München (2023). Phonotactically driven cue weighting in a sound change in progress: Acoustic evidence from West Central Bavarian [Dataset]. https://data.europa.eu/data/datasets/https-open-bydata-de-api-hub-repo-datasets-https-data-ub-uni-muenchen-de-380-dataset?locale=da
Explore at:
csvAvailable download formats
Dataset updated
May 8, 2023
Dataset authored and provided by
Universitätsbibliothek der Ludwig-Maximilians-Universität München
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
R script and data frames to accompany the conference paper of the same name. The R file contain the script on which the analyses in Thon, K. & Kleber, F. (2023). Phonotactically driven cue weighting in a sound change in progress: Acoustic evidence from West Central Bavarian/ Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS), Prague, Czech Republic are based. The two .csv files contain the corresponding input data that needs to be loaded in R prior to run the respective analysis script.
m
Hot hand experimental data
data.mendeley.com
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcleiton Morais (2025). Hot hand experimental data [Dataset]. http://doi.org/10.17632/2b6z9796wg.3
Explore at:
Unique identifier
https://doi.org/10.17632/2b6z9796wg.3
Dataset updated
Jul 8, 2025
Authors
Marcleiton Morais
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset corresponds to the raw data from the research study “The Hot-Hand Belief and the Loss-Averse Investor.” The experimental protocol was approved by two Research Ethics Committees: the University of Brasília (UnB), under technical decision No. 2.558.540, and the Federal University of Tocantins (UFT), under technical decision No. 2.850.741. A total of 226 undergraduate students from various majors at UFT and UnB were recruited to participate in the study. After data cleaning, the final sample comprised 208 individuals.

The original data file, RawDatahh.rds, is stored in RDS format—a binary file structure native to the R programming language. This format enables efficient storage of complex R objects, such as lists and data frames, while preserving internal structures including labels, data types, and hierarchies. Specifically, the RDS file contains: (i) the global configuration parameters that define the rules and structure of the experimental game; (ii) detailed behavioral data on participant decisions across multiple periods; and (iii) the full set of simulated price series for the financial assets used in the experiment.

For ease of use in statistical analysis and replication, a cleaned and reformatted version of the raw data is provided in CSV format as subset_RawDatahh.csv. This subset retains key variables capturing participants’ choices, asset prices, the monetary values of purchased shares, and the alpha indices estimated in the study.

The dataset subset_RawDatahh.csv is structured as panel data, with observations for multiple participants across several time periods. It includes numeric variables on the quantities of assets purchased, corresponding prices, monetary amounts paid per purchase, and the indices $\theta_1^i$ and $\theta_2^i$ tested under Hypothesis 1 of the study.

This version of the dataset also includes two key supplementary materials to support transparency and usability. First, it provides the original experimental instructions that were delivered to participants during the laboratory sessions, offering full insight into the procedures, decision tasks, and game mechanics involved in the experiment. These instructions are available in PDF format and reflect the exact wording and layout used in the sessions. Second, it includes a comprehensive user guide that outlines the structure of the dataset, defines the core variables, and provides practical instructions for importing, managing, and analyzing the data in R.
Market Basket Analysis
kaggle.com
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
d
Data and code from: Severity of charcoal rot disease in soybean genotypes...
catalog.data.gov
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
+1more
Updated May 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data and code from: Severity of charcoal rot disease in soybean genotypes inoculated with Macrophomina phaseolina isolates differs among growth environments [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-severity-of-charcoal-rot-disease-in-soybean-genotypes-inoculated-with-i
Explore at:
Dataset updated
May 8, 2025
Dataset provided by
Agricultural Research Service
Description
This dataset includes all the raw data and all the R statistical software code that we used to analyze the data and produce all the outputs that are in the figures, tables, and text of the associated manuscript:Mengistu, A., Q. D. Read, C. R. Little, H. M. Kelly, P. M. Henry, and N. Bellaloui. 2025. Severity of charcoal rot disease in soybean genotypes inoculated with Macrophomina phaseolina isolates differs among growth environments. Plant Disease. DOI: 10.1094/PDIS-10-24-2230-RE.The data included here come from a series of tests designed to evaluate methods for identifying soybean genotypes that are resistant or susceptible to charcoal rot, a widespread and economically significant disease. Four independent experiments were performed to determine the variability in disease severity by soybean genotype and by isolated variant of the charcoal rot fungus: two field tests, a greenhouse test, and a growth chamber test. The tests differed in the number of genotypes and isolates used, as well as the method of inoculation. The accuracy of identifying resistant and susceptible genotypes varied by study, and the same isolate tested across different studies often had highly variable disease severity. Our results indicate that the non-field methods are not reliable ways to identify sources of charcoal rot resistance in soybean.The models fit in the R script archived here are Bayesian general linear mixed models with AUDPC (area under the disease progress curve) as the response variable. One-dimensional clustering is used to divide the genotypes into resistant and susceptible based on their model-predicted AUDPC values, and this result is compared with the preexisting resistance classification. Posterior distributions of the marginal means for different combinations of genotype, isolate, and other covariates are estimated and compared. Code to reproduce the tables and figures of the manuscript is also included.The following files are included:README.pdf: Full description, with column metadata for the data spreadsheets and text description of each R scriptdata2023-04-18.xlsx: Excel sheet with data from three of the four trialscleaned_data.RData: all data in analysis-ready format; generates a set of data frames when imported into an R environmentModified Cut-Tip Inoculation on DT974290 and LS980358 on first 32 isolates.xlsx: Excel spreadsheet with data from the fourth trialdata_cleaning.R: Script required to format data from .xlsx files into analysis-ready format (running this script is not necessary to reproduce the analysis; instead you may begin with the following script importing the cleaned .RData object)AUDPC_fits.R: Script containing code for all model fitting, model predictions and comparisons, and figure and table generation
f
RData file of estimate comparisons and primary MPRA data.
plos.figshare.com
application/gzip
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew R. Ghazi; Xianguo Kong; Ed S. Chen; Leonard C. Edelstein; Chad A. Shaw (2023). RData file of estimate comparisons and primary MPRA data. [Dataset]. http://doi.org/10.1371/journal.pcbi.1007504.s005
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1007504.s005
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS Computational Biology
Authors
Andrew R. Ghazi; Xianguo Kong; Ed S. Chen; Leonard C. Edelstein; Chad A. Shaw
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
An RData file that contains three data frames: ulirsch_comparisons, primary_comparisons, and primary_mpra_data. The first two data frames are the data necessary to produce Fig 4. Each row corresponds to one variant, and each column corresponds to a given analysis method. The values in the table give the transcription shift estimates. The third data frame gives the barcode counts from our primary MPRA dataset with anonymized variant identifiers. (RDATA)
o
Ld Estimated From 1K Genomes Ceu Population
explore.openaire.eu
zenodo.org
Updated Oct 17, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jean Morrison; Nicholas Knoblauch (2018). Ld Estimated From 1K Genomes Ceu Population [Dataset]. http://doi.org/10.5281/zenodo.1464356
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.1464356
Dataset updated
Oct 17, 2018
Authors
Jean Morrison; Nicholas Knoblauch
Description
These data contain estimated pairwise r^2 for variants with allele frequency greater than 0.05 in the 1000 Genomes CEU population. They were estimated using LDshrink (https://github.com/stephenslab/LDshrink). R^2 is only reported when the estimate is greater than 0.1. For each chromosome there are two files: chr
o
Data from: Common field data limitations can substantially bias sexual...
explore.openaire.eu
datadryad.org
Updated Aug 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emily Cramer; Sara Ann Kaiser; Mike S Webster; T Brandt Ryder (2020). Common field data limitations can substantially bias sexual selection metrics [Dataset]. http://doi.org/10.5061/dryad.xksn02vc5
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.xksn02vc5
Dataset updated
Aug 8, 2020
Authors
Emily Cramer; Sara Ann Kaiser; Mike S Webster; T Brandt Ryder
Description
This data supplement contains 7 items, with dual goals of providing all the code needed to replicate the results in the manuscript and providing code to researchers who want to explore bias in their own study systems. Note that results for individual study systems should be viewed with caution, as assumptions about the exact mechanisms of extra-pair copulations and fertilizations may not be appropriate for a particular study system. The utility for other researchers to explore their own study systems was a secondary goal. Thus the code is not streamlined for this application. Suggestions for how to approach this are at the end of this document. We provide: 1. Simulation code used to generate populations from multiple different species, with flexibility in the number of data limitations and their degree (MethodsMSSimCode_20203027.R). Within the simulation, each population's Bateman gradients are calculated and saved, as are data needed to calculate the other metrics. Other information about the population is not stored. The simulation code outputs four dataframes that are intended to be written as .csv files, to be re-opened and analysed separately. "GrandvarianceTerms" includes some descriptive information about the study population (number of extra-pair offspring), as well as true and detected variance in reproductive and copulation success. It is one row per simulation iteration. "GrandGradientSummary" contains the model output for eight Bateman gradients per simulation iteration (true and detected gradients for each of two sexes, with both a raw and a relativized form). The different types of Bateman gradient are identified in a column named "Model" with levels: FemaleBatemanApparent, MaleBatemanApparent, FemaleBatemanTrue, MaleBatemanTrue, MaleBatemanDetRel_Terr, MaleBatemanTrue_Both, FemaleBatemanApparentRel, and FemaleBatemanTrueRel. The latter four are the relativized forms of the gradient. "GrandPopSUmmaryTot" summarizes, for each replicate population, the number of extra-pair males that sired offspring in each brood. Length of the output will vary. "GrandPopSummaryNEPTot" summarizes, for each replicate population, the number of extra-pair offspring for each brood size. Length of the output will vary. 2. Though they are not used in the main study, we include two additions relevant for researchers wanting to model their own systems. First, you can add random noise to your estimates of m and s by un-commenting the code at lines 90-91 and 153-154 (consider commenting out lines 102 and 103). This incorporates up to 10% variation in m and s (and can easily be adjusted to add a different amount of noise). Second, you can incorporate spatial limitations. Because this was a longer piece of code, we have supplied it as a separate file (ModificationsToIncludeSpatialImpacts.R), which can be used to replace the code at lines 149-192 in the main simulation code. 3. Code used to analyze the output from the simulations (ExaminingSims_Methods_20200327_commented.R). The code for statistical analysis and visualization starts by re-shaping the data and combining content from the GrandVarianceTerms and GrandGradientSummary dataframes in order to make a dataframe for s'max (which is then written as a file and re-read from the file for convenience). The processing includes a number of steps that may not be useful for examining one's own study population, but that were needed for our analyses. We have erred on the side of including more than necessary, and noting places where we expect most readers would prefer to comment out the code (below in specific instructions for simulating individual populations). For the analyses presented in the paper, there are three main dataframes for statistics (ie., that contain the mean values across replicate iterations of the simulation) and three main dataframes with information on the individual iterations, which were used in preparing figures. Dataframes with the word Variance in the name are optimized for opportunity for selection. Dataframes with Gradients in the name are for the Bateman gradient. Dataframes with JonesIndex in the name are for s'max, the relativized Bateman gradient, and opportunity for sexual selection. For each separate analytical question, the main dataframe is filtered to contain only one source of data limitation at a time. Starting at line 390, this code begins the analysis presented in the main paper. Analysis follows the overall structure in the paper: first looking at basic simulation conditions, then at infertility, then predation, etc. Each of these major subdivisions has a header in the code, to facilitate quick access. Within each major heading, analysis is first from the variance dataframe (i.e., opportunity for selection), then from the gradients dataframe (Bateman gradients), then from the JonesIndex dataframe (first s'max then opportunity for sexual selection). Within each re...
f
Data used in "A summer heatwave reduced activity, heart rate and autumn body...
datasetcatalog.nlm.nih.gov
figshare.com
Updated Mar 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evans, Alina L.; Albon, Steve; Król, Elżbieta; Trondrud, L. Monica; Kumpula, Jouko; Speakman, john; Loe, Leif Egil; Pigeon, Gabriel; Ropstad, Erik (2023). Data used in "A summer heatwave reduced activity, heart rate and autumn body mass in a cold-adapted ungulate" [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001093609
Explore at:
Dataset updated
Mar 31, 2023
Authors
Evans, Alina L.; Albon, Steve; Król, Elżbieta; Trondrud, L. Monica; Kumpula, Jouko; Speakman, john; Loe, Leif Egil; Pigeon, Gabriel; Ropstad, Erik
Description
Overview This dataset contains biologging data and R script used to produce the results in "A summer heatwave reduced activity, heart rate and autumn body mass in a cold-adapted ungulate", a submitted manuscript. The longitudinal data of female reindeer and calf body masses used in the paper is owned by the Finnish Reindeer Herders’ Association. Natural Resources Institute Finland (Luke) updates, saves and administrates this long-term reindeer herd data. Methods of data collection Animals and study area The study involved biologging (see below) 14 adult semi-domesticated reindeer females (Focal animals: Table S1) at the Kutuharju Reindeer Research Facility (Kaamanen, Northern Finland, 69° 8’ N, 26° 59’ E, Figure S1), during June–September 2018. Ten of these individuals had been intensively handled in June as part of another study (Trondrud, 2021). The 14 females were part of a herd of ~100 animals, belonging to the Reindeer Herders’ Association. The herding management included keeping reindeer in two large enclosures (~13.8 and ~15 km2) after calving until the rut, after which animals were moved to a winter enclosure (~15 km2) and then in spring to a calving paddock (~0.3 km2) to give birth (See Supporting Information for further details on the study area). Kutuharju reindeer graze freely on natural pastures from May to November and after that are provided with silage and pellets as a supplementary feed in winter. During the period from September to April animals are weighed 5–6 times. In September, body masses of the focal females did not differ from the rest of the herd. Heart rate (HR) and subcutaneous body temperature (Tsc) data In February 2018, the focal females were instrumented with a heart rate (HR) and temperature logger (DST centi-HRT, Star-Oddi, Gardabaer, Iceland). The surgical protocol is described in the Supporting Information. The DST centi-HRT sensors recorded HR and subcutaneous body temperature (Tsc) every 15 min. HR was automatically calculated from a 4-sec electrocardiogram (ECG) at 150 Hz measurement frequency, alongside an index for signal quality. Additional data processing is described in Supporting Information. Activity data The animals were fitted with collar-mounted tri-axial accelerometers (Vertex Plus Activity Sensor, Vectronic Aerospace GmbH, Berlin, Germany) to monitor their activity levels. These sensors recorded acceleration (g) in three directions representing back-forward, lateral, and dorsal-ventral movements at 8 Hz resolution. For each axis, partial dynamic body acceleration (PDBA) was calculated by subtracting the static acceleration using a 4 sec running average from the raw acceleration (Shepard et al., 2008). We estimated vectorial dynamic body acceleration (VeDBA) by calculating the square root of the sum of squared PDBAs (Wilson et al., 2020). We aggregated VeDBA data into 15-min sums (hereafter “sum VeDBA”) to match with HR and Tsc records. Corrections for time offsets are described in Supporting Information. Due to logger failures, only 10 of the 14 individuals had complete data from both loggers (activity and heart rate). Weather and climate data We set up a HOBO weather station (Onset Computer Corporation, Bourne, MA, USA) mounted on a 2 m tall tripod in May 2018 that measured air temperature (Ta, °C) at 15-minute intervals. The placement of the station was between the two summer paddocks. These measurements were matched to the nearest timestamps for VeDBA, HR and Tsc recordings. Also, we obtained weather records from the nearest public weather stations for the years 1990–2021 (Table S2). Weather station IDs and locations relative to the study area are shown in Figure S1 in the Supporting Information. The temperatures at the study site and the nearest weather station were strongly correlated (Pearson’s, r = 0.99), but temperatures were on average ~1.0°C higher at the study site (Figure S2). Statistical analyses All statistical analyses were conducted in R version 4.1.1 (The R Core Team, 2021). Mean values are presented with standard deviation (SD), and parameter estimates with standard error (SE). Environmental effects on activity states and transition probabilities We fitted hidden Markov models (HMM) to 15-min sum VeDBA using the package ‘momentuHMM’ (McClintock & Michelot, 2018). HMMs assume that the observed pattern is driven by an underlying latent state sequence (a finite Markov chain). These states can then be used as proxies to interpret the animal’s unobserved behaviour (Langrock et al., 2012). We assumed only two underlying states, thought to represent ‘inactive’ and ‘active’ (Figure S3). The ‘active’ state thus contains multiple forms of movement, e.g., foraging, walking, and running, but reindeer spend more than 50% of the time foraging in summer (Skogland, 1980). We fitted several HMMs to evaluate both external (temperature and time of day) and individual-level (calf status) effects on the probability to occupy each state (stationary state probabilities). The combination of the explanatory variables in each HMM is listed in Table S5. Ta was fitted as a continuous variable with piecewise polynomial spline with 8 knots, asserted from visual inspection of the model outputs. We included sine and cosine terms for time of day to account for cyclicity. In addition, to assess the impact of Ta on activity patterns, we fitted five temperature-day categories in interaction with time of day. These categories were based on 20% intervals of the distribution of temperature data from our local weather station, in the period 19 June to 19 August 2018, with ranges of < 10°C (cold), 10−13°C (cool), 13−16°C (intermediate) 16−20°C (warm) and ≥ 20°C (hot). We evaluated the significance of each variable on the transition probabilities from the confidence intervals of each estimate, and the goodness-of-fit of each model using Akaike information criteria (AIC) (Burnham & Anderson, 2002), retaining models within ΔAIC < 5. We extracted the most likely state occupied by an individual using the viterbi function, returning the optimal state pathway, i.e., a two-level categorical variable indicating whether the individual was most likely resting or active. We used this output to calculate daily activity budgets (% time spent active). Drivers of heart rate (HR) and subcutaneous body temperature (Tsc) We matched the activity states derived from the HMM to the HR and Tsc data. We opted to investigate the drivers of variation in HR and Tsc only within the inactive state. HR and Tsc were fitted as response variables in separate generalised additive mixed-effects models (GAMM), which included the following smooth terms: calendar day as a thin-plate regression spline, time of day (ToD, in hours, knots [k] = 10) as a cubic circular regression spline and individual as random intercept. All models were fitted using restricted maximum likelihood, a penalization value (λ) of 1.4 (Wood, 2017), and an autoregressive structure (AR1) to account for temporal autocorrelation. We used the ‘gam.check’ function from the ‘mgcv’ package to select k. The sum of VeDBA in the past 15 minutes was included as a predictor in all models. All models were fitted with the same set of explanatory variables: sum VeDBA, age, body mass (BM), lactation status, Ta, as well as the interaction between lactation status and Ta. Description of files 1. Data: "kutuharju_weather.csv" weather data recorded from local weather station during study period "Inari_Ivalo_lentoasema.csv" public weather data from weather station ID 102033, owned and managed by the Finnish Meterorological Institute "activitydata.Rdata" dataset used in analyses of activity patterns in reindeer "HR_temp_data.Rdata" dataset used in analyses of heart rate and body temperature responses in reindeer "HRfigureData.Rdata" and "TempFigureData.Rdata" are data files (lists) with model outputs generated in "heartrate_bodytemp_analyses.R" and used in "figures_in_paper.R" "HMM_df_withStates.Rdata" data frame used in HMM models including output from viterbi function "plotdf_m16.Rdata" dataframe for plotting output from model 16 "plotdf_m22.Rdata" dataframe for plotting output from model 22 2. Scripts "activitydata_HMMs.R" R script for data prep and hidden markov models to analyse activity patterns in reindeer "heartrate_bodytemp_analyses.R" R script for data prep and generalized additive mixed models to analyse heart rate and body temperature responses in reindeer "figures_in_paper.R" R script for generating figures 1-3 in the manuscript 3. HMM_model "modelList.Rdata" list containing 2 items: string of all 25 HMM models created, and dataframe with model number and formula "m16.Rdata" and "m22.Rdata" direct acces to two best-fit models
c
Data from: NEOWISE-R Single Exposure (L1b) Frame Metadata Table
s.cnmilf.com
data.staging.idas-ds1.appdat.jsc.nasa.gov
+2more
Updated Jun 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NASA/IPAC Infrared Science Archive (2025). NEOWISE-R Single Exposure (L1b) Frame Metadata Table [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/neowise-r-single-exposure-l1b-frame-metadata-table
Explore at:
Dataset updated
Jun 28, 2025
Dataset provided by
NASA/IPAC Infrared Science Archive
Description
The Near-Earth Object Wide-field Infrared Survey Explorer Reactivation Mission (NEOWISE; Mainzer et al. 2014, ApJ, 792, 30) is a NASA Planetary Science Division space-based survey to detect, track and characterize asteroids and comets, and to learn more about the population of near-Earth objects that could pose an impact hazard to the Earth. NEOWISE systematically images the sky at 3.4 and 4.6 μm, obtaining multiple independent observations on each _location that enable detection of previously known and new solar system small bodies by virtue of the their motion. Because it is an infrared survey, NEOWISE detects asteroid thermal emission and is equally sensitive to high and low albedo objects.The following table contains brief descriptions of all metadata information that is relevant to the processing of Single-exposure (level 1) images and the extraction of sources from the corresponding Single-exposure images. The table contains the unique scan ID and frame number for specific each single-exposure image and the reconstructed right ascension and declination of the image center. Much of the information in this table is processing-specific, and may not be of interest to general users (e.g. flags indicating whether frames have been processed or not, and the date and time for starting of the pipeline etc). The metadata table also contains some characterization and derived statistics of the Single-exposure image frames, basic parameters used for photometry and derived statistics for extracted sources and artifacts. For example, it contains the number of sources with profile-fit photometry Signal-to-Noise (SNR) greater than 3, and the total number of real sources affected by artifacts such as latent images and electronic ghosts.
d
Data from: Quantification of the zygotic barrier between interbreeding taxa...
datadryad.org
data.niaid.nih.gov
zip
Updated Nov 7, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ronald Bialozyt; Marc Niggemann; Birgit Ziegenhagen (2016). Quantification of the zygotic barrier between interbreeding taxa using gene flow data [Dataset]. http://doi.org/10.5061/dryad.ts544
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.ts544
Dataset updated
Nov 7, 2016
Dataset provided by
Dryad
Authors
Ronald Bialozyt; Marc Niggemann; Birgit Ziegenhagen
Time period covered
Nov 7, 2016
Area covered
Gemany, Central Europe, Hesse, Middle Europe
Description
Introgression dataThe file was produced using the command 'save()' from the R software. This resulted in the file “introgression.RData”, which includes two dataframes: (1) intro.rates.paper and (2) tree.data. The first dataframe comprises the empirical introgression rates recorded by Bialozyt et al. (2012). The columns are the individual tree number followed by the percentage of introgression for year 2006 and 2007 respectively. The second dataframe (tree.data) comprises the name, coordinates, sex and taxa of all trees used in the simulation. In the column 'SEX' females are coded '0' and males are coded '1'. In the column 'SPECIES' Populus nigra are coded '0' and the hybrids (Populus × canadenis) are coded '1'.introgression.RData
Data_Patterns of gene flow across multiple anthropogenic infrastructures:...
figshare.com
zip
Updated Feb 11, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Remon; Sylvain Moulhérat; Jérémie H. Cornuau; Lucie Gendron; Murielle Richard; Michel Baguette; Jérôme G. Prunier (2020). Data_Patterns of gene flow across multiple anthropogenic infrastructures: insights from a multi-species approach [Dataset]. http://doi.org/10.6084/m9.figshare.11835840.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11835840.v1
Dataset updated
Feb 11, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jonathan Remon; Sylvain Moulhérat; Jérémie H. Cornuau; Lucie Gendron; Murielle Richard; Michel Baguette; Jérôme G. Prunier
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data attached to the manuscript "Patterns of gene flow across multiple anthropogenic infrastructures: insights from a multi-species approach". There is four data sets, one per studied species including R scripts and original data frames. We also included aditional R functions needed to run the R scripts.
SAPFLUXNET: A global database of sap flow measurements
zenodo.org
explore.openaire.eu
zip
Updated Sep 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Poyatos; Rafael Poyatos; Víctor Granda; Víctor Granda; Víctor Flo; Víctor Flo; Roberto Molowny-Horas; Roberto Molowny-Horas; Kathy Steppe; Kathy Steppe; Maurizio Mencuccini; Maurizio Mencuccini; Jordi Martínez-Vilalta; Jordi Martínez-Vilalta (2020). SAPFLUXNET: A global database of sap flow measurements [Dataset]. http://doi.org/10.5281/zenodo.3697807
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3697807
Dataset updated
Sep 26, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rafael Poyatos; Rafael Poyatos; Víctor Granda; Víctor Granda; Víctor Flo; Víctor Flo; Roberto Molowny-Horas; Roberto Molowny-Horas; Kathy Steppe; Kathy Steppe; Maurizio Mencuccini; Maurizio Mencuccini; Jordi Martínez-Vilalta; Jordi Martínez-Vilalta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General description

SAPFLUXNET contains a global database of sap flow and environmental data, together with metadata at different levels.
SAPFLUXNET is a harmonised database, compiled from contributions from researchers worldwide. This version (0.1.4) contains more than 200 datasets, from all over the World, covering a broad range of bioclimatic conditions.
More information on the coverage can be found here: http://sapfluxnet.creaf.cat/shiny/sfn_progress_dashboard/.

The SAPFLUXNET project has been developed by researchers at CREAF and other institutions (http://sapfluxnet.creaf.cat/#team), coordinated by Rafael Poyatos (CREAF, http://www.creaf.cat/staff/rafael-poyatos-lopez), and funded by two Spanish Young Researcher's Grants (SAPFLUXNET, CGL2014-55883-JIN; DATAFORUSE, RTI2018-095297-J-I00 ) and an Alexander von Humboldt Research Fellowship for Experienced Researchers).

Variables and units

SAPFLUXNET contains whole-plant sap flow and environmental variables at sub-daily temporal resolution. Both sap flow and environmental time series have accompanying flags in a data frame, one for sap flow and another for environmental
variables. These flags store quality issues detected during the quality control process and can be used to add further quality flags.

Metadata contain relevant variables informing about site conditions, stand characteristics, tree and species attributes, sap flow methodology and details on environmental measurements. To learn more about variables, units and data flags please use the functionalities implemented in the sapfluxnetr package (https://github.com/sapfluxnet/sapfluxnetr). In particular, have a look at the package vignettes using R:

# remotes::install_github( # 'sapfluxnet/sapfluxnetr', # build_opts = c("--no-resave-data", "--no-manual", "--build-vignettes") # ) library(sapfluxnetr) # to list all vignettes vignette(package='sapfluxnetr') # variables and units vignette('metadata-and-data-units', package='sapfluxnetr') # data flags vignette('data-flags', package='sapfluxnetr')

Data formats

SAPFLUXNET data can be found in two formats: 1) RData files belonging to the custom-built 'sfn_data' class and 2) Text files in .csv format. We recommend using the sfn_data objects together with the sapfluxnetr package, although we also provide the text files for convenience. For each dataset, text files are structured in the same way as the slots of sfn_data objects; if working with text files, we recommend that you check the data structure of 'sfn_data' objects in the corresponding vignette.

Working with sfn_data files

To work with SAPFLUXNET data, first they have to be downloaded from Zenodo, maintaining the folder structure. A first level in the folder hierarchy corresponds to file format, either RData files or csv's. A second level corresponds to how sap flow is expressed: per plant, per sapwood area or per leaf area. Please note that interconversions among the magnitudes have been performed whenever possible. Below this level, data have been organised per dataset. In the case of RData files, each dataset is contained in a sfn_data object, which stores all data and metadata in different slots (see the vignette 'sfn-data-classes'). In the case of csv files, each dataset has 9 individual files, corresponding to metadata (5), sap flow and environmental data (2) and their corresponding data flags (2).

After downloading the entire database, the sapfluxnetr package can be used to:
- Work with data from a single site: data access, plotting and time aggregation.
- Select the subset datasets to work with.
- Work with data from multiple sites: data access, plotting and time aggregation.

Please check the following package vignettes to learn more about how to work with sfn_data files:

Quick guide

Metadata and data units

sfn_data classes

Custom aggregation

Memory and parallelization

Working with text files

We recommend to work with sfn_data objects using R and the sapfluxnetr package and we do not currently provide code to work with text files.

Data issues and reporting

Please report any issue you may find in the database by sending us an email: sapfluxnet@creaf.uab.cat.

Temporary data fixes, detected but not yet included in released versions will be published in SAPFLUXNET main web page ('Known data errors').

Data access, use and citation

This version of the SAPFLUXNET database is open access. We are working on a data paper describing the database, but, before its publication, please cite this Zenodo entry if SAPFLUXNET is used in any publication.
Z
Data from: Lower complexity of motor primitives ensures robust control of...
data.niaid.nih.gov
Updated Jun 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kunimasa, Yoko (2022). Lower complexity of motor primitives ensures robust control of high-speed human locomotion [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3764760
Explore at:
Dataset updated
Jun 18, 2022
Dataset provided by
Ekizos, Antonis
Santuz, Alessandro
Kijima, Kota
Arampatzis, Adamantios
Kunimasa, Yoko
Ishikawa, Masaki
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Walking and running are mechanically and energetically different locomotion modes. For selecting one or another, speed is a parameter of paramount importance. Yet, both are likely controlled by similar low-dimensional neuronal networks that reflect in patterned muscle activations called muscle synergies. Here, we investigated how humans synergistically activate muscles during locomotion at different submaximal and maximal speeds. We analysed the duration and complexity (or irregularity) over time of motor primitives, the temporal components of muscle synergies. We found that the challenge imposed by controlling high-speed locomotion forces the central nervous system to produce muscle activation patterns that are wider and less complex relative to the duration of the gait cycle. The motor modules, or time-independent coefficients, were redistributed as locomotion speed changed. These outcomes show that robust locomotion control at challenging speeds is achieved by modulating the relative contribution of muscle activations and producing less complex and wider control signals, whereas slow speeds allow for more irregular control.

In this supplementary data set we made available: a) the metadata with anonymized participant information, b) the raw EMG, c) the touchdown and lift-off timings of the recorded limb, d) the filtered and time-normalized EMG, e) the muscle synergies extracted via NMF and f) the code to process the data, including the scripts to calculate the Higuchi's fractal dimension (HFD) of motor primitives. In total, 180 trials from 30 participants are included in the supplementary data set.

The file “metadata.dat” is available in ASCII and RData format and contains:

Code: the participant’s code

Group: the experimental group in which the participant was involved (G1 = walking and submaximal running; G2 = submaximal and maximal running)

Sex: the participant’s sex (M or F)

Speeds: the type of locomotion (W for walking or R for running) and speed at which the recordings were conducted in 10*[m/s]

Age: the participant’s age in years

Height: the participant’s height in [cm]

Mass: the participant’s body mass in [kg]

PB: 100 m-personal best time (for G2).

The "RAW_DATA.RData" R list consists of elements of S3 class "EMG", each of which is a human locomotion trial containing cycle segmentation timings and raw electromyographic (EMG) data from 13 muscles of the right-side leg. Cycle times are structured as data frames containing two columns that correspond to touchdown (first column) and lift-off (second column). Raw EMG data sets are also structured as data frames with one row for each recorded data point and 14 columns. The first column contains the incremental time in seconds. The remaining 13 columns contain the raw EMG data, named with the following muscle abbreviations: ME = gluteus medius, MA = gluteus maximus, FL = tensor fasciæ latæ, RF = rectus femoris, VM = vastus medialis, VL = vastus lateralis, ST = semitendinosus, BF = biceps femoris, TA = tibialis anterior, PL = peroneus longus, GM = gastrocnemius medialis, GL = gastrocnemius lateralis, SO = soleus. Please note that the following trials include less than 30 gait cycles (the actual number shown between parentheses): P16_R_83 (20), P16_R_95 (25), P17_R_28 (28), P17_R_83 (24), P17_R_95 (13), P18_R_95 (23), P19_R_95 (18), P20_R_28 (25), P20_R_42 (27), P20_R_95 (25), P22_R_28 (23), P23_R_28(29), P24_R_28 (28), P24_R_42 (29), P25_R_28 (29), P25_R_95 (28), P26_R_28 (29), P26_R_95 (28), P27_R_28 (28), P27_R_42 (29), P27_R_95 (24), P28_R_28 (29), P29_R_95 (17). All the other trials consist of 30 gait cycles. Trials are named like “P20_R_20,” where the characters “P20” indicate the participant number (in this example the 20th), the character “R” indicate the locomotion type (W=walking, R=running), and the numbers “20” indicate the locomotion speed in 10*m/s (in this case the speed is 2.0 m/s). The filtered and time-normalized emg data is named, following the same rules, like “FILT_EMG_P03_R_30”.

Old versions not compatible with the R package musclesyneRgies

The files containing the gait cycle breakdown are available in RData format, in the file named “CYCLE_TIMES.RData”. The files are structured as data frames with as many rows as the available number of gait cycles and two columns. The first column named “touchdown” contains the touchdown incremental times in seconds. The second column named “stance” contains the duration of each stance phase of the right foot in seconds. Each trial is saved as an element of a single R list. Trials are named like “CYCLE_TIMES_P20_R_20,” where the characters “CYCLE_TIMES” indicate that the trial contains the gait cycle breakdown times, the characters “P20” indicate the participant number (in this example the 20th), the character “R” indicate the locomotion type (W=walking, R=running), and the numbers “20” indicate the locomotion speed in 10*m/s (in this case the speed is 2.0 m/s). Please note that the following trials include less than 30 gait cycles (the actual number shown between parentheses): P16_R_83 (20), P16_R_95 (25), P17_R_28 (28), P17_R_83 (24), P17_R_95 (13), P18_R_95 (23), P19_R_95 (18), P20_R_28 (25), P20_R_42 (27), P20_R_95 (25), P22_R_28 (23), P23_R_28(29), P24_R_28 (28), P24_R_42 (29), P25_R_28 (29), P25_R_95 (28), P26_R_28 (29), P26_R_95 (28), P27_R_28 (28), P27_R_42 (29), P27_R_95 (24), P28_R_28 (29), P29_R_95 (17).

The files containing the raw, filtered and the normalized EMG data are available in RData format, in the files named “RAW_EMG.RData” and “FILT_EMG.RData”. The raw EMG files are structured as data frames with as many rows as the amount of recorded data points and 13 columns. The first column named “time” contains the incremental time in seconds. The remaining 12 columns contain the raw EMG data, named with muscle abbreviations that follow those reported above. Each trial is saved as an element of a single R list. Trials are named like “RAW_EMG_P03_R_30”, where the characters “RAW_EMG” indicate that the trial contains raw emg data, the characters “P03” indicate the participant number (in this example the 3rd), the character “R” indicate the locomotion type (see above), and the numbers “30” indicate the locomotion speed (see above). The filtered and time-normalized emg data is named, following the same rules, like “FILT_EMG_P03_R_30”.

The files containing the muscle synergies extracted from the filtered and normalized EMG data are available in RData format, in the files named “SYNS_H.RData” and “SYNS_W.RData”. The muscle synergies files are divided in motor primitives and motor modules and are presented as direct output of the factorisation and not in any functional order. Motor primitives are data frames with 6000 rows and a number of columns equal to the number of synergies (which might differ from trial to trial) plus one. The rows contain the time-dependent coefficients (motor primitives), one column for each synergy plus the time points (columns are named e.g. “time, Syn1, Syn2, Syn3”, where “Syn” is the abbreviation for “synergy”). Each gait cycle contains 200 data points, 100 for the stance and 100 for the swing phase which, multiplied by the 30 recorded cycles, result in 6000 data points distributed in as many rows. This output is transposed as compared to the one discussed in the methods section to improve user readability. Each set of motor primitives is saved as an element of a single R list. Trials are named like “SYNS_H_P12_W_07”, where the characters “SYNS_H” indicate that the trial contains motor primitive data, the characters “P12” indicate the participant number (in this example the 12th), the character “W” indicate the locomotion type (see above), and the numbers “07” indicate the speed (see above). Motor modules are data frames with 12 rows (number of recorded muscles) and a number of columns equal to the number of synergies (which might differ from trial to trial). The rows, named with muscle abbreviations that follow those reported above, contain the time-independent coefficients (motor modules), one for each synergy and for each muscle. Each set of motor modules relative to one synergy is saved as an element of a single R list. Trials are named like “SYNS_W_P22_R_20”, where the characters “SYNS_W” indicate that the trial contains motor module data, the characters “P22” indicate the participant number (in this example the 22nd), the character “W” indicates the locomotion type (see above), and the numbers “20” indicate the speed (see above). Given the nature of the NMF algorithm for the extraction of muscle synergies, the supplementary data set might show non-significant differences as compared to the one used for obtaining the results of this paper.

The files containing the HFD calculated from motor primitives are available in RData format, in the file named “HFD.RData”. HFD results are presented in a list of lists containing, for each trial, 1) the HFD, and 2) the interval time k used for the calculations. HFDs are presented as one number (mean HFD of the primitives for that trial), as are the interval times k. Trials are named like “HFD_P01_R_95”, where the characters “HFD” indicate that the trial contains HFD data, the characters “P01” indicate the participant number (in this example the 1st), the character “R” indicates the locomotion type (see above), and the numbers “95” indicate the speed (see above).

All the code used for the pre-processing of EMG data, the extraction of muscle synergies and the calculation of HFD is available in R format. Explanatory comments are profusely present throughout the script “muscle_synergies.R”.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). Children's questionnaire (data frame) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/93fcb56f-065d-502e-8941-49dd99608fb5

Children's questionnaire (data frame) - Dataset - B2FIND

Explore at:

Dataset updated

Jun 23, 2024

Description

"WeAreHere!" Children's questionnaire. This dataset includes: (1) the WaH children's questionnaire (20 questions including 5-point Likert scale questions, dichotomous questions and an open space for comments). The Catalan version (original), and the Spanish and English versions of the questionnaire can be found in this dataset in pdf format. (2) The data frame in xlsx format, with the children's answers to the questionnaire (a total of 3664 answers) and a reduced version of it for doing the regression (with the 5-point likert scale variable "ask for help" transformed into a dichotomous variable). (3) The data frame in xlsx format, with the children's answers to the questionnaire and the categorization of their comments (sheet 1), the data frame with only the MCA variables selected (sheet 2), and the categories and subcategories table (sheet 3). (4) The data analysis procedure for the regression, the component and multiple component analysis (R script).

Clear search

Close search

Google apps

Main menu

Children's questionnaire (data frame) - Dataset - B2FIND

R codes and dataset for Visualisation of Diachronic Constructional Change...

CORESIDENCE_GLAD: The Global Living Arrangements Database, 1960-2021

Data from: The island-mainland species turnover relationship

Data for the Farewell and Herberg example of a two-phase experiment using a...

Data from: Constraints on trait combinations explain climatic drivers of...

Data from: Humans exploit robust locomotion by improving the stability of...

Data from: Phonotactically driven cue weighting in a sound change in...

Hot hand experimental data

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Data and code from: Severity of charcoal rot disease in soybean genotypes...

RData file of estimate comparisons and primary MPRA data.

Ld Estimated From 1K Genomes Ceu Population

Data from: Common field data limitations can substantially bias sexual...

Data used in "A summer heatwave reduced activity, heart rate and autumn body...

Data from: NEOWISE-R Single Exposure (L1b) Frame Metadata Table

Data from: Quantification of the zygotic barrier between interbreeding taxa...

Data_Patterns of gene flow across multiple anthropogenic infrastructures:...

SAPFLUXNET: A global database of sap flow measurements

Data from: Lower complexity of motor primitives ensures robust control of...

Children's questionnaire (data frame) - Dataset - B2FINDSee More Versions

Children's questionnaire (data frame) - Dataset - B2FIND