"WeAreHere!" Children's questionnaire. This dataset includes: (1) the WaH children's questionnaire (20 questions including 5-point Likert scale questions, dichotomous questions and an open space for comments). The Catalan version (original), and the Spanish and English versions of the questionnaire can be found in this dataset in pdf format. (2) The data frame in xlsx format, with the children's answers to the questionnaire (a total of 3664 answers) and a reduced version of it for doing the regression (with the 5-point likert scale variable "ask for help" transformed into a dichotomous variable). (3) The data frame in xlsx format, with the children's answers to the questionnaire and the categorization of their comments (sheet 1), the data frame with only the MCA variables selected (sheet 2), and the categories and subcategories table (sheet 3). (4) The data analysis procedure for the regression, the component and multiple component analysis (R script).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Walking and running are mechanically and energetically different locomotion modes. For selecting one or another, speed is a parameter of paramount importance. Yet, both are likely controlled by similar low-dimensional neuronal networks that reflect in patterned muscle activations called muscle synergies. Here, we investigated how humans synergistically activate muscles during locomotion at different submaximal and maximal speeds. We analysed the duration and complexity (or irregularity) over time of motor primitives, the temporal components of muscle synergies. We found that the challenge imposed by controlling high-speed locomotion forces the central nervous system to produce muscle activation patterns that are wider and less complex relative to the duration of the gait cycle. The motor modules, or time-independent coefficients, were redistributed as locomotion speed changed. These outcomes show that robust locomotion control at challenging speeds is achieved by modulating the relative contribution of muscle activations and producing less complex and wider control signals, whereas slow speeds allow for more irregular control.
In this supplementary data set we made available: a) the metadata with anonymized participant information, b) the raw EMG, c) the touchdown and lift-off timings of the recorded limb, d) the filtered and time-normalized EMG, e) the muscle synergies extracted via NMF and f) the code to process the data, including the scripts to calculate the Higuchi's fractal dimension (HFD) of motor primitives. In total, 180 trials from 30 participants are included in the supplementary data set.
The file “metadata.dat” is available in ASCII and RData format and contains:
Code: the participant’s code
Group: the experimental group in which the participant was involved (G1 = walking and submaximal running; G2 = submaximal and maximal running)
Sex: the participant’s sex (M or F)
Speeds: the type of locomotion (W for walking or R for running) and speed at which the recordings were conducted in 10*[m/s]
Age: the participant’s age in years
Height: the participant’s height in [cm]
Mass: the participant’s body mass in [kg]
PB: 100 m-personal best time (for G2).
The "RAW_DATA.RData" R list consists of elements of S3 class "EMG", each of which is a human locomotion trial containing cycle segmentation timings and raw electromyographic (EMG) data from 13 muscles of the right-side leg. Cycle times are structured as data frames containing two columns that correspond to touchdown (first column) and lift-off (second column). Raw EMG data sets are also structured as data frames with one row for each recorded data point and 14 columns. The first column contains the incremental time in seconds. The remaining 13 columns contain the raw EMG data, named with the following muscle abbreviations: ME = gluteus medius, MA = gluteus maximus, FL = tensor fasciæ latæ, RF = rectus femoris, VM = vastus medialis, VL = vastus lateralis, ST = semitendinosus, BF = biceps femoris, TA = tibialis anterior, PL = peroneus longus, GM = gastrocnemius medialis, GL = gastrocnemius lateralis, SO = soleus. Please note that the following trials include less than 30 gait cycles (the actual number shown between parentheses): P16_R_83 (20), P16_R_95 (25), P17_R_28 (28), P17_R_83 (24), P17_R_95 (13), P18_R_95 (23), P19_R_95 (18), P20_R_28 (25), P20_R_42 (27), P20_R_95 (25), P22_R_28 (23), P23_R_28(29), P24_R_28 (28), P24_R_42 (29), P25_R_28 (29), P25_R_95 (28), P26_R_28 (29), P26_R_95 (28), P27_R_28 (28), P27_R_42 (29), P27_R_95 (24), P28_R_28 (29), P29_R_95 (17). All the other trials consist of 30 gait cycles. Trials are named like “P20_R_20,” where the characters “P20” indicate the participant number (in this example the 20th), the character “R” indicate the locomotion type (W=walking, R=running), and the numbers “20” indicate the locomotion speed in 10*m/s (in this case the speed is 2.0 m/s). The filtered and time-normalized emg data is named, following the same rules, like “FILT_EMG_P03_R_30”.
Old versions not compatible with the R package musclesyneRgies
The files containing the gait cycle breakdown are available in RData format, in the file named “CYCLE_TIMES.RData”. The files are structured as data frames with as many rows as the available number of gait cycles and two columns. The first column named “touchdown” contains the touchdown incremental times in seconds. The second column named “stance” contains the duration of each stance phase of the right foot in seconds. Each trial is saved as an element of a single R list. Trials are named like “CYCLE_TIMES_P20_R_20,” where the characters “CYCLE_TIMES” indicate that the trial contains the gait cycle breakdown times, the characters “P20” indicate the participant number (in this example the 20th), the character “R” indicate the locomotion type (W=walking, R=running), and the numbers “20” indicate the locomotion speed in 10*m/s (in this case the speed is 2.0 m/s). Please note that the following trials include less than 30 gait cycles (the actual number shown between parentheses): P16_R_83 (20), P16_R_95 (25), P17_R_28 (28), P17_R_83 (24), P17_R_95 (13), P18_R_95 (23), P19_R_95 (18), P20_R_28 (25), P20_R_42 (27), P20_R_95 (25), P22_R_28 (23), P23_R_28(29), P24_R_28 (28), P24_R_42 (29), P25_R_28 (29), P25_R_95 (28), P26_R_28 (29), P26_R_95 (28), P27_R_28 (28), P27_R_42 (29), P27_R_95 (24), P28_R_28 (29), P29_R_95 (17).
The files containing the raw, filtered and the normalized EMG data are available in RData format, in the files named “RAW_EMG.RData” and “FILT_EMG.RData”. The raw EMG files are structured as data frames with as many rows as the amount of recorded data points and 13 columns. The first column named “time” contains the incremental time in seconds. The remaining 12 columns contain the raw EMG data, named with muscle abbreviations that follow those reported above. Each trial is saved as an element of a single R list. Trials are named like “RAW_EMG_P03_R_30”, where the characters “RAW_EMG” indicate that the trial contains raw emg data, the characters “P03” indicate the participant number (in this example the 3rd), the character “R” indicate the locomotion type (see above), and the numbers “30” indicate the locomotion speed (see above). The filtered and time-normalized emg data is named, following the same rules, like “FILT_EMG_P03_R_30”.
The files containing the muscle synergies extracted from the filtered and normalized EMG data are available in RData format, in the files named “SYNS_H.RData” and “SYNS_W.RData”. The muscle synergies files are divided in motor primitives and motor modules and are presented as direct output of the factorisation and not in any functional order. Motor primitives are data frames with 6000 rows and a number of columns equal to the number of synergies (which might differ from trial to trial) plus one. The rows contain the time-dependent coefficients (motor primitives), one column for each synergy plus the time points (columns are named e.g. “time, Syn1, Syn2, Syn3”, where “Syn” is the abbreviation for “synergy”). Each gait cycle contains 200 data points, 100 for the stance and 100 for the swing phase which, multiplied by the 30 recorded cycles, result in 6000 data points distributed in as many rows. This output is transposed as compared to the one discussed in the methods section to improve user readability. Each set of motor primitives is saved as an element of a single R list. Trials are named like “SYNS_H_P12_W_07”, where the characters “SYNS_H” indicate that the trial contains motor primitive data, the characters “P12” indicate the participant number (in this example the 12th), the character “W” indicate the locomotion type (see above), and the numbers “07” indicate the speed (see above). Motor modules are data frames with 12 rows (number of recorded muscles) and a number of columns equal to the number of synergies (which might differ from trial to trial). The rows, named with muscle abbreviations that follow those reported above, contain the time-independent coefficients (motor modules), one for each synergy and for each muscle. Each set of motor modules relative to one synergy is saved as an element of a single R list. Trials are named like “SYNS_W_P22_R_20”, where the characters “SYNS_W” indicate that the trial contains motor module data, the characters “P22” indicate the participant number (in this example the 22nd), the character “W” indicates the locomotion type (see above), and the numbers “20” indicate the speed (see above). Given the nature of the NMF algorithm for the extraction of muscle synergies, the supplementary data set might show non-significant differences as compared to the one used for obtaining the results of this paper.
The files containing the HFD calculated from motor primitives are available in RData format, in the file named “HFD.RData”. HFD results are presented in a list of lists containing, for each trial, 1) the HFD, and 2) the interval time k used for the calculations. HFDs are presented as one number (mean HFD of the primitives for that trial), as are the interval times k. Trials are named like “HFD_P01_R_95”, where the characters “HFD” indicate that the trial contains HFD data, the characters “P01” indicate the participant number (in this example the 1st), the character “R” indicates the locomotion type (see above), and the numbers “95” indicate the speed (see above).
All the code used for the pre-processing of EMG data, the extraction of muscle synergies and the calculation of HFD is available in R format. Explanatory comments are profusely present throughout the script “muscle_synergies.R”.
Many oceanic islands are notable for their high endemism, suggesting that islands may promote unique assembly processes. However, mainland assemblages sometimes harbour comparable levels of endemism, suggesting that island biotas may not be as unique as often assumed. Here, we test the uniqueness of island biotic assembly by comparing the rate of species turnover among islands and the mainland, after accounting for distance decay and environmental gradients. We modeled species turnover as a function of geographic and environmental distance for mainland (M-M) communities of Anolis lizards and Terrarana frogs, two clades that have diversified extensively on Caribbean islands and the mainland Neotropics. We compared mainland-island (M–I) and island-island (I–I) species turnover to predictions of the M–M model. If island assembly is not unique, then the M–M model should successfully predict M–I and I–I turnover, given geographic and environmental distance. We found that M–I turnover and, to...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background
Is the control of movement less stable when we walk or run in challenging settings? Intuitively, one might answer that it is, given that challenging locomotion externally (e.g. rough terrain) or internally (e.g. age-related impairments) makes our movements more unstable. Here, we investigated how young and old humans synergistically activate muscles during locomotion when different perturbation levels are introduced. Of these control signals, called muscle synergies, we analyzed the stability over time and the complexity (or irregularity). Surprisingly, we found that perturbations force the central nervous system to produce muscle activation patterns that are less unstable and less complex. These outcomes show that robust locomotion in challenging settings is achieved by producing less complex control signals which are more stable over time, whereas easier tasks allow for more unstable and irregular control.
How to use the data set
This supplementary data set contains: a) the metadata with anonymized participant information, b) the raw electromyographic (EMG) data acquired during locomotion, c) the touchdown and lift-off timings of the recorded limb, d) the filtered and time-normalized EMG, e) the muscle synergies extracted via non-negative matrix factorization and f) the code written in R (R Found. for Stat. Comp.) to process the data, including the scripts to calculate the short-term Maximum Lyapunov Exponents (sMLE) and Higuchi's fractal dimension (HFD) of motor primitives. In total, 476 trials from 86 participants are included in the supplementary data set.
The file “participant_data.dat” is available in ASCII and RData (R Found. for Stat. Comp.) format and contains:
Code: the participant’s code
Experiment: the experimental setup in which the participant was involved (E1 = walking and running, overground and treadmill; E2 = walking and running, even- and uneven-surface; E3 = unperturbed and perturbed walking, young and old)
Group: the group to which the participant was assigned (see methods for the details)
Sex: the participant’s sex (M or F)
Speed: the speed at which the recordings were conducted in m/s
Age: the participant’s age in years (participants were considered old if older than 65 years, but younger than 80)
Height: the participant’s height in [cm]
Mass: the participant’s body mass in [kg].
The "RAW_DATA.RData" R list consists of elements of S3 class "EMG", each of which is a human locomotion trial containing cycle segmentation timings and raw electromyographic (EMG) data from 13 muscles of the right-side leg. Cycle times are structured as data frames containing two columns that correspond to touchdown (first column) and lift-off (second column). Raw EMG data sets are also structured as data frames with one row for each recorded data point and 14 columns. The first column contains the incremental time in seconds. The remaining 13 columns contain the raw EMG data, named with the following muscle abbreviations: ME = gluteus medius, MA = gluteus maximus, FL = tensor fasciæ latæ, RF = rectus femoris, VM = vastus medialis, VL = vastus lateralis, ST = semitendinosus, BF = biceps femoris, TA = tibialis anterior, PL = peroneus longus, GM = gastrocnemius medialis, GL = gastrocnemius lateralis, SO = soleus. Please note that the running overground trials of participants P0001 and P0009 and the second uneven-surface running trial of participant P0048 consist of 22, 27 and 23 cycles, respectively. All the other trials consist of 30 gait cycles. Trials are named like “P0053_OW_02”, where the characters “P0053” indicate the participant number (in this example the 53rd), the characters “OW” indicate the locomotion type (E1: OW=overground walking, OR=overground running, TW=treadmill walking, TR=treadmill running; E2: EW=even-surface walking, ER=even-surface running, UW=uneven-surface walking, UR=uneven-surface running; E3: NW=normal walking, PW=perturbed walking), and the numbers “02” indicate the trial number (in this case the 2nd). The 10 trials per participant recorded for each overground session (i.e. 10 for walking and 10 for running) were concatenated into one. The filtered and time-normalized EMG data are named, following the same rules, like “FILT_EMG_P0053_OG_02”.
Old versions not compatible with the R package musclesyneRgies
The files containing the gait cycle breakdown are available in RData (R Found. for Stat. Comp.) format, in the file named “CYCLE_TIMES.RData”. The files are structured as data frames with 30 rows (one for each gait cycle) and two columns. The first column contains the touchdown incremental times in seconds. The second column contains the duration of each stance phase in seconds. Each trial is saved as an element of a single R list. Trials are named like “CYCLE_TIMES_P0020,” where the characters “CYCLE_TIMES” indicate that the trial contains the gait cycle breakdown times and the characters “P0020” indicate the participant number (in this example the 20th). Please note that the overground trials of participants P0001 and P0009 and the second uneven-surface running trial of participant P0048 only contain 22, 27 and 23 cycles, respectively.
The files containing the raw, filtered and the normalized EMG data are available in RData (R Found. for Stat. Comp.) format, in the files named “RAW_EMG.RData” and “FILT_EMG.RData”. The raw EMG files are structured as data frames with 30000 rows (one for each recorded data point) and 14 columns. The first column contains the incremental time in seconds. The remaining thirteen columns contain the raw EMG data, named with muscle abbreviations that follow those reported in the methods section. Each trial is saved as an element of a single R list. Trials are named like “RAW_EMG_P0053_OW_02”, where the characters “RAW_EMG” indicate that the trial contains raw emg data, the characters “P0053” indicate the participant number (in this example the 53rd), the characters “OW” indicate the locomotion type (E1: OW=overground walking, OR=overground running, TW=treadmill walking, TR=treadmill running; E2: EW=even-surface walking, ER=even-surface running, UW=uneven-surface walking, UR=uneven-surface running; E3: NW=normal walking, PW=perturbed walking), and the numbers “02” indicate the trial number (in this case the 2nd). The 10 trials per participant recorded for each overground session (i.e. 10 for walking and 10 for running) were concatenated into one. The filtered and time-normalized EMG data is named, following the same rules, like “FILT_EMG_P0053_OG_02”.
The files containing the muscle synergies extracted from the filtered and normalized EMG data are available in RData (R Found. for Stat. Comp.) format, in the files named “SYNS_H.RData” and “SYNS_W.RData”. The muscle synergies files are divided in motor primitives and motor modules and are presented as direct output of the factorization and not in any functional order. Motor primitives are data frames with 6000 rows and a number of columns equal to the number of synergies (which might differ from trial to trial) plus one. The rows contain the time-dependent coefficients (motor primitives), one column for each synergy plus the time points (columns are named e.g. “Time, Syn1, Syn2, Syn3”, where “Syn” is the abbreviation for “synergy”). Each gait cycle contains 200 data points, 100 for the stance and 100 for the swing phase which, multiplied by the 30 recorded cycles, result in 6000 data points distributed in as many rows. This output is transposed as compared to the one discussed above to improve user readability. Each set of motor primitives is saved as an element of a single R list. Trials are named like “SYNS_H_P0012_PW_02”, where the characters “SYNS_H” indicate that the trial contains motor primitive data, the characters “P0012” indicate the participant number (in this example the 12th), ), the characters “PW” indicate the locomotion type (see above), and the numbers “02” indicate the trial number (in this case the 2nd). Motor modules are data frames with 13 rows (number of recorded muscles) and a number of columns equal to the number of synergies (which might differ from trial to trial). The rows, named with muscle abbreviations that follow those reported in the methods section, contain the time-independent coefficients (motor modules), one for each synergy and for each muscle. Each set of motor modules relative to one synergy is saved as an element of a single R list. Trials are named like “SYNS_W_P0082_PW_02”, where the characters “SYNS_W” indicate that the trial contains motor module data, the characters “P0082” indicate the participant number (in this example the 82nd) ), the characters “PW” indicate the locomotion type (see above), and the numbers “02” indicate the trial number (in this case the 2nd). Given the nature of the NMF algorithm for the extraction of muscle synergies, the supplementary data set might show non-significant differences as compared to the one used for obtaining the results of this paper.
The files containing the sMLE calculated from motor primitives are available in RData (R Found. for Stat. Comp.) format, in the file named “sMLE.RData”. sMLE results are presented in a list of lists containing, for each trial, 1) the divergences, 2) the sMLE, and 3) the value of the R2 between the divergence curve and its linear interpolation made using the specified amount of points. The divergences are presented as a one-dimensional vector. sMLE are one number like the R2 value. Trials are named like “MLE_P0081_EW_01”, where the characters “sMLE” indicate that the trial containss sMLE data, the characters “P0081” indicate the participant number (in this example the
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The human body is an outstandingly complex machine including around 1000 muscles and joints acting synergistically. Yet, the coordination of the enormous amount of degrees of freedom needed for movement is mastered by our one brain and spinal cord. The idea that some synergistic neural components of movement exist was already suggested at the beginning of the XX century. Since then, it has been widely accepted that the central nervous system might simplify the production of movement by avoiding the control of each muscle individually. Instead, it might be controlling muscles in common patterns that have been called muscle synergies. Only with the advent of modern computational methods and hardware it has been possible to numerically extract synergies from electromyography (EMG) signals. However, typical experimental setups do not include a big number of individuals, with common sample sizes of five to 20 participants. With this study, we make publicly available a set of EMG activities recorded during treadmill running from the right lower limb of 135 healthy and young adults (78 males, 57 females). Moreover, we include in this open access data set the code used to extract synergies from EMG data using non-negative matrix factorization and the relative outcomes. Muscle synergies, containing the time-invariant muscle weightings (motor modules) and the time-dependent activation coefficients (motor primitives), were extracted from 13 ipsilateral EMG activities using non-negative matrix factorization. Four synergies were enough to describe as many gait cycle phases during running: weight acceptance, propulsion, early swing and late swing. We foresee many possible applications of our data, that we can summarize in three key points. First, it can be a prime source for broadening the representation of human motor control due to the big sample size. Second, it could serve as a benchmark for scientists from multiple disciplines such as musculoskeletal modelling, robotics, clinical neuroscience, sport science, etc. Third, the data set could be used both to train students or to support established scientists in the perfection of current muscle synergies extraction methods.
The "RAW_DATA.RData" R list consists of elements of S3 class "EMG", each of which is a human locomotion trial containing cycle segmentation timings and raw electromyographic (EMG) data from 13 muscles of the right-side leg. Cycle times are structured as data frames containing two columns that correspond to touchdown (first column) and lift-off (second column). Raw EMG data sets are also structured as data frames with one row for each recorded data point and 14 columns. The first column contains the incremental time in seconds. The remaining 13 columns contain the raw EMG data, named with the following muscle abbreviations: ME = gluteus medius, MA = gluteus maximus, FL = tensor fasciæ latæ, RF = rectus femoris, VM = vastus medialis, VL = vastus lateralis, ST = semitendinosus, BF = biceps femoris, TA = tibialis anterior, PL = peroneus longus, GM = gastrocnemius medialis, GL = gastrocnemius lateralis, SO = soleus.
The file "dataset.rar" contains data in older format, not compatible with the R package musclesyneRgies.
https://www.gnu.org/licenses/agpl.txthttps://www.gnu.org/licenses/agpl.txt
# Archived outputs from models of alternative reproductive tactics
This repository contains outputs from models of the evolution of alternative reproductive tactics, with accompanying code available on github (https://github.com/spflanagan/ARTs).
The repository supports a manuscript submitted to Proceedings of the Royal Society B, which is investigating the effects of explicit genetic architecture on evolutionary dynamics of alternative reproductive tactics.
The analysis used two separate programs: a baseline analytical model (written in R) and a simulation-based model (written in C++). Outputs from both of these models are archived here.
## Baseline model
- `morph_results_Ns.RDS`: An R data file containing a data.frame with the results of the baseline analytical model. It contains 11 columns:
- initial_CP = frequency of the courter/parent morph in the initial generation of the model
- initial_CN = frequency of the courter/non-parent morph in the initial generation of the model
- initial_NP = frequency of the non-courter/parent morph in the initial generation of the model
- initial_NN = frequency of the non-courter/non-parent morph in the initial generation of the model
- CP = frequency of the courter/parent morph in the final generation of the model
- CN = frequency of the courter/non-parent morph in the final generation of the model
- NP = frequency of the non-courter/parent morph in the final generation of the model
- NN = frequency of the non-courter/non-parent morph in the final generation of the model
- r = the relative reproductive investment parameter
- c = the sperm competition coefficient
- num_sneak = the number of males allowed to sneak fertilisations within a single clutch
- `morph_results_10000_equalStart.RDS`: An R data file containing a data.frame with the results of the baseline analytical model after it had been run for 10,000 generations. It contains the same 11 columns as `morph_results_Ns.RDS`.
## Simulation model
The simulation model was run with a variety of parameter combinations, which have been summarised in various files and the archived files are provided.
### Raw outputs
These results were generated by running the scripts `scripts/101_model-informed-single-locus.sh`, `101_model-informed-single-locus-tradeoffs.sh`, and `102_model-informed-genetics.sh`. Each parameter combination was run multiple times and generated the same sets of files:
- `*_parameters.txt`: Outputs the parameter settings for that run in a text file with each parameter on its own line
- `*_log.txt`: A text file with the log outputs from the model - will note whether any errors occurred in that run.
- `*_traits.txt`: A tab-delimited file containing the trait data for every individual in generation 0 and generation 12000. The columns are:
- Gen: generation
- Pop: population ID within simulations that started with identical starting conditions
- Individual: A numerical index for the individual whose information is output.
- Sex: whether the individual is MALE or FEMALE
- Courter: A boolean value for whether the individual has the courter trait (1) or the non-courter trait (0). Note that females can carry the courter trait (i.e., have a value of 1 in this column) but do not express the courter trait.
- CourtTrait: The actual trait value for the individual, which is the sum of allelic effects at courter QTLs.
- Parent: A boolean value for whether the individual has the parent trait (1) or the non-parent trait (0). Note that females can carry the parent trait (i.e., have a value of 1 in this column) but do not express the parent trait.
- ParentTrait: The actual trait value for the individual, which is the sum of allelic effects at parent QTLs.
- Preference: A boolean value for whether the individual prefers the courting male (1) or non-courting males (0). For the simulation runs here, all individuals have a preference for courters.
- PrefTrait: The trait value for the preference trait if the trait has a genetic basis. In all iterations of the model shared here, the preference trait was not genetically inherited.
- MateFound: A count of how many mates the individual was able to obtain.
- PotRS: The potential reproductive success of the individual, based on their fecundity (which is set by the parameter settings in the model).
- LifetimeRS: The realised reproductive success of the individual based on the number of matings and the number of offpsring produced.
- Alive: A boolean value tracking whether the individual died or survived to mate.
- `*_summary.txt`: A tab-delimited file summarising the final frequencies of various morphs and other demographic parameters for each generation of the model, and for each population (when populations were initiated with identical starting parameters). It contains the following columns:
- Generation: Generation number (an integer)
- Pop: Numerical population ID. All populations were initiatlised with identical conditions within a single file.
- PopSize: The population size (i.e., number of adults)
- NumMal: Number of adult males in the population
- NumFem: Number of adult females in the population
- NumProgeny: The number of progeny produced
- ParentThresh: The population-level threshold for the parent trait to switch from parent to non-parent (this is the mean allelic effects in Gen 0).
- ParentFreq: Frequency of the parent trait in the population
- ParentAEmean: Mean allelic effects of the Parent QTLs
- ParentAEsd: Standard deviation in allelic effects of the Parent QTLs
- ParentW: Relative fitness of parent males
- NonParentW: Relative fitness of non-parent males
- CourterThresh: The population-level threshold for the courter trait to switch from courter to non-courter (this is the mean allelic effects in Gen 0).
- CourterFreq: Frequency of the courter trait in the population
- CourterAEmean: Mean allelic effects of courter QTLs
- CourterAEsd: Standard deviation of allelic effects of courter QTLs
- CourterW: Relative fitness of courting males
- NonCourterW: Relative fitness of non-courting males
- FreqNcNp: Frequency of non-courting/non-parent (NN) morph
- FreqCNp: Frequency of courting/non-parent (CN) morph
- FreqNcP: Frequency of non-courting/parent (NP) morph
- Freq CP: Frequency of courting/parent (CP) morph
- PrefThresh: The population-level threshold for the preference trait to switch from preferring the courting to non-courting males (not relevant to these simulations)
- PrefFreq: Frequency of the preference for the courting male in th epopulation (not relevant to these simulations)
- NumRandMate: Number of females that randomly mated (i.e., did not find a partner with the preferred trait)
The runs with explicit genetic architectures also have the following files:
- `*_qtlinfo.txt`: A tab-delimited file summarising the location of each type of QTL. Each column is a different QTL and each row is a different population. If they are initialised to be identical, the QTL information will be the same for each population. The format of the QTL location information is a the chromosome number as an integer (starting at 0), followed by a decimal, and the following numbers are the location among the marker loci. So, 0.850 refers to a QTL on chromosome 0 at marker location 850 (out of 1000).
- `*_allelic-effects.txt`: A tab-delimited file containing the allelic effects for each QTL. These are the additive contributions each QTL makes towards the trait, and these mutate if a mutation occurs at the location of the QTL (which is recorded in the corresponding `*_qtlinfo.txt` file).
- `*_markers.txt`: A tab-delimited file summarising the allele frequency at each marker locus.
- `*vcf`: A variant call format file for the population in the final generation of the simulations. See standard formats for this type of file online (e.g., https://samtools.github.io/hts-specs/VCFv4.2.pdf)
- `*Tajima.D`: The vcftools output format containing Tajima's D statistics, which was generated using the vcf file. See the vcftools manual for more details (https://vcftools.github.io/man_latest.html)
- `*LD.geno.ld`: The vcftools output format containing pairwise linkage disequilibrium statistics, which was generated using the vcf file. See the vcftools manual for more details (https://vcftools.github.io/man_latest.html)
- `*_gt.csv`: A comma-separated file summarising the genotype information for each individual at all marker loci. It is similar to vcf file format, with the following columns:
- Marker: marker ID
- Chrom: Chromosome ID or number (starting at 0)
- Position: Locaiton on the chromosome (starting at 0)
- REF: Reference allele
- ALT: Alternative allele
- The remaining columns are each individual's genotype
- `*_pheno.csv`: A comma-separated file summarising the phenotypes for a population with the following columns:
- ID: individual ID
- CourtTrait: Whether the individual is a courter (2) or a non-courter (1)
- ParentTrait: Whether the individual is a parent (2) or a non-parent (1)
- Sex: Whether the individual is a male (MAL) or a female (FEM)
- Morph: The morph of the individual (CP = courter/parent, C = courter, P = non-courter/parent, N = non-courter/non-parent; females can have preferences as well but this is not relevant to these datasets)
These above outputs, run with various parameter settings, are found in the following `tar.gz` files:
- stochasticity.tar.gz
: Contains the
This repository provides access to five pre-computed reconstruction files as well as the static polygons and rotation files used to generate them. This set of palaeogeographic reconstruction files provide palaeocoordinates for three global grids at H3 resolutions 2, 3, and 4, which have an average cell spacing of ~316 km, ~119 km, and ~45 km, respectively. Grids were reconstructed at a temporal resolution of one million years throughout the entire Phanerozoic (540–0 Ma). The reconstruction files are stored as comma-separated-value (CSV) files which can be easily read by almost any spreadsheet program (e.g. Microsoft Excel and Google Sheets) or programming language (e.g. Python, Julia, and R). In addition, R Data Serialization (RDS) files—a common format for saving R objects—are also provided as lighter (and compressed) alternatives to the CSV files. The structure of the reconstruction files follows a wide-form data frame structure to ease indexing. Each file consists of three initial index columns relating to the H3 cell index (i.e. the 'H3 address'), present-day longitude of the cell centroid, and the present-day latitude of the cell centroid. The subsequent columns provide the reconstructed longitudinal and latitudinal coordinate pairs for their respective age of reconstruction in ascending order, indicated by a numerical suffix. Each row contains a unique spatial point on the Earth's continental surface reconstructed through time. NA values within the reconstruction files indicate points which are not defined in deeper time (i.e. either the static polygon does not exist at that time, or it is outside the temporal coverage as defined by the rotation file).
The following five Global Plate Models are provided (abbreviation, temporal coverage, reference) within the GPMs folder:
WR13, 0–550 Ma, (Wright et al., 2013)
MA16, 0–410 Ma, (Matthews et al., 2016)
TC16, 0–540 Ma, (Torsvik and Cocks, 2016)
SC16, 0–1100 Ma, (Scotese, 2016)
ME21, 0–1000 Ma, (Merdith et al., 2021)
In addition, the H3 grids for resolutions 2, 3, and 4 are provided within the grids folder. Finally, we also provide two scripts (python and R) within the code folder which can be used to generate reconstructed coordinates for user data from the reconstruction files.
For access to the code used to generate these files:
https://github.com/LewisAJones/PhanGrids
For more information, please refer to the article describing the data:
Jones, L.A. and Domeier, M.M. 2024. A Phanerozoic gridded dataset for palaeogeographic reconstructions. (2024).
For any additional queries, contact:
Lewis A. Jones (lewisa.jones@outlook.com) or Mathew M. Domeier (mathewd@uio.no)
If you use these files, please cite:
Jones, L.A. and Domeier, M.M. 2024. A Phanerozoic gridded dataset for palaeogeographic reconstructions. DOI: 10.5281/zenodo.10069221
References
Matthews, K. J., Maloney, K. T., Zahirovic, S., Williams, S. E., Seton, M., & Müller, R. D. (2016). Global plate boundary evolution and kinematics since the late Paleozoic. Global and Planetary Change, 146, 226–250. https://doi.org/10.1016/j.gloplacha.2016.10.002.
Merdith, A. S., Williams, S. E., Collins, A. S., Tetley, M. G., Mulder, J. A., Blades, M. L., Young, A., Armistead, S. E., Cannon, J., Zahirovic, S., & Müller, R. D. (2021). Extending full-plate tectonic models into deep time: Linking the Neoproterozoic and the Phanerozoic. Earth-Science Reviews, 214, 103477. https://doi.org/10.1016/j.earscirev.2020.103477.
Scotese, C. R. (2016). Tutorial: PALEOMAP paleoAtlas for GPlates and the paleoData plotter program: PALEOMAP Project, Technical Report.
Torsvik, T. H., & Cocks, L. R. M. (2017). Earth history and palaeogeography. Cambridge University Press. https://doi.org/10.1017/9781316225523.
Wright, N., Zahirovic, S., Müller, R. D., & Seton, M. (2013). Towards community-driven paleogeographic reconstructions: Integrating open-access paleogeographic and paleobiology data with plate tectonics. Biogeosciences, 10, 1529–1541. https://doi.org/10.5194/bg-10-1529-2013.
This R code uses joint time series of concentration and discharge to (1) separate discharge events and store them in a data frame (events_h) and (2) analyse C-Q relationships including hysteresis, derive metrics describing these and store them in a data frame (eve.des). The R code is provided as TXT and as R-file. The code is written by Qing Zhan, Rémi Dupas, Camille Minaudo and Andreas Musolff.
This code is used and further descriped in this paper: A. Musolff, Q. Zhan, R. Dupas, C. Minaudo, J. H. Fleckenstein, M. Rode, J. Dehaspe & K. Rinke (2021) Spatial and Temporal Variability in Concentration-Discharge Relationships at the Event Scale. Water Resourcers Research Volume 57, Issue 10 https://doi.org/10.1029/2020WR029442
The Near-Earth Object Wide-field Infrared Survey Explorer Reactivation Mission (NEOWISE; Mainzer et al. 2014, ApJ, 792, 30) is a NASA Planetary Science Division space-based survey to detect, track and characterize asteroids and comets, and to learn more about the population of near-Earth objects that could pose an impact hazard to the Earth. NEOWISE systematically images the sky at 3.4 and 4.6 μm, obtaining multiple independent observations on each location that enable detection of previously known and new solar system small bodies by virtue of the their motion. Because it is an infrared survey, NEOWISE detects asteroid thermal emission and is equally sensitive to high and low albedo objects.The following table contains brief descriptions of all metadata information that is relevant to the processing of Single-exposure (level 1) images and the extraction of sources from the corresponding Single-exposure images. The table contains the unique scan ID and frame number for specific each single-exposure image and the reconstructed right ascension and declination of the image center. Much of the information in this table is processing-specific, and may not be of interest to general users (e.g. flags indicating whether frames have been processed or not, and the date and time for starting of the pipeline etc). The metadata table also contains some characterization and derived statistics of the Single-exposure image frames, basic parameters used for photometry and derived statistics for extracted sources and artifacts. For example, it contains the number of sources with profile-fit photometry Signal-to-Noise (SNR) greater than 3, and the total number of real sources affected by artifacts such as latent images and electronic ghosts.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
"WeAreHere!" Children's questionnaire. This dataset includes: (1) the WaH children's questionnaire (20 questions including 5-point Likert scale questions, dichotomous questions and an open space for comments). The Catalan version (original), and the Spanish and English versions of the questionnaire can be found in this dataset in pdf format. (2) The data frame in xlsx format, with the children's answers to the questionnaire (a total of 3664 answers) and a reduced version of it for doing the regression (with the 5-point likert scale variable "ask for help" transformed into a dichotomous variable). (3) The data frame in xlsx format, with the children's answers to the questionnaire and the categorization of their comments (sheet 1), the data frame with only the MCA variables selected (sheet 2), and the categories and subcategories table (sheet 3). (4) The data analysis procedure for the regression, the component and multiple component analysis (R script).