Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This publication contains the raw data as well as the evaluation scrips (written in R) of the paper "Formation of Study Groups: Exploring Students' Needs and Practical Challenges".
The evaluation data was collected using the software that is published in https://doi.org/10.5281/zenodo.10678081.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Biological processes exhibit complex temporal dependencies due to the sequential nature of allocation decisions in organisms’ life-cycles, feedback loops, and two-way causality. Consequently, longitudinal data often contain cross-lags: the predictor variable depends on the response variable of the previous time-step. Although statisticians have warned that regression models that ignore such covariate endogeneity in time series are likely to be inappropriate, this has received relatively little attention in biology. Furthermore, the resulting degree of estimation bias remains largely unexplored.
We use a graphical model and numerical simulations to understand why and how regression models that ignore cross-lags can be biased, and how this bias depends on the length and number of time series. Ecological and evolutionary examples are provided to illustrate that cross-lags may be more common than is typically appreciated and that they occur in functionally different ways.
We show that routinely used regression models that ignore cross-lags are asymptotically unbiased. However, this offers little relief, as for most realistically feasible lengths of time series conventional methods are biased. Furthermore, collecting time series on multiple subjects–such as populations, groups or individuals—does not help to overcome this bias when the analysis focusses on within-subject patterns (often the pattern of interest). Simulations (R tutorial 1 & 2), a literature search and a real-world empirical example on fairy wrens (data archived here with analyses presented in R-tutorial 3) together suggest that approaches that ignore cross-lags are likely biased in the direction opposite to the sign of the cross-lag (e.g. towards detecting density-dependence of vital rates and against detecting life history trade-offs and benefits of group living). Next, we show that multivariate (e.g. structural equation) models can dynamically account for cross-lags, and simultaneously address additional bias induced by measurement error, but only if the analysis considers multiple time series.
We provide guidance on how to identify a cross-lag and subsequently specify it in a multivariate model, which can be far from trivial. Our tutorials with data and R code of the worked examples provide step‐by‐step instructions on how to perform such analyses.
Our study offers insights into situations in which cross-lags can bias analysis of ecological and evolutionary time series and suggests that adopting dynamical models can be important, as this directly affects our understanding of population regulation, the evolution of life histories and cooperation, and possibly many other topics. Determining how strong estimation bias due to ignoring covariate endogeneity has been in the ecological literature requires further study, also because it may interact with other sources of bias.
Methods The data was part of a long-term study on red-winged fariy wrens (Malurus elegans) in South-west Australia (Pemberton) from 2008-2016. In each year data was collected on group size, offspring production and survival of all group members. See description in Box 4 in the associated paper, and references therein.
Fo Usa R Group Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pooled within-group correlations (r) between functions and variables.
Data licence Germany – Attribution – Version 2.0https://www.govdata.de/dl-de/by-2-0
License information was derived automatically
Table 1.7.2: R & D staff by gender, sectors and staff groups (full-time equivalent)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the last decade, a plethora of algorithms have been developed for spatial ecology studies. In our case, we use some of these codes for underwater research work in applied ecology analysis of threatened endemic fishes and their natural habitat. For this, we developed codes in Rstudio® script environment to run spatial and statistical analyses for ecological response and spatial distribution models (e.g., Hijmans & Elith, 2017; Den Burg et al., 2020). The employed R packages are as follows: caret (Kuhn et al., 2020), corrplot (Wei & Simko, 2017), devtools (Wickham, 2015), dismo (Hijmans & Elith, 2017), gbm (Freund & Schapire, 1997; Friedman, 2002), ggplot2 (Wickham et al., 2019), lattice (Sarkar, 2008), lattice (Musa & Mansor, 2021), maptools (Hijmans & Elith, 2017), modelmetrics (Hvitfeldt & Silge, 2021), pander (Wickham, 2015), plyr (Wickham & Wickham, 2015), pROC (Robin et al., 2011), raster (Hijmans & Elith, 2017), RColorBrewer (Neuwirth, 2014), Rcpp (Eddelbeuttel & Balamura, 2018), rgdal (Verzani, 2011), sdm (Naimi & Araujo, 2016), sf (e.g., Zainuddin, 2023), sp (Pebesma, 2020) and usethis (Gladstone, 2022).
It is important to follow all the codes in order to obtain results from the ecological response and spatial distribution models. In particular, for the ecological scenario, we selected the Generalized Linear Model (GLM) and for the geographic scenario we selected DOMAIN, also known as Gower's metric (Carpenter et al., 1993). We selected this regression method and this distance similarity metric because of its adequacy and robustness for studies with endemic or threatened species (e.g., Naoki et al., 2006). Next, we explain the statistical parameterization for the codes immersed in the GLM and DOMAIN running:
In the first instance, we generated the background points and extracted the values of the variables (Code2_Extract_values_DWp_SC.R). Barbet-Massin et al. (2012) recommend the use of 10,000 background points when using regression methods (e.g., Generalized Linear Model) or distance-based models (e.g., DOMAIN). However, we considered important some factors such as the extent of the area and the type of study species for the correct selection of the number of points (Pers. Obs.). Then, we extracted the values of predictor variables (e.g., bioclimatic, topographic, demographic, habitat) in function of presence and background points (e.g., Hijmans and Elith, 2017).
Subsequently, we subdivide both the presence and background point groups into 75% training data and 25% test data, each group, following the method of Soberón & Nakamura (2009) and Hijmans & Elith (2017). For a training control, the 10-fold (cross-validation) method is selected, where the response variable presence is assigned as a factor. In case that some other variable would be important for the study species, it should also be assigned as a factor (Kim, 2009).
After that, we ran the code for the GBM method (Gradient Boost Machine; Code3_GBM_Relative_contribution.R and Code4_Relative_contribution.R), where we obtained the relative contribution of the variables used in the model. We parameterized the code with a Gaussian distribution and cross iteration of 5,000 repetitions (e.g., Friedman, 2002; kim, 2009; Hijmans and Elith, 2017). In addition, we considered selecting a validation interval of 4 random training points (Personal test). The obtained plots were the partial dependence blocks, in function of each predictor variable.
Subsequently, the correlation of the variables is run by Pearson's method (Code5_Pearson_Correlation.R) to evaluate multicollinearity between variables (Guisan & Hofer, 2003). It is recommended to consider a bivariate correlation ± 0.70 to discard highly correlated variables (e.g., Awan et al., 2021).
Once the above codes were run, we uploaded the same subgroups (i.e., presence and background groups with 75% training and 25% testing) (Code6_Presence&backgrounds.R) for the GLM method code (Code7_GLM_model.R). Here, we first ran the GLM models per variable to obtain the p-significance value of each variable (alpha ≤ 0.05); we selected the value one (i.e., presence) as the likelihood factor. The generated models are of polynomial degree to obtain linear and quadratic response (e.g., Fielding and Bell, 1997; Allouche et al., 2006). From these results, we ran ecological response curve models, where the resulting plots included the probability of occurrence and values for continuous variables or categories for discrete variables. The points of the presence and background training group are also included.
On the other hand, a global GLM was also run, from which the generalized model is evaluated by means of a 2 x 2 contingency matrix, including both observed and predicted records. A representation of this is shown in Table 1 (adapted from Allouche et al., 2006). In this process we select an arbitrary boundary of 0.5 to obtain better modeling performance and avoid high percentage of bias in type I (omission) or II (commission) errors (e.g., Carpenter et al., 1993; Fielding and Bell, 1997; Allouche et al., 2006; Kim, 2009; Hijmans and Elith, 2017).
Table 1. Example of 2 x 2 contingency matrix for calculating performance metrics for GLM models. A represents true presence records (true positives), B represents false presence records (false positives - error of commission), C represents true background points (true negatives) and D represents false backgrounds (false negatives - errors of omission).
|
Validation set | |
Model |
True |
False |
Presence |
A |
B |
Background |
C |
D |
We then calculated the Overall and True Skill Statistics (TSS) metrics. The first is used to assess the proportion of correctly predicted cases, while the second metric assesses the prevalence of correctly predicted cases (Olden and Jackson, 2002). This metric also gives equal importance to the prevalence of presence prediction as to the random performance correction (Fielding and Bell, 1997; Allouche et al., 2006).
The last code (i.e., Code8_DOMAIN_SuitHab_model.R) is for species distribution modelling using the DOMAIN algorithm (Carpenter et al., 1993). Here, we loaded the variable stack and the presence and background group subdivided into 75% training and 25% test, each. We only included the presence training subset and the predictor variables stack in the calculation of the DOMAIN metric, as well as in the evaluation and validation of the model.
Regarding the model evaluation and estimation, we selected the following estimators:
1) partial ROC, which evaluates the approach between the curves of positive (i.e., correctly predicted presence) and negative (i.e., correctly predicted absence) cases. As farther apart these curves are, the model has a better prediction performance for the correct spatial distribution of the species (Manzanilla-Quiñones, 2020).
2) ROC/AUC curve for model validation, where an optimal performance threshold is estimated to have an expected confidence of 75% to 99% probability (De Long et al., 1988).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘R & D personnel by personnel groups and sectors (full-time equivalent)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/https-www-datenportal-bmbf-de-portal-1-7-1 on 16 January 2022.
--- Dataset description provided by original source is as follows ---
Table 1.7.1: R & D personnel by personnel groups and sectors (full-time equivalent)
--- Original source retains full ownership of the source dataset ---
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
T8 2019 R 9813 age classes - Art.16, c. 1, Legislative Decree no. 33/2013 Update frequency “Never” = the dataset published on the deadline set by the law does not undergo changes, except for errors and corrections
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
K zisťovaniu Dop P 6-01 a Dop 5-04. V železničnej preprave v prílohe Dop P 6-01sa používa číselník CIS0239. Nástupcom od 1.1.2008 je číselník CIS0235.
ChemTax based chl-a of algal groups are reported from four cruises in the Sargasso Sea during 2011 and 2012.
The Geostationary Lightning Mapper Level 2 Lightning Detection product contains a list of lightning flashes, and their constituent groups and events. The definition of and relationship among flashes, groups, and events are governed by the following spatial and temporal characteristics: An event represents the signal detected from the cloud top associated with a lightning emission in an individual sensor pixel for a 2ms integration period; A group represents the events detected in adjacent sensor pixels for the same integration period as an event; A flash represents a series of measurements constrained by temporal and spatial extent thresholds that are associated with one or more groups. The parent, child relationship among specific flashes, groups, and events is stored in the product. Data for each flash includes an energy-weighted centroid latitude, longitude location, time span of occurrence, amount of radiant energy, and coverage area. Data for each group includes an energy-weighted centroid latitude, longitude location, mean time of occurrence, amount of radiant energy, and coverage area. Data for each event includes a latitude, longitude location, time of occurrence, and amount of radiant energy. The product includes data quality information for each flash and group. A Lightning Detection product file contains a set of flashes, and its constituent groups and events for a 20 second period. The units of measure for the flash, group, and event radiant energy values is Joules. The units of measure for the flash and group coverage areas is square meters.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This map provides an estimation of Hydrologic Groups of soils in NSW according to the four class system (A-D)\r \r * A — soils having high infiltration rates, even when thoroughly wetted and consisting chiefly of deep, well to excessively-drained sands or gravels. These soils have a high rate of water transmission and have low water run-off potential.\r \r * B — soils having moderate infiltration rates when thoroughly wetted and consisting chiefly of moderately deep to deep, moderately fine to moderately coarse textures. These soils have a moderate rate of water transmission.\r \r * C — soils having slow infiltration rates when thoroughly wetted and consisting chiefly of soils with a layer that impedes downward movement of water, or soils with moderately fine to fine texture. These soils have a slow rate of water transmission.\r \r * D — soils having very slow infiltration rates when thoroughly wetted and consisting chiefly of clay soils with a high swelling potential, soils with a permanent high water table, soils with a claypan or clay layer at or near the surface, and shallow soils over nearly impervious material. These soils have a very slow rate of water transmission.\r \r The map uses the best available soils mapping coverage and was derived from a lookup table system linking a Hydrologic Group class to a particular soil type using the Great Soil Group (GSG) classification. Each dominant GSG has been assigned a Hydrologic Soil Group.\r \r The classification is based on the United State's Hydrologic Soil Group system published within the National Engineering Handbook (2007).\r \r Online Maps: This dataset can be viewed using eSPADE (NSW’s soil spatial viewer), which contains a suite of soil and landscape information including soil profile data. Many of these datasets have hot-linked soil reports. An alternative viewer is the SEED Map ; an ideal way to see what other natural resources datasets (e.g. vegetation) are available for this map area.\r \r Reference: Department of Planning, Industry and Environment, 2021, Hydrologic Soil Groups of NSW, Version 4.5, NSW Department of Planning, Industry and Environment, Parramatta.
This dataset is designed to accompany the paper submitted to Data Science Journal: O'Brien et al, "Earth Science Data Repositories: Implementing the CARE Principles". This dataset shows examples of activities that data repositories are likely to undertake as they implement the CARE principles. These examples were constructed as part of a discussion about the challenges faced by data repositories when acquiring, curating, and disseminating data and other information about Indigenous Peoples, communities, and lands. For clarity, individual repository activities were very specific. However, in practice, repository activities are not carried out singly, but are more likely to be performed in groups or in sequence. This dataset shows examples of how activities are likely to be combined in response to certain triggers. See related dataset O'Brien, M., R. Duerr, R. Taitingfong, A. Martinez, L. Vera, L. Jennings, R. Downs, E. Antognoli, T. ten Brink, N. Halmai, S.R. Carroll, D. David-Chavez, M. Hudson, and P. Buttigieg. 2024. Alignment between CARE Principles and Data Repository Activities. Environmental Data Initiative. https://doi.org/10.6073/pasta/23e699ad00f74a178031904129e78e93 (Accessed 2024-03-13), and the paper for more information about development of the activities and their categorization, raw data of relationships between specific activities and a discussion of the implementation of CARE Principles by data repositories.
Data in this table are organized into groups delineated by a triggering event in the
first column. For example, the first group consists of 9 rows; while the second group has 7
rows. The first row of each group contains the event that triggers the set of actions
described in the last 4 columns of the spreadsheet. Within each group, the associated rows
in each column are given in numerical not temporal order, since activities will likely vary
widely from repository to repository.
For example, the first group of rows is about what likely needs to happen if a
repository discovers that it holds Indigenous data (O6). Clearly, it will need to develop
processes to identify communities to engage (R6) as well as processes for contacting those
communities (R7) (if it doesn't already have them). It will also probably need to review and
possibly update its data management policies to ensure that they are justifiable (R2). Based
on these actions, it is likely that the repository's outreach group needs to prepare for
working with more communities (O3) including ensuring that the repository's governance
protocols are up-to-date and publicized (O5) and that the repository practices are
transparent (O4). If initial contacts go well, it is likely that the repository will need
ongoing engagement with the community or communities (S1). This may include adding
representation to the repository's advisory board (O2); clarifying data usage with the
communities (O9), facilitating relationships between data providers and communities (O1);
working with the community to identify educational opportunities (O10); and sharing data
with them (O8). It may also become necessary to liaise with whomever is maintaining the
vocabularies in use at the repository (O7).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data include interest group membership in coalitions, contributions to coalitions, and characteristics of interest groups and coalitions. R Script and Output are contained within. Data are in CSV file format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Customs records of are available for CBS MANUFACTURING GROUP FO USA R. GROUP INC. Learn about its Importer, supply capabilities and the countries to which it supplies goods
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
While groups have been central to thinking about partisan identity and choices, there has been surprisingly little attention paid to the role of perceptions of the group composition of the parties. We explore this critical linking information in the context of religious groups, some of the chief pivots around which the parties have been sorting. Using three national samples, we show that perceptions of the religious group composition of the parties are often biased – evangelicals overestimate the presence of evangelicals within the Republican Party and the irreligious within the Democratic Party. The key finding is that individuals are far more likely to identify with the party in which they believe their group is well represented – a finding which clarifies the role of party image shifts in constructing partisanship, the limits of the culture war motif, and the importance of social perception in shaping beliefs about party representation.
Financial overview and grant giving statistics of Triple R Sports Group
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Customs records of are available for R. AND K. GROUP CO LIMITED. Learn about its Importer, supply capabilities and the countries to which it supplies goods
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This publication contains the raw data as well as the evaluation scrips (written in R) of the paper "Formation of Study Groups: Exploring Students' Needs and Practical Challenges".
The evaluation data was collected using the software that is published in https://doi.org/10.5281/zenodo.10678081.