Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
‖compared with a trajectory of individuals who reported persistently low internalizing symptoms (n = 1119).†Multinomial regression models were adjusted on sex and age at baseline.Negative childhood events and potential covariates associated with internalizing symptoms trajectories (French TEMPO study, 1991–2009, n = 1503, age and sex-adjusted ORs, 95% CI).
Functions to perform posterior predictive model fittingA series of functions written in the R language for performing posterior predictive simulation under Brownian motion and Early burst models of trait evolution.Slater_Pennell_Post.Pred_functions.Rcetacean analysesR code to perform posterior predictive and maximum likelihood model fitting to a cetacean body length datasetfitContinuous_modified_cetaceansA modified version of the fitContinuous function from the geiger library that allows a model to be fitted to a comparative dataset wherein the Brownian rate is permitted to vary along one, pre-specified internal edge of the phylogenyCetacean phylogenytime calibrated phylogeny of cetaceans from Slater et al (Proc Roy Soc B 2010) and used for posterior predictive model fittingwhales_final.phycetacean body length datacetacean body length data from Slater et al (Proc Roy Soc B 2010) and used for posterior predictive model fittingCetlengthdata.csvSlater_Pennel_supplementSupplementary results...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Meal and rumination patterns, chewing behaviors (min/8 h), and times of resting, drinking, non-nutritive oral behaviors (NNOB), standing, and lying as influenced by feeding pasteurized waste milk (WM) and a blend of WM with milk replacer powder (WM+MR) or transition milk (WM+TM) as liquid feeds to cold-stressed newborn Holstein calves (n = 17 per treatment).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the last decade, a plethora of algorithms have been developed for spatial ecology studies. In our case, we use some of these codes for underwater research work in applied ecology analysis of threatened endemic fishes and their natural habitat. For this, we developed codes in Rstudio® script environment to run spatial and statistical analyses for ecological response and spatial distribution models (e.g., Hijmans & Elith, 2017; Den Burg et al., 2020). The employed R packages are as follows: caret (Kuhn et al., 2020), corrplot (Wei & Simko, 2017), devtools (Wickham, 2015), dismo (Hijmans & Elith, 2017), gbm (Freund & Schapire, 1997; Friedman, 2002), ggplot2 (Wickham et al., 2019), lattice (Sarkar, 2008), lattice (Musa & Mansor, 2021), maptools (Hijmans & Elith, 2017), modelmetrics (Hvitfeldt & Silge, 2021), pander (Wickham, 2015), plyr (Wickham & Wickham, 2015), pROC (Robin et al., 2011), raster (Hijmans & Elith, 2017), RColorBrewer (Neuwirth, 2014), Rcpp (Eddelbeuttel & Balamura, 2018), rgdal (Verzani, 2011), sdm (Naimi & Araujo, 2016), sf (e.g., Zainuddin, 2023), sp (Pebesma, 2020) and usethis (Gladstone, 2022).
It is important to follow all the codes in order to obtain results from the ecological response and spatial distribution models. In particular, for the ecological scenario, we selected the Generalized Linear Model (GLM) and for the geographic scenario we selected DOMAIN, also known as Gower's metric (Carpenter et al., 1993). We selected this regression method and this distance similarity metric because of its adequacy and robustness for studies with endemic or threatened species (e.g., Naoki et al., 2006). Next, we explain the statistical parameterization for the codes immersed in the GLM and DOMAIN running:
In the first instance, we generated the background points and extracted the values of the variables (Code2_Extract_values_DWp_SC.R). Barbet-Massin et al. (2012) recommend the use of 10,000 background points when using regression methods (e.g., Generalized Linear Model) or distance-based models (e.g., DOMAIN). However, we considered important some factors such as the extent of the area and the type of study species for the correct selection of the number of points (Pers. Obs.). Then, we extracted the values of predictor variables (e.g., bioclimatic, topographic, demographic, habitat) in function of presence and background points (e.g., Hijmans and Elith, 2017).
Subsequently, we subdivide both the presence and background point groups into 75% training data and 25% test data, each group, following the method of Soberón & Nakamura (2009) and Hijmans & Elith (2017). For a training control, the 10-fold (cross-validation) method is selected, where the response variable presence is assigned as a factor. In case that some other variable would be important for the study species, it should also be assigned as a factor (Kim, 2009).
After that, we ran the code for the GBM method (Gradient Boost Machine; Code3_GBM_Relative_contribution.R and Code4_Relative_contribution.R), where we obtained the relative contribution of the variables used in the model. We parameterized the code with a Gaussian distribution and cross iteration of 5,000 repetitions (e.g., Friedman, 2002; kim, 2009; Hijmans and Elith, 2017). In addition, we considered selecting a validation interval of 4 random training points (Personal test). The obtained plots were the partial dependence blocks, in function of each predictor variable.
Subsequently, the correlation of the variables is run by Pearson's method (Code5_Pearson_Correlation.R) to evaluate multicollinearity between variables (Guisan & Hofer, 2003). It is recommended to consider a bivariate correlation ± 0.70 to discard highly correlated variables (e.g., Awan et al., 2021).
Once the above codes were run, we uploaded the same subgroups (i.e., presence and background groups with 75% training and 25% testing) (Code6_Presence&backgrounds.R) for the GLM method code (Code7_GLM_model.R). Here, we first ran the GLM models per variable to obtain the p-significance value of each variable (alpha ≤ 0.05); we selected the value one (i.e., presence) as the likelihood factor. The generated models are of polynomial degree to obtain linear and quadratic response (e.g., Fielding and Bell, 1997; Allouche et al., 2006). From these results, we ran ecological response curve models, where the resulting plots included the probability of occurrence and values for continuous variables or categories for discrete variables. The points of the presence and background training group are also included.
On the other hand, a global GLM was also run, from which the generalized model is evaluated by means of a 2 x 2 contingency matrix, including both observed and predicted records. A representation of this is shown in Table 1 (adapted from Allouche et al., 2006). In this process we select an arbitrary boundary of 0.5 to obtain better modeling performance and avoid high percentage of bias in type I (omission) or II (commission) errors (e.g., Carpenter et al., 1993; Fielding and Bell, 1997; Allouche et al., 2006; Kim, 2009; Hijmans and Elith, 2017).
Table 1. Example of 2 x 2 contingency matrix for calculating performance metrics for GLM models. A represents true presence records (true positives), B represents false presence records (false positives - error of commission), C represents true background points (true negatives) and D represents false backgrounds (false negatives - errors of omission).
|
Validation set | |
Model |
True |
False |
Presence |
A |
B |
Background |
C |
D |
We then calculated the Overall and True Skill Statistics (TSS) metrics. The first is used to assess the proportion of correctly predicted cases, while the second metric assesses the prevalence of correctly predicted cases (Olden and Jackson, 2002). This metric also gives equal importance to the prevalence of presence prediction as to the random performance correction (Fielding and Bell, 1997; Allouche et al., 2006).
The last code (i.e., Code8_DOMAIN_SuitHab_model.R) is for species distribution modelling using the DOMAIN algorithm (Carpenter et al., 1993). Here, we loaded the variable stack and the presence and background group subdivided into 75% training and 25% test, each. We only included the presence training subset and the predictor variables stack in the calculation of the DOMAIN metric, as well as in the evaluation and validation of the model.
Regarding the model evaluation and estimation, we selected the following estimators:
1) partial ROC, which evaluates the approach between the curves of positive (i.e., correctly predicted presence) and negative (i.e., correctly predicted absence) cases. As farther apart these curves are, the model has a better prediction performance for the correct spatial distribution of the species (Manzanilla-Quiñones, 2020).
2) ROC/AUC curve for model validation, where an optimal performance threshold is estimated to have an expected confidence of 75% to 99% probability (De Long et al., 1988).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset includes SAS codes and associated Excel files (.csv and .xlsx) containing data from Nigerian catfish farmers. The .xslx file includes the main variables and the formula for the other derived variables. The SAS code utilizes PROC GLM to produce Type III Sum of Squares, effect size measures (Partial Eta Squared, Semi-Partial Eta Squared, Partial Omega Squared, and Semi-Partial Omega Squared), and the linear regression estimates.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT The aim of the work was to evaluate the adjustment of the logistic and gompertz model with structure of first-order autoregressive errors in the study on the ‘Dwarf green’ coconut fruit growth based on longitudinal and cross-sectional internal cavity diameter data (DLCI and DTCI). Model adjustments showed positive residual autocorrelation, according to the Durbin-Watson test and for both variables, DLCI and DTCI, the residue was modeled according to first-order autoregressive process (AR1). The analysis was performed using the least squares method in the PROC MODEL of the SAS software and results indicated that for both characteristics under study, the logistic model was the most appropriate in describing fruit growth and, according to the model, fully developed ‘Dwarf green’ coconut fruits have longitudinal and cross-sections internal cavity diameter of approximately 7.39 cm and 7.60 cm, respectively.
Library of Wroclaw University of Science and Technology scientific output (DONA database)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We often rely on the likelihood to obtain estimates of regression parameters but it is not readily available for generalized linear mixed models (GLMMs). Inferences for the regression coefficients and the covariance parameters are key in these models. We presented alternative approaches for analyzing binary data from a hierarchical structure that do not rely on any distributional assumptions: a generalized quasi-likelihood (GQL) approach and a generalized method of moments (GMM) approach. These are alternative approaches to the typical maximum-likelihood approximation approach in Statistical Analysis System (SAS) such as Laplace approximation (LAP). We examined and compared the performance of GQL and GMM approaches with multiple random effects to the LAP approach as used in PROC GLIMMIX, SAS. The GQL approach tends to produce unbiased estimates, whereas the LAP approach can lead to highly biased estimates for certain scenarios. The GQL approach produces more accurate estimates on both the regression coefficients and the covariance parameters with smaller standard errors as compared to the GMM approach. We found that both GQL and GMM approaches are less likely to result in non-convergence as opposed to the LAP approach. A simulation study was conducted and a numerical example was presented for illustrative purposes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ingredients, chemical composition (% of DM unless otherwise noted), and particle size distribution of the basal starter feed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study aimed to evaluate the influence of initial weight, initial age, average daily gain in initial weight, average daily gain in total weight and genetic group on the probability of pregnancy in primiparous females of the Nellore, 1/2 Simmental + 1/2 Nellore, and 3/4 Nellore + 1/4 Simmental genetic groups. Data were collected from the livestock file of the Farpal Farm, located in the municipality of Jaíba, Minas Gerais State, Brazil. The pregnancy diagnosis results (success = 1 and failure = 0) were used to determine the probability of pregnancy that was modeled using logistic regression by the Proc Logistic procedure available on SAS (Statistical..., 2004) software, from the regressor variables initial weight, average daily gain in initial weight, average daily gain in total weight, and genetic group. Initial weight (IW) was the most important variable in the probability of pregnancy in heifers, and 1-kg increments in IW allowed for increases of 5.8, 9.8 and 3.4% in the probability of pregnancy in Nellore, 1/2 Simmental + 1/2 Nellore and, 3/4 Nellore + 1/4 Simmental heifers, respectively. The initial age influenced the probability of pregnancy in Nellore heifers. From the estimates of the effects of each variable it was possible to determine the minimum initial weights for each genetic group. This information can be used to monitor the development of heifers until the breeding season and increase the pregnancy rate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abbreviations: Proc, Prochlorococcus; Syn, Synechococcus; Euk, picoeukaryotes; HetProk, heterotrophic prokaryotes.The response variable was log-transformed and the resulting data were converted into Euclidian distance similarities matrices. The Pseudo-F and the P-values were obtained by permutation (n = 999).Results of the multivariate multiple regression analysis with forward selection (DISTLM forward) to explain the variability in viral abundance throughout the water column (total) and in specific depth layers in different geographic regions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Volume overload in peritoneal dialysis patients is a common issue that can lead to poor prognosis. We employed a group trajectory model to categorize volume load trajectories and examined the factors associated with each trajectory class to explore the impact of different trajectory groups on clinical prognosis and residual renal function (RRF). This single-center prospective cohort study included 214 patients on maintenance peritoneal dialysis within a tertiary hospital. The ratio of extracellular water to total body water was measured using Bioimpedance analysis. The SAS 9.4 PROC Traj procedure was used to examine the group-based trajectory of the patients. A multivariate logistic regression model was used to calculate the adjusted odds ratios (aOR) of the associated factors to predict the trajectory class of participants. The average age of the included patients was 53.56 (SD: 11.77) years, with a male proportion of 46.7% and a median follow-up time of 6 months. The normal stable group accounted for 35.05% of the total population and maintained a normal and stable level, the moderate stable group accounted for 52.8% of the total population and showed a slightly higher and stable level, and the high fluctuation group accounted for 12.15% of the total population and showed a high and fluctuating level. A multivariate logistic regression analysis revealed that age, diabetes, and albumin levels are significant factors influencing the categorization of volume load trajectories. There were statistically significant differences in both the technical survival rate and the loss of residual renal function among the three trajectory groups.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Longitudinal data together with recurrent events are commonly encountered in clinical trials. In many applications, these two processes are highly correlated. When there exist a large portion of subjects not experiencing recurrent events of interest, it is possible that some of these subjects are unsusceptible to the events. Therefore, we assume the underlying population is composed of two subpopulations: one subpopulation susceptible to the recurrent events, and the other unsusceptible. In this article, we propose a joint model of longitudinal outcomes and zero-inflated recurrent event data. Our model consists of three submodels: (1) a generalized linear mixed model for the longitudinal process; (2) a proportional intensities model for the recurrent event process in the susceptible subpopulation; and (3) a logistic regression model for the probability such that a subject belongs to the unsusceptible subpopulation. We consider associations (1) between longitudinal outcomes and the zero-inflation rate; and (2) between longitudinal outcomes and the intensity rate of recurrent events in the susceptible subpopulation. Estimation is carried out by maximizing the log-likelihood function using Gaussian quadrature techniques, which can be conveniently implemented in SAS Proc NLMIXED. Simulation studies demonstrate that the proposed method performs well. We apply the method to a clinical trial.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: For each categorical variable, one category was chosen as a reference category (RC, e.g., RC = Social Sciences for the categorical variable discipline). For categorical variables, effect for each predictor variable (a dummy variable representing one of the categories) is a regression coefficient (Coeff) that should be interpreted in relation to its standard error (SE) and the effect of the reference category. Variance components for level 1 are derived from the data, but variance components at level 2 and level 3 indicate the amount of variance that can be explained by differences between studies (level 3) and differences between single reliability coefficients nested within studies (level 2). The loglikelihood test provided by SAS/proc mixed (−2LL) can be used to compare different models, as can also the Bayes Information Criteria (BIC). The smaller the BIC, the better the model is.*p
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
‖compared with a trajectory of individuals who reported persistently low internalizing symptoms (n = 1119).†Multinomial regression models were adjusted on sex and age at baseline.Negative childhood events and potential covariates associated with internalizing symptoms trajectories (French TEMPO study, 1991–2009, n = 1503, age and sex-adjusted ORs, 95% CI).