100+ datasets found
  1. Codes in R for spatial statistics analysis, ecological response models and...

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    D. W. Rössel-Ramírez; D. W. Rössel-Ramírez; J. Palacio-Núñez; J. Palacio-Núñez; S. Espinosa; S. Espinosa; J. F. Martínez-Montoya; J. F. Martínez-Montoya (2025). Codes in R for spatial statistics analysis, ecological response models and spatial distribution models [Dataset]. http://doi.org/10.5281/zenodo.7603557
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    D. W. Rössel-Ramírez; D. W. Rössel-Ramírez; J. Palacio-Núñez; J. Palacio-Núñez; S. Espinosa; S. Espinosa; J. F. Martínez-Montoya; J. F. Martínez-Montoya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the last decade, a plethora of algorithms have been developed for spatial ecology studies. In our case, we use some of these codes for underwater research work in applied ecology analysis of threatened endemic fishes and their natural habitat. For this, we developed codes in Rstudio® script environment to run spatial and statistical analyses for ecological response and spatial distribution models (e.g., Hijmans & Elith, 2017; Den Burg et al., 2020). The employed R packages are as follows: caret (Kuhn et al., 2020), corrplot (Wei & Simko, 2017), devtools (Wickham, 2015), dismo (Hijmans & Elith, 2017), gbm (Freund & Schapire, 1997; Friedman, 2002), ggplot2 (Wickham et al., 2019), lattice (Sarkar, 2008), lattice (Musa & Mansor, 2021), maptools (Hijmans & Elith, 2017), modelmetrics (Hvitfeldt & Silge, 2021), pander (Wickham, 2015), plyr (Wickham & Wickham, 2015), pROC (Robin et al., 2011), raster (Hijmans & Elith, 2017), RColorBrewer (Neuwirth, 2014), Rcpp (Eddelbeuttel & Balamura, 2018), rgdal (Verzani, 2011), sdm (Naimi & Araujo, 2016), sf (e.g., Zainuddin, 2023), sp (Pebesma, 2020) and usethis (Gladstone, 2022).

    It is important to follow all the codes in order to obtain results from the ecological response and spatial distribution models. In particular, for the ecological scenario, we selected the Generalized Linear Model (GLM) and for the geographic scenario we selected DOMAIN, also known as Gower's metric (Carpenter et al., 1993). We selected this regression method and this distance similarity metric because of its adequacy and robustness for studies with endemic or threatened species (e.g., Naoki et al., 2006). Next, we explain the statistical parameterization for the codes immersed in the GLM and DOMAIN running:

    In the first instance, we generated the background points and extracted the values of the variables (Code2_Extract_values_DWp_SC.R). Barbet-Massin et al. (2012) recommend the use of 10,000 background points when using regression methods (e.g., Generalized Linear Model) or distance-based models (e.g., DOMAIN). However, we considered important some factors such as the extent of the area and the type of study species for the correct selection of the number of points (Pers. Obs.). Then, we extracted the values of predictor variables (e.g., bioclimatic, topographic, demographic, habitat) in function of presence and background points (e.g., Hijmans and Elith, 2017).

    Subsequently, we subdivide both the presence and background point groups into 75% training data and 25% test data, each group, following the method of Soberón & Nakamura (2009) and Hijmans & Elith (2017). For a training control, the 10-fold (cross-validation) method is selected, where the response variable presence is assigned as a factor. In case that some other variable would be important for the study species, it should also be assigned as a factor (Kim, 2009).

    After that, we ran the code for the GBM method (Gradient Boost Machine; Code3_GBM_Relative_contribution.R and Code4_Relative_contribution.R), where we obtained the relative contribution of the variables used in the model. We parameterized the code with a Gaussian distribution and cross iteration of 5,000 repetitions (e.g., Friedman, 2002; kim, 2009; Hijmans and Elith, 2017). In addition, we considered selecting a validation interval of 4 random training points (Personal test). The obtained plots were the partial dependence blocks, in function of each predictor variable.

    Subsequently, the correlation of the variables is run by Pearson's method (Code5_Pearson_Correlation.R) to evaluate multicollinearity between variables (Guisan & Hofer, 2003). It is recommended to consider a bivariate correlation ± 0.70 to discard highly correlated variables (e.g., Awan et al., 2021).

    Once the above codes were run, we uploaded the same subgroups (i.e., presence and background groups with 75% training and 25% testing) (Code6_Presence&backgrounds.R) for the GLM method code (Code7_GLM_model.R). Here, we first ran the GLM models per variable to obtain the p-significance value of each variable (alpha ≤ 0.05); we selected the value one (i.e., presence) as the likelihood factor. The generated models are of polynomial degree to obtain linear and quadratic response (e.g., Fielding and Bell, 1997; Allouche et al., 2006). From these results, we ran ecological response curve models, where the resulting plots included the probability of occurrence and values for continuous variables or categories for discrete variables. The points of the presence and background training group are also included.

    On the other hand, a global GLM was also run, from which the generalized model is evaluated by means of a 2 x 2 contingency matrix, including both observed and predicted records. A representation of this is shown in Table 1 (adapted from Allouche et al., 2006). In this process we select an arbitrary boundary of 0.5 to obtain better modeling performance and avoid high percentage of bias in type I (omission) or II (commission) errors (e.g., Carpenter et al., 1993; Fielding and Bell, 1997; Allouche et al., 2006; Kim, 2009; Hijmans and Elith, 2017).

    Table 1. Example of 2 x 2 contingency matrix for calculating performance metrics for GLM models. A represents true presence records (true positives), B represents false presence records (false positives - error of commission), C represents true background points (true negatives) and D represents false backgrounds (false negatives - errors of omission).

    Validation set

    Model

    True

    False

    Presence

    A

    B

    Background

    C

    D

    We then calculated the Overall and True Skill Statistics (TSS) metrics. The first is used to assess the proportion of correctly predicted cases, while the second metric assesses the prevalence of correctly predicted cases (Olden and Jackson, 2002). This metric also gives equal importance to the prevalence of presence prediction as to the random performance correction (Fielding and Bell, 1997; Allouche et al., 2006).

    The last code (i.e., Code8_DOMAIN_SuitHab_model.R) is for species distribution modelling using the DOMAIN algorithm (Carpenter et al., 1993). Here, we loaded the variable stack and the presence and background group subdivided into 75% training and 25% test, each. We only included the presence training subset and the predictor variables stack in the calculation of the DOMAIN metric, as well as in the evaluation and validation of the model.

    Regarding the model evaluation and estimation, we selected the following estimators:

    1) partial ROC, which evaluates the approach between the curves of positive (i.e., correctly predicted presence) and negative (i.e., correctly predicted absence) cases. As farther apart these curves are, the model has a better prediction performance for the correct spatial distribution of the species (Manzanilla-Quiñones, 2020).

    2) ROC/AUC curve for model validation, where an optimal performance threshold is estimated to have an expected confidence of 75% to 99% probability (De Long et al., 1988).

  2. f

    Appendix A. Equivalence of the F test and t test in Example 1.

    • wiley.figshare.com
    html
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul A. Murtaugh (2023). Appendix A. Equivalence of the F test and t test in Example 1. [Dataset]. http://doi.org/10.6084/m9.figshare.3527090.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Wiley
    Authors
    Paul A. Murtaugh
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Equivalence of the F test and t test in Example 1.

  3. Z

    Simulation Data & R scripts for: "Introducing recurrent events analyses to...

    • data.niaid.nih.gov
    Updated Apr 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ferry, Nicolas (2024). Simulation Data & R scripts for: "Introducing recurrent events analyses to assess species interactions based on camera trap data: a comparison with time-to-first-event approaches" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11085005
    Explore at:
    Dataset updated
    Apr 29, 2024
    Dataset authored and provided by
    Ferry, Nicolas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Files descriptions:

    All csv files refer to results from the different models (PAMM, AARs, Linear models, MRPPs) on each iteration of the simulation. One row being one iteration. "results_perfect_detection.csv" refers to the results from the first simulation part with all the observations."results_imperfect_detection.csv" refers to the results from the first simulation part with randomly thinned observations to mimick imperfect detection.

    ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).PAMM30: p-value of the PAMM running on the 30-days survey.PAMM7: p-value of the PAMM running on the 7-days survey.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).

    "results_int_dir_perf_det.csv" refers to the results from the second simulation part, with all the observations."results_int_dir_imperf_det.csv" refers to the results from the second simulation part, with randomly thinned observations to mimick imperfect detection.ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of A on B.p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of B on A.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2_BAB: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.AAR2_ABA: ratio value for the Avoidance-Attraction-Ratio calculating ABA/AA.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).

    Scripts files description:1_Functions: R script containing the functions: - MRPP from Karanth et al. (2017) adapted here for time efficiency. - MRPP from Murphy et al. (2021) adapted here for time efficiency. - Version of the ct_to_recurrent() function from the recurrent package adapted to process parallized on the simulation datasets. - The simulation() function used to simulate two species observations with reciprocal effect on each other.2_Simulations: R script containing the parameters definitions for all iterations (for the two parts of the simulations), the simulation paralellization and the random thinning mimicking imperfect detection.3_Approaches comparison: R script containing the fit of the different models tested on the simulated data.3_1_Real data comparison: R script containing the fit of the different models tested on the real data example from Murphy et al. 2021.4_Graphs: R script containing the code for plotting results from the simulation part and appendices.5_1_Appendix - Check for similarity between codes for Karanth et al 2017 method: R script containing Karanth et al. (2017) and Murphy et al. (2021) codes lines and the adapted version for time-efficiency matter and a comparison to verify similarity of results.5_2_Appendix - Multi-response procedure permutation difference: R script containing R code to test for difference of the MRPPs approaches according to the species on which permutation are done.

  4. SWECO25: Topographic (topo)

    • zenodo.org
    bin, csv, xml, zip
    Updated Feb 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nathan Külling; Nathan Külling; Antoine Adde; Antoine Adde (2024). SWECO25: Topographic (topo) [Dataset]. http://doi.org/10.5281/zenodo.10635539
    Explore at:
    xml, zip, bin, csvAvailable download formats
    Dataset updated
    Feb 8, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nathan Külling; Nathan Külling; Antoine Adde; Antoine Adde
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The topographic category contains the "alti3d" dataset.

    The alti3D dataset (topographic category) describes the topography of Switzerland. After resampling the “SwissALTI3D” source data (swisstopo, 2016) to the SWECO25 grid with 4 resampling schemes (mean, median, minimum, and maximum value), we generated individual layers for four variables (elevation, aspect, hillshade, and slope). For each variable and resampling scheme, we computed 13 focal statistics layers by applying a cell-level function calculating the mean value in a circular moving window of 13 radii ranging from 25m to 5km. This dataset includes a total of 224 layers. Final values were rounded and multiplied by 100.

    The detailed list of layers available is provided in SWECO25_datalayers_details_topo.csv and includes information on the category, dataset, variable name (long), variable name (short), period, sub-period, start year, end year, attribute, radii, unit, and path.

    References:

    Swiss Federal Office of Topography [swisstopo]. The high precision digital elevation model of Switzerland swissALTI3D (2m). (Wabern, Switzerland, 2016).

    Külling, N., Adde, A., Fopp, F., Schweiger, A. K., Broennimann, O., Rey, P.-L., Giuliani, G., Goicolea, T., Petitpierre, B., Zimmermann, N. E., Pellissier, L., Altermatt, F., Lehmann, A., & Guisan, A. (2024). SWECO25: A cross-thematic raster database for ecological research in Switzerland. Scientific Data, 11(1), Article 1. https://doi.org/10.1038/s41597-023-02899-1

    V2: metadata update

  5. f

    Appendix B. A description of how data were simulated for the evaluation of...

    • wiley.figshare.com
    html
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul A. Murtaugh (2023). Appendix B. A description of how data were simulated for the evaluation of the different analysis techniques for Example 2. [Dataset]. http://doi.org/10.6084/m9.figshare.3527087.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Wiley
    Authors
    Paul A. Murtaugh
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A description of how data were simulated for the evaluation of the different analysis techniques for Example 2.

  6. Data and plot scripts for "Rising complexity and falling explanatory power...

    • zenodo.org
    zip
    Updated Jan 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Etienne Low-Decarie; Corey Chivers; Monica Granados; Etienne Low-Decarie; Corey Chivers; Monica Granados (2020). Data and plot scripts for "Rising complexity and falling explanatory power in ecology" [Dataset]. http://doi.org/10.5281/zenodo.11621
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 21, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Etienne Low-Decarie; Corey Chivers; Monica Granados; Etienne Low-Decarie; Corey Chivers; Monica Granados
    Description

    Analyses of published research can provide a realistic perspective on the progress of science. By analyzing more than 18 000 articles published by the preeminent ecological societies, we found that (1) ecological research is becoming increasingly statistically complex, reporting a growing number of P values per article and (2) the value of reported coefficient of determination (R2) has been falling steadily, suggesting a decrease in the marginal explanatory power of ecology. These trends may be due to changes in the way ecology is studied or in the way the findings of investigations are reported. Determining the reason for increasing complexity and declining marginal explanatory power would require a critical review of the scientific process in ecology, from research design to dissemination, and could influence the public interpretation and policy implications of ecological findings.


    Read More: http://www.esajournals.org/doi/abs/10.1890/130230

  7. Data from: Comparing traditional and Bayesian approaches to ecological...

    • osti.gov
    • search.dataone.org
    • +1more
    Updated Jul 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDOE Office of Science (SC), Biological and Environmental Research (BER) (2020). Data from: Comparing traditional and Bayesian approaches to ecological meta-analysis [Dataset]. http://doi.org/10.5061/dryad.zw3r22863
    Explore at:
    Dataset updated
    Jul 14, 2020
    Dataset provided by
    Department of Energy Biological and Environmental Research Program
    Office of Sciencehttp://www.er.doe.gov/
    Northern Arizona Univ., Flagstaff, AZ (United States)
    Description

    Despite the wide application of meta-analysis in ecology, some of the traditional methods used for meta-analysis may not perform well given the type of data characteristic of ecological meta-analyses. We reviewed published meta-analyses on the ecological impacts of global climate change, evaluating the number of replicates used in the primary studies (ni) and the number of studies or records (k) that were aggregated to calculate a mean effect size. We used the results of the review in a simulation experiment to assess the performance of conventional frequentist and Bayesian meta-analysis methods for estimating a mean effect size and its uncertainty interval. Our literature review showed that ni and k were highly variable, distributions were right-skewed, and were generally small (median ni =5, median k=44). Our simulations show that the choice of method for calculating uncertainty intervals was critical for obtaining appropriate coverage (close to the nominal value of 0.95). When k was low (<40), 95% coverage was achieved by a confidence interval based on the t-distribution that uses an adjusted standard error (the Hartung-Knapp-Sidik-Jonkman, HKSJ), or by a Bayesian credible interval, whereas bootstrap or z-distribution confidence intervals had lower coverage. Despite the importance of the method to calculate the uncertainty interval, 39% of the meta-analyses reviewed did not report the method used, and of the 61% that did, 94% used a potentially problematic method, which may be a consequence of software defaults. In general, for a simple random-effects meta-analysis, the performance of the best frequentist and Bayesian methods were similar for the same combinations of factors (k and mean replication), though the Bayesian approaches had higher than nominal (>95%) coverage for the mean effect when k was very low (k<15). Our literature review suggests that many meta-analyses that used z-distribution or bootstrapping confidence intervals may have over-estimated the statistical significance of their results when the number of studies was low; more appropriate methods need to be adopted in ecological meta-analyses.

  8. T

    Data set of ecological adjustment value of Arctic permafrost change on...

    • data.tpdc.ac.cn
    • tpdc.ac.cn
    zip
    Updated Oct 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shijin WANG (2022). Data set of ecological adjustment value of Arctic permafrost change on ecology system from 1982 to 2015 [Dataset]. http://doi.org/10.11888/Cryos.tpdc.272859
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 27, 2022
    Dataset provided by
    TPDC
    Authors
    Shijin WANG
    Area covered
    Description

    The data set of ecological adjustment value of Arctic permafrost change from 1982 to 2015, with the time resolution of 1982, 2015 and the change rate of two phases, covers the entire Arctic tundra area, with the spatial resolution of 8km. Based on multi-source remote sensing, simulation, statistics and measured data, and combined with GIS and ecological methods, it quantifies the adjustment service value of Arctic permafrost to the ecosystem, The unit price refers to the correlation (0.35) between the active layer thickness and NDVI changes after excluding precipitation and snow water equivalent, and the grassland ecosystem service value (the unit price of tundra ecosystem service is based on 1/3 of the grassland ecosystem service value).

  9. d

    The fractured lab notebook: undergraduate and ecological data management...

    • search.dataone.org
    Updated Nov 14, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for Ecological Analysis and Synthesis; Carly Strasser (2013). The fractured lab notebook: undergraduate and ecological data management training in the United States [Dataset]. https://search.dataone.org/view/knb.300.9
    Explore at:
    Dataset updated
    Nov 14, 2013
    Dataset provided by
    Knowledge Network for Biocomplexity
    Authors
    National Center for Ecological Analysis and Synthesis; Carly Strasser
    Time period covered
    Mar 29, 2011 - May 25, 2011
    Area covered
    Variables measured
    Answer, Coding, EndDate, Question, R script, StartDate, First Name, Param name, Description, RespondentID, and 157 more
    Description

    Data presented here are those collected from a survey of Ecology professors at 48 undergraduate institutions to assess the current state of data management education. The following files have been uploaded:

    Scripts(2): 1. DataCleaning_20120105.R is an R script for cleaning up data prior to analysis. This script removes spaces, substitutes text for codes, removed duplicate schools, and converts questions and answers from the survey into more simple parameter names, without any numbers, spaces, or symbols. This script is heavily annotated to assist the user of the file in understanding what is being done to the data files. The script produces the file cleandata_[date].Rdata, which is called in the file DataTrimming_20120105.R 2. DataTrimming_20120105.R is an R script for trimming extraneous variables not used in final analyses. Some variables are combined as needed and NAs (no answers) are removed. The file is heavily annotated. It produces trimdata_[date].Rdata, which was imported into Excel for summary statistics.

    Data files (3) 3. AdvancedSpreadsheet_20110526.csv is the output file from the SurveyMonkey online survey tool used for this project. It is a .csv sheet with the complete set of survey data, although some data (e.g., open-ended responses, institution names) are removed to prevent schools and/or instructors from being identifiable. This file is read into DataCleaning_20120105.R for cleaning and editing. 4. VariableRenaming_20110711.csv is called into the DataCleaning_20120105.R script to convert the questions and answers from the survey into simple parameter names, without any numbers, spaces, or symbols. 5. ParamTable.csv is a list of the parameter names used for analysis and the value codes. It can be used to understand outputs from the scripts above (cleandata_[date].Rdata and trimdata_[date].Rdata).

  10. d

    Data for: Functional trait variability supports the use of mean trait values...

    • search.dataone.org
    • datadryad.org
    Updated Apr 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Ryznar; Lauren Smith; Benjamin HÃ; Shalanda Grier; Peggy Fong (2025). Data for: Functional trait variability supports the use of mean trait values and identifies tradeoffs for marine macroalgae [Dataset]. http://doi.org/10.5068/D1H98V
    Explore at:
    Dataset updated
    Apr 7, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Emily Ryznar; Lauren Smith; Benjamin HÃ; Shalanda Grier; Peggy Fong
    Time period covered
    Jan 1, 2023
    Description

    Trait-based ecology (TBE) has proven useful in the terrestrial realm and beyond for collapsing ecological complexity into traits that can be compared and generalized across species and scales. However, TBE for marine macroalgae is still in its infancy, motivating research to build the foundation of macroalgal TBE by leveraging lessons learned from other systems. Our objectives were to evaluate the utility of mean trait values (MTVs) across species, to explore the potential for intraspecific trait variability, and to identify macroalgal ecological strategies by clustering species with similar traits and testing for bivariate relationships between traits. To accomplish this, we measured thallus toughness, a trait associated with resistance to herbivory, and tensile strength, a trait associated with resistance to physical disturbance, in eight tropical macroalgal species across up to seven sites where they were found around Moorea, French Polynesia. We found interspecific trait variation g...

  11. d

    Data from: Restoration and replication: a case study on the value of...

    • search.dataone.org
    • data.niaid.nih.gov
    • +3more
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tristan Campbell; Kingsley Dixon; Rebecca Handcock (2023). Restoration and replication: a case study on the value of computational reproducibility assessment [Dataset]. http://doi.org/10.5061/dryad.m905qfv6m
    Explore at:
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Tristan Campbell; Kingsley Dixon; Rebecca Handcock
    Time period covered
    Jan 1, 2023
    Description

    Open science is vital to the interdisciplinary field of ecology due to its integrative nature and use of longitudinal datasets that build upon earlier data collections. To highlight the importance of open science in the rapidly growing discipline of restoration ecology, we conducted a 'computational reproducibility’ assessment of a publication on a mining restoration program spanning several decades and over 250 km2 in a global biodiversity hotspot. Open data and code provided alongside the original publication were assessed for consistency with the results and conclusions of the original publication, as were potential limitations in findings due to the methodology. The impacts of inconsistencies and limitations were qualitatively assessed against the key findings from the publication and data were re-analysed where impacts were potentially significant. Of the six inconsistencies and limitations identified, two had a significant impact on five of the 11 key findings of the original publ...

  12. D

    High Ecological Value Waterways and Water Dependent Ecosystems - COASTAL...

    • data.nsw.gov.au
    pdf, txt
    Updated Feb 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NSW Department of Climate Change, Energy, the Environment and Water (2024). High Ecological Value Waterways and Water Dependent Ecosystems - COASTAL CATCHMENTS of NSW [Dataset]. https://data.nsw.gov.au/data/dataset/coastal-catchments-nsw
    Explore at:
    txt, pdfAvailable download formats
    Dataset updated
    Feb 26, 2024
    Dataset provided by
    Department of Climate Change, Energy, the Environment and Water of New South Waleshttps://www.nsw.gov.au/departments-and-agencies/dcceew
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New South Wales
    Description

    This dataset identifies high ecological value waterways and water dependent ecosystems for the 184 coastal catchments in NSW. The purpose of the dataset is to identify strategic priorities for protecting and improving the health of high value waterways and water dependent ecosystems in the catchment.

    The dataset map shows areas where waterways and water dependent ecosystems are defined as high ecological value, based on the definitions, guidelines and policies under the Environment Protection and Biodiversity Conservation Act 1999, Biodiversity Conservation Act 2016, Fisheries Management Act 1994 and/or Water Management Act 2000.

    The water dependent ecosystems consist of wetlands, and flora and fauna that rely on water sources (including groundwater). The dataset does not include vegetation dependent on surface waters. The dataset integrates up to 28 data layers/indicators being used by the State Government to define high value. The individual indicators have not been ground-truthed and it is recommended that field assessments and/or a comparison to local mapping be undertaken prior to any decisions being made.

    The dataset was created by initially placing a 1-hectare hexagon grid over the relevant boundary area and attributing the grid with either ‘Absent’ or ‘Present’. This represents an occurrence only of high value water dependent ecosystems layers within the hexagon grid. The ‘HEVlyr’ field provides a count of the number of layers within the hexagon. A value of zero (0) represents Absent, values of greater and equal to one (1) represents the Presence of high ecological value assets. The values DONOT indicate the significance of the area, i.e., a value of 1 is no less significant to a higher value. Where layers are Present, please contact data steward for details relating to the underpinning datasets.

  13. U

    Invertebrate Datasets for Evaluation and Review of Ecology-Focused Stream...

    • data.usgs.gov
    • s.cnmilf.com
    • +2more
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charles Wahl; Robert Zuellig; Erin Hennessy (2024). Invertebrate Datasets for Evaluation and Review of Ecology-Focused Stream Studies, Fountain Creek Basin, Colorado [Dataset]. http://doi.org/10.5066/P91QQ6GT
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Charles Wahl; Robert Zuellig; Erin Hennessy
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    1985 - 2022
    Area covered
    Colorado, Fountain Creek
    Description

    These data from Evaluation and Review of Ecology-Focused Stream Studies to Support Cooperative Monitoring, Fountain Creek Basin, Colorado were used to describe temporal trends in invertebrate communities in the basin. Invertebrate data were collected at U.S. Geological Survey (USGS) sites between 1985 and 2022. Datasets include invertebrate frequency of occurrence, invertebrate tolerance index values, invertebrate multi-metric index, New Zealand mudsnial counts, and list of invertebrate species collected.

  14. s

    Data from: An ecosystem values framework to support decision makers in the...

    • pacific-data.sprep.org
    • png-data.sprep.org
    pdf
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PNG Conservation and Environment Protection Authority (2025). An ecosystem values framework to support decision makers in the Coral Triangle [Dataset]. https://pacific-data.sprep.org/dataset/ecosystem-values-framework-support-decision-makers-coral-triangle
    Explore at:
    pdf(1611954)Available download formats
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    PNG Conservation and Environment Protection Authority
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Area covered
    -215.96374511719 -0.97773574620118, -215.5078125 -0.86788673108853, -208.43811035156 -8.3854310155677, -204.39514160156 -6.953691427897, -218.57299804688 -2.5150610531888, -210.12451171875 -1.33471813277, -218.44116210938 -9.430204575473)), -212.70080566406 -1.7465556193133, -206.16943359375 -11.824341483849, -209.42138671875 -6.4626927851653, Papua New Guinea
    Description

    A Final Report for Department of the Environment and Energy (October 2017)

  15. Z

    SWECO25: Edaphic (edaph)

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antoine Adde (2024). SWECO25: Edaphic (edaph) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7981142
    Explore at:
    Dataset updated
    Feb 12, 2024
    Dataset provided by
    Nathan Külling
    Antoine Adde
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The edaphic category contains the "eiv" and "modiffus" datasets.

    The eiv dataset includes variables representing local soil properties and climate conditions. After resampling the source data (Descombes et al., 2020) to the SWECO25 grid, we generated individual layers for the 8 available variables (soil pH, nutrients, moisture, moisture variability, aeration, humus, climate continentality, and light). For each variable, we provided a layer with the raw values and 13 focal statistics layers by applying a cell-level function calculating the mean value in a circular moving window of 13 radii ranging from 25m to 5km. This dataset includes a total of 112 layers. Final values were rounded and multiplied by 100.

    The modiffus dataset describes the nitrogen (n) and phosphorus (p) loads in Swiss soils. After resampling the source data (Hürdler et al., 2015) to the SWECO25 grid for these two variables, we provided the output maps and computed 13 focal statistics layers by applying a cell-level function calculating the average value in a circular moving window of 13 radii ranging from 25m to 5km. This dataset includes a total of 28 layers. Final values were rounded and multiplied by 100.

    The detailed list of layers available is provided in SWECO25_datalayers_details_edaph.csv and includes information on the category, dataset, variable name (long), variable name (short), period, sub-period, start year, end year, attribute, radii, unit, and path.

    References:

    Descombes, P. et al. Spatial modelling of ecological indicator values improves predictions of plant distributions in complex landscapes. Ecography 43, 1448-1463 (2020).

    Hürdler J., P. V., Spiess E. Abschätzung diffuser Stickstoff- und Phosphoreinträge in die Gewässer der Schweiz MODIFFUS 3.0: Bericht im Auftrag des Bundesamtes für Umwelt (BAFU). (Zürich, Switzerland, 2015).

    Külling, N., Adde, A., Fopp, F., Schweiger, A. K., Broennimann, O., Rey, P.-L., Giuliani, G., Goicolea, T., Petitpierre, B., Zimmermann, N. E., Pellissier, L., Altermatt, F., Lehmann, A., & Guisan, A. (2024). SWECO25: A cross-thematic raster database for ecological research in Switzerland. Scientific Data, 11(1), Article 1. https://doi.org/10.1038/s41597-023-02899-1

    V2: metadata update

  16. d

    Innovating the Data Ecosystem: An Update of the Federal Big Data Research...

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated May 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NCO NITRD (2025). Innovating the Data Ecosystem: An Update of the Federal Big Data Research and Development Strategic Plan [Dataset]. https://catalog.data.gov/dataset/innovating-the-data-ecosystem-an-update-of-the-federal-big-data-research-and-development-s
    Explore at:
    Dataset updated
    May 14, 2025
    Dataset provided by
    NCO NITRD
    Description

    This document, Innovating the Data Ecosystem: An Update of The Federal Big Data Research and Development Strategic Plan, updates the 2016 Federal Big Data Research and Development Strategic Plan. This plan updates the vision and strategies on the research and development needs for big data laid out in the 2016 Strategic Plan through the six strategies areas (enhance the reusability and integrity of data; enable innovative, user-driven data science; develop and enhance the robustness of the federated ecosystem; prioritize privacy, ethics, and security; develop necessary expertise and diverse talent; and enhance U.S. leadership in the international context) to enhance data value and reusability and responsiveness to federal policies on data sharing and management.

  17. Data from: Centennial recovery of recent human-disturbed forests

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asun Rodríguez-Uña; Verónica Cruz-Alonso; José A. López-López; David Moreno-Mateos (2024). Centennial recovery of recent human-disturbed forests [Dataset]. http://doi.org/10.5061/dryad.rv15dv4h8
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 30, 2024
    Dataset provided by
    University of Oxford
    Universidad de Murcia
    University of Cambridge
    Universidad Complutense de Madrid
    Authors
    Asun Rodríguez-Uña; Verónica Cruz-Alonso; José A. López-López; David Moreno-Mateos
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    International commitments are challenging countries to restore their degraded lands, particularly forests. These commitments require global assessments of recovery timescales and trajectories of different forest attributes to inform restoration strategies. We use a meta-chronosequence approach including 125 forest chronosequences to reconstruct the past (c. 300 years), and model future recovery trajectories of forests recovering from agriculture and logging impacts. We found recovering forests significantly differed from undisturbed ones after 150 years and projected that difference to remain for up to 218 or 494 years for ecosystem attributes like nitrogen stocks or species similarity, respectively. These conservative estimates, however, do not capture the complexity of forest ecosystems. A centennial recovery of forests requires strategic, unprecedented planning to deliver a restored world. Methods Database construction We collected data from 16,873 plots from 125 chronosequences of recovering forest ecosystems in 110 published primary studies. From these chronosequences, we extracted 641 recovery trajectories of quantitative measures of ecosystem attributes along time, related to six recovery metrics (organism abundance, species diversity, species similarity, carbon cycling, nitrogen stock, and phosphorus stock), two restoration strategies (passive and active), three disturbance types [agriculture (including abandoned croplands and pastures), logging and mining], and a climatic metric (i.e., aridity index). From the selected chronosequences, we extracted 641 recovery trajectories, i.e., field-based quantitative measurements of ecosystem integrity repeated through time, reported in tables, figures, and text of the selected studies. Each trajectory included at least two data points, defined as the value of the ecosystem metric at different times since recovery started (hereafter, recovery time). Average values were considered for the data points with the same recovery time (n = 72, in 21 studies). We used response ratios (RRs) to estimate the recovery completeness, i.e., the effect sizes between reference and recovering systems. We computed the RR for each data point along the trajectory as ln (Xres/Xref), where Xres is the value of the ecosystem metric at a certain recovery time and Xref is the reference value of the same metric in the reference forest. Effect sizes of the meta-analysis were weighted by study precision, which was estimated as the product of the number of subplots and their area, assuming that a higher sampling effort would imply a higher precision. For abundance, diversity and similarity, we fitted fixed-effects models, with weights only accounting for within-study variability; whereas for biogeochemical functions, we assumed random-effect meta-analytic models, accounting for both between- and within-study variability. Statistical analysis To estimate the trajectory of forest recovery over time, we fitted a separate linear mixed model (LMM) for the RR of each recovery metric. We included the recovery time as a fixed factor and as a random slope, and the trajectory identity as a random intercept, enabling a different slope and intercept for each trajectory. As the recovery process along time may result in a wide range of trajectories from linear to more saturating shapes, we consider three functions to include the recovery time variable: one linear and two decelerating trends [ln(recovery time + 1) and √recovery time]. We then selected among the three options the one that best fit the data of each recovery metric according to the minimum AICc. The models for the recovery of similarity were fitted using the Morisita-Horn index, as the Pearson correlation test informed that it was correlated to Jaccard and Bray-Curtis indices. Their absolute values were square root transformed to meet the assumptions of general linear models and then multiplied by -1 to facilitate interpretation. Using the resulting LMMs, we predicted the RR after 73, 146, and 219 years of recovery [i.e., one, two, and three times the global life expectancy in 2019]. We then predicted the time needed for forest ecosystems to recover to 90% of reference values for each trajectory and recovery metric and calculated the median by metric. Also using the resulting LMMs, we predicted the RR after 50 and 100 years of recovery for each metric and trajectory (1) to know if the recovery completeness is dependent on the metric and (2) to understand the main explanatory variables underlying the recovery process for each metric. We fitted linear models (LM) to analyse the difference in the RR after 50 years and after 100 years of recovery among recovery metrics. The models had the recovery metric and the intercepts of the LMMs for each trajectory as fixed factors. The latter was included to account for the effect of the initial state of degradation when recovery started. We then fitted a separate LM for the effect of each explanatory variable studied (i.e., aridity, disturbance category, restoration strategy or life form) on the RR predictions after 50 and 100 years of all recovery metrics together, and then for each recovery metric individually. In all the cases, the intercept of the LMMs for each trajectory was also included as a fixed factor to account for the effect of the initial state of degradation when recovery started. For the models fitted for the disturbance category and the life form, we excluded the categories with <1% of the values (i.e., “mining” for disturbance and “bird” for life form) or those including data with mixing information from other categories (i.e., “agriculture and logging” for disturbance and “woody and non-woody” for life form).

  18. s

    Data from: Benefits and costs of ecological restoration: rapid assessment of...

    • eprints.soton.ac.uk
    • data.niaid.nih.gov
    • +2more
    Updated May 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hughes, Francine M. R.; Peh, Kelvin S. H.; Balmford, Andrew; Field, Rob H.; Lamb, Anthony; Birch, Jennifer C.; Bradbury, Richard B.; Brown, Claire; Butchart, Stuart H. M.; Lester, Martin; Morrison, Ross; Sedgwick, Isabel; Soans, Chris; Stattersfield, Alison J.; Stroh, Peter A.; Swetnam, Ruth D.; Thomas, David H. L.; Walpole, Matt; Warrington, Stuart; Peh, Kelvin S. H. (2023). Data from: Benefits and costs of ecological restoration: rapid assessment of changing ecosystem service values at a UK wetland [Dataset]. http://doi.org/10.5061/dryad.669h5
    Explore at:
    Dataset updated
    May 6, 2023
    Dataset provided by
    DRYAD
    Authors
    Hughes, Francine M. R.; Peh, Kelvin S. H.; Balmford, Andrew; Field, Rob H.; Lamb, Anthony; Birch, Jennifer C.; Bradbury, Richard B.; Brown, Claire; Butchart, Stuart H. M.; Lester, Martin; Morrison, Ross; Sedgwick, Isabel; Soans, Chris; Stattersfield, Alison J.; Stroh, Peter A.; Swetnam, Ruth D.; Thomas, David H. L.; Walpole, Matt; Warrington, Stuart; Peh, Kelvin S. H.
    Area covered
    United Kingdom
    Description

    Data used to value nature-based recreationThis is a short data file containing data collected in questionnaire surveys in 2011 of visitors to a wetland restoration project at Wicken Fen. It is divided into short sections that list different categories of data from the questionnaire. It also lists data received from the National Trust about visitor numbers in 2010.Ecology and Evolution data for ECE31248.pdf,Restoration of degraded land is recognized by the international community as an important way of enhancing both biodiversity and ecosystem services, but more information is needed about its costs and benefits. In Cambridgeshire, U.K., a long-term initiative to convert drained, intensively farmed arable land to a wetland habitat mosaic is driven by a desire both to prevent biodiversity loss from the nationally important Wicken Fen National Nature Reserve (Wicken Fen NNR) and to increase the provision of ecosystem services. We evaluated the changes in ecosystem service delivery resulting from this land conversion, using a new Toolkit for Ecosystem Service Site-based Assessment (TESSA) to estimate biophysical and monetary values of ecosystem services provided by the restored wetland mosaic compared with the former arable land. Overall results suggest that restoration is associated with a net gain to society as a whole of 199ha−1y−1,foraone−offinvestmentinrestorationof199ha−1y−1,foraone−offinvestmentinrestorationof2320 ha−1. Restoration has led to an estimated loss of arable production of 2040ha−1y−1,butestimatedgainsof2040ha−1y−1,butestimatedgainsof671 ha−1y−1 in nature-based recreation, 120ha−1y−1fromgrazing,120ha−1y−1fromgrazing,48 ha−1y−1 from flood protection, and a reduction in greenhouse gas (GHG) emissions worth an estimated 72ha−1y−1.Managementcostshavealsodeclinedbyanestimated72ha−1y−1.Managementcostshavealsodeclinedbyanestimated1325 ha−1y−1. Despite uncertainties associated with all measured values and the conservative assumptions used, we conclude that there was a substantial gain to society as a whole from this land-use conversion. The beneficiaries also changed from local arable farmers under arable production to graziers, countryside users from towns and villages, and the global community, under restoration. We emphasize that the values reported here are not necessarily transferable to other sites.

  19. d

    Data from: Habitat selection and the value of information in heterogenous...

    • search.dataone.org
    • datadryad.org
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kenneth A. Schmidt; Francois Massol (2025). Habitat selection and the value of information in heterogenous landscapes [Dataset]. http://doi.org/10.5061/dryad.1d73qk4
    Explore at:
    Dataset updated
    Apr 2, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Kenneth A. Schmidt; Francois Massol
    Time period covered
    Jan 1, 2018
    Description

    Despite the wide usage of the term information in evolutionary ecology, there is no general treatise between fitness (i.e., density-dependent population growth) and selection of the environment sensu lato. Here we (1) initiate the building of a quantitative framework with which to examine the relationship between information use in spatially heterogeneous landscapes and density-dependent population growth, and (2) illustrate its utility by applying the framework to an existing model of breeding habitat selection. We begin by linking information, as a process of narrowing choice, to population growth/fitness. Second, we define a measure of a population’s penalty of ignorance based on the Kullback-Leibler index that combines the contributions of resource selection (i.e., biased use of breeding sites) and density-dependent depletion. Third, we quantify the extent to which environmental heterogeneity (i.e., mean and variance within a landscape) constrains sustainable population growth of un...

  20. Swan Canning Riparian Ecology Foreshores 2019 (DBCA-066)

    • data.gov.au
    docx +9
    Updated Jan 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Biodiversity, Conservation and Attractions (2022). Swan Canning Riparian Ecology Foreshores 2019 (DBCA-066) [Dataset]. https://data.gov.au/dataset/ds-wa-245fd77d-0f88-4db0-8f76-834aa6444b84
    Explore at:
    wms, wfs, shp, fgdb, geojson, esri featureserver, docx, zip, geopackage, esri mapserverAvailable download formats
    Dataset updated
    Jan 28, 2022
    Dataset provided by
    Department of Biodiversity, Conservation and Attractionshttps://www.dbca.wa.gov.au/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Foreshores
    Description

    This data set was derived from the Riverbank dataset, which aims to identify those foreshore areas most in need of works. Riverbank dataset assesses Built Foreshores, Natural Foreshores, Riparian …Show full descriptionThis data set was derived from the Riverbank dataset, which aims to identify those foreshore areas most in need of works. Riverbank dataset assesses Built Foreshores, Natural Foreshores, Riparian Ecological Value and Useability of Foreshores to identify priorities. This data set represents the most up to date assessment of Riparian Ecology of foreshores. Riparian Ecology value of foreshores was identified and analysed by GIS in 2016. Data valuation hierarchy’s, based on the Perth Biodiversity Project, were finalised in 2019 The data provides a Swan Canning Riverpark wide assessment of the riparian ecological values of foreshores using conservation estate status, proximity to high value flora and fauna (Class 1, 2 ad 3) Flora and Fauna, TEC and or Bush forever sites, Heddle Vegetation Complex rarity on the swan coastal plain, vegetation condition and vegetation variability. Feedback on accuracy and recency of data is welcome.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
D. W. Rössel-Ramírez; D. W. Rössel-Ramírez; J. Palacio-Núñez; J. Palacio-Núñez; S. Espinosa; S. Espinosa; J. F. Martínez-Montoya; J. F. Martínez-Montoya (2025). Codes in R for spatial statistics analysis, ecological response models and spatial distribution models [Dataset]. http://doi.org/10.5281/zenodo.7603557
Organization logo

Codes in R for spatial statistics analysis, ecological response models and spatial distribution models

Explore at:
binAvailable download formats
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
D. W. Rössel-Ramírez; D. W. Rössel-Ramírez; J. Palacio-Núñez; J. Palacio-Núñez; S. Espinosa; S. Espinosa; J. F. Martínez-Montoya; J. F. Martínez-Montoya
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In the last decade, a plethora of algorithms have been developed for spatial ecology studies. In our case, we use some of these codes for underwater research work in applied ecology analysis of threatened endemic fishes and their natural habitat. For this, we developed codes in Rstudio® script environment to run spatial and statistical analyses for ecological response and spatial distribution models (e.g., Hijmans & Elith, 2017; Den Burg et al., 2020). The employed R packages are as follows: caret (Kuhn et al., 2020), corrplot (Wei & Simko, 2017), devtools (Wickham, 2015), dismo (Hijmans & Elith, 2017), gbm (Freund & Schapire, 1997; Friedman, 2002), ggplot2 (Wickham et al., 2019), lattice (Sarkar, 2008), lattice (Musa & Mansor, 2021), maptools (Hijmans & Elith, 2017), modelmetrics (Hvitfeldt & Silge, 2021), pander (Wickham, 2015), plyr (Wickham & Wickham, 2015), pROC (Robin et al., 2011), raster (Hijmans & Elith, 2017), RColorBrewer (Neuwirth, 2014), Rcpp (Eddelbeuttel & Balamura, 2018), rgdal (Verzani, 2011), sdm (Naimi & Araujo, 2016), sf (e.g., Zainuddin, 2023), sp (Pebesma, 2020) and usethis (Gladstone, 2022).

It is important to follow all the codes in order to obtain results from the ecological response and spatial distribution models. In particular, for the ecological scenario, we selected the Generalized Linear Model (GLM) and for the geographic scenario we selected DOMAIN, also known as Gower's metric (Carpenter et al., 1993). We selected this regression method and this distance similarity metric because of its adequacy and robustness for studies with endemic or threatened species (e.g., Naoki et al., 2006). Next, we explain the statistical parameterization for the codes immersed in the GLM and DOMAIN running:

In the first instance, we generated the background points and extracted the values of the variables (Code2_Extract_values_DWp_SC.R). Barbet-Massin et al. (2012) recommend the use of 10,000 background points when using regression methods (e.g., Generalized Linear Model) or distance-based models (e.g., DOMAIN). However, we considered important some factors such as the extent of the area and the type of study species for the correct selection of the number of points (Pers. Obs.). Then, we extracted the values of predictor variables (e.g., bioclimatic, topographic, demographic, habitat) in function of presence and background points (e.g., Hijmans and Elith, 2017).

Subsequently, we subdivide both the presence and background point groups into 75% training data and 25% test data, each group, following the method of Soberón & Nakamura (2009) and Hijmans & Elith (2017). For a training control, the 10-fold (cross-validation) method is selected, where the response variable presence is assigned as a factor. In case that some other variable would be important for the study species, it should also be assigned as a factor (Kim, 2009).

After that, we ran the code for the GBM method (Gradient Boost Machine; Code3_GBM_Relative_contribution.R and Code4_Relative_contribution.R), where we obtained the relative contribution of the variables used in the model. We parameterized the code with a Gaussian distribution and cross iteration of 5,000 repetitions (e.g., Friedman, 2002; kim, 2009; Hijmans and Elith, 2017). In addition, we considered selecting a validation interval of 4 random training points (Personal test). The obtained plots were the partial dependence blocks, in function of each predictor variable.

Subsequently, the correlation of the variables is run by Pearson's method (Code5_Pearson_Correlation.R) to evaluate multicollinearity between variables (Guisan & Hofer, 2003). It is recommended to consider a bivariate correlation ± 0.70 to discard highly correlated variables (e.g., Awan et al., 2021).

Once the above codes were run, we uploaded the same subgroups (i.e., presence and background groups with 75% training and 25% testing) (Code6_Presence&backgrounds.R) for the GLM method code (Code7_GLM_model.R). Here, we first ran the GLM models per variable to obtain the p-significance value of each variable (alpha ≤ 0.05); we selected the value one (i.e., presence) as the likelihood factor. The generated models are of polynomial degree to obtain linear and quadratic response (e.g., Fielding and Bell, 1997; Allouche et al., 2006). From these results, we ran ecological response curve models, where the resulting plots included the probability of occurrence and values for continuous variables or categories for discrete variables. The points of the presence and background training group are also included.

On the other hand, a global GLM was also run, from which the generalized model is evaluated by means of a 2 x 2 contingency matrix, including both observed and predicted records. A representation of this is shown in Table 1 (adapted from Allouche et al., 2006). In this process we select an arbitrary boundary of 0.5 to obtain better modeling performance and avoid high percentage of bias in type I (omission) or II (commission) errors (e.g., Carpenter et al., 1993; Fielding and Bell, 1997; Allouche et al., 2006; Kim, 2009; Hijmans and Elith, 2017).

Table 1. Example of 2 x 2 contingency matrix for calculating performance metrics for GLM models. A represents true presence records (true positives), B represents false presence records (false positives - error of commission), C represents true background points (true negatives) and D represents false backgrounds (false negatives - errors of omission).

Validation set

Model

True

False

Presence

A

B

Background

C

D

We then calculated the Overall and True Skill Statistics (TSS) metrics. The first is used to assess the proportion of correctly predicted cases, while the second metric assesses the prevalence of correctly predicted cases (Olden and Jackson, 2002). This metric also gives equal importance to the prevalence of presence prediction as to the random performance correction (Fielding and Bell, 1997; Allouche et al., 2006).

The last code (i.e., Code8_DOMAIN_SuitHab_model.R) is for species distribution modelling using the DOMAIN algorithm (Carpenter et al., 1993). Here, we loaded the variable stack and the presence and background group subdivided into 75% training and 25% test, each. We only included the presence training subset and the predictor variables stack in the calculation of the DOMAIN metric, as well as in the evaluation and validation of the model.

Regarding the model evaluation and estimation, we selected the following estimators:

1) partial ROC, which evaluates the approach between the curves of positive (i.e., correctly predicted presence) and negative (i.e., correctly predicted absence) cases. As farther apart these curves are, the model has a better prediction performance for the correct spatial distribution of the species (Manzanilla-Quiñones, 2020).

2) ROC/AUC curve for model validation, where an optimal performance threshold is estimated to have an expected confidence of 75% to 99% probability (De Long et al., 1988).

Search
Clear search
Close search
Google apps
Main menu