Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the last decade, a plethora of algorithms have been developed for spatial ecology studies. In our case, we use some of these codes for underwater research work in applied ecology analysis of threatened endemic fishes and their natural habitat. For this, we developed codes in Rstudio® script environment to run spatial and statistical analyses for ecological response and spatial distribution models (e.g., Hijmans & Elith, 2017; Den Burg et al., 2020). The employed R packages are as follows: caret (Kuhn et al., 2020), corrplot (Wei & Simko, 2017), devtools (Wickham, 2015), dismo (Hijmans & Elith, 2017), gbm (Freund & Schapire, 1997; Friedman, 2002), ggplot2 (Wickham et al., 2019), lattice (Sarkar, 2008), lattice (Musa & Mansor, 2021), maptools (Hijmans & Elith, 2017), modelmetrics (Hvitfeldt & Silge, 2021), pander (Wickham, 2015), plyr (Wickham & Wickham, 2015), pROC (Robin et al., 2011), raster (Hijmans & Elith, 2017), RColorBrewer (Neuwirth, 2014), Rcpp (Eddelbeuttel & Balamura, 2018), rgdal (Verzani, 2011), sdm (Naimi & Araujo, 2016), sf (e.g., Zainuddin, 2023), sp (Pebesma, 2020) and usethis (Gladstone, 2022).
It is important to follow all the codes in order to obtain results from the ecological response and spatial distribution models. In particular, for the ecological scenario, we selected the Generalized Linear Model (GLM) and for the geographic scenario we selected DOMAIN, also known as Gower's metric (Carpenter et al., 1993). We selected this regression method and this distance similarity metric because of its adequacy and robustness for studies with endemic or threatened species (e.g., Naoki et al., 2006). Next, we explain the statistical parameterization for the codes immersed in the GLM and DOMAIN running:
In the first instance, we generated the background points and extracted the values of the variables (Code2_Extract_values_DWp_SC.R). Barbet-Massin et al. (2012) recommend the use of 10,000 background points when using regression methods (e.g., Generalized Linear Model) or distance-based models (e.g., DOMAIN). However, we considered important some factors such as the extent of the area and the type of study species for the correct selection of the number of points (Pers. Obs.). Then, we extracted the values of predictor variables (e.g., bioclimatic, topographic, demographic, habitat) in function of presence and background points (e.g., Hijmans and Elith, 2017).
Subsequently, we subdivide both the presence and background point groups into 75% training data and 25% test data, each group, following the method of Soberón & Nakamura (2009) and Hijmans & Elith (2017). For a training control, the 10-fold (cross-validation) method is selected, where the response variable presence is assigned as a factor. In case that some other variable would be important for the study species, it should also be assigned as a factor (Kim, 2009).
After that, we ran the code for the GBM method (Gradient Boost Machine; Code3_GBM_Relative_contribution.R and Code4_Relative_contribution.R), where we obtained the relative contribution of the variables used in the model. We parameterized the code with a Gaussian distribution and cross iteration of 5,000 repetitions (e.g., Friedman, 2002; kim, 2009; Hijmans and Elith, 2017). In addition, we considered selecting a validation interval of 4 random training points (Personal test). The obtained plots were the partial dependence blocks, in function of each predictor variable.
Subsequently, the correlation of the variables is run by Pearson's method (Code5_Pearson_Correlation.R) to evaluate multicollinearity between variables (Guisan & Hofer, 2003). It is recommended to consider a bivariate correlation ± 0.70 to discard highly correlated variables (e.g., Awan et al., 2021).
Once the above codes were run, we uploaded the same subgroups (i.e., presence and background groups with 75% training and 25% testing) (Code6_Presence&backgrounds.R) for the GLM method code (Code7_GLM_model.R). Here, we first ran the GLM models per variable to obtain the p-significance value of each variable (alpha ≤ 0.05); we selected the value one (i.e., presence) as the likelihood factor. The generated models are of polynomial degree to obtain linear and quadratic response (e.g., Fielding and Bell, 1997; Allouche et al., 2006). From these results, we ran ecological response curve models, where the resulting plots included the probability of occurrence and values for continuous variables or categories for discrete variables. The points of the presence and background training group are also included.
On the other hand, a global GLM was also run, from which the generalized model is evaluated by means of a 2 x 2 contingency matrix, including both observed and predicted records. A representation of this is shown in Table 1 (adapted from Allouche et al., 2006). In this process we select an arbitrary boundary of 0.5 to obtain better modeling performance and avoid high percentage of bias in type I (omission) or II (commission) errors (e.g., Carpenter et al., 1993; Fielding and Bell, 1997; Allouche et al., 2006; Kim, 2009; Hijmans and Elith, 2017).
Table 1. Example of 2 x 2 contingency matrix for calculating performance metrics for GLM models. A represents true presence records (true positives), B represents false presence records (false positives - error of commission), C represents true background points (true negatives) and D represents false backgrounds (false negatives - errors of omission).
|
Validation set | |
Model |
True |
False |
Presence |
A |
B |
Background |
C |
D |
We then calculated the Overall and True Skill Statistics (TSS) metrics. The first is used to assess the proportion of correctly predicted cases, while the second metric assesses the prevalence of correctly predicted cases (Olden and Jackson, 2002). This metric also gives equal importance to the prevalence of presence prediction as to the random performance correction (Fielding and Bell, 1997; Allouche et al., 2006).
The last code (i.e., Code8_DOMAIN_SuitHab_model.R) is for species distribution modelling using the DOMAIN algorithm (Carpenter et al., 1993). Here, we loaded the variable stack and the presence and background group subdivided into 75% training and 25% test, each. We only included the presence training subset and the predictor variables stack in the calculation of the DOMAIN metric, as well as in the evaluation and validation of the model.
Regarding the model evaluation and estimation, we selected the following estimators:
1) partial ROC, which evaluates the approach between the curves of positive (i.e., correctly predicted presence) and negative (i.e., correctly predicted absence) cases. As farther apart these curves are, the model has a better prediction performance for the correct spatial distribution of the species (Manzanilla-Quiñones, 2020).
2) ROC/AUC curve for model validation, where an optimal performance threshold is estimated to have an expected confidence of 75% to 99% probability (De Long et al., 1988).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Equivalence of the F test and t test in Example 1.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Files descriptions:
All csv files refer to results from the different models (PAMM, AARs, Linear models, MRPPs) on each iteration of the simulation. One row being one iteration. "results_perfect_detection.csv" refers to the results from the first simulation part with all the observations."results_imperfect_detection.csv" refers to the results from the first simulation part with randomly thinned observations to mimick imperfect detection.
ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).PAMM30: p-value of the PAMM running on the 30-days survey.PAMM7: p-value of the PAMM running on the 7-days survey.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).
"results_int_dir_perf_det.csv" refers to the results from the second simulation part, with all the observations."results_int_dir_imperf_det.csv" refers to the results from the second simulation part, with randomly thinned observations to mimick imperfect detection.ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of A on B.p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of B on A.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2_BAB: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.AAR2_ABA: ratio value for the Avoidance-Attraction-Ratio calculating ABA/AA.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).
Scripts files description:1_Functions: R script containing the functions: - MRPP from Karanth et al. (2017) adapted here for time efficiency. - MRPP from Murphy et al. (2021) adapted here for time efficiency. - Version of the ct_to_recurrent() function from the recurrent package adapted to process parallized on the simulation datasets. - The simulation() function used to simulate two species observations with reciprocal effect on each other.2_Simulations: R script containing the parameters definitions for all iterations (for the two parts of the simulations), the simulation paralellization and the random thinning mimicking imperfect detection.3_Approaches comparison: R script containing the fit of the different models tested on the simulated data.3_1_Real data comparison: R script containing the fit of the different models tested on the real data example from Murphy et al. 2021.4_Graphs: R script containing the code for plotting results from the simulation part and appendices.5_1_Appendix - Check for similarity between codes for Karanth et al 2017 method: R script containing Karanth et al. (2017) and Murphy et al. (2021) codes lines and the adapted version for time-efficiency matter and a comparison to verify similarity of results.5_2_Appendix - Multi-response procedure permutation difference: R script containing R code to test for difference of the MRPPs approaches according to the species on which permutation are done.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The topographic category contains the "alti3d" dataset.
The alti3D dataset (topographic category) describes the topography of Switzerland. After resampling the “SwissALTI3D” source data (swisstopo, 2016) to the SWECO25 grid with 4 resampling schemes (mean, median, minimum, and maximum value), we generated individual layers for four variables (elevation, aspect, hillshade, and slope). For each variable and resampling scheme, we computed 13 focal statistics layers by applying a cell-level function calculating the mean value in a circular moving window of 13 radii ranging from 25m to 5km. This dataset includes a total of 224 layers. Final values were rounded and multiplied by 100.
The detailed list of layers available is provided in SWECO25_datalayers_details_topo.csv and includes information on the category, dataset, variable name (long), variable name (short), period, sub-period, start year, end year, attribute, radii, unit, and path.
References:
Swiss Federal Office of Topography [swisstopo]. The high precision digital elevation model of Switzerland swissALTI3D (2m). (Wabern, Switzerland, 2016).
Külling, N., Adde, A., Fopp, F., Schweiger, A. K., Broennimann, O., Rey, P.-L., Giuliani, G., Goicolea, T., Petitpierre, B., Zimmermann, N. E., Pellissier, L., Altermatt, F., Lehmann, A., & Guisan, A. (2024). SWECO25: A cross-thematic raster database for ecological research in Switzerland. Scientific Data, 11(1), Article 1. https://doi.org/10.1038/s41597-023-02899-1
V2: metadata update
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A description of how data were simulated for the evaluation of the different analysis techniques for Example 2.
Analyses of published research can provide a realistic perspective on the progress of science. By analyzing more than 18 000 articles published by the preeminent ecological societies, we found that (1) ecological research is becoming increasingly statistically complex, reporting a growing number of P values per article and (2) the value of reported coefficient of determination (R2) has been falling steadily, suggesting a decrease in the marginal explanatory power of ecology. These trends may be due to changes in the way ecology is studied or in the way the findings of investigations are reported. Determining the reason for increasing complexity and declining marginal explanatory power would require a critical review of the scientific process in ecology, from research design to dissemination, and could influence the public interpretation and policy implications of ecological findings.
Read More: http://www.esajournals.org/doi/abs/10.1890/130230
Despite the wide application of meta-analysis in ecology, some of the traditional methods used for meta-analysis may not perform well given the type of data characteristic of ecological meta-analyses. We reviewed published meta-analyses on the ecological impacts of global climate change, evaluating the number of replicates used in the primary studies (ni) and the number of studies or records (k) that were aggregated to calculate a mean effect size. We used the results of the review in a simulation experiment to assess the performance of conventional frequentist and Bayesian meta-analysis methods for estimating a mean effect size and its uncertainty interval. Our literature review showed that ni and k were highly variable, distributions were right-skewed, and were generally small (median ni =5, median k=44). Our simulations show that the choice of method for calculating uncertainty intervals was critical for obtaining appropriate coverage (close to the nominal value of 0.95). When k was low (<40), 95% coverage was achieved by a confidence interval based on the t-distribution that uses an adjusted standard error (the Hartung-Knapp-Sidik-Jonkman, HKSJ), or by a Bayesian credible interval, whereas bootstrap or z-distribution confidence intervals had lower coverage. Despite the importance of the method to calculate the uncertainty interval, 39% of the meta-analyses reviewed did not report the method used, and of the 61% that did, 94% used a potentially problematic method, which may be a consequence of software defaults. In general, for a simple random-effects meta-analysis, the performance of the best frequentist and Bayesian methods were similar for the same combinations of factors (k and mean replication), though the Bayesian approaches had higher than nominal (>95%) coverage for the mean effect when k was very low (k<15). Our literature review suggests that many meta-analyses that used z-distribution or bootstrapping confidence intervals may have over-estimated the statistical significance of their results when the number of studies was low; more appropriate methods need to be adopted in ecological meta-analyses.
The data set of ecological adjustment value of Arctic permafrost change from 1982 to 2015, with the time resolution of 1982, 2015 and the change rate of two phases, covers the entire Arctic tundra area, with the spatial resolution of 8km. Based on multi-source remote sensing, simulation, statistics and measured data, and combined with GIS and ecological methods, it quantifies the adjustment service value of Arctic permafrost to the ecosystem, The unit price refers to the correlation (0.35) between the active layer thickness and NDVI changes after excluding precipitation and snow water equivalent, and the grassland ecosystem service value (the unit price of tundra ecosystem service is based on 1/3 of the grassland ecosystem service value).
Data presented here are those collected from a survey of Ecology professors at 48 undergraduate institutions to assess the current state of data management education. The following files have been uploaded:
Scripts(2): 1. DataCleaning_20120105.R is an R script for cleaning up data prior to analysis. This script removes spaces, substitutes text for codes, removed duplicate schools, and converts questions and answers from the survey into more simple parameter names, without any numbers, spaces, or symbols. This script is heavily annotated to assist the user of the file in understanding what is being done to the data files. The script produces the file cleandata_[date].Rdata, which is called in the file DataTrimming_20120105.R 2. DataTrimming_20120105.R is an R script for trimming extraneous variables not used in final analyses. Some variables are combined as needed and NAs (no answers) are removed. The file is heavily annotated. It produces trimdata_[date].Rdata, which was imported into Excel for summary statistics.
Data files (3) 3. AdvancedSpreadsheet_20110526.csv is the output file from the SurveyMonkey online survey tool used for this project. It is a .csv sheet with the complete set of survey data, although some data (e.g., open-ended responses, institution names) are removed to prevent schools and/or instructors from being identifiable. This file is read into DataCleaning_20120105.R for cleaning and editing. 4. VariableRenaming_20110711.csv is called into the DataCleaning_20120105.R script to convert the questions and answers from the survey into simple parameter names, without any numbers, spaces, or symbols. 5. ParamTable.csv is a list of the parameter names used for analysis and the value codes. It can be used to understand outputs from the scripts above (cleandata_[date].Rdata and trimdata_[date].Rdata).
Trait-based ecology (TBE) has proven useful in the terrestrial realm and beyond for collapsing ecological complexity into traits that can be compared and generalized across species and scales. However, TBE for marine macroalgae is still in its infancy, motivating research to build the foundation of macroalgal TBE by leveraging lessons learned from other systems. Our objectives were to evaluate the utility of mean trait values (MTVs) across species, to explore the potential for intraspecific trait variability, and to identify macroalgal ecological strategies by clustering species with similar traits and testing for bivariate relationships between traits. To accomplish this, we measured thallus toughness, a trait associated with resistance to herbivory, and tensile strength, a trait associated with resistance to physical disturbance, in eight tropical macroalgal species across up to seven sites where they were found around Moorea, French Polynesia. We found interspecific trait variation g...
Open science is vital to the interdisciplinary field of ecology due to its integrative nature and use of longitudinal datasets that build upon earlier data collections. To highlight the importance of open science in the rapidly growing discipline of restoration ecology, we conducted a 'computational reproducibility’ assessment of a publication on a mining restoration program spanning several decades and over 250 km2 in a global biodiversity hotspot. Open data and code provided alongside the original publication were assessed for consistency with the results and conclusions of the original publication, as were potential limitations in findings due to the methodology. The impacts of inconsistencies and limitations were qualitatively assessed against the key findings from the publication and data were re-analysed where impacts were potentially significant. Of the six inconsistencies and limitations identified, two had a significant impact on five of the 11 key findings of the original publ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset identifies high ecological value waterways and water dependent ecosystems for the 184 coastal catchments in NSW. The purpose of the dataset is to identify strategic priorities for protecting and improving the health of high value waterways and water dependent ecosystems in the catchment.
The dataset map shows areas where waterways and water dependent ecosystems are defined as high ecological value, based on the definitions, guidelines and policies under the Environment Protection and Biodiversity Conservation Act 1999, Biodiversity Conservation Act 2016, Fisheries Management Act 1994 and/or Water Management Act 2000.
The water dependent ecosystems consist of wetlands, and flora and fauna that rely on water sources (including groundwater). The dataset does not include vegetation dependent on surface waters. The dataset integrates up to 28 data layers/indicators being used by the State Government to define high value. The individual indicators have not been ground-truthed and it is recommended that field assessments and/or a comparison to local mapping be undertaken prior to any decisions being made.
The dataset was created by initially placing a 1-hectare hexagon grid over the relevant boundary area and attributing the grid with either ‘Absent’ or ‘Present’. This represents an occurrence only of high value water dependent ecosystems layers within the hexagon grid. The ‘HEVlyr’ field provides a count of the number of layers within the hexagon. A value of zero (0) represents Absent, values of greater and equal to one (1) represents the Presence of high ecological value assets. The values DONOT indicate the significance of the area, i.e., a value of 1 is no less significant to a higher value. Where layers are Present, please contact data steward for details relating to the underpinning datasets.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
These data from Evaluation and Review of Ecology-Focused Stream Studies to Support Cooperative Monitoring, Fountain Creek Basin, Colorado were used to describe temporal trends in invertebrate communities in the basin. Invertebrate data were collected at U.S. Geological Survey (USGS) sites between 1985 and 2022. Datasets include invertebrate frequency of occurrence, invertebrate tolerance index values, invertebrate multi-metric index, New Zealand mudsnial counts, and list of invertebrate species collected.
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
A Final Report for Department of the Environment and Energy (October 2017)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The edaphic category contains the "eiv" and "modiffus" datasets.
The eiv dataset includes variables representing local soil properties and climate conditions. After resampling the source data (Descombes et al., 2020) to the SWECO25 grid, we generated individual layers for the 8 available variables (soil pH, nutrients, moisture, moisture variability, aeration, humus, climate continentality, and light). For each variable, we provided a layer with the raw values and 13 focal statistics layers by applying a cell-level function calculating the mean value in a circular moving window of 13 radii ranging from 25m to 5km. This dataset includes a total of 112 layers. Final values were rounded and multiplied by 100.
The modiffus dataset describes the nitrogen (n) and phosphorus (p) loads in Swiss soils. After resampling the source data (Hürdler et al., 2015) to the SWECO25 grid for these two variables, we provided the output maps and computed 13 focal statistics layers by applying a cell-level function calculating the average value in a circular moving window of 13 radii ranging from 25m to 5km. This dataset includes a total of 28 layers. Final values were rounded and multiplied by 100.
The detailed list of layers available is provided in SWECO25_datalayers_details_edaph.csv and includes information on the category, dataset, variable name (long), variable name (short), period, sub-period, start year, end year, attribute, radii, unit, and path.
References:
Descombes, P. et al. Spatial modelling of ecological indicator values improves predictions of plant distributions in complex landscapes. Ecography 43, 1448-1463 (2020).
Hürdler J., P. V., Spiess E. Abschätzung diffuser Stickstoff- und Phosphoreinträge in die Gewässer der Schweiz MODIFFUS 3.0: Bericht im Auftrag des Bundesamtes für Umwelt (BAFU). (Zürich, Switzerland, 2015).
Külling, N., Adde, A., Fopp, F., Schweiger, A. K., Broennimann, O., Rey, P.-L., Giuliani, G., Goicolea, T., Petitpierre, B., Zimmermann, N. E., Pellissier, L., Altermatt, F., Lehmann, A., & Guisan, A. (2024). SWECO25: A cross-thematic raster database for ecological research in Switzerland. Scientific Data, 11(1), Article 1. https://doi.org/10.1038/s41597-023-02899-1
V2: metadata update
This document, Innovating the Data Ecosystem: An Update of The Federal Big Data Research and Development Strategic Plan, updates the 2016 Federal Big Data Research and Development Strategic Plan. This plan updates the vision and strategies on the research and development needs for big data laid out in the 2016 Strategic Plan through the six strategies areas (enhance the reusability and integrity of data; enable innovative, user-driven data science; develop and enhance the robustness of the federated ecosystem; prioritize privacy, ethics, and security; develop necessary expertise and diverse talent; and enhance U.S. leadership in the international context) to enhance data value and reusability and responsiveness to federal policies on data sharing and management.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
International commitments are challenging countries to restore their degraded lands, particularly forests. These commitments require global assessments of recovery timescales and trajectories of different forest attributes to inform restoration strategies. We use a meta-chronosequence approach including 125 forest chronosequences to reconstruct the past (c. 300 years), and model future recovery trajectories of forests recovering from agriculture and logging impacts. We found recovering forests significantly differed from undisturbed ones after 150 years and projected that difference to remain for up to 218 or 494 years for ecosystem attributes like nitrogen stocks or species similarity, respectively. These conservative estimates, however, do not capture the complexity of forest ecosystems. A centennial recovery of forests requires strategic, unprecedented planning to deliver a restored world. Methods Database construction We collected data from 16,873 plots from 125 chronosequences of recovering forest ecosystems in 110 published primary studies. From these chronosequences, we extracted 641 recovery trajectories of quantitative measures of ecosystem attributes along time, related to six recovery metrics (organism abundance, species diversity, species similarity, carbon cycling, nitrogen stock, and phosphorus stock), two restoration strategies (passive and active), three disturbance types [agriculture (including abandoned croplands and pastures), logging and mining], and a climatic metric (i.e., aridity index). From the selected chronosequences, we extracted 641 recovery trajectories, i.e., field-based quantitative measurements of ecosystem integrity repeated through time, reported in tables, figures, and text of the selected studies. Each trajectory included at least two data points, defined as the value of the ecosystem metric at different times since recovery started (hereafter, recovery time). Average values were considered for the data points with the same recovery time (n = 72, in 21 studies). We used response ratios (RRs) to estimate the recovery completeness, i.e., the effect sizes between reference and recovering systems. We computed the RR for each data point along the trajectory as ln (Xres/Xref), where Xres is the value of the ecosystem metric at a certain recovery time and Xref is the reference value of the same metric in the reference forest. Effect sizes of the meta-analysis were weighted by study precision, which was estimated as the product of the number of subplots and their area, assuming that a higher sampling effort would imply a higher precision. For abundance, diversity and similarity, we fitted fixed-effects models, with weights only accounting for within-study variability; whereas for biogeochemical functions, we assumed random-effect meta-analytic models, accounting for both between- and within-study variability. Statistical analysis To estimate the trajectory of forest recovery over time, we fitted a separate linear mixed model (LMM) for the RR of each recovery metric. We included the recovery time as a fixed factor and as a random slope, and the trajectory identity as a random intercept, enabling a different slope and intercept for each trajectory. As the recovery process along time may result in a wide range of trajectories from linear to more saturating shapes, we consider three functions to include the recovery time variable: one linear and two decelerating trends [ln(recovery time + 1) and √recovery time]. We then selected among the three options the one that best fit the data of each recovery metric according to the minimum AICc. The models for the recovery of similarity were fitted using the Morisita-Horn index, as the Pearson correlation test informed that it was correlated to Jaccard and Bray-Curtis indices. Their absolute values were square root transformed to meet the assumptions of general linear models and then multiplied by -1 to facilitate interpretation. Using the resulting LMMs, we predicted the RR after 73, 146, and 219 years of recovery [i.e., one, two, and three times the global life expectancy in 2019]. We then predicted the time needed for forest ecosystems to recover to 90% of reference values for each trajectory and recovery metric and calculated the median by metric. Also using the resulting LMMs, we predicted the RR after 50 and 100 years of recovery for each metric and trajectory (1) to know if the recovery completeness is dependent on the metric and (2) to understand the main explanatory variables underlying the recovery process for each metric. We fitted linear models (LM) to analyse the difference in the RR after 50 years and after 100 years of recovery among recovery metrics. The models had the recovery metric and the intercepts of the LMMs for each trajectory as fixed factors. The latter was included to account for the effect of the initial state of degradation when recovery started. We then fitted a separate LM for the effect of each explanatory variable studied (i.e., aridity, disturbance category, restoration strategy or life form) on the RR predictions after 50 and 100 years of all recovery metrics together, and then for each recovery metric individually. In all the cases, the intercept of the LMMs for each trajectory was also included as a fixed factor to account for the effect of the initial state of degradation when recovery started. For the models fitted for the disturbance category and the life form, we excluded the categories with <1% of the values (i.e., “mining” for disturbance and “bird” for life form) or those including data with mixing information from other categories (i.e., “agriculture and logging” for disturbance and “woody and non-woody” for life form).
Data used to value nature-based recreationThis is a short data file containing data collected in questionnaire surveys in 2011 of visitors to a wetland restoration project at Wicken Fen. It is divided into short sections that list different categories of data from the questionnaire. It also lists data received from the National Trust about visitor numbers in 2010.Ecology and Evolution data for ECE31248.pdf,Restoration of degraded land is recognized by the international community as an important way of enhancing both biodiversity and ecosystem services, but more information is needed about its costs and benefits. In Cambridgeshire, U.K., a long-term initiative to convert drained, intensively farmed arable land to a wetland habitat mosaic is driven by a desire both to prevent biodiversity loss from the nationally important Wicken Fen National Nature Reserve (Wicken Fen NNR) and to increase the provision of ecosystem services. We evaluated the changes in ecosystem service delivery resulting from this land conversion, using a new Toolkit for Ecosystem Service Site-based Assessment (TESSA) to estimate biophysical and monetary values of ecosystem services provided by the restored wetland mosaic compared with the former arable land. Overall results suggest that restoration is associated with a net gain to society as a whole of 199ha−1y−1,foraone−offinvestmentinrestorationof199ha−1y−1,foraone−offinvestmentinrestorationof2320 ha−1. Restoration has led to an estimated loss of arable production of 2040ha−1y−1,butestimatedgainsof2040ha−1y−1,butestimatedgainsof671 ha−1y−1 in nature-based recreation, 120ha−1y−1fromgrazing,120ha−1y−1fromgrazing,48 ha−1y−1 from flood protection, and a reduction in greenhouse gas (GHG) emissions worth an estimated 72ha−1y−1.Managementcostshavealsodeclinedbyanestimated72ha−1y−1.Managementcostshavealsodeclinedbyanestimated1325 ha−1y−1. Despite uncertainties associated with all measured values and the conservative assumptions used, we conclude that there was a substantial gain to society as a whole from this land-use conversion. The beneficiaries also changed from local arable farmers under arable production to graziers, countryside users from towns and villages, and the global community, under restoration. We emphasize that the values reported here are not necessarily transferable to other sites.
Despite the wide usage of the term information in evolutionary ecology, there is no general treatise between fitness (i.e., density-dependent population growth) and selection of the environment sensu lato. Here we (1) initiate the building of a quantitative framework with which to examine the relationship between information use in spatially heterogeneous landscapes and density-dependent population growth, and (2) illustrate its utility by applying the framework to an existing model of breeding habitat selection. We begin by linking information, as a process of narrowing choice, to population growth/fitness. Second, we define a measure of a population’s penalty of ignorance based on the Kullback-Leibler index that combines the contributions of resource selection (i.e., biased use of breeding sites) and density-dependent depletion. Third, we quantify the extent to which environmental heterogeneity (i.e., mean and variance within a landscape) constrains sustainable population growth of un...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set was derived from the Riverbank dataset, which aims to identify those foreshore areas most in need of works. Riverbank dataset assesses Built Foreshores, Natural Foreshores, Riparian …Show full descriptionThis data set was derived from the Riverbank dataset, which aims to identify those foreshore areas most in need of works. Riverbank dataset assesses Built Foreshores, Natural Foreshores, Riparian Ecological Value and Useability of Foreshores to identify priorities. This data set represents the most up to date assessment of Riparian Ecology of foreshores. Riparian Ecology value of foreshores was identified and analysed by GIS in 2016. Data valuation hierarchy’s, based on the Perth Biodiversity Project, were finalised in 2019 The data provides a Swan Canning Riverpark wide assessment of the riparian ecological values of foreshores using conservation estate status, proximity to high value flora and fauna (Class 1, 2 ad 3) Flora and Fauna, TEC and or Bush forever sites, Heddle Vegetation Complex rarity on the swan coastal plain, vegetation condition and vegetation variability. Feedback on accuracy and recency of data is welcome.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the last decade, a plethora of algorithms have been developed for spatial ecology studies. In our case, we use some of these codes for underwater research work in applied ecology analysis of threatened endemic fishes and their natural habitat. For this, we developed codes in Rstudio® script environment to run spatial and statistical analyses for ecological response and spatial distribution models (e.g., Hijmans & Elith, 2017; Den Burg et al., 2020). The employed R packages are as follows: caret (Kuhn et al., 2020), corrplot (Wei & Simko, 2017), devtools (Wickham, 2015), dismo (Hijmans & Elith, 2017), gbm (Freund & Schapire, 1997; Friedman, 2002), ggplot2 (Wickham et al., 2019), lattice (Sarkar, 2008), lattice (Musa & Mansor, 2021), maptools (Hijmans & Elith, 2017), modelmetrics (Hvitfeldt & Silge, 2021), pander (Wickham, 2015), plyr (Wickham & Wickham, 2015), pROC (Robin et al., 2011), raster (Hijmans & Elith, 2017), RColorBrewer (Neuwirth, 2014), Rcpp (Eddelbeuttel & Balamura, 2018), rgdal (Verzani, 2011), sdm (Naimi & Araujo, 2016), sf (e.g., Zainuddin, 2023), sp (Pebesma, 2020) and usethis (Gladstone, 2022).
It is important to follow all the codes in order to obtain results from the ecological response and spatial distribution models. In particular, for the ecological scenario, we selected the Generalized Linear Model (GLM) and for the geographic scenario we selected DOMAIN, also known as Gower's metric (Carpenter et al., 1993). We selected this regression method and this distance similarity metric because of its adequacy and robustness for studies with endemic or threatened species (e.g., Naoki et al., 2006). Next, we explain the statistical parameterization for the codes immersed in the GLM and DOMAIN running:
In the first instance, we generated the background points and extracted the values of the variables (Code2_Extract_values_DWp_SC.R). Barbet-Massin et al. (2012) recommend the use of 10,000 background points when using regression methods (e.g., Generalized Linear Model) or distance-based models (e.g., DOMAIN). However, we considered important some factors such as the extent of the area and the type of study species for the correct selection of the number of points (Pers. Obs.). Then, we extracted the values of predictor variables (e.g., bioclimatic, topographic, demographic, habitat) in function of presence and background points (e.g., Hijmans and Elith, 2017).
Subsequently, we subdivide both the presence and background point groups into 75% training data and 25% test data, each group, following the method of Soberón & Nakamura (2009) and Hijmans & Elith (2017). For a training control, the 10-fold (cross-validation) method is selected, where the response variable presence is assigned as a factor. In case that some other variable would be important for the study species, it should also be assigned as a factor (Kim, 2009).
After that, we ran the code for the GBM method (Gradient Boost Machine; Code3_GBM_Relative_contribution.R and Code4_Relative_contribution.R), where we obtained the relative contribution of the variables used in the model. We parameterized the code with a Gaussian distribution and cross iteration of 5,000 repetitions (e.g., Friedman, 2002; kim, 2009; Hijmans and Elith, 2017). In addition, we considered selecting a validation interval of 4 random training points (Personal test). The obtained plots were the partial dependence blocks, in function of each predictor variable.
Subsequently, the correlation of the variables is run by Pearson's method (Code5_Pearson_Correlation.R) to evaluate multicollinearity between variables (Guisan & Hofer, 2003). It is recommended to consider a bivariate correlation ± 0.70 to discard highly correlated variables (e.g., Awan et al., 2021).
Once the above codes were run, we uploaded the same subgroups (i.e., presence and background groups with 75% training and 25% testing) (Code6_Presence&backgrounds.R) for the GLM method code (Code7_GLM_model.R). Here, we first ran the GLM models per variable to obtain the p-significance value of each variable (alpha ≤ 0.05); we selected the value one (i.e., presence) as the likelihood factor. The generated models are of polynomial degree to obtain linear and quadratic response (e.g., Fielding and Bell, 1997; Allouche et al., 2006). From these results, we ran ecological response curve models, where the resulting plots included the probability of occurrence and values for continuous variables or categories for discrete variables. The points of the presence and background training group are also included.
On the other hand, a global GLM was also run, from which the generalized model is evaluated by means of a 2 x 2 contingency matrix, including both observed and predicted records. A representation of this is shown in Table 1 (adapted from Allouche et al., 2006). In this process we select an arbitrary boundary of 0.5 to obtain better modeling performance and avoid high percentage of bias in type I (omission) or II (commission) errors (e.g., Carpenter et al., 1993; Fielding and Bell, 1997; Allouche et al., 2006; Kim, 2009; Hijmans and Elith, 2017).
Table 1. Example of 2 x 2 contingency matrix for calculating performance metrics for GLM models. A represents true presence records (true positives), B represents false presence records (false positives - error of commission), C represents true background points (true negatives) and D represents false backgrounds (false negatives - errors of omission).
|
Validation set | |
Model |
True |
False |
Presence |
A |
B |
Background |
C |
D |
We then calculated the Overall and True Skill Statistics (TSS) metrics. The first is used to assess the proportion of correctly predicted cases, while the second metric assesses the prevalence of correctly predicted cases (Olden and Jackson, 2002). This metric also gives equal importance to the prevalence of presence prediction as to the random performance correction (Fielding and Bell, 1997; Allouche et al., 2006).
The last code (i.e., Code8_DOMAIN_SuitHab_model.R) is for species distribution modelling using the DOMAIN algorithm (Carpenter et al., 1993). Here, we loaded the variable stack and the presence and background group subdivided into 75% training and 25% test, each. We only included the presence training subset and the predictor variables stack in the calculation of the DOMAIN metric, as well as in the evaluation and validation of the model.
Regarding the model evaluation and estimation, we selected the following estimators:
1) partial ROC, which evaluates the approach between the curves of positive (i.e., correctly predicted presence) and negative (i.e., correctly predicted absence) cases. As farther apart these curves are, the model has a better prediction performance for the correct spatial distribution of the species (Manzanilla-Quiñones, 2020).
2) ROC/AUC curve for model validation, where an optimal performance threshold is estimated to have an expected confidence of 75% to 99% probability (De Long et al., 1988).