Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Categorical scatterplots with R for biologists: a step-by-step guide
Benjamin Petre1, Aurore Coince2, Sophien Kamoun1
1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK
Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.
Protocol
• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.
• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.
• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.
Notes
• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.
• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.
replicates
graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()
References
Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.
Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035
Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set includes the *.csv data and the used scripts to reproduce the plots of the three different scenarios presented in S. Kiemle, K. Heck, E. Coltman, R. Helmig (2022) Stable water isotopologue fractionation during soil-water evaporation: Analysis using a coupled soil-atmosphere model. (Under review) Water Resources Research. *.csv files The isotope distribution has been analyzed in the vertical and in horizontal direction of a soil column for all scenarios. Therefore, we provide *.csv files generated using the ParaView Tools "plot over line" or "plot over time". Each *.csv file contains information about the saturation, temperature, and component composition for each phase in mole fraction or in the isotopic-specific delta notation. Additionally, information about the evaporation rate is given in a separate file *.txt file. python scripts For each scenario, we provide scripts to reproduce the presented plots. Scenarios We used different free flow conditions to analyze the fractionation processes inside the porous medium. Scenario 1. laminar flow, Scenario 2. laminar flow, but with isolation of parameter affecting the fractionation process, Scenario 3. turbulent flow. Please find below a detailed description of the data labeling and needed scripts to reproduce a certain plot for each scenario. Scenario: The spatial distribution of stable water isotopologues in horizontal (-0.01 m depth) and vertical (at 0.05 m width) inside a soil column at selected days (DoE (Day of Experiment)): Use the python scripts plot_concentration_horizontal_all.py (horizontal direction) and plot_concentration_spatial_all.py (vertical direction) to create the specific plots. In the folder IsotopeProfile_Horizontal and IsotopeProfile_Vertical the belonging *.csv can be found. The *.csv files are named after the selected day (e.g. DoE_80 refers to day 80 of the virtual experiment). The influence of the evaporation rate on isotopic fractionation processes in various depths (-0.001, -0.005, -0.009, and -0.018 m ) during the whole virtual experiment time: Use the python script plot_evap_isotopes_v2.py to create the plots. The data for the isotopologues distribution and the saturation can be found in the folder PlotOverTime. All data is named as PlotOverTime_xxxxm with xxxx representing the respective depth (e.g. PlotOverTime_0.001m refers to -0.001 m depth). The data for the evaporation rate can be found in the folder EvaporationRate. Note, the evaporation rate data is available as a .txt because we extract the information about the evaporation directly during the simulation and do not derive it through any post-processing. Scenario: Process behavior of isolated parameters that influences the isotopic fractionation: Use plot_concentration.py to reproduce the plots either represented in the isotopic-specific delta notation or in mole fraction. The corresponding data can be found in the folder IsotopeProfile_Vertical. The data labeling refers to the single cases (1- no fractionation; 2 - only equilibrium fractionation; 3 - only kinetic fractionation; 4 - only liquid diffusion; 5 - Reference). Scenario: Evaporation rate during the virtual experiment for different flow cases: With plot_evap.py and the .txt files which can be found in the folder EvaporationRate, the evaporation progression can be plotted. The labeling of the .txt files refers to the different flow cases (1 - 0.1 m/s (laminar); 2 - 0.13 m/s (laminar); 3 - 0.5 m/s (turbulent); 4 - 1 m/s (turbulent); 5 - 3 m/s (turbulent)). The isotope profiles in the vertical and horizontal direction of the soil column (similar to Scenario 1) for selected days: With plot_cocentration_horizontal_all.py and plot_concentration_spatial_all.py the plots for the horizontal and vertical distribution of isotopologues can be generated. The corresponding data can be found in the folders IsotopeProfile_Horizontal and IsotopeProfile_Vertical. These folders are structured with subfolders containing the data of selected days of the virtual experiments (DoE - Day of Experiments), in this case, day 2, 10, and 35. The data labeling remains similar to Scenario 3a).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
{# General information# The script runs with R (Version 3.1.1; 2014-07-10) and packages plyr (Version 1.8.1), XLConnect (Version 0.2-9), utilsMPIO (Version 0.0.25), sp (Version 1.0-15), rgdal (Version 0.8-16), tools (Version 3.1.1) and lattice (Version 0.20-29)# --------------------------------------------------------------------------------------------------------# Questions can be directed to: Martin Bulla (bulla.mar@gmail.com)# -------------------------------------------------------------------------------------------------------- # Data collection and how the individual variables were derived is described in: #Steiger, S.S., et al., When the sun never sets: diverse activity rhythms under continuous daylight in free-living arctic-breeding birds. Proceedings of the Royal Society B: Biological Sciences, 2013. 280(1764): p. 20131016-20131016. # Dale, J., et al., The effects of life history and sexual selection on male and female plumage colouration. Nature, 2015. # Data are available as Rdata file # Missing values are NA. # --------------------------------------------------------------------------------------------------------# For better readability the subsections of the script can be collapsed # --------------------------------------------------------------------------------------------------------}{# Description of the method # 1 - data are visualized in an interactive actogram with time of day on x-axis and one panel for each day of data # 2 - red rectangle indicates the active field, clicking with the mouse in that field on the depicted light signal generates a data point that is automatically (via custom made function) saved in the csv file. For this data extraction I recommend, to click always on the bottom line of the red rectangle, as there is always data available due to a dummy variable ("lin") that creates continuous data at the bottom of the active panel. The data are captured only if greenish vertical bar appears and if new line of data appears in R console). # 3 - to extract incubation bouts, first click in the new plot has to be start of incubation, then next click depict end of incubation and the click on the same stop start of the incubation for the other sex. If the end and start of incubation are at different times, the data will be still extracted, but the sex, logger and bird_ID will be wrong. These need to be changed manually in the csv file. Similarly, the first bout for a given plot will be always assigned to male (if no data are present in the csv file) or based on previous data. Hence, whenever a data from a new plot are extracted, at a first mouse click it is worth checking whether the sex, logger and bird_ID information is correct and if not adjust it manually. # 4 - if all information from one day (panel) is extracted, right-click on the plot and choose "stop". This will activate the following day (panel) for extraction. # 5 - If you wish to end extraction before going through all the rectangles, just press "escape". }{# Annotations of data-files from turnstone_2009_Barrow_nest-t401_transmitter.RData dfr-- contains raw data on signal strength from radio tag attached to the rump of female and male, and information about when the birds where captured and incubation stage of the nest1. who: identifies whether the recording refers to female, male, capture or start of hatching2. datetime_: date and time of each recording3. logger: unique identity of the radio tag 4. signal_: signal strength of the radio tag5. sex: sex of the bird (f = female, m = male)6. nest: unique identity of the nest7. day: datetime_ variable truncated to year-month-day format8. time: time of day in hours9. datetime_utc: date and time of each recording, but in UTC time10. cols: colors assigned to "who"--------------------------------------------------------------------------------------------------------m-- contains metadata for a given nest1. sp: identifies species (RUTU = Ruddy turnstone)2. nest: unique identity of the nest3. year_: year of observation4. IDfemale: unique identity of the female5. IDmale: unique identity of the male6. lat: latitude coordinate of the nest7. lon: longitude coordinate of the nest8. hatch_start: date and time when the hatching of the eggs started 9. scinam: scientific name of the species10. breeding_site: unique identity of the breeding site (barr = Barrow, Alaska)11. logger: type of device used to record incubation (IT - radio tag)12. sampling: mean incubation sampling interval in seconds--------------------------------------------------------------------------------------------------------s-- contains metadata for the incubating parents1. year_: year of capture2. species: identifies species (RUTU = Ruddy turnstone)3. author: identifies the author who measured the bird4. nest: unique identity of the nest5. caught_date_time: date and time when the bird was captured6. recapture: was the bird capture before? (0 - no, 1 - yes)7. sex: sex of the bird (f = female, m = male)8. bird_ID: unique identity of the bird9. logger: unique identity of the radio tag --------------------------------------------------------------------------------------------------------}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
Data points present in this dataset were obtained following the subsequent steps: To assess the secretion efficiency of the constructs, 96 colonies from the selection plates were evaluated using the workflow presented in Figure Workflow. We picked transformed colonies and cultured in 400 μL TAP medium for 7 days in Deep-well plates (Corning Axygen®, No.: PDW500CS, Thermo Fisher Scientific Inc., Waltham, MA), covered with Breathe-Easy® (Sigma-Aldrich®). Cultivation was performed on a rotary shaker, set to 150 rpm, under constant illumination (50 μmol photons/m2s). Then 100 μL sample were transferred clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA) and fluorescence was measured using an Infinite® M200 PRO plate reader (Tecan, Männedorf, Switzerland). Fluorescence was measured at excitation 575/9 nm and emission 608/20 nm. Supernatant samples were obtained by spinning Deep-well plates at 3000 × g for 10 min and transferring 100 μL from each well to the clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA), followed by fluorescence measurement. To compare the constructs, R Statistic version 3.3.3 was used to perform one-way ANOVA (with Tukey's test), and to test statistical hypotheses, the significance level was set at 0.05. Graphs were generated in RStudio v1.0.136. The codes are deposit herein.
Info
ANOVA_Turkey_Sub.R -> code for ANOVA analysis in R statistic 3.3.3
barplot_R.R -> code to generate bar plot in R statistic 3.3.3
boxplotv2.R -> code to generate boxplot in R statistic 3.3.3
pRFU_+_bk.csv -> relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
sup_+_bl.csv -> supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
sup_raw.csv -> supernatant mCherry fluorescence dataset of 96 colonies for each construct.
who_+_bl2.csv -> whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
who_raw.csv -> whole culture mCherry fluorescence dataset of 96 colonies for each construct.
who_+_Chlo.csv -> whole culture chlorophyll fluorescence dataset of 96 colonies for each construct.
Anova_Output_Summary_Guide.pdf -> Explain the ANOVA files content
ANOVA_pRFU_+_bk.doc -> ANOVA of relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
ANOVA_sup_+_bk.doc -> ANOVA of supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
ANOVA_who_+_bk.doc -> ANOVA of whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
ANOVA_Chlo.doc -> ANOVA of whole culture chlorophyll fluorescence of all constructs, plus average and standard deviation values.
Consider citing our work.
Molino JVD, de Carvalho JCM, Mayfield SP (2018) Comparison of secretory signal peptides for heterologous protein expression in microalgae: Expanding the secretion portfolio for Chlamydomonas reinhardtii. PLoS ONE 13(2): e0192433. https://doi.org/10.1371/journal. pone.0192433
[Note 2023-08-14 - Supersedes version 1, https://doi.org/10.15482/USDA.ADC/1528086 ] This dataset contains all code and data necessary to reproduce the analyses in the manuscript: Mengistu, A., Read, Q. D., Sykes, V. R., Kelly, H. M., Kharel, T., & Bellaloui, N. (2023). Cover crop and crop rotation effects on tissue and soil population dynamics of Macrophomina phaseolina and yield under no-till system. Plant Disease. https://doi.org/10.1094/pdis-03-23-0443-re The .zip archive cropping-systems-1.0.zip contains data and code files. Data stem_soil_CFU_by_plant.csv: Soil disease load (SoilCFUg) and stem tissue disease load (StemCFUg) for individual plants in CFU per gram, with columns indicating year, plot ID, replicate, row, plant ID, previous crop treatment, cover crop treatment, and comments. Missing data are indicated with . yield_CFU_by_plot.csv: Yield data (YldKgHa) at the plot level in units of kg/ha, with columns indicating year, plot ID, replicate, and treatments, as well as means of soil and stem disease load at the plot level. Code cropping_system_analysis_v3.0.Rmd: RMarkdown notebook with all data processing, analysis, and visualization code equations.Rmd: RMarkdown notebook with formatted equations formatted_figs_revision.R: R script to produce figures formatted exactly as they appear in the manuscript The Rproject file cropping-systems.Rproj is used to organize the RStudio project. Scripts and notebooks used in older versions of the analysis are found in the testing/ subdirectory. Excel spreadsheets containing raw data from which the cleaned CSV files were created are found in the raw_data subdirectory.
File List e001_arssnlvl0.csv (MD5: 75f21b9949b87c018f3499b5d2a093e7) e001_arssnlvl3.csv (MD5: 02cefd1cdb16fc25968f4e7d96a43378) e026_aslit.csv (MD5: eea09f91f81dc7e5801d8f67ccded5c1) e054_arssprecip.csv (MD5: 1d62c9ab92a6a1fc4caa53787d2c0cf1) e120_bmins.csv (MD5: e8699cfd2d2c9a11d43abd572b6f3cd8) e120_invnit1_2.csv (MD5: 6c023c438eb4c6e2625f526fafd9c17d) e120_invnit4_8.csv (MD5: a304786a833b5c6e53063472b97ef93d) e120_invnit16.csv (MD5: bda824cec8f8fdc2992c8732e03aa109) e120_nitbm.csv (MD5: 212637b7ab08d0cc2146f26602061b64) find_number_of_observations.R (MD5: a6a097d6633ee66b9c3531676320b929) multispatialCCM.zip (MD5: 64647cc7d14d2df5398a76af9ec73e2a) Description e001_arssnlvl0.csv is a comma-separated text file containing the data for Agroypron (Elymus) repens and Schizachyrium scoparium dynamics in unfertilized plots for experiment 001 at Cedar Creek. Column definitions are: 1."index": concatenated text including the plot, field, and year sampled 2. "Exp": Cedar Creek experiment number 3. "Year": year data was sampled 4. "Field": ID for field that was sampled 5. "Plot": plot number for sample 6. "Ntrt": categorical fertilization treatment 7. "Nadd": g nitrogen added per square meter per year for each treatment 8. "NitrAdd": g nitrate added per square meter per year for each treatment 9. "Natm.Nadd": g nitrogen added per square meter per year for each treatment, including 1 g/m2/year atmospheric deposition 10. "fg": plant functional group: C3/C4 for grasses with C3/C4 photosynthetic pathway, F for non-legume forb, L for legume. 11. "isspecies": Binary indicator describing whether or not a row had plant species found in it that year (should be 1 for all rows) 12. "richness": Species richness for all species found in the sample 13. "Agropyron repens": g dry aboveground biomass per meter square of A. repens 14. "Schizachyrium scoparium": g dry aboveground biomass per meter square of S. scoparium 15. "Miscellaneous litter": g dry aboveground biomass per meter square of leaf litter 16. "Ncat": Fertilization intensity category, with 1 being the lowest and 3 being the highest 17. "FieldPlot": concatenated text including the field and plot e001_arssnlvl3.csv is a comma-separated text file containing the data for Agroypron (Elymus) repens and Schizachyrium scoparium dynamics in heavily plots for experiment 001 at Cedar Creek. Column definitions are as described for e001_arssnlvl0.csv. e026_aslit.csv is a comma-separated text file containing the data for Agrostic scabra and leaf litter dynamics in plots with varying soil fertility in experiment 026 at Cedar Creek. Column definitions are: 1. "monoculture": plant species grown in subplot (should always be A. scabra) 2. "litbiomass": g dry aboveground biomass per meter square of leaf litter 3. "year": year of sampling 4. "plot": plot sampled (soil N treatments vary among plots) 5. "subplot": subplot sampled 6. "abvbiomass": g dry aboveground biomass per meter square of A. scabra 7. "totaln": total soil nitrogen (in percent of soil by mass) 8. "exp": Cedar Creek experiment number 9. "yearest": year in which the experiment was established 10. "nlevel": categorical level for total soil nitrogen treatment 11. "plotsubplot": concatenated text including the plot and subplot 12. "Field": ID for field that was sampled 13. "FieldPlot": concatenated text including the field and plot e054_arssprecip.csv is a comma-separated text file containing the data for A. repens, S. scoparium, leaf litter, and precipitation dynamics for experiment 054 at Cedar Creek. Column definitions are: 1. "index": concatenated text including the year, field, plot, transect sampled 2. "Exp": experiment number 3. "Year": year of sample 4. "OldField": old field ID 5. "Plot": plot number for sample 6. "Transect": transect ID 7. "YearAb": Year that the field was abandoned from agricultural use 8. "Agropyron repens": g dry aboveground biomass per meter square of A. repens 9. "Schizachyrium scoparium": g dry aboveground biomass per meter square of S. scoparium 10: "Miscellaneous litter": g dry aboveground biomass per meter square of leaf litter 11: "precipmm": total summer annual precipitation (June-August) in mm 12: "FieldPlot": concatenated text including the field and plot e120_bmins.csv is a comma-separated text file describing plant biomass and insect dynamics for Cedar Creek experiment 120. Column definitions are: 1. "Exp": Cedar Creek experiment number 2. "Year": sampling year 3. "Month": sampling month 4. "Plot": plot sampled 5. "NumSp": number of species in treatment 6. "SpNum": number of species maintained in plot 7. "AbvBioAnnProd": g plant aboveground biomass harvested per square meter per year 8. "noh020tot": mg soil nitrate per kg soil, sampled in top 20 cm of soil 9. "insectcount": number of insect individuals in sweep net sample 10. "insectsp": number of insect species in sweep net sample 11. "Field": field ID 12. "FieldPlot": concatenated text including the field and plot e120_invnit1_2.csv is a comma-separated text file describing invading plant species dynamics and soil nitrate dynamics in monoculture plots for experiment 120 at Cedar Creek. Column definitions are as described for e120_bmins.csv, except for: 9. “invrichness”: number of non-planted “invading” plant species e120_invnit4_8.csv is a comma-separated text file describing invading plant species dynamics and soil nitrate dynamics in 4 and 8 species mixture plots for experiment 120 at Cedar Creek. Column definitions are as described for e120_invnit1_2.csv. e120_invnit16.csv is a comma-separated text file describing invading plant species dynamics and soil nitrate dynamics in 16 species mixture plots for experiment 120 at Cedar Creek. Column definitions are as described for e120_invnit1_2.csv. e120_nitbm.csv is a comma-separated text file describing soil nitrate and aboveground plant biomass dynamics. Column definitions are as described for e120_bmins.csv. find_number_of_observations.R is an R source code file that can be used to determine the number of sequential observations in subplots for all of the data sets listed above. The data should be in the working directory of R when the R code is run. multispatialCCM.zip...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Supplementary data and code associated with the Biogeosciences paper published by Cecilia Prada et al. "Soil and Biomass Carbon Storage is Much Higher in Central American than Andean Montane Forests". There are 16 files associated with this paper (1) AGB.csv providing the site, plot, treeID, mnemn, family, agb, and AGcarbon for each tree in the dataset. Column headings are described in the file AGB_metadata.csv (2) AGB_metadata.csv Metadata (column descriptions) for AGB.csv (3) CWD_D.csv Complete information on the downed coarse woody debris (CWD) measured in each plot (4) CWD_D_metadata.csv Metadata (column descriptions) for CWD_D.csv (5) CWD_S.csv Complete information on the standing coarse woody debris measured in each plot (6) CWD_S_metadata.csv Metadata (column descriptions) for CWD_S.csv (7) SoilC.csv Estimated soil carbon storage (Mg C) at each sampling location in each plot (8) SoilC_metadata.csv Metadata (column descriptions) for SoilC.csv (9) Table.csv Data source, soil carbon value (Mg C) and elevation from published data sources (10) Table_metadata.csv Metadata (column descriptions) for Table.csv (11) TableS1.csv Data source, above ground carbon value (Mg C) and elevation from published data sources (12) TableS1_metadata.csv Metadata (column descriptions) for TableS1.csv (13) RScript.R Annotated code for data analysis and figures (14)Full_dataset.csv Full set of environmental data and carbon data by plot (15) Full_dataset_metadata.csv Metadata (column descriptions) for Full_dataset.csv (16) Species list and species codes.csv Full family, genus and species names for the species codes (column mnemn in AGB.csv)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
This dataset is the repository for the following paper submitted to Data in Brief:
Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).
The Data in Brief article contains the supplement information and is the related data paper to:
Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).
Description/abstract
The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.
Folder structure
The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:
“code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.
“MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.
“mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).
“yield_productivity” contains .csv files of yield information for all countries listed above.
“population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).
“GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.
“built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.
Code structure
1_MODIS_NDVI_hdf_file_extraction.R
This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.
2_MERGE_MODIS_tiles.R
In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").
3_CROP_MODIS_merged_tiles.R
Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS. The repository provides the already clipped and merged NDVI datasets.
4_TREND_analysis_NDVI.R
Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.
5_BUILT_UP_change_raster.R
Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.
6_POPULATION_numbers_plot.R
For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.
7_YIELD_plot.R
In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.
8_GLDAS_read_extract_trend
The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection). Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.
(9_workflow_diagramme) this simple code can be used to plot a workflow diagram and is detached from the actual analysis.
Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data Curation, Writing - Original Draft, Writing - Review & Editing, Visualization, Supervision, Project administration, and Funding acquisition: Michael
Plot-level field data were collected in the summer of 2014 to estimate aboveground and belowground biomass in the Great Dismal Swamp National Wildlife Refuge and Dismal Swamp State Park in North Carolina and Virginia. Data were collected at 85 plots. The _location of the center of each plot was recorded with a Trimble ProXH global positioning system (GPS) and differentially corrected. Data files included 1: GDS_plots.csv, 2. GDS_FWD.csv, 3. GDS_LWD.csv, 4. GDS_Shrubs.csv, 5. GDS_Trees.csv, and 6. GDS_plot_summaries.csv. The data contained in GDS_plot_summaries.csv were calculated from the GDS_plots.csv, GDS_FWD.csv, GDS_LWD.csv, GDS_Shrubs.csv, GDS_Trees.csv files using the R statistical software environment (R Core Team, 2019) and code in GDS_AGB_Summaries.R. R Core Team, 2019, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Complete datasets and R scripts related to article published in Applied Vegetation Science: "Soil phosphorous availability determines the contribution of small, individual grassland remnants to the conservation of landscape-scale biodiversity"1. plot_metadata.csv - for all 162 vegetation plots the dataset provides a metadata on the geographical and environmental characteristics of the plot. 2. soil.csv - for all 162 vegetation plots the dataset provides the raw data on each of the four measured soil characteristics: Phosphorous, pH, Carbon and Nitrogen.3. vegdat.csv - for all 162 vegetation plots, this dataset provides the recorded abundance (ACFOR scale) of all 174 recorded species.4. planttraits.csv - for all 174 species, this dataset lists the level of habitat specialisation. 5. PlueBaeten_analyses.nb.html R scripts on the full statistical analyses of the data in html format6. PlueBaeten_analyses.Rmd R scripts on the full statistical analyses of the data in R Markdown format
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This folder contains R codes and datasets for sub-figures of Figure 2.Figure 2A codes make a matrix based on Supplemental Table 6.Figure 2B codes read in csv data "figure2b.csv".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset and scripts used for manuscript: High consistency and repeatability in the breeding migrations of a benthic shark.
Project title: High consistency and repeatability in the breeding migrations of a benthic shark
Date:23/04/2024
Folders:
- 1_Raw_data
- Perpendicular_Point_068151, Sanctuary_Point_068088, SST raw data, sst_nc_files, IMOS_animal_measurements, IMOS_detections, PS&Syd&JB tags, rainfall_raw, sample_size, Point_Perpendicular_2013_2019, Sanctuary_Point_2013_2019, EAC_transport
- 2_Processed_data
- SST (anomaly, historic_sst, mean_sst_31_years, week_1992_sst:week_2022_sst including week_2019_complete_sst)
- Rain (weekly_rain, weekly_rainfall_completed)
- Clean (clean, cleaned_data, cleaned_gam, cleaned_pj_data)
- 3_Script_processing_data
- Plots(dual_axis_plot (Fig. 1 & Fig. 4).R, period_plot (Fig. 2).R, sd_plot (Fig. 5).R, sex_plot (Fig. 3).R
- cleaned_data.R, cleaned_data_gam.R, weekly_rainfall_completed.R, descriptive_stats.R, sst.R, sst_2019b.R, sst_anomaly.R
- 4_Script_analyses
- gam.R, gam_eac.R, glm.R, lme.R, Repeatability.R
- 5_Output_doc
- Plots (arrival_dual_plot_with_anomaly (Fig. 1).png, period_plot (Fig.2).png, sex_arrival_departure (Fig. 3).png, departure_dual_plot_with_anomaly (Fig. 4).png, standard deviation plot (Fig. 5).png)
- Tables (gam_arrival_eac_selection_table.csv (Table S2), gam_departure_eac_selection_table (Table S5), gam_arrival_selection_table (Table. S3), gam_departure_selection_table (Table. S6), glm_arrival_selection_table, glm_departure_selection_table, lme_arrival_anova_table, lme_arrival_selection_table (Table S4), lme_departure_anova_table, lme_departure_selection_table (Table. S8))
Descriptions of scripts and files used:
- cleaned_data.R: script to extract detections of sharks at Jervis Bay. Calculate arrival and departure dates over the seven breeding seasons. Add sex and length for each individual. Extract moon phase (numerical value) and period of the day from arrival and departure times.
- IMOS_detections.csv: raw data file with detections of Port Jackson sharks over different sites in Australia.
- IMOS_animal_measurements.csv: raw data file with morphological data of Port Jackson sharks
- PS&Syd&JB tags: file with measurements and sex identification of sharks (different from IMOS, it was used to complete missing sex and length).
- cleaned_data.csv: file with arrival and departure dates of the final sample size of sharks (N=49) with missing sex and length for some individuals.
- clean.csv: completed file using PS&Syd&JB tags, note: tag ID 117393679 was wrongly identified as a male in IMOS and correctly identified as a female in PS&Syd&JB tags
file as indicated by its large size.
- cleaned_pj_data: Final data file with arrival and departure dates, sex, length, moon phase (numerical) and period of the day.
- weekly_rainfall_completed.R: script to calculate average weekly rainfall and correlation between the two weather stations used (Point perpendicular and Sanctuary point).
- weekly_rain.csv: file with the corresponding week number (1-28) for each date (01-06-2013 to 13-12-2019)
- weekly_rainfall_completed.csv: file with week number (1-28), year (2013-2019) and weekly rainfall average completed with Sanctuary Point for week 2 of 2017
- Point_Perpendicular_2013_2019: Rainfall (mm) from 01-01-2013 to 31-12-2020 at the Point Perpendicular weather station
- Sanctuary_Point_2013_2019: Rainfall (mm) from 01-01-2013 to 31-12-2020 at the Sanctuary Point weather station
- IDCJAC0009_068088_2017_Data.csv: Rainfall (mm) from 01-01-2017 to 31-12-2017 at the Sanctuary Point weather station (to fill in missing value for average rainfall of week 2 of 2017)
- cleaned_data_gam.R: script to calculate weekly counts of sharks to run gam models and add weekly averages of rainfall and sst anomaly
- cleaned_pj_data.csv
- anomaly.csv: weekly (1-28) average sst anomalies for Jervis Bay (2013-2019)
- weekly_rainfall_completed.csv: weekly (1-28) average rainfall for Jervis Bay (2013-2019_
- sample_size.csv: file with the number of sharks tagged (13-49) for each year (2013-2019)
- sst.R: script to extract daily and weekly sst from IMOS nc files from 01-05 until 31-12 for the following years: 1992:2022 for Jervis Bay
- sst_raw_data: folder with all the raw weekly (1:28) csv files for each year (1992:2022) to fill in with sst data using the sst script
- sst_nc_files: folder with all the nc files downloaded from IMOS from the last 31 years (1992-2022) at the sensor (IMOS - SRS - SST - L3S-Single Sensor - 1 day - night time – Australia).
- SST: folder with the average weekly (1-28) sst data extracted from the nc files using the sst script for each of the 31 years (to calculate temperature anomaly).
- sst_2019b.R: script to extract daily and weekly sst from IMOS nc file for 2019 (missing value for week 19) for Jervis Bay
- week_2019_sst: weekly average sst 2019 with a missing value for week 19
- week_2019b_sst: sst data from 2019 with another sensor (IMOS – SRS – MODIS - 01 day - Ocean Colour-SST) to fill in the gap of week 19
- week_2019_complete_sst: completed average weekly sst data from the year 2019 for weeks 1-28.
- sst_anomaly.R: script to calculate mean weekly sst anomaly for the study period (2013-2019) using mean historic weekly sst (1992-2022)
- historic_sst.csv: mean weekly (1-28) and yearly (1992-2022) sst for Jervis Bay
- mean_sst_31_years.csv: mean weekly (1-28) sst across all years (1992-2022) for Jervis Bay
- anomaly.csv: mean weekly and yearly sst anomalies for the study period (2013-2019)
- Descriptive_stats.R: script to calculate minimum and maximum length of sharks, mean Julian arrival and departure dates per individual per year, mean Julian arrival and departure dates per year for all sharks (Table. S10), summary of standard deviation of julian arrival dates (Table. S9)
- cleaned_pj_data.csv
- gam.R: script used to run the Generalized additive model for rainfall and sea surface temperature
- cleaned_gam.csv
- glm.R: script used to run the Generalized linear mixed models for the period of the day and moon phase
- cleaned_pj_data.csv
- sample_size.csv
- lme.R: script used to run the Linear mixed model for sex and size
- cleaned_pj_data.csv
- Repeatability.R: script used to run the Repeatability for Julian arrival and Julian departure dates
- cleaned_pj_data.csv
This resource contains the data and scripts used for: Goeking, S. A. and D. G. Tarboton, (2022). Spatially distributed overstory and understory leaf area index estimated from forest inventory data. Water. https://doi.org/10.3390/w1415241.
Abstract from the paper: Abstract: Forest change affects the relative magnitudes of hydrologic fluxes such as evapotranspiration (ET) and streamflow. However, much is unknown about the sensitivity of streamflow response to forest disturbance and recovery. Several physically based models recognize the different influences that overstory versus understory canopies exert on hydrologic processes, yet most input datasets consist of total leaf area index (LAI) rather than individual canopy strata. Here, we developed stratum-specific LAI datasets with the intent of improving the representation of vegetation for ecohydrologic modeling. We applied three pre-existing methods for estimating overstory LAI, and one new method for estimating both overstory and understory LAI, to measurements collected from a probability-based plot network established by the US Forest Service’s Forest Inventory and Analysis (FIA) program, for a modeling domain in Montana, MT, USA. We then combined plot-level LAI estimates with spatial datasets (i.e., biophysical and re-mote sensing predictors) in a machine learning algorithm (random forests) to produce annual gridded LAI datasets. Methods that estimate only overstory LAI tended to underestimate LAI relative to Landsat-based LAI (mean bias error ≥ 0.83), while the method that estimated both overstory and understory layers was most strongly correlated with Landsat-based LAI (r2 = 0.80 for total LAI, with mean bias error of -0.99). During 1984-2019, interannual variability of under-story LAI exceeded that for overstory LAI; this variability may affect partitioning of precipitation to ET vs. runoff at annual timescales. We anticipate that distinguishing overstory and understory components of LAI will improve the ability of LAI-based models to simulate how for-est change influences hydrologic processes.
This resource contains one CSV file, two shapefiles (each within a zip file), two R scripts, and multiple raster datasets. The two shapefiles represent the boundaries of the Middle Fork Flathead river and South Fork Flathead River watersheds. The raster datasets represent annual leaf area index (LAI) at 30 m resolution for the entire modeling domain used in this study. LAI was estimated using method LAI4, which produced separate overstory and understory LAI datasets. Filenames contain years, e.g., "LAI4_2019" is overstory LAI for 2019; "LAI4under_2019" is understory LAI for 2019.
The CSV files in this Resource contain annual time series of LAI and ET ratio (annual evapotranspiration divided by annual precipitation) for the South Fork Flathead River and Middle Fork Flathead River watersheds, 1984-2019. LAI methods represented in this time series are LAI1 and LAI4 from the paper. LAI1 consists of only overstory LAI, and LAI4 consists of overstory (LAI4), understory (LAI4_under), and total (LAI4_total) LAI. For each LAI estimation method, summary statistics of the entire watershed are included (min, first quartile, median, third quartile, and max).
The two R scripts (R language and environment for statistical computing) summarize Forest Inventory & Analysis (FIA) data from the FIA database (FIADB) to estimate LAI at FIA plots. 1) FIADB_queries_public.r: Script for compiling FIA plot measurements prior to estimating LAI 2) LAI_estimation_public: Script for estimating LAI at FIA plots using the four methods described in this paper
Before running the R scripts, users must obtain several FIADB tables (PLOT, COND, TREE, and P2VEG_SUBP_STRUCTURE; all four tables must be renamed with lower-case names, e.g., "plot"). These tables can be obtained using one of two methods: 1) By downloading CSV files for the appropriate U.S. state(s) from the FIA DataMart (https://apps.fs.usda.gov/fia/datamart/datamart.html). If this method is used, the CSV files must be imported (read) into R before proceeding. 2) By using r package 'rFIA' to download the tables from FIADB for the U.S. state(s) of interest.
Note that publicly available plot coordinates are accurate within 1 km and are not true plot locations, which are legally confidential to protect the integrity of the sample locations and the privacy of landowners. Access to true plot location data requires review by FIA's Spatial Data Services unit, who can be contacted at SM.FS.RMRSFIA_Help@usda.gov.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and code used for the study : "From cultivar mixtures to allelic mixtures: opposite effects of allelic richness between genotypes and genotype richness in wheat".
The script "Manuscript_Analyses.R" contains all code for the statistical analysis presented in the manuscript (main text & supplementary information). This script uses files produced in the folder "Locus-by-locus analysis" as inputs, and "manhattan_custom.R" as a source function ("manhattan_custom.R" is used to highlight SNPs in a given interval and to write specified SNPs name on manhattan plots). The file "Traits_monocultures.csv" contains the 20 functional traits measured on the 179 monoculture plots (see Supplementary Methods for more information on trait measurement). This file is used as an input in the script "Manuscript_Analyses.R".
The "Locus-by-locus analysis" folder contains all analyses conducted to test the effect of allelic richness on the four variables of interest: Grain Yield (GY, g/m²), Spike Number per m² (SNb, nb spikes/m²), Thousand Kernel Weight (TKW, g), and Septoria tritici blotch (STB) severity. The locus-by-locus analysis is performed with the script "Allelic_richness_locus_by_locus_analysis.R". This analysis generates a list of .csv files with one file per chromosome. Each file contains the pvalues and estimated effect sizes of the tested SNPs for the given chromosome. These output files are stored in folders named after the variables for which the effect of allelic richness was tested ("RAW_GY", "RAW_SNb", "RAW_TKW", and "RAW_severity"). The script "Allelic_richness_locus_by_locus_output_processing.R" combines all .csv files into a single dataframe and produces three diagnostic plots: Manahattan plots, histograms of p-value distributions, and p-value q-q plots. p-value thresholds were computed based on a Family-Wise Error Rate of 5% using the Galwey correction. This is done in the "pvalue_thresholds" folder with the "Meff_computation.R" script. "Meff_computation.R" uses the "Meff_function.R" as a source function and generates "GY_thresholds.csv" and "STB_thresholds.csv" as outputs (these files contains different thresholds computed according to different methods but we only retained the Galwey method (most recent) for the analyses. Since GY, SNb, and TKW were analyzed with the same number of SNPs (~19K), we used the same significance threshold for the three variables ("GY_thresholds.csv"), whereas we computed a different thresholds for STB ("STB_thresholds.csv") for which we could only include ~6K SNPs in the analysis. The "geno_pos.csv" file contains the physical positions of the SNPs.
Upstream the locus-by-locus analysis, phenotypic and genotypic files are prepared in the "Phenoytpic file preparation" and "Genotypic file preparation" folders, respecively.
The phenotypic file preparation includes the correction of yield-related variables (GY, SNb, and TKW) for spatial auto-correlation in the "Spatial_analyses_YLD_variables" folder, and the computation of plot-level variables from individual-level variables with the "Allelic_richness_phenotypic_file_prep.R" script. In this script, we compute both absolute plot values (termed "RAW_...) and relative plot values (termed "RYT_..., only for mixture plots). All phenotypic files have the same structure with the same first 6 columns: "focal" = identity of the focal genotype (the one for which the variable is measured, only relevant for variables measured at the individual-level), "neighbor" = identity of the neighbor genotype (the neighbor of the genotype for which the variable is measured, only relevant for variables measured at the individual-level), "pair" = identity of the genotypic pair (combines the identity of the focal and the neighbor genotypes), "assoc" = type of plot ("M" = monoculture or pure stand plot, "P" = mixture plot), "row" = position of the plot along the smallest dimension of the grid (see Figure 1), "column" = position of the plot along the largest dimension of the grid (see Figure 1).
The genotypic file preparation is done with the "Allelic_richness_genotypic_file_prep.R" script and includes SNP filtering, computation of matrices of allelic richness, and computation of matrices of genetic similarity between genotypic pairs. The analysis is done separatly for yield-related variables and for STB severity since the two types of variable were not measured on the same set of plots.
Cover crops provide many agroecosystem services, including weed suppression, which is partially exerted through release of allelopathic benzoxazinoid (BX) compounds. This research characterizes (1) changes in concentrations of BX compounds in shoots, roots, and soil at three growth stages (GS) of cereal rye (Secale cereale L.), and (2) their degradation over time following termination. Concentrations of shoot dominant BX compounds, DIBOA-glc and DIBOA, were least at GS 83 (boot). The root dominant BX compound, HMBOA-glc, concentration was least at GS 54 (elongation). Rhizosphere soil BX concentrations were 1000 times smaller than in root tissues. Dominant compounds in soil were HMBOA-glc and HMBOA. Concentrations of BX compounds were similar for soil near root crowns and between-rows. Soil BX concentrations following cereal rye termination declined exponentially over time in three of four treatments: incorporated shoots (S) and roots (R), no-till S+R (cereal rye rolled flat), and no-till R (shoots removed), but not in no-till S. On the day following cereal rye termination, soil concentrations of HMBOA-glc and HMBOA in these three treatments increased above initial concentrations. Concentrations of these two compounds decreased the fastest while DIBOA-glc declined the slowest (half-life of 4 d in no-till S+R soil). Placement of shoots on the surface of an area where cereal rye had not grown (no-till S) did not increase soil concentrations of BX compounds. The short duration and complex dynamics of BX compounds in soil prior to and following termination illustrate the limited window for enhancing weed suppression by cereal rye allelochemicals; valuable information for programs breeding for enhanced weed suppression. In addition to the data analyzed for this article, we also include the R code. Resources in this dataset:Resource Title: BX data following termination. File Name: FinalBXsForMatt-20200908.csvResource Description: For each sample, gives the time, depth, location, and plot treatment, and then the compound concentrations. This is the principal data set analyzed with the R (anal2-cleaned.r) code, see that code for use.Resource Title: BX compounds from 3rd sampling time before termination. File Name: soil2-20201123.csvResource Description: These data are for comparison with the post termination data. They were taken at the 3rd sampling time (pre-termination), a day prior to termination. Each sample is identified with a treatment, date, and plot location, in addition to the BX concentrations. See R code (anal2-cleaned.r) for how this file is used.Resource Title: Soil location (within row versus between row) values of BX compounds. File Name: s2b.csvResource Description: Each row gives the average BX compound for each soil location (within row versus between row) for the second sample for each plot. These data are combined with bx3 (the data set read in from the file , "FinalBXsForMatt-20200908.csv"). See R (anal2-cleaned.r) code for use.Resource Title: R code for analysis of the decay (post-termination) BX data.. File Name: anal2-cleaned.rResource Description: This is the R code used to analyze the termination data. It also creates and writes out some data subsets (used for analysis and plots) that are later read in.Resource Software Recommended: R version 3.6.3,url: https://www.R-project.org/ Resource Title: Tissue BX compounds. File Name: tissues20210728b.csvResource Description: Data file holding results from a tissue analysis for BX compounds, in ug, from shoots and roots, and at various sampling times. Read into the R file, anal1-cleaned.r where it is used in a statistical analysis and to create figures.Resource Title: BX compounds from soil with a live rye cover crop. File Name: soil2-20201214.csvResource Description: BX compounds (in ng/g dry wt), by treatment, sampling time, date, and plot ID. These are data are read into the R program, anal1-cleaned.r, for analysis and to create figures. These are soil samples taken from locations with a live rye plant cover crop.Resource Title: R code for BX analyses of soil under rye and plant tissues. File Name: anal1-cleaned.rResource Description: R code for analysis of the soil BX compounds under a live rye cover crop at different growing stages, and for the analysis of tissue BX compounds. In addition to statistical analyses, code in this file creates figures, also some statistical output that is used to create a file that is later read in for figure creation (s2-CLD20220730-Stage.csv).Resource Software Recommended: R version 3.6.3,url: https://www.R-project.org/ Resource Title: Description of data files for anal2-cleaned.r. File Name: readme2.txtResource Description: Describes the input files used in the R code in anal2-cleaned.r, including descriptions and formats for each field. The file also describes some output (results) files that were uploaded to this site. This is a plain ASCII text file.Resource Title: Estimates produced by anal2-cleaned.r from statistical modeling.. File Name: Estimates20201110.csvResource Description: Estimates produced by anal2-cleaned.r from statistical modeling (see readme2.txt)Resource Title: Summary statistics from anal2-cleaned.r. File Name: CV20210412.csvResource Description: Summary statistics from anal2-cleaned.r, used for plotsResource Title: Data summaries (same as CV20210412.csv), rescaled. File Name: RESCALE-20210412.csvResource Description: Same as "CV20210412.csv" except log of data have been rescaled to minimum at least zero and maximum one, see readme2.txtResource Title: Statistical summaries for different stages. File Name: s2-CLD20220730-Stage.csvResource Description: Statistical summaries used for creating a figure (not used in paper), used in anal1-cleaned.r; data for soil BX under living rye.Resource Title: Description of data files for anal1-cleaned.r. File Name: readme1.txtResource Description: Contains general descriptions of data imported into anal1-cleaned.r, and a description of each field. Also contains some descriptions of files output by anal1-cleaned.r, used to create tables or figures.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A routine was developed in R ('bathy_plots.R') to plot bathymetry data over time during individual CEAMARC events. This is so we can analyse benthic data in relation to habitat, ie. did we trawl over a slope or was the sea floor relatively flat. Note that the depth range in the plots is autoscaled to the data, so a small range in depths appears as a scatetring of points. As long as you look at the depth scale though interpretation will be ok.
The R files need a file of bathymetry data in '200708V3_one_minute.csv' which is a file containing a data export from the underway PostgreSQL ship database and 'events.csv' which is a stripped down version of the events export from the ship board events database export. If you wish to run the code again you may need to change the pathnames in the R script to relevant locations. If you have opened the csv files in excel at any stage and the R script gets an error you may need to format the date/time columns as yyyy-mm-dd hh;mm:ss, save and close the file as csv without opening it again and then run the R script.
However, all output files are here for every CEAMARC event. Filenames contain a reference to CEAMARC event id. Files are in eps format and can be viewed using Ghostview which is available as a free download on the internet.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This database contains nine spreadsheets titled: “2011_TMON_All.csv,” “2014_TMON_All.csv,” “2020_TMON_All.csv,” “Historical_Covers.csv,” “Modern_Covers_01.csv,” “Modern_Sommes.csv,” “Modern_Sommes_PresAbs.csv,” “Combined_HistModern.csv,” and “Chi_Comparison.csv.” There are also two R code files titled: “Management_Analysis_Dataverse” and “Monitored_Analysis_Dataverse.” Both R code files were compiled in R version 4.2.1 (2022-06-23 ucrt) -- "Funny-Looking Kid.” Together, the three files “2011_TMON_All.csv,” “2014_TMON_All.csv,” and “2020_TMON_All.csv,” serve as the raw data compiled together in the R code file “Monitored_Analysis_Dataverse” for the Monitored dataset described in the publication. Metadata for the included nine spreadsheets is below: Historical_Covers.csv - Historical cover classes of the species listed in this dataset were extracted from McCormick and Somes (1982) classification of plant communities in the 1972 Maryland Wetland Maps (https://geodata.md.gov/imap/rest/services/Hydrology/MD_WetlandMaps1972/MapServer). The maps provided codes for the wetland plant communities present at the sites based on interpretation of natural-color stereoscopic aerial photographs. The plant communities were verified by field-sampling. Column 1 – Site: Unique site-plot identification code Column 2 – Code: Community code assigned by McCormick and Somes (1982, see citation and access information below) Column 3 – Dominants: Four-letter codes for dominant species present in the plant communities, per the community codes assigned by McCormick and Somes (1982) Column 4 – Treatment: Indicates whether plot was under Short-Term or Continuous Management Columns 5-15: Column names are four-letter species codes for species present at each unique site-plot combination. A value of 1 in a cell indicates that the species was present in McCormick and Somes’ (1982) dataset, and a missing value indicates that the species was not present in the 1982 dataset. Modern_Covers.csv - Modern cover classes of species were sampled between September and October 2022 from 32 individual tidal wetland sites where Phragmites australis had been treated using herbicides. Column 1 – Site: Unique site-plot identification code Column 2 – Treatment: Indicates whether plot was under Short-Term or Continuous Management Column 3 – Phrag_Cover: Phragmites australis cover class as defined by the Braun-Blanquet method. Column 4 – Spp_Rich: Number of species present in the unique site-plot. Columns 5-37: Column names are four-letter species codes for species present at each unique site-plot combination. The number present in the cell indicates the species’ cover class, as defined by the Braun-Blanquet method (1 = trace, 2 = 1-5%, 3 = 5-25%, 4 = 25-50%, 5 = 50-75%, 6 = 75-100%), in the unique site-plot. Modern_Somes.csv - Modern cover classes of species were sampled between September and October 2022 from 32 individual tidal wetland sites where Phragmites australis had been treated using herbicides. Column 1 – Site: Unique site-plot identification code Column 2 – Treatment: Indicates whether plot was under Short-Term or Continuous Management Columns 3-13: Column names are four-letter species codes for species present at each unique site-plot combination. The modern species present are constrained to only the species described in McCormick and Somes’ (1982) community codes. The number present in the cell indicates the species’ cover class, as defined by the Braun-Blanquet method (1 = trace, 2 = 1-5%, 3 = 5-25%, 4 = 25-50%, 5 = 50-75%, 6 = 75-100%), in the unique site-plot. Modern_Somes_PresAbs.csv - Modern cover classes of species were sampled between September and October 2022 from 32 individual tidal wetland sites where Phragmites australis had been treated using herbicides. Column 1 – Site: Unique site-plot identification code Column 2 – Treatment: Indicates whether plot was under Short-Term or Continuous Management Columns 3-13: Column names are four-letter species codes for species present at each unique site-plot combination. The modern species present are constrained to only the species described in McCormick and Somes’ (1982) community codes. A value of 1 in a cell indicates that the species was present in the 2022 survey, and a missing value indicates that the species was not present. Combined_HistModern.csv - Historical cover classes for the species listed in this dataset were extracted from McCormick and Somes (1982) classification of plant communities in the 1972 Maryland Wetland Maps (https://geodata.md.gov/imap/rest/services/Hydrology/MD_WetlandMaps1972/MapServer). The maps provided codes for the wetland plant communities present at the sites based on interpretation of natural-color stereoscopic aerial photographs. The plant communities were verified by field-sampling. Modern cover classes of species were sampled between September and October 2022 from 32 individual tidal wetland sites where Phragmites australis had been treated using...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NB-IoT vs. LTE-M: Measurement Data of the Energy Consumption of LPWAN Technologies
This dataset contains the raw energy measurements as well as R scripts to reproduce the energy consumption plot for the corresponding paper.
Each .csv file contains a specific set of measurements and we provide a script to read, process and plot the contained data.
Figure 3
Mean energy consumption of the different phases for Authentication for NB-IoT and LTE-M.
Due to the fact that the duration of Idle Connected in the measurement scripts was 30 seconds and 60 seconds for Idle Not Connected, the D-value and the mean power consumption are divided by 2.
Data – energy_measurements_fig3.csv
Code – fig3.R
Figure 4
Mean energy consumption of the different phases for Data Connection and Download for NB-IoT and LTE-M for 1KB of data in HTTP.
The delay between the measurements for Figure 4 were all 30 seconds long, but the identified Standby and Idle phases have different lengths. Therefore, the Idle phase values for both access technologies have been normalized and calculated for 20 seconds each.
Data – energy_measurements_fig4.csv
Code – fig4.R
Figure 5
Mean energy consumption of the different phases for Data Connection and Download for HTTP and MQTT for 1KB of data in NB-IoT.
In this scenario the delay between the measurements were different again. For MQTT the delay was 150 seconds and for HTTP 30 seconds. Therefore, the data during the Idle and Standby (only for MQTT) phase is normalized and calculated for 20 seconds and 10 seconds, respectively. During the MQTT Idle phase measurements, the device disconnects. This is not taken into account for the evaluation, which is why these energy values are discarded for this figure.
Data – energy_measurements_fig5.csv
Code – fig5.R
Contact
For questions or issues with this code, please contact Viktoria Vomhoff (viktoria.vomhoff@uni-wuerzburg.de) or any of the authors of the related publication.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Files “Herbs 160620.csv” and “Shrubs 160620.csv” contain data on the herbaceous and woody, respectively, vascular plant community composition in 2012 of plots in the portion of the Palmerton Zinc Superfund Site (Palmerton, PA, USA) amended with different compost types and seed mixes in 2003. Detailed methods are described in Dietterich and Casper (2016). Column headings are described below.
Date: date of sampling.
Plot: one-acre plot to which a single combination of experimental treatments was applied.
Transect: transect, nested within plot, along which sampling quadrats were established.
Quadrat: quadrat, nested within transect, in which species identity and percent cover were assessed. Quadrats were 1 m2 squares for herbaceous species measurements, and 100 m2 circles for woody species measurements.
Shape: shape of the plot. R denotes 32 x 128 m rectangular plots, and S denotes 64 x 64 m square plots.
Herb.Shrub: Denotes whether a given quadrat was used to assess herbaceous (“Herb”) or woody (“Shrub”) vegetation.
Seed: Seed mixture applied to the plot. “C4,” “Annual,” and “Perennial” correspond to “2003 Seed Mix” 1, 2, and 3, respectively, in Table 1 of Dietterich and Casper (2016).
Compost: Compost type applied to the plot. “Lehigh” denotes Lehigh County compost, “Duck” denotes duck manure, “Straw” denotes straw mulch, “Mushroom” denotes mushroom compost, and “Sludge” denotes sewage sludge.
All other columns denote the percent cover of a given species recorded in that quadrat. Species abbreviations are elaborated in Table S1 of Dietterich and Casper (2016).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data are a harmonized collation of vegetation survey plot datasets from across Australia, representing 205,084 plots. Source data were obtained from the main custodians of vegetation plot information in Australia, being primarily state/territory and federal government agencies. The collated source data were harmonised to a common structured data format through customised scripts written in R. The data provided here are those for which the source data licenses enable adaptation and sharing, though data were also collated and harmonised from a number of sources that could not be included in this data release due to license restrictions. Important methodological attributes have been incorporated where possible, including the taxonomic scope of each survey and the size/shape of the survey plots used. This harmonised dataset may enable a wide variety of analyses that improve our understanding of Australian plant diversity and vegetation patterns. Lineage: Given the absence of an existing large-scale harmonised plant community survey plot dataset for Australia, we obtained and collated data primarily from state/territory and federal agencies, that are custodians of the largest survey datasets. In some cases we downloaded data directly from publicly accessible websites, while for others we required assistance and permission to obtain relevant data.
Given the different formats of the source data, we developed a simple, customised and structured data format to harmonise across sources, based broadly on the Veg-X schema (Wiser, et al. 2011), with existing standards for data fields used wherever possible (Veg-X, Darwin Core). Source data were harmonised to the common format using a customised script in R. Taxonomic nomenclature was standardised to the Australian Plant Census (CHAH 2022), using code adapted from Falster et al. (2021), with only vascular plant species retained.
Key methodological aspects of the component datasets were also incorporated, including the taxonomic scope of the vegetation survey (e.g. all vascular plants , dominant species only) and the size / configuration of the plot that was surveyed. Obtaining such information often involved identifying publications related to the data and cataloguing the methods described.
Data products The data were formatted and prepared into the following files, linked by common identifiers: •\tproject.csv – this file describes attributes of the projects undertaken to sample survey plots. Most projects are associated with surveying multiple plots over space and time, using a common methodology. •\tplot.csv – this file describes attributes of the plots that have been surveyed. A plot is a fixed area/location in space, that may be surveyed at one or more times, for one or more attributes (e.g. plant species, soil attributes). •\tplotObservation.csv – this file describes the attributes associated with the observations taken at a plot at a particular time (date). There may be multiple plot observations for a single plot. •\taggregateOrganismObservation.csv – this file describes the attributes of plant species observed in a specific plot observation, such as the species observed and any measure of abundance that was made. •\taggregateSoilObservation.csv – this file describes the attributes of the soil that were made in a specific plot observation. •\tspeciesAttributes.csv – this file describes the attributes associated with species names included in the dataset, where the scientific name for each plant species is that accepted by the Australian Plant Census (CHAH 2022).
The contents (data fields) of each file listed above are described in the file: HAVPlot_Data_Format.csv
The sources of the plot data provided here are shown in the file: HAVPlot_source_citations.docx . Use of the data provided here should comply with the data license conditions of the source data.
A coded example (using R) for combining and manipulating the component HAVPlot data files is provided in the file: HAVPlot_data_query_example_R_code.R
Summary The HAVPlot data comprise 213,101 observations across 205,084 plots. A summary of the full HAVPlot data are also available in Mokany et al. (2022).
References CHAH (2022). Australian Plant Census, Centre of Australian National Biodiversity Research. Council of Heads of Australasian Herbaria (CHAH). https://id.biodiversity.org.au/tree/51354547. Falster, D., et al. 2021. AusTraits, a curated plant trait database for the Australian flora. - Scientific Data 8: 254. Mokany, K., et al. 2022. Patterns and drivers of plant diversity across Australia. – Ecography e06426. https://doi.org/10.1111/ecog.06426 Wiser, S. K., et al. 2011. Veg-X - an exchange standard for plot-based vegetation data. - Journal of Vegetation Science 22: 598-609.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Categorical scatterplots with R for biologists: a step-by-step guide
Benjamin Petre1, Aurore Coince2, Sophien Kamoun1
1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK
Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.
Protocol
• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.
• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.
• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.
Notes
• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.
• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.
replicates
graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()
References
Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.
Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035
Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128