Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
This dataset is the repository for the following paper submitted to Data in Brief:
Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).
The Data in Brief article contains the supplement information and is the related data paper to:
Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).
Description/abstract
The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.
Folder structure
The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:
“code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.
“MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.
“mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).
“yield_productivity” contains .csv files of yield information for all countries listed above.
“population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).
“GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.
“built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.
Code structure
1_MODIS_NDVI_hdf_file_extraction.R
This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.
2_MERGE_MODIS_tiles.R
In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").
3_CROP_MODIS_merged_tiles.R
Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS.
The repository provides the already clipped and merged NDVI datasets.
4_TREND_analysis_NDVI.R
Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.
To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.
5_BUILT_UP_change_raster.R
Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.
6_POPULATION_numbers_plot.R
For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.
7_YIELD_plot.R
In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.
8_GLDAS_read_extract_trend
The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection).
Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.
From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).
From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.
Facebook
TwitterZooplankton Meter3 Data - MOCNESS Only
The Zooplankton Meter3 Database for the Georges Bank GLOBEC project was originally located in the laboratory of Ted Durbin at the Graduate School of Oceanography, University of Rhode Island. It was accessed via the U.S. GLOBEC Georges Bank data management system using SQLPlus network access to the data base management system at URI. The data were cached and are served from the local computer.
A description of the original URI database is available online and includes the design and variable definitions. A version of this document is shown here.
Note: Our program's Data Acknowledgement Policy requires that any person making substantial use of a data set must communicate with the investigators who acquired the data prior to publication and anticipate that the data collectors will be co-authors of published results.
The following documentation applies to the data found locally on the WHOI GLOBEC Data Server.
The data are served as a hierarchy. The least changing variables are in higher order levels (e.g., cruise id, year, month, etc.), while variables that change the most are in the lower order levels (e.g., time of collection, net number, taxon collected, etc.) There are six levels within the database; variable names and descriptions are given in the metadata.
Most column variable names and instrument names were taken from the U.S. GLOBEC Georges Bank data thesaurus; those that were not follow the GLOBEC data protocols. The taxonomic code variable (taxon_code) is from the National Oceanographic Data Center's Taxonomic List, version 8. Taxonomic information is built into these ten-digit codes as they reflect the systematic nomenclature.
You may contact BCO-DMO for additional help.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supersedes all earlier versions of 'Predicted Near Future Climate Change Impacts on the HGL of the ACT'. It incorporates HGL boundary and management area edits based on updated soil landscape mapping for the ACT.\r \r The focus of this dataset is climate change impacts in the Australian Capital Territory. It contains digital spatial data developed to assist in land management decision making in the ACT. \r The dataset contains an assessment of climate change impacts on 14 variables defined by the NARCliM (NSW/ACT Regional Climate Modelling) project for three selected regional climate projection ensembles (multimodel mean, CCCMA3.1-R2, ECHAM5-R3). Only near-future (1990-2009 to 2020-2039) projections were considered.\r Each variable was considered using annual and seasonal time periods. Field names in the dataset follow the following format:\r \r Field name = MODEL_NARCliM VARIABLE_TIME PERIOD\t\r \r Values for each element of the field name are summarised as follows:\r \r MODEL (Near future - 1990-2009 to 2020-2039)\t\r C – Consensus (NARCliM Multimodel Consensus Scenario)\t\r W – Wetter (NARCliM CCCMA3.1-R2 Wetter Scenario)\t\r D – Drier (NARCliM ECHAM5-R3 Drier Scenario)\t\t\t\t\r \t\t\t\t\t\t\t\r NARCliM VARIABLE\t\r FFDI – Forest fire danger index \r FF50 – Forest fire danger index above 50\t\r FFBC – Forest fire danger index bias corrected\t\r FFBC50 – Forest fire danger index bias corrected above 50\t\r PRAC – Precipitation\t\r PRACBC – Precipitation bias corrected\t\r TAME – Temp mean\t\r TAMX – Temp max \r TAMN – Temp min \r TAMXBC – Temp max bias corrected \r TAMNBC – Temp min bias corrected\t\r TAMX35 – Temp max bias corrected over 35\t\r TAMN2 – Temp min bias corrected below 2 \r WSSM – Wind speed\t\r \t\t\t\t\t\t\t\r TIME PERIOD \r A – Annual \r D – DJF \r M – MAM \r J – JJA \r S – SON\r \r Hydrogeological landscape (HGL) unit boundaries developed as part of the broader ACT Hydrogeological Landscapes (HGL) Framework project where used to constrain the outputs for this climate change assessment in the ACT. In all, there are 25 HGL defined. A weighted mean was used to calculate values for each HGL unit based on the proportions of corresponding 10km gridded data from the NARCliM data set.\r \r Spatial resolution for this dataset is 1:50 000.
Facebook
TwitterhistoricalMisc is an experiment of the CMIP5 - Coupled Model Intercomparison Project Phase 5 ( https://pcmdi.llnl.gov/mips/cmip5 ). CMIP5 is meant to provide a framework for coordinated climate change experiments for the next five years and thus includes simulations for assessment in the AR5 as well as others that extend beyond the AR5. Experiment design: https://pcmdi.llnl.gov/mips/cmip5/experiment_design.html List of output variables: https://pcmdi.llnl.gov/mips/cmip5/datadescription.html Output: time series per variable in model grid spatial resolution in netCDF format Earth System model and the simulation information: CIM repository Entry name/title of data are specified according to the Data Reference Syntax ( https://pcmdi.llnl.gov/mips/cmip5/docs/cmip5_data_reference_syntax.pdf ) as activity/product/institute/model/experiment/frequency/modeling realm/MIP table/ensemble member/version number/variable name/CMOR filename.nc . Used forcings in individual ensemble runs (see attached addinfo for more information): r[1-5]i1p2 - LU: land-use change; r[1-5]i1p3 - SI: solar irradiance; r[1-5]i1p4 - AA: anthropogenic aerosols (a mixture of aerosols, not explicitly defined here).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data is used for examination of inflation- unemployment relationship for 18 countries after 1991. Inflation data is obtained from World Bank database (https://data.worldbank.org/indicator/FP.CPI.TOTL.ZG) and unemployment data is obtained from International Labor Organization (http://www.ilo.org/wesodata/).
Analysis period is different for all countries because of structural breaks determined by single point change point detection algorithm included in changepoint package of Killick & Eckley (2014). Granger-causality is conducted with Toda&Yamamoto (1995) procedure. Integration levels are determined with 3 stationary tests. VAR models are run with vars package (Pfaff, Stigler & Pfaff; 2018) without trend and constant terms. Cointegration test is conducted with urca package (Pfaff, Zivot, Stigler & Pfaff; 2016).
All data files are .csv files. Analyst need to change country index (variable name: j) in order to see individual results. Findings can be seen in the article.
Killick, R., & Eckley, I. (2014). changepoint: An R package for changepoint analysis. Journal of statistical software, 58(3), 1-19.
Pfaff, B., Stigler, M., & Pfaff, M. B. (2018). Package ‘vars’. Online] https://cran. r-project. org/web/packages/vars/vars. pdf.
Pfaff, B., Zivot, E., Stigler, M., & Pfaff, M. B. (2016). Package ‘urca’. Unit root and cointegration tests for time series data. R package version, 1-2.
Toda, H. Y., & Yamamoto, T. (1995). Statistical inference in vector autoregressions with possibly integrated processes. Journal of econometrics, 66(1-2), 225-250.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Zooplankton Meter2 Data from GSO/URI - Bongo Nets only
The Zooplankton Meter2 Database for the Georges Bank GLOBEC project was originally located in the laboratory of Ted Durbin at the Graduate School of Oceanography, University of Rhode Island. It was accessed via the U.S. GLOBEC Georges Bank data management system using SQLPlus network access to the data base management system at URI. Data were cached and are served from the local computer.
A description of the original URI database is online and includes the design and variable definitions. A version of this document is shown http://globec.whoi.edu/globec-dir/data_doc/zoo_square_meter_URI.html"> here.
Note: Our program's Data Acknowledgement Policy requires that any person making substantial use of a data set must communicate with the investigators who acquired the data prior to publication and anticipate that the data collectors will be co-authors of published results.
The following documentation applies to the data found locally on the WHOI GLOBEC Data Server:
The data is served as a hierarchy. The least changing variables are in higher order levels (e.g., cruise id, year, month, etc.), while variables that change the most are in the lower order levels (e.g., time of collection, net number, taxon collected, etc.) There are six levels within the data; variable names and descriptions are given in the metadata.
Most column variable names and instrument names were taken from the U.S. GLOBEC Georges Bank data thesaurus; those that were not follow the GLOBEC data protocols. The taxonomic code variable (taxon_code) is from the National Oceanographic Data Center's Taxonomic List, version 8. Taxonomic information is built into these ten-digit codes as they reflect the systematic nomenclature.
You may contact BCO-DMO for additional help.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides 30-year averaged climate data for both historical and future periods, with a spatial resolution of 0.01° × 0.01°. Historical data (1991–2020) are based on the China Surface Climate Standard Dataset and were interpolated using ANUSPLIN software. Future climate data are derived from CMIP6 simulations, bias-corrected using the Delta downscaling method. The dataset includes 10 models (9 Global Climate Models, namely, GCMs, and 1 ensemble model), 3 scenarios (SSP1-2.6, SSP2-4.5, and SSP5-8.5), and 3 future periods (2021–2040, 2041–2070, 2071–2100). For each period (or scenario), 28 climate variables are provided, including: 5 monthly basic climate variables (mean temperature, maximum temperature, minimum temperature, precipitation, and percentage of sunshine), and 23 bioclimatic variables based on the basic variables (for details, see the dataset documentation file).The data quality was strictly evaluated. The ANUSPLIN interpolated historical data showed a strong correlation with observations (all correlation coefficients above 0.91). The historical interpolations generated by the ANUSPLIIN software showed a good fit (above 0.91) with observations. The bias correction improved the accuracy of most GCM original simulations, reducing the bias by 0.69%–58.63%. This dataset aims to provide high-resolution, bias-corrected long-term historical and future climate data for climate and ecological research. All computations were performed using R, and the corresponding code can be found in the dataset folder: “Code”.All data are provided in GeoTIFF (.tif) format, where each file for the basic climate variables contains 12 bands, representing monthly data in ascending order (e.g., Band 1 corresponds to January). To facilitate data storage, all files are provided in compressed archives, following a consistent naming convention:(1) Historical data: China_Variable_1km_1991–2020.tifWhere, Variable represents the abbreviation of the 28 climate variables.Example: China_pr_1km_1991–2020.tif.(2) Future data: China_Variable_Model_VariantLabel_1km_StartYear-EndYear_Scenario.tifWhere, Variable is the 28 climate variables; Model is the GCM name; VariantLabel is r1i1p1f1 in this study; StartYear-EndYear is the future period; Scenario is the SSP climate scenarioExample: China_tasmin_MRI-ESM2-0_r1i1p1f1_1km_2071–2100_SSP585.tif.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
!!!WARNING~~~This dataset has a large number of flaws and is unable to properly answer many questions that people generally use it to answer, such as whether national hate crimes are changing (or at least they use the data so improperly that they get the wrong answer). A large number of people using this data (academics, advocates, reporting, US Congress) do so inappropriately and get the wrong answer to their questions as a result. Indeed, many published papers using this data should be retracted. Before using this data I highly recommend that you thoroughly read my book on UCR data, particularly the chapter on hate crimes (https://ucrbook.com/hate-crimes.html) as well as the FBI's own manual on this data. The questions you could potentially answer well are relatively narrow and generally exclude any causal relationships. ~~~WARNING!!!Version 8 release notes:Adds 2019 dataVersion 7 release notes:Changes release notes description, does not change data.Version 6 release notes:Adds 2018 dataVersion 5 release notes:Adds data in the following formats: SPSS, SAS, and Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Adds data for 1991.Fixes bug where bias motivation "anti-lesbian, gay, bisexual, or transgender, mixed group (lgbt)" was labeled "anti-homosexual (gay and lesbian)" prior to 2013 causing there to be two columns and zero values for years with the wrong label.All data is now directly from the FBI, not NACJD. The data initially comes as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. Version 4 release notes: Adds data for 2017.Adds rows that submitted a zero-report (i.e. that agency reported no hate crimes in the year). This is for all years 1992-2017. Made changes to categorical variables (e.g. bias motivation columns) to make categories consistent over time. Different years had slightly different names (e.g. 'anti-am indian' and 'anti-american indian') which I made consistent. Made the 'population' column which is the total population in that agency. Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), made all character values lower case, reordered columns. I also generated incident month, weekday, and month-day variables from the incident date variable included in the original data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CitiesGOER is a database that provides environmental data for 52,602 cities and 48 environmental variables, including 38 bioclimatic variables, 8 soil variables and 2 topographic variables. Data were extracted from the same 30 arc-seconds global grid layers that were prepared when making the TreeGOER (Tree Globally Observed Environmental Ranges) database that is available from https://doi.org/10.5281/zenodo.7922927. Details on the preparations of these layers are provided by Kindt, R. (2023). TreeGOER: A database with globally observed environmental ranges for 48,129 tree species. Global Change Biology, 00, 1–16. https://onlinelibrary.wiley.com/doi/10.1111/gcb.16914. CitiesGOER was designed to be used together with TreeGOER and possibly also with the GlobalUsefulNativeTrees database (Kindt et al. 2023) to allow users to filter suitable tree species based on environmental conditions of the planting site.
The identities and coordinates of cities were sourced from a data set with information for cities with a population size larger than 1000 that was created by Opendatasoft and made available from https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/table/?disjunctive.cou_name_en&sort=name. The data was downloaded on 22-JULY-2023 and afterwards filtered for cities with a population of 5000 or above. Cities where information on the country was missing were removed. The coordinates of cities were used to extract the environmental data via the terra package (Hijmans et al. 2022, version 1.6-47) in the R 4.2.1 environment.
Update 2023.08 provided median values from 23 Global Climate Models (GCMs) for Shared Socio-Economic Pathway (SSP) 1-2.6 and from 18 GCMs for SSP 3-7.0, both for the 2050s (2041-2060). Similar methods were used to calculate these median values as in the case studies for the TreeGOER manuscript (calculations were partially done via the BiodiversityR::ensemble.envirem.run function and with downscaled bioclimatic and monthly climate 2.5 arc-minutes future grid layers available from WorldClim 2.1).
The locations of the 52,602 cities are mapped in one of the series available from the TreeGOER Global Zones atlas that can be obtained from https://doi.org/10.5281/zenodo.8252756.
When using CitiesGOER in your work, cite this depository and the following:
The development of CitiesGOER was supported by the Darwin Initiative to project DAREX001 of Developing a Global Biodiversity Standard certification for tree-planting and restoration, by Norway’s International Climate and Forest Initiative through the Royal Norwegian Embassy in Ethiopia to the Provision of Adequate Tree Seed Portfolio project in Ethiopia, and by the Green Climate Fund through the IUCN-led Transforming the Eastern Province of Rwanda through Adaptation project.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 15 release notes:Adds 2021 data.Version 14 release notes:Adds 2020 data. Please note that the FBI has retired UCR data ending in 2020 data so this will be the last Arrests by Age, Sex, and Race data they release. Version 13 release notes:Changes R files from .rda to .rds.Fixes bug where the number_of_months_reported variable incorrectly was the largest of the number of months reported for a specific crime variable. For example, if theft was reported Jan-June and robbery was reported July-December in an agency, in total there were 12 months reported. But since each crime (and let's assume no other crime was reported more than 6 months of the year) only was reported 6 months, the number_of_months_reported variable was incorrectly set at 6 months. Now it is the total number of months reported of any crime. So it would be set to 12 months in this example. Thank you to Nick Eubank for alerting me to this issue.Adds rows even when a agency reported zero arrests that month; all arrest values are set to zero for these rows.Version 12 release notes:Adds 2019 data.Version 11 release notes:Changes release notes description, does not change data.Version 10 release notes:The data now has the following age categories (which were previously aggregated into larger groups to reduce file size): under 10, 10-12, 13-14, 40-44, 45-49, 50-54, 55-59, 60-64, over 64. These categories are available for female, male, and total (female+male) arrests. The previous aggregated categories (under 15, 40-49, and over 49 have been removed from the data). Version 9 release notes:For each offense, adds a variable indicating the number of months that offense was reported - these variables are labeled as "num_months_[crime]" where [crime] is the offense name. These variables are generated by the number of times one or more arrests were reported per month for that crime. For example, if there was at least one arrest for assault in January, February, March, and August (and no other months), there would be four months reported for assault. Please note that this does not differentiate between an agency not reporting that month and actually having zero arrests. The variable "number_of_months_reported" is still in the data and is the number of months that any offense was reported. So if any agency reports murder arrests every month but no other crimes, the murder number of months variable and the "number_of_months_reported" variable will both be 12 while every other offense number of month variable will be 0. Adds data for 2017 and 2018.Version 8 release notes:Adds annual data in R format.Changes project name to avoid confusing this data for the ones done by NACJD.Fixes bug where bookmaking was excluded as an arrest category. Changed the number of categories to include more offenses per category to have fewer total files. Added a "total_race" file for each category - this file has total arrests by race for each crime and a breakdown of juvenile/adult by race. Version 7 release notes: Adds 1974-1979 dataAdds monthly data (only totals by sex and race, not by age-categories). All data now from FBI, not NACJD. Changes some column names so all columns are <=32 characters to be usable in Stata.Changes how number of months reported is calculated. Now it is the number of unique months with arrest data reported - months of data from the monthly header file (i.e. juvenile disposition data) are not considered in this calculation. Version 6 release notes: Fix bug where juvenile female columns had the same value as juvenile male columns.Version 5 release notes: Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.Version 4 release notes: Changes column names from "p
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
This dataset is the repository for the following paper submitted to Data in Brief:
Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).
The Data in Brief article contains the supplement information and is the related data paper to:
Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).
Description/abstract
The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.
Folder structure
The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:
“code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.
“MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.
“mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).
“yield_productivity” contains .csv files of yield information for all countries listed above.
“population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).
“GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.
“built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.
Code structure
1_MODIS_NDVI_hdf_file_extraction.R
This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.
2_MERGE_MODIS_tiles.R
In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").
3_CROP_MODIS_merged_tiles.R
Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS.
The repository provides the already clipped and merged NDVI datasets.
4_TREND_analysis_NDVI.R
Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.
To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.
5_BUILT_UP_change_raster.R
Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.
6_POPULATION_numbers_plot.R
For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.
7_YIELD_plot.R
In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.
8_GLDAS_read_extract_trend
The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection).
Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.
From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).
From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.